All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	tim.c.chen@intel.com, dave.hansen@intel.com,
	andi.kleen@intel.com, aaron.lu@intel.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Hugh Dickins <hughd@google.com>, Shaohua Li <shli@kernel.org>,
	Minchan Kim <minchan@kernel.org>, Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH -v3 05/10] mm, THP, swap: Add get_huge_swap_page()
Date: Thu, 8 Sep 2016 14:13:53 +0300	[thread overview]
Message-ID: <20160908111353.GD17331@node> (raw)
In-Reply-To: <1473266769-2155-6-git-send-email-ying.huang@intel.com>

On Wed, Sep 07, 2016 at 09:46:04AM -0700, Huang, Ying wrote:
> From: Huang Ying <ying.huang@intel.com>
> 
> A variation of get_swap_page(), get_huge_swap_page(), is added to
> allocate a swap cluster (512 swap slots) based on the swap cluster
> allocation function.  A fair simple algorithm is used, that is, only the
> first swap device in priority list will be tried to allocate the swap
> cluster.  The function will fail if the trying is not successful, and
> the caller will fallback to allocate a single swap slot instead.  This
> works good enough for normal cases.

For normal cases, yes. But the limitation is not obvious for users and
performance difference after small change in configuration could be
puzzling.

At least this must be documented somewhere.

> 
> This will be used for the THP (Transparent Huge Page) swap support.
> Where get_huge_swap_page() will be used to allocate one swap cluster for
> each THP swapped out.
> 
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Shaohua Li <shli@kernel.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Rik van Riel <riel@redhat.com>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> ---
>  include/linux/swap.h | 24 +++++++++++++++++++++++-
>  mm/swapfile.c        | 18 ++++++++++++------
>  2 files changed, 35 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 75aad24..bc0a84d 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -399,7 +399,7 @@ static inline long get_nr_swap_pages(void)
>  }
>  
>  extern void si_swapinfo(struct sysinfo *);
> -extern swp_entry_t get_swap_page(void);
> +extern swp_entry_t __get_swap_page(bool huge);
>  extern swp_entry_t get_swap_page_of_type(int);
>  extern int add_swap_count_continuation(swp_entry_t, gfp_t);
>  extern void swap_shmem_alloc(swp_entry_t);
> @@ -419,6 +419,23 @@ extern bool reuse_swap_page(struct page *, int *);
>  extern int try_to_free_swap(struct page *);
>  struct backing_dev_info;
>  
> +static inline swp_entry_t get_swap_page(void)
> +{
> +	return __get_swap_page(false);
> +}
> +
> +#ifdef CONFIG_THP_SWAP_CLUSTER
> +static inline swp_entry_t get_huge_swap_page(void)
> +{
> +	return __get_swap_page(true);
> +}
> +#else
> +static inline swp_entry_t get_huge_swap_page(void)
> +{
> +	return (swp_entry_t) {0};
> +}
> +#endif
> +
>  #else /* CONFIG_SWAP */
>  
>  #define swap_address_space(entry)		(NULL)
> @@ -525,6 +542,11 @@ static inline swp_entry_t get_swap_page(void)
>  	return entry;
>  }
>  
> +static inline swp_entry_t get_huge_swap_page(void)
> +{
> +	return (swp_entry_t) {0};
> +}
> +
>  #endif /* CONFIG_SWAP */
>  
>  #ifdef CONFIG_MEMCG
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 0132e8c..3d2bd1f 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -760,14 +760,15 @@ static inline unsigned long swap_alloc_huge_cluster(struct swap_info_struct *si)
>  }
>  #endif
>  
> -swp_entry_t get_swap_page(void)
> +swp_entry_t __get_swap_page(bool huge)
>  {
>  	struct swap_info_struct *si, *next;
>  	pgoff_t offset;
> +	int nr_pages = huge_cluster_nr_entries(huge);
>  
> -	if (atomic_long_read(&nr_swap_pages) <= 0)
> +	if (atomic_long_read(&nr_swap_pages) < nr_pages)
>  		goto noswap;
> -	atomic_long_dec(&nr_swap_pages);
> +	atomic_long_sub(nr_pages, &nr_swap_pages);
>  
>  	spin_lock(&swap_avail_lock);
>  
> @@ -795,10 +796,15 @@ start_over:
>  		}
>  
>  		/* This is called for allocating swap entry for cache */
> -		offset = scan_swap_map(si, SWAP_HAS_CACHE);
> +		if (likely(nr_pages == 1))
> +			offset = scan_swap_map(si, SWAP_HAS_CACHE);
> +		else
> +			offset = swap_alloc_huge_cluster(si);
>  		spin_unlock(&si->lock);
>  		if (offset)
>  			return swp_entry(si->type, offset);
> +		else if (unlikely(nr_pages != 1))
> +			goto fail_alloc;
>  		pr_debug("scan_swap_map of si %d failed to find offset\n",
>  		       si->type);
>  		spin_lock(&swap_avail_lock);
> @@ -818,8 +824,8 @@ nextsi:
>  	}
>  
>  	spin_unlock(&swap_avail_lock);
> -
> -	atomic_long_inc(&nr_swap_pages);
> +fail_alloc:
> +	atomic_long_add(nr_pages, &nr_swap_pages);
>  noswap:
>  	return (swp_entry_t) {0};
>  }
> -- 
> 2.8.1
> 

-- 
 Kirill A. Shutemov

WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	tim.c.chen@intel.com, dave.hansen@intel.com,
	andi.kleen@intel.com, aaron.lu@intel.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Hugh Dickins <hughd@google.com>, Shaohua Li <shli@kernel.org>,
	Minchan Kim <minchan@kernel.org>, Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH -v3 05/10] mm, THP, swap: Add get_huge_swap_page()
Date: Thu, 8 Sep 2016 14:13:53 +0300	[thread overview]
Message-ID: <20160908111353.GD17331@node> (raw)
In-Reply-To: <1473266769-2155-6-git-send-email-ying.huang@intel.com>

On Wed, Sep 07, 2016 at 09:46:04AM -0700, Huang, Ying wrote:
> From: Huang Ying <ying.huang@intel.com>
> 
> A variation of get_swap_page(), get_huge_swap_page(), is added to
> allocate a swap cluster (512 swap slots) based on the swap cluster
> allocation function.  A fair simple algorithm is used, that is, only the
> first swap device in priority list will be tried to allocate the swap
> cluster.  The function will fail if the trying is not successful, and
> the caller will fallback to allocate a single swap slot instead.  This
> works good enough for normal cases.

For normal cases, yes. But the limitation is not obvious for users and
performance difference after small change in configuration could be
puzzling.

At least this must be documented somewhere.

> 
> This will be used for the THP (Transparent Huge Page) swap support.
> Where get_huge_swap_page() will be used to allocate one swap cluster for
> each THP swapped out.
> 
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Shaohua Li <shli@kernel.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Rik van Riel <riel@redhat.com>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> ---
>  include/linux/swap.h | 24 +++++++++++++++++++++++-
>  mm/swapfile.c        | 18 ++++++++++++------
>  2 files changed, 35 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 75aad24..bc0a84d 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -399,7 +399,7 @@ static inline long get_nr_swap_pages(void)
>  }
>  
>  extern void si_swapinfo(struct sysinfo *);
> -extern swp_entry_t get_swap_page(void);
> +extern swp_entry_t __get_swap_page(bool huge);
>  extern swp_entry_t get_swap_page_of_type(int);
>  extern int add_swap_count_continuation(swp_entry_t, gfp_t);
>  extern void swap_shmem_alloc(swp_entry_t);
> @@ -419,6 +419,23 @@ extern bool reuse_swap_page(struct page *, int *);
>  extern int try_to_free_swap(struct page *);
>  struct backing_dev_info;
>  
> +static inline swp_entry_t get_swap_page(void)
> +{
> +	return __get_swap_page(false);
> +}
> +
> +#ifdef CONFIG_THP_SWAP_CLUSTER
> +static inline swp_entry_t get_huge_swap_page(void)
> +{
> +	return __get_swap_page(true);
> +}
> +#else
> +static inline swp_entry_t get_huge_swap_page(void)
> +{
> +	return (swp_entry_t) {0};
> +}
> +#endif
> +
>  #else /* CONFIG_SWAP */
>  
>  #define swap_address_space(entry)		(NULL)
> @@ -525,6 +542,11 @@ static inline swp_entry_t get_swap_page(void)
>  	return entry;
>  }
>  
> +static inline swp_entry_t get_huge_swap_page(void)
> +{
> +	return (swp_entry_t) {0};
> +}
> +
>  #endif /* CONFIG_SWAP */
>  
>  #ifdef CONFIG_MEMCG
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 0132e8c..3d2bd1f 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -760,14 +760,15 @@ static inline unsigned long swap_alloc_huge_cluster(struct swap_info_struct *si)
>  }
>  #endif
>  
> -swp_entry_t get_swap_page(void)
> +swp_entry_t __get_swap_page(bool huge)
>  {
>  	struct swap_info_struct *si, *next;
>  	pgoff_t offset;
> +	int nr_pages = huge_cluster_nr_entries(huge);
>  
> -	if (atomic_long_read(&nr_swap_pages) <= 0)
> +	if (atomic_long_read(&nr_swap_pages) < nr_pages)
>  		goto noswap;
> -	atomic_long_dec(&nr_swap_pages);
> +	atomic_long_sub(nr_pages, &nr_swap_pages);
>  
>  	spin_lock(&swap_avail_lock);
>  
> @@ -795,10 +796,15 @@ start_over:
>  		}
>  
>  		/* This is called for allocating swap entry for cache */
> -		offset = scan_swap_map(si, SWAP_HAS_CACHE);
> +		if (likely(nr_pages == 1))
> +			offset = scan_swap_map(si, SWAP_HAS_CACHE);
> +		else
> +			offset = swap_alloc_huge_cluster(si);
>  		spin_unlock(&si->lock);
>  		if (offset)
>  			return swp_entry(si->type, offset);
> +		else if (unlikely(nr_pages != 1))
> +			goto fail_alloc;
>  		pr_debug("scan_swap_map of si %d failed to find offset\n",
>  		       si->type);
>  		spin_lock(&swap_avail_lock);
> @@ -818,8 +824,8 @@ nextsi:
>  	}
>  
>  	spin_unlock(&swap_avail_lock);
> -
> -	atomic_long_inc(&nr_swap_pages);
> +fail_alloc:
> +	atomic_long_add(nr_pages, &nr_swap_pages);
>  noswap:
>  	return (swp_entry_t) {0};
>  }
> -- 
> 2.8.1
> 

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-09-08 11:13 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-07 16:45 [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out Huang, Ying
2016-09-07 16:45 ` Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 01/10] mm, swap: Make swap cluster size same of THP size on x86_64 Huang, Ying
2016-09-07 16:46   ` Huang, Ying
2016-09-08  5:45   ` Anshuman Khandual
2016-09-08  5:45     ` Anshuman Khandual
2016-09-08 18:07     ` Huang, Ying
2016-09-08 18:07       ` Huang, Ying
2016-09-19 17:09     ` Johannes Weiner
2016-09-19 17:09       ` Johannes Weiner
2016-09-20  2:01       ` Huang, Ying
2016-09-20  2:01         ` Huang, Ying
2016-09-22 19:25         ` Johannes Weiner
2016-09-22 19:25           ` Johannes Weiner
2016-09-23  8:47           ` Huang, Ying
2016-09-08  8:21   ` Anshuman Khandual
2016-09-08  8:21     ` Anshuman Khandual
2016-09-08 11:03   ` Kirill A. Shutemov
2016-09-08 11:03     ` Kirill A. Shutemov
2016-09-08 17:39     ` Huang, Ying
2016-09-08 17:39       ` Huang, Ying
2016-09-08 11:07   ` Kirill A. Shutemov
2016-09-08 11:07     ` Kirill A. Shutemov
2016-09-08 17:23     ` Huang, Ying
2016-09-08 17:23       ` Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 02/10] mm, memcg: Add swap_cgroup_iter iterator Huang, Ying
2016-09-07 16:46   ` Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 03/10] mm, memcg: Support to charge/uncharge multiple swap entries Huang, Ying
2016-09-07 16:46   ` Huang, Ying
2016-09-08  5:46   ` Anshuman Khandual
2016-09-08  5:46     ` Anshuman Khandual
2016-09-08  8:28   ` Anshuman Khandual
2016-09-08  8:28     ` Anshuman Khandual
2016-09-08 18:15     ` Huang, Ying
2016-09-08 18:15       ` Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 04/10] mm, THP, swap: Add swap cluster allocate/free functions Huang, Ying
2016-09-07 16:46   ` Huang, Ying
2016-09-08  5:49   ` Anshuman Khandual
2016-09-08  5:49     ` Anshuman Khandual
2016-09-08  8:30   ` Anshuman Khandual
2016-09-08  8:30     ` Anshuman Khandual
2016-09-08 18:14     ` Huang, Ying
2016-09-08 18:14       ` Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 05/10] mm, THP, swap: Add get_huge_swap_page() Huang, Ying
2016-09-07 16:46   ` Huang, Ying
2016-09-08 11:13   ` Kirill A. Shutemov [this message]
2016-09-08 11:13     ` Kirill A. Shutemov
2016-09-08 17:22     ` Huang, Ying
2016-09-08 17:22       ` Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 06/10] mm, THP, swap: Support to clear SWAP_HAS_CACHE for huge page Huang, Ying
2016-09-07 16:46   ` Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 07/10] mm, THP, swap: Support to add/delete THP to/from swap cache Huang, Ying
2016-09-07 16:46   ` Huang, Ying
2016-09-08  9:00   ` Anshuman Khandual
2016-09-08  9:00     ` Anshuman Khandual
2016-09-08 18:10     ` Huang, Ying
2016-09-08 18:10       ` Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 08/10] mm, THP: Add can_split_huge_page() Huang, Ying
2016-09-07 16:46   ` Huang, Ying
2016-09-08 11:17   ` Kirill A. Shutemov
2016-09-08 11:17     ` Kirill A. Shutemov
2016-09-08 17:02     ` Huang, Ying
2016-09-08 17:02       ` Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 09/10] mm, THP, swap: Support to split THP in swap cache Huang, Ying
2016-09-07 16:46   ` Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 10/10] mm, THP, swap: Delay splitting THP during swap out Huang, Ying
2016-09-07 16:46   ` Huang, Ying
2016-09-09  5:43 ` [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out Minchan Kim
2016-09-09  5:43   ` Minchan Kim
2016-09-09 15:53   ` Tim Chen
2016-09-09 15:53     ` Tim Chen
2016-09-09 20:35   ` Huang, Ying
2016-09-09 20:35     ` Huang, Ying
2016-09-13  6:13     ` Minchan Kim
2016-09-13  6:13       ` Minchan Kim
2016-09-13  6:40       ` Huang, Ying
2016-09-13  6:40         ` Huang, Ying
2016-09-13  7:05         ` Minchan Kim
2016-09-13  7:05           ` Minchan Kim
2016-09-13  8:53           ` Huang, Ying
2016-09-13  8:53             ` Huang, Ying
2016-09-13  9:16             ` Minchan Kim
2016-09-13  9:16               ` Minchan Kim
2016-09-13 23:52               ` Chen, Tim C
2016-09-13 23:52                 ` Chen, Tim C
2016-09-19  7:11                 ` Minchan Kim
2016-09-19  7:11                   ` Minchan Kim
2016-09-19 15:59                   ` Tim Chen
2016-09-19 15:59                     ` Tim Chen
2016-09-18  1:53               ` Huang, Ying
2016-09-18  1:53                 ` Huang, Ying
2016-09-19  7:08                 ` Minchan Kim
2016-09-19  7:08                   ` Minchan Kim
2016-09-20  2:54                   ` Huang, Ying
2016-09-20  2:54                     ` Huang, Ying
2016-09-20  5:06                     ` Minchan Kim
2016-09-20  5:06                       ` Minchan Kim
2016-09-20  5:28                       ` Huang, Ying
2016-09-20  5:28                         ` Huang, Ying
2016-09-13 14:35             ` Andrea Arcangeli
2016-09-13 14:35               ` Andrea Arcangeli
2016-09-19 17:33 ` Hugh Dickins
2016-09-19 17:33   ` Hugh Dickins
2016-09-22 22:56 ` Shaohua Li
2016-09-22 22:56   ` Shaohua Li
2016-09-22 23:49   ` Chen, Tim C
2016-09-22 23:49     ` Chen, Tim C
2016-09-22 23:53     ` Andi Kleen
2016-09-22 23:53       ` Andi Kleen
2016-09-23  0:38   ` Rik van Riel
2016-09-23  2:32     ` Huang, Ying
2016-09-23  2:32       ` Huang, Ying
2016-09-25 19:18       ` Shaohua Li
2016-09-25 19:18         ` Shaohua Li
2016-09-26  1:06         ` Minchan Kim
2016-09-26  1:06           ` Minchan Kim
2016-09-26  3:25         ` Huang, Ying
2016-09-26  3:25           ` Huang, Ying
2016-09-23  2:12   ` Huang, Ying
2016-09-23  2:12     ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160908111353.GD17331@node \
    --to=kirill@shutemov.name \
    --cc=aarcange@redhat.com \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi.kleen@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    --cc=tim.c.chen@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.