From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751191AbdCNXk0 (ORCPT ); Tue, 14 Mar 2017 19:40:26 -0400 Received: from mga02.intel.com ([134.134.136.20]:40629 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750977AbdCNXkY (ORCPT ); Tue, 14 Mar 2017 19:40:24 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.36,166,1486454400"; d="scan'208";a="67436650" Message-ID: <1489534821.2733.47.camel@linux.intel.com> Subject: Re: [PATCH -mm -v6 4/9] mm, THP, swap: Add get_huge_swap_page() From: Tim Chen To: "Huang, Ying" , Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrea Arcangeli , "Kirill A . Shutemov" , Hugh Dickins , Shaohua Li , Minchan Kim , Rik van Riel Date: Tue, 14 Mar 2017 16:40:21 -0700 In-Reply-To: <20170308072613.17634-5-ying.huang@intel.com> References: <20170308072613.17634-1-ying.huang@intel.com> <20170308072613.17634-5-ying.huang@intel.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.18.5.2 (3.18.5.2-1.fc23) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2017-03-08 at 15:26 +0800, Huang, Ying wrote: > From: Huang Ying > > A variation of get_swap_page(), get_huge_swap_page(), is added to > allocate a swap cluster (HPAGE_PMD_NR swap slots) based on the swap > cluster allocation function.  A fair simple algorithm is used, that is, > only the first swap device in priority list will be tried to allocate > the swap cluster.  The function will fail if the trying is not > successful, and the caller will fallback to allocate a single swap slot > instead.  This works good enough for normal cases. > > This will be used for the THP (Transparent Huge Page) swap support. > Where get_huge_swap_page() will be used to allocate one swap cluster for > each THP swapped out. > > Because of the algorithm adopted, if the difference of the number of the > free swap clusters among multiple swap devices is significant, it is > possible that some THPs are split earlier than necessary.  For example, > this could be caused by big size difference among multiple swap devices. > > Cc: Andrea Arcangeli > Cc: Kirill A. Shutemov > Cc: Hugh Dickins > Cc: Shaohua Li > Cc: Minchan Kim > Cc: Rik van Riel > Signed-off-by: "Huang, Ying" > --- >  include/linux/swap.h | 19 ++++++++++++++++++- >  mm/swap_slots.c      |  5 +++-- >  mm/swapfile.c        | 16 ++++++++++++---- >  3 files changed, 33 insertions(+), 7 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index 278e1349a424..e3a7609a8989 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -388,7 +388,7 @@ static inline long get_nr_swap_pages(void) >  extern void si_swapinfo(struct sysinfo *); >  extern swp_entry_t get_swap_page(void); >  extern swp_entry_t get_swap_page_of_type(int); > -extern int get_swap_pages(int n, swp_entry_t swp_entries[]); > +extern int get_swap_pages(int n, swp_entry_t swp_entries[], bool huge); >  extern int add_swap_count_continuation(swp_entry_t, gfp_t); >  extern void swap_shmem_alloc(swp_entry_t); >  extern int swap_duplicate(swp_entry_t); > @@ -527,6 +527,23 @@ static inline swp_entry_t get_swap_page(void) >   >  #endif /* CONFIG_SWAP */ >   > +#ifdef CONFIG_THP_SWAP_CLUSTER > +static inline swp_entry_t get_huge_swap_page(void) > +{ > + swp_entry_t entry; > + > + if (get_swap_pages(1, &entry, true)) > + return entry; > + else > + return (swp_entry_t) {0}; > +} > +#else > +static inline swp_entry_t get_huge_swap_page(void) > +{ > + return (swp_entry_t) {0}; > +} > +#endif > + >  #ifdef CONFIG_MEMCG >  static inline int mem_cgroup_swappiness(struct mem_cgroup *memcg) >  { > diff --git a/mm/swap_slots.c b/mm/swap_slots.c > index 9b5bc86f96ad..075bb39e03c5 100644 > --- a/mm/swap_slots.c > +++ b/mm/swap_slots.c > @@ -258,7 +258,8 @@ static int refill_swap_slots_cache(struct swap_slots_cache *cache) >   >   cache->cur = 0; >   if (swap_slot_cache_active) > - cache->nr = get_swap_pages(SWAP_SLOTS_CACHE_SIZE, cache->slots); > + cache->nr = get_swap_pages(SWAP_SLOTS_CACHE_SIZE, cache->slots, > +    false); >   >   return cache->nr; >  } > @@ -334,7 +335,7 @@ swp_entry_t get_swap_page(void) >   return entry; >   } >   > - get_swap_pages(1, &entry); > + get_swap_pages(1, &entry, false); >   >   return entry; >  } > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 91876c33114b..7241c937e52b 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -904,11 +904,12 @@ static unsigned long scan_swap_map(struct swap_info_struct *si, >   >  } >   > -int get_swap_pages(int n_goal, swp_entry_t swp_entries[]) > +int get_swap_pages(int n_goal, swp_entry_t swp_entries[], bool huge) >  { >   struct swap_info_struct *si, *next; >   long avail_pgs; >   int n_ret = 0; > + int nr_pages = huge_cluster_nr_entries(huge); >   >   avail_pgs = atomic_long_read(&nr_swap_pages); >   if (avail_pgs <= 0) > @@ -920,6 +921,10 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[]) >   if (n_goal > avail_pgs) >   n_goal = avail_pgs; >   > + n_goal *= nr_pages; I think if (n_goal > 1) when huge is true,  n_goal should be set to huge_cluster_nr_entries(huge) here or we could have an invalid check below. We probably should add a comment to get_swap_pages on how we treat n_goal when huge is true.  Maybe say we will always treat n_goal as SWAPFILE_CLUSTER when huge is true.  > + if (avail_pgs < n_goal) > + goto noswap; > + >   atomic_long_sub(n_goal, &nr_swap_pages); >   >   spin_lock(&swap_avail_lock); > @@ -946,10 +951,13 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[]) >   spin_unlock(&si->lock); >   goto nextsi; >   } > - n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, > -     n_goal, swp_entries); > + if (likely(nr_pages == 1)) if (likely(!huge)) is probably more readable > + n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, > +     n_goal, swp_entries); > + else > + n_ret = swap_alloc_huge_cluster(si, swp_entries); >   spin_unlock(&si->lock); Thanks. Tim