All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] close various race windows for swap
@ 2021-04-08 13:08 Miaohe Lin
  2021-04-08 13:08 ` [PATCH 1/5] mm/swapfile: add percpu_ref support " Miaohe Lin
                   ` (5 more replies)
  0 siblings, 6 replies; 73+ messages in thread
From: Miaohe Lin @ 2021-04-08 13:08 UTC (permalink / raw)
  To: akpm
  Cc: hannes, mhocko, iamjoonsoo.kim, vbabka, alex.shi, willy, minchan,
	richard.weiyang, ying.huang, hughd, tim.c.chen, linux-kernel,
	linux-mm, linmiaohe

Hi all,
When I was investigating the swap code, I found some possible race
windows. This series aims to fix all these races. But using current
get/put_swap_device() to guard against concurrent swapoff for
swap_readpage() looks terrible because swap_readpage() may take really
long time. And to reduce the performance overhead on the hot-path as
much as possible, it appears we can use the percpu_ref to close this
race window(as suggested by Huang, Ying). The patch 1 adds percpu_ref
support for swap and the rest of the patches use this to close various
race windows. More details can be found in the respective changelogs.
Thanks!

Miaohe Lin (5):
  mm/swapfile: add percpu_ref support for swap
  swap: fix do_swap_page() race with swapoff
  mm/swap_state: fix get_shadow_from_swap_cache() race with swapoff
  mm/swap_state: fix potential faulted in race in swap_ra_info()
  mm/swap_state: fix swap_cluster_readahead() race with swapoff

 include/linux/swap.h |  4 +++-
 mm/memory.c          | 10 +++++++++
 mm/swap_state.c      | 33 +++++++++++++++++++++--------
 mm/swapfile.c        | 50 +++++++++++++++++++++++++++-----------------
 4 files changed, 68 insertions(+), 29 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 73+ messages in thread
* Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff
@ 2021-04-08 20:46 kernel test robot
  0 siblings, 0 replies; 73+ messages in thread
From: kernel test robot @ 2021-04-08 20:46 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 13284 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20210408130820.48233-3-linmiaohe@huawei.com>
References: <20210408130820.48233-3-linmiaohe@huawei.com>
TO: Miaohe Lin <linmiaohe@huawei.com>
TO: akpm(a)linux-foundation.org
CC: hannes(a)cmpxchg.org
CC: mhocko(a)suse.com
CC: iamjoonsoo.kim(a)lge.com
CC: vbabka(a)suse.cz
CC: alex.shi(a)linux.alibaba.com
CC: willy(a)infradead.org
CC: minchan(a)kernel.org
CC: richard.weiyang(a)gmail.com
CC: ying.huang(a)intel.com

Hi Miaohe,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linux/master]
[also build test WARNING on linus/master hnaz-linux-mm/master v5.12-rc6 next-20210408]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Miaohe-Lin/close-various-race-windows-for-swap/20210408-211224
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 5e46d1b78a03d52306f21f77a4e4a144b6d31486
:::::: branch date: 8 hours ago
:::::: commit date: 8 hours ago
config: powerpc-randconfig-s032-20210408 (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-279-g6d5d9b42-dirty
        # https://github.com/0day-ci/linux/commit/56e65e21c8c9858e36c3bca84006a15fe9b85efd
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Miaohe-Lin/close-various-race-windows-for-swap/20210408-211224
        git checkout 56e65e21c8c9858e36c3bca84006a15fe9b85efd
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   mm/swapfile.c:488:35: sparse: sparse: context imbalance in 'swap_do_scheduled_discard' - different lock contexts for basic block
   mm/swapfile.c:664:9: sparse: sparse: context imbalance in 'scan_swap_map_try_ssd_cluster' - different lock contexts for basic block
   mm/swapfile.c:954:20: sparse: sparse: context imbalance in 'scan_swap_map_slots' - unexpected unlock
   mm/swapfile.c:1037:23: sparse: sparse: context imbalance in 'swap_free_cluster' - different lock contexts for basic block
   mm/swapfile.c:1218:9: sparse: sparse: context imbalance in 'swap_info_get' - wrong count at exit
   mm/swapfile.c:1230:36: sparse: sparse: context imbalance in 'swap_info_get_cont' - unexpected unlock
   mm/swapfile.c:384:9: sparse: sparse: context imbalance in '__swap_entry_free' - different lock contexts for basic block
   mm/swapfile.c:1361:23: sparse: sparse: context imbalance in 'swap_entry_free' - different lock contexts for basic block
   mm/swapfile.c:1418:34: sparse: sparse: context imbalance in 'put_swap_page' - different lock contexts for basic block
   mm/swapfile.c:1479:28: sparse: sparse: context imbalance in 'swapcache_free_entries' - unexpected unlock
   mm/swapfile.c:384:9: sparse: sparse: context imbalance in 'page_swapcount' - different lock contexts for basic block
   mm/swapfile.c:384:9: sparse: sparse: context imbalance in 'swap_swapcount' - different lock contexts for basic block
   mm/swapfile.c:384:9: sparse: sparse: context imbalance in 'swp_swapcount' - different lock contexts for basic block
   mm/swapfile.c:384:9: sparse: sparse: context imbalance in 'swap_page_trans_huge_swapped' - different lock contexts for basic block
   mm/swapfile.c:1737:44: sparse: sparse: context imbalance in 'reuse_swap_page' - unexpected unlock
   mm/swapfile.c:384:9: sparse: sparse: context imbalance in '__swap_duplicate' - different lock contexts for basic block
>> mm/swapfile.c:3673:23: sparse: sparse: context imbalance in 'add_swap_count_continuation' - different lock contexts for basic block

vim +/add_swap_count_continuation +3673 mm/swapfile.c

f981c5950fa859 Mel Gorman   2012-07-31  3563  
570a335b8e2257 Hugh Dickins 2009-12-14  3564  /*
570a335b8e2257 Hugh Dickins 2009-12-14  3565   * add_swap_count_continuation - called when a swap count is duplicated
570a335b8e2257 Hugh Dickins 2009-12-14  3566   * beyond SWAP_MAP_MAX, it allocates a new page and links that to the entry's
570a335b8e2257 Hugh Dickins 2009-12-14  3567   * page of the original vmalloc'ed swap_map, to hold the continuation count
570a335b8e2257 Hugh Dickins 2009-12-14  3568   * (for that entry and for its neighbouring PAGE_SIZE swap entries).  Called
570a335b8e2257 Hugh Dickins 2009-12-14  3569   * again when count is duplicated beyond SWAP_MAP_MAX * SWAP_CONT_MAX, etc.
570a335b8e2257 Hugh Dickins 2009-12-14  3570   *
570a335b8e2257 Hugh Dickins 2009-12-14  3571   * These continuation pages are seldom referenced: the common paths all work
570a335b8e2257 Hugh Dickins 2009-12-14  3572   * on the original swap_map, only referring to a continuation page when the
570a335b8e2257 Hugh Dickins 2009-12-14  3573   * low "digit" of a count is incremented or decremented through SWAP_MAP_MAX.
570a335b8e2257 Hugh Dickins 2009-12-14  3574   *
570a335b8e2257 Hugh Dickins 2009-12-14  3575   * add_swap_count_continuation(, GFP_ATOMIC) can be called while holding
570a335b8e2257 Hugh Dickins 2009-12-14  3576   * page table locks; if it fails, add_swap_count_continuation(, GFP_KERNEL)
570a335b8e2257 Hugh Dickins 2009-12-14  3577   * can be called after dropping locks.
570a335b8e2257 Hugh Dickins 2009-12-14  3578   */
570a335b8e2257 Hugh Dickins 2009-12-14  3579  int add_swap_count_continuation(swp_entry_t entry, gfp_t gfp_mask)
570a335b8e2257 Hugh Dickins 2009-12-14  3580  {
570a335b8e2257 Hugh Dickins 2009-12-14  3581  	struct swap_info_struct *si;
235b62176712b9 Huang, Ying  2017-02-22  3582  	struct swap_cluster_info *ci;
570a335b8e2257 Hugh Dickins 2009-12-14  3583  	struct page *head;
570a335b8e2257 Hugh Dickins 2009-12-14  3584  	struct page *page;
570a335b8e2257 Hugh Dickins 2009-12-14  3585  	struct page *list_page;
570a335b8e2257 Hugh Dickins 2009-12-14  3586  	pgoff_t offset;
570a335b8e2257 Hugh Dickins 2009-12-14  3587  	unsigned char count;
eb085574a7526c Huang Ying   2019-07-11  3588  	int ret = 0;
570a335b8e2257 Hugh Dickins 2009-12-14  3589  
570a335b8e2257 Hugh Dickins 2009-12-14  3590  	/*
570a335b8e2257 Hugh Dickins 2009-12-14  3591  	 * When debugging, it's easier to use __GFP_ZERO here; but it's better
570a335b8e2257 Hugh Dickins 2009-12-14  3592  	 * for latency not to zero a page while GFP_ATOMIC and holding locks.
570a335b8e2257 Hugh Dickins 2009-12-14  3593  	 */
570a335b8e2257 Hugh Dickins 2009-12-14  3594  	page = alloc_page(gfp_mask | __GFP_HIGHMEM);
570a335b8e2257 Hugh Dickins 2009-12-14  3595  
eb085574a7526c Huang Ying   2019-07-11  3596  	si = get_swap_device(entry);
570a335b8e2257 Hugh Dickins 2009-12-14  3597  	if (!si) {
570a335b8e2257 Hugh Dickins 2009-12-14  3598  		/*
570a335b8e2257 Hugh Dickins 2009-12-14  3599  		 * An acceptable race has occurred since the failing
eb085574a7526c Huang Ying   2019-07-11  3600  		 * __swap_duplicate(): the swap device may be swapoff
570a335b8e2257 Hugh Dickins 2009-12-14  3601  		 */
570a335b8e2257 Hugh Dickins 2009-12-14  3602  		goto outer;
570a335b8e2257 Hugh Dickins 2009-12-14  3603  	}
eb085574a7526c Huang Ying   2019-07-11  3604  	spin_lock(&si->lock);
570a335b8e2257 Hugh Dickins 2009-12-14  3605  
570a335b8e2257 Hugh Dickins 2009-12-14  3606  	offset = swp_offset(entry);
235b62176712b9 Huang, Ying  2017-02-22  3607  
235b62176712b9 Huang, Ying  2017-02-22  3608  	ci = lock_cluster(si, offset);
235b62176712b9 Huang, Ying  2017-02-22  3609  
d8aa24e04fb2a7 Miaohe Lin   2020-12-14  3610  	count = swap_count(si->swap_map[offset]);
570a335b8e2257 Hugh Dickins 2009-12-14  3611  
570a335b8e2257 Hugh Dickins 2009-12-14  3612  	if ((count & ~COUNT_CONTINUED) != SWAP_MAP_MAX) {
570a335b8e2257 Hugh Dickins 2009-12-14  3613  		/*
570a335b8e2257 Hugh Dickins 2009-12-14  3614  		 * The higher the swap count, the more likely it is that tasks
570a335b8e2257 Hugh Dickins 2009-12-14  3615  		 * will race to add swap count continuation: we need to avoid
570a335b8e2257 Hugh Dickins 2009-12-14  3616  		 * over-provisioning.
570a335b8e2257 Hugh Dickins 2009-12-14  3617  		 */
570a335b8e2257 Hugh Dickins 2009-12-14  3618  		goto out;
570a335b8e2257 Hugh Dickins 2009-12-14  3619  	}
570a335b8e2257 Hugh Dickins 2009-12-14  3620  
570a335b8e2257 Hugh Dickins 2009-12-14  3621  	if (!page) {
eb085574a7526c Huang Ying   2019-07-11  3622  		ret = -ENOMEM;
eb085574a7526c Huang Ying   2019-07-11  3623  		goto out;
570a335b8e2257 Hugh Dickins 2009-12-14  3624  	}
570a335b8e2257 Hugh Dickins 2009-12-14  3625  
570a335b8e2257 Hugh Dickins 2009-12-14  3626  	/*
570a335b8e2257 Hugh Dickins 2009-12-14  3627  	 * We are fortunate that although vmalloc_to_page uses pte_offset_map,
570a335b8e2257 Hugh Dickins 2009-12-14  3628  	 * no architecture is using highmem pages for kernel page tables: so it
570a335b8e2257 Hugh Dickins 2009-12-14  3629  	 * will not corrupt the GFP_ATOMIC caller's atomic page table kmaps.
570a335b8e2257 Hugh Dickins 2009-12-14  3630  	 */
570a335b8e2257 Hugh Dickins 2009-12-14  3631  	head = vmalloc_to_page(si->swap_map + offset);
570a335b8e2257 Hugh Dickins 2009-12-14  3632  	offset &= ~PAGE_MASK;
570a335b8e2257 Hugh Dickins 2009-12-14  3633  
2628bd6fc052bd Huang Ying   2017-11-02  3634  	spin_lock(&si->cont_lock);
570a335b8e2257 Hugh Dickins 2009-12-14  3635  	/*
570a335b8e2257 Hugh Dickins 2009-12-14  3636  	 * Page allocation does not initialize the page's lru field,
570a335b8e2257 Hugh Dickins 2009-12-14  3637  	 * but it does always reset its private field.
570a335b8e2257 Hugh Dickins 2009-12-14  3638  	 */
570a335b8e2257 Hugh Dickins 2009-12-14  3639  	if (!page_private(head)) {
570a335b8e2257 Hugh Dickins 2009-12-14  3640  		BUG_ON(count & COUNT_CONTINUED);
570a335b8e2257 Hugh Dickins 2009-12-14  3641  		INIT_LIST_HEAD(&head->lru);
570a335b8e2257 Hugh Dickins 2009-12-14  3642  		set_page_private(head, SWP_CONTINUED);
570a335b8e2257 Hugh Dickins 2009-12-14  3643  		si->flags |= SWP_CONTINUED;
570a335b8e2257 Hugh Dickins 2009-12-14  3644  	}
570a335b8e2257 Hugh Dickins 2009-12-14  3645  
570a335b8e2257 Hugh Dickins 2009-12-14  3646  	list_for_each_entry(list_page, &head->lru, lru) {
570a335b8e2257 Hugh Dickins 2009-12-14  3647  		unsigned char *map;
570a335b8e2257 Hugh Dickins 2009-12-14  3648  
570a335b8e2257 Hugh Dickins 2009-12-14  3649  		/*
570a335b8e2257 Hugh Dickins 2009-12-14  3650  		 * If the previous map said no continuation, but we've found
570a335b8e2257 Hugh Dickins 2009-12-14  3651  		 * a continuation page, free our allocation and use this one.
570a335b8e2257 Hugh Dickins 2009-12-14  3652  		 */
570a335b8e2257 Hugh Dickins 2009-12-14  3653  		if (!(count & COUNT_CONTINUED))
2628bd6fc052bd Huang Ying   2017-11-02  3654  			goto out_unlock_cont;
570a335b8e2257 Hugh Dickins 2009-12-14  3655  
9b04c5fec43c0d Cong Wang    2011-11-25  3656  		map = kmap_atomic(list_page) + offset;
570a335b8e2257 Hugh Dickins 2009-12-14  3657  		count = *map;
9b04c5fec43c0d Cong Wang    2011-11-25  3658  		kunmap_atomic(map);
570a335b8e2257 Hugh Dickins 2009-12-14  3659  
570a335b8e2257 Hugh Dickins 2009-12-14  3660  		/*
570a335b8e2257 Hugh Dickins 2009-12-14  3661  		 * If this continuation count now has some space in it,
570a335b8e2257 Hugh Dickins 2009-12-14  3662  		 * free our allocation and use this one.
570a335b8e2257 Hugh Dickins 2009-12-14  3663  		 */
570a335b8e2257 Hugh Dickins 2009-12-14  3664  		if ((count & ~COUNT_CONTINUED) != SWAP_CONT_MAX)
2628bd6fc052bd Huang Ying   2017-11-02  3665  			goto out_unlock_cont;
570a335b8e2257 Hugh Dickins 2009-12-14  3666  	}
570a335b8e2257 Hugh Dickins 2009-12-14  3667  
570a335b8e2257 Hugh Dickins 2009-12-14  3668  	list_add_tail(&page->lru, &head->lru);
570a335b8e2257 Hugh Dickins 2009-12-14  3669  	page = NULL;			/* now it's attached, don't free it */
2628bd6fc052bd Huang Ying   2017-11-02  3670  out_unlock_cont:
2628bd6fc052bd Huang Ying   2017-11-02  3671  	spin_unlock(&si->cont_lock);
570a335b8e2257 Hugh Dickins 2009-12-14  3672  out:
235b62176712b9 Huang, Ying  2017-02-22 @3673  	unlock_cluster(ci);
ec8acf20afb853 Shaohua Li   2013-02-22  3674  	spin_unlock(&si->lock);
eb085574a7526c Huang Ying   2019-07-11  3675  	put_swap_device(si);
570a335b8e2257 Hugh Dickins 2009-12-14  3676  outer:
570a335b8e2257 Hugh Dickins 2009-12-14  3677  	if (page)
570a335b8e2257 Hugh Dickins 2009-12-14  3678  		__free_page(page);
eb085574a7526c Huang Ying   2019-07-11  3679  	return ret;
570a335b8e2257 Hugh Dickins 2009-12-14  3680  }
570a335b8e2257 Hugh Dickins 2009-12-14  3681  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 30996 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2021-04-16  8:30 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-08 13:08 [PATCH 0/5] close various race windows for swap Miaohe Lin
2021-04-08 13:08 ` [PATCH 1/5] mm/swapfile: add percpu_ref support " Miaohe Lin
2021-04-12  3:30   ` Huang, Ying
2021-04-12  3:30     ` Huang, Ying
2021-04-12  6:59     ` Miaohe Lin
2021-04-12  7:24     ` Huang, Ying
2021-04-12  7:24       ` Huang, Ying
2021-04-13 12:39       ` Miaohe Lin
2021-04-14  1:17         ` Huang, Ying
2021-04-14  1:17           ` Huang, Ying
2021-04-14  1:58           ` Miaohe Lin
2021-04-14  2:06             ` Huang, Ying
2021-04-14  2:06               ` Huang, Ying
2021-04-14  3:44               ` Dennis Zhou
2021-04-14  3:59                 ` Huang, Ying
2021-04-14  3:59                   ` Huang, Ying
2021-04-14  4:05                   ` Dennis Zhou
2021-04-14  5:44                     ` Huang, Ying
2021-04-14  5:44                       ` Huang, Ying
2021-04-14 14:53                       ` Dennis Zhou
2021-04-15  3:16                         ` Miaohe Lin
2021-04-15  4:20                           ` Dennis Zhou
2021-04-15  9:17                             ` Miaohe Lin
2021-04-15  5:24                         ` Huang, Ying
2021-04-15  5:24                           ` Huang, Ying
2021-04-15 14:31                           ` Dennis Zhou
2021-04-16  0:54                             ` Huang, Ying
2021-04-16  0:54                               ` Huang, Ying
2021-04-16  2:27                             ` Miaohe Lin
2021-04-16  6:25                               ` Huang, Ying
2021-04-16  6:25                                 ` Huang, Ying
2021-04-16  8:30                                 ` Miaohe Lin
2021-04-08 13:08 ` [PATCH 2/5] swap: fix do_swap_page() race with swapoff Miaohe Lin
2021-04-08 21:34   ` Tim Chen
2021-04-09  8:42     ` Miaohe Lin
2021-04-09 17:17       ` Tim Chen
2021-04-10  3:17         ` Miaohe Lin
2021-04-12  1:44           ` Huang, Ying
2021-04-12  1:44             ` Huang, Ying
2021-04-12  3:24             ` Miaohe Lin
2021-04-08 21:37   ` kernel test robot
2021-04-09  8:46     ` Miaohe Lin
2021-04-08 22:56   ` kernel test robot
2021-04-13  1:27   ` Huang, Ying
2021-04-13  1:27     ` Huang, Ying
2021-04-13 19:24     ` Tim Chen
2021-04-14  1:04       ` Huang, Ying
2021-04-14  1:04         ` Huang, Ying
2021-04-14  2:20         ` Miaohe Lin
2021-04-14 16:13         ` Tim Chen
2021-04-15  3:19           ` Miaohe Lin
2021-04-14  2:55     ` Miaohe Lin
2021-04-14  3:07       ` Huang, Ying
2021-04-14  3:07         ` Huang, Ying
2021-04-14  3:27         ` Miaohe Lin
2021-04-08 13:08 ` [PATCH 3/5] mm/swap_state: fix get_shadow_from_swap_cache() " Miaohe Lin
2021-04-13  1:33   ` Huang, Ying
2021-04-13  1:33     ` Huang, Ying
2021-04-14  2:42     ` Miaohe Lin
2021-04-08 13:08 ` [PATCH 4/5] mm/swap_state: fix potential faulted in race in swap_ra_info() Miaohe Lin
2021-04-09  8:50   ` Huang, Ying
2021-04-09  8:50     ` Huang, Ying
2021-04-09  9:00     ` Miaohe Lin
2021-04-12  0:55       ` Huang, Ying
2021-04-12  0:55         ` Huang, Ying
2021-04-12  3:17         ` Miaohe Lin
2021-04-08 13:08 ` [PATCH 5/5] mm/swap_state: fix swap_cluster_readahead() race with swapoff Miaohe Lin
2021-04-13  1:36   ` Huang, Ying
2021-04-13  1:36     ` Huang, Ying
2021-04-14  2:43     ` Miaohe Lin
2021-04-08 14:55 ` [PATCH 0/5] close various race windows for swap riteshh
2021-04-09  8:01   ` Miaohe Lin
2021-04-08 20:46 [PATCH 2/5] swap: fix do_swap_page() race with swapoff kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.