linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Huang\, Ying" <ying.huang@intel.com>
To: Rafael Aquini <aquini@redhat.com>
Cc: linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	 akpm@linux-foundation.org
Subject: Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference
Date: Wed, 23 Sep 2020 10:21:36 +0800	[thread overview]
Message-ID: <878sd1qllb.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20200922184838.978540-1-aquini@redhat.com> (Rafael Aquini's message of "Tue, 22 Sep 2020 14:48:38 -0400")

Hi, Rafael,

Rafael Aquini <aquini@redhat.com> writes:

> The swap area descriptor only gets struct swap_cluster_info *cluster_info
> allocated if the swapfile is backed by non-rotational storage.
> When the swap area is laid on top of ordinary disk spindles, lock_cluster()
> will naturally return NULL.

Thanks for reporting.  But the bug looks strange.  Because in a system
with only HDD swap devices, during THP swap out, the swap cluster
shouldn't be allocated, as in

shrink_page_list()
  add_to_swap()
    get_swap_page()
      get_swap_pages()
        swap_alloc_cluster()

Where si->free_clusters is checked, and it should be empty for HDD.  So
in shrink_page_list(), the THP should have been split.  While in
split_huge_page_to_list(), PageSwapCache() is checked before calling
split_swap_cluster().  So this appears strange.

All in all, it appears that we need to find the real root cause of the
bug.

Did you test with the latest upstream kernel?  Can you help trace the
return value of swap_alloc_cluster()?  Can you share the swap device
information?

Best Regards,
Huang, Ying

> CONFIG_THP_SWAP exposes cluster_info infrastructure to a broader number of
> use cases, and split_swap_cluster(), which is the counterpart of split_huge_page()
> for the THPs in the swapcache, misses checking the return of lock_cluster before
> operating on the cluster_info pointer.
>
> This patch addresses that issue by adding a proper check for the pointer
> not being NULL in the wrappers cluster_{is,clear}_huge(), in order to avoid
> crashes similar to the one below:
>
> [ 5758.157556] BUG: kernel NULL pointer dereference, address: 0000000000000007
> [ 5758.165331] #PF: supervisor write access in kernel mode
> [ 5758.171161] #PF: error_code(0x0002) - not-present page
> [ 5758.176894] PGD 0 P4D 0
> [ 5758.179721] Oops: 0002 [#1] SMP PTI
> [ 5758.183614] CPU: 10 PID: 316 Comm: kswapd1 Kdump: loaded Tainted: G S               --------- ---  5.9.0-0.rc3.1.tst.el8.x86_64 #1
> [ 5758.196717] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.01.0002.082220131453 08/22/2013
> [ 5758.208176] RIP: 0010:split_swap_cluster+0x47/0x60
> [ 5758.213522] Code: c1 e3 06 48 c1 eb 0f 48 8d 1c d8 48 89 df e8 d0 20 6a 00 80 63 07 fb 48 85 db 74 16 48 89 df c6 07 00 66 66 66 90 31 c0 5b c3 <80> 24 25 07 00 00 00 fb 31 c0 5b c3 b8 f0 ff ff ff 5b c3 66 0f 1f
> [ 5758.234478] RSP: 0018:ffffb147442d7af0 EFLAGS: 00010246
> [ 5758.240309] RAX: 0000000000000000 RBX: 000000000014b217 RCX: ffffb14779fd9000
> [ 5758.248281] RDX: 000000000014b217 RSI: ffff9c52f2ab1400 RDI: 000000000014b217
> [ 5758.256246] RBP: ffffe00c51168080 R08: ffffe00c5116fe08 R09: ffff9c52fffd3000
> [ 5758.264208] R10: ffffe00c511537c8 R11: ffff9c52fffd3c90 R12: 0000000000000000
> [ 5758.272172] R13: ffffe00c51170000 R14: ffffe00c51170000 R15: ffffe00c51168040
> [ 5758.280134] FS:  0000000000000000(0000) GS:ffff9c52f2a80000(0000) knlGS:0000000000000000
> [ 5758.289163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 5758.295575] CR2: 0000000000000007 CR3: 0000000022a0e003 CR4: 00000000000606e0
> [ 5758.303538] Call Trace:
> [ 5758.306273]  split_huge_page_to_list+0x88b/0x950
> [ 5758.311433]  deferred_split_scan+0x1ca/0x310
> [ 5758.316202]  do_shrink_slab+0x12c/0x2a0
> [ 5758.320491]  shrink_slab+0x20f/0x2c0
> [ 5758.324482]  shrink_node+0x240/0x6c0
> [ 5758.328469]  balance_pgdat+0x2d1/0x550
> [ 5758.332652]  kswapd+0x201/0x3c0
> [ 5758.336157]  ? finish_wait+0x80/0x80
> [ 5758.340147]  ? balance_pgdat+0x550/0x550
> [ 5758.344525]  kthread+0x114/0x130
> [ 5758.348126]  ? kthread_park+0x80/0x80
> [ 5758.352214]  ret_from_fork+0x22/0x30
> [ 5758.356203] Modules linked in: fuse zram rfkill sunrpc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 iTCO_wdt crct10dif_pclmul iTCO_vendor_support drm_kms_helper crc32_pclmul ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops cec rapl joydev intel_cstate ipmi_si ipmi_devintf drm intel_uncore i2c_i801 ipmi_msghandler pcspkr lpc_ich mei_me i2c_smbus mei ioatdma ip_tables xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg igb ahci libahci i2c_algo_bit crc32c_intel libata dca wmi dm_mirror dm_region_hash dm_log dm_mod
> [ 5758.412673] CR2: 0000000000000007
> [    0.000000] Linux version 5.9.0-0.rc3.1.tst.el8.x86_64 (mockbuild@x86-vm-15.build.eng.bos.redhat.com) (gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), GNU ld version 2.30-79.el8) #1 SMP Wed Sep 9 16:03:34 EDT 2020
>
> Fixes: 59807685a7e77 ("mm, THP, swap: support splitting THP for THP swap out")
> Signed-off-by: Rafael Aquini <aquini@redhat.com>
> ---
>  mm/swapfile.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 12f59e641b5e..37ddf5e5c53b 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -324,14 +324,15 @@ static inline void cluster_set_null(struct swap_cluster_info *info)
>  
>  static inline bool cluster_is_huge(struct swap_cluster_info *info)
>  {
> -	if (IS_ENABLED(CONFIG_THP_SWAP))
> +	if (IS_ENABLED(CONFIG_THP_SWAP) && info)
>  		return info->flags & CLUSTER_FLAG_HUGE;
>  	return false;
>  }
>  
>  static inline void cluster_clear_huge(struct swap_cluster_info *info)
>  {
> -	info->flags &= ~CLUSTER_FLAG_HUGE;
> +	if (IS_ENABLED(CONFIG_THP_SWAP) && info)
> +		info->flags &= ~CLUSTER_FLAG_HUGE;
>  }
>  
>  static inline struct swap_cluster_info *lock_cluster(struct swap_info_struct *si,


  parent reply	other threads:[~2020-09-23  2:21 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-22 18:48 [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference Rafael Aquini
2020-09-22 19:47 ` Andrew Morton
2020-09-23 13:42   ` Rafael Aquini
2020-09-25  2:59     ` Andrew Morton
2020-09-25  3:06       ` Huang, Ying
2020-09-25  3:10         ` Andrew Morton
2020-09-23  2:21 ` Huang, Ying [this message]
2020-09-23  4:34   ` Rafael Aquini
2020-09-23  5:13     ` Huang, Ying
2020-09-23 13:01       ` Rafael Aquini
2020-09-24  0:59         ` Huang, Ying
2020-09-24  2:09           ` Rafael Aquini
2020-09-24  3:51             ` Huang, Ying
2020-09-24  6:30               ` Rafael Aquini
2020-09-24  6:57                 ` Huang, Ying
2020-09-24  7:45                 ` Huang, Ying
2020-09-24 15:08                   ` Rafael Aquini
2020-09-25  3:21                     ` Huang, Ying
2020-09-26 15:16                       ` Rafael Aquini
2020-09-27  5:33                         ` Huang, Ying
2020-10-01 14:31                       ` Rafael Aquini
2020-10-05 13:39                         ` Rafael Aquini
2020-10-09  0:18                           ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878sd1qllb.fsf@yhuang-dev.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aquini@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).