linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Henry Burns <henryburns@google.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Henry Burns <henrywolfeburns@gmail.com>,
	Minchan Kim <minchan@kernel.org>,
	Shakeel Butt <shakeelb@google.com>,
	Jonathan Adams <jwadams@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.14 59/62] mm/zsmalloc.c: fix race condition in zs_destroy_pool
Date: Tue, 27 Aug 2019 09:51:04 +0200	[thread overview]
Message-ID: <20190827072703.875523743@linuxfoundation.org> (raw)
In-Reply-To: <20190827072659.803647352@linuxfoundation.org>

From: Henry Burns <henryburns@google.com>

commit 701d678599d0c1623aaf4139c03eea260a75b027 upstream.

In zs_destroy_pool() we call flush_work(&pool->free_work).  However, we
have no guarantee that migration isn't happening in the background at
that time.

Since migration can't directly free pages, it relies on free_work being
scheduled to free the pages.  But there's nothing preventing an
in-progress migrate from queuing the work *after*
zs_unregister_migration() has called flush_work().  Which would mean
pages still pointing at the inode when we free it.

Since we know at destroy time all objects should be free, no new
migrations can come in (since zs_page_isolate() fails for fully-free
zspages).  This means it is sufficient to track a "# isolated zspages"
count by class, and have the destroy logic ensure all such pages have
drained before proceeding.  Keeping that state under the class spinlock
keeps the logic straightforward.

In this case a memory leak could lead to an eventual crash if compaction
hits the leaked page.  This crash would only occur if people are
changing their zswap backend at runtime (which eventually starts
destruction).

Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com
Fixes: 48b4800a1c6a ("zsmalloc: page migration support")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/zsmalloc.c |   61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 59 insertions(+), 2 deletions(-)

--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -52,6 +52,7 @@
 #include <linux/zpool.h>
 #include <linux/mount.h>
 #include <linux/migrate.h>
+#include <linux/wait.h>
 #include <linux/pagemap.h>
 
 #define ZSPAGE_MAGIC	0x58
@@ -267,6 +268,10 @@ struct zs_pool {
 #ifdef CONFIG_COMPACTION
 	struct inode *inode;
 	struct work_struct free_work;
+	/* A wait queue for when migration races with async_free_zspage() */
+	struct wait_queue_head migration_wait;
+	atomic_long_t isolated_pages;
+	bool destroying;
 #endif
 };
 
@@ -1890,6 +1895,19 @@ static void putback_zspage_deferred(stru
 
 }
 
+static inline void zs_pool_dec_isolated(struct zs_pool *pool)
+{
+	VM_BUG_ON(atomic_long_read(&pool->isolated_pages) <= 0);
+	atomic_long_dec(&pool->isolated_pages);
+	/*
+	 * There's no possibility of racing, since wait_for_isolated_drain()
+	 * checks the isolated count under &class->lock after enqueuing
+	 * on migration_wait.
+	 */
+	if (atomic_long_read(&pool->isolated_pages) == 0 && pool->destroying)
+		wake_up_all(&pool->migration_wait);
+}
+
 static void replace_sub_page(struct size_class *class, struct zspage *zspage,
 				struct page *newpage, struct page *oldpage)
 {
@@ -1959,6 +1977,7 @@ bool zs_page_isolate(struct page *page,
 	 */
 	if (!list_empty(&zspage->list) && !is_zspage_isolated(zspage)) {
 		get_zspage_mapping(zspage, &class_idx, &fullness);
+		atomic_long_inc(&pool->isolated_pages);
 		remove_zspage(class, zspage, fullness);
 	}
 
@@ -2058,8 +2077,16 @@ int zs_page_migrate(struct address_space
 	 * Page migration is done so let's putback isolated zspage to
 	 * the list if @page is final isolated subpage in the zspage.
 	 */
-	if (!is_zspage_isolated(zspage))
+	if (!is_zspage_isolated(zspage)) {
+		/*
+		 * We cannot race with zs_destroy_pool() here because we wait
+		 * for isolation to hit zero before we start destroying.
+		 * Also, we ensure that everyone can see pool->destroying before
+		 * we start waiting.
+		 */
 		putback_zspage_deferred(pool, class, zspage);
+		zs_pool_dec_isolated(pool);
+	}
 
 	reset_page(page);
 	put_page(page);
@@ -2110,8 +2137,8 @@ void zs_page_putback(struct page *page)
 		 * so let's defer.
 		 */
 		putback_zspage_deferred(pool, class, zspage);
+		zs_pool_dec_isolated(pool);
 	}
-
 	spin_unlock(&class->lock);
 }
 
@@ -2134,8 +2161,36 @@ static int zs_register_migration(struct
 	return 0;
 }
 
+static bool pool_isolated_are_drained(struct zs_pool *pool)
+{
+	return atomic_long_read(&pool->isolated_pages) == 0;
+}
+
+/* Function for resolving migration */
+static void wait_for_isolated_drain(struct zs_pool *pool)
+{
+
+	/*
+	 * We're in the process of destroying the pool, so there are no
+	 * active allocations. zs_page_isolate() fails for completely free
+	 * zspages, so we need only wait for the zs_pool's isolated
+	 * count to hit zero.
+	 */
+	wait_event(pool->migration_wait,
+		   pool_isolated_are_drained(pool));
+}
+
 static void zs_unregister_migration(struct zs_pool *pool)
 {
+	pool->destroying = true;
+	/*
+	 * We need a memory barrier here to ensure global visibility of
+	 * pool->destroying. Thus pool->isolated pages will either be 0 in which
+	 * case we don't care, or it will be > 0 and pool->destroying will
+	 * ensure that we wake up once isolation hits 0.
+	 */
+	smp_mb();
+	wait_for_isolated_drain(pool); /* This can block */
 	flush_work(&pool->free_work);
 	iput(pool->inode);
 }
@@ -2376,6 +2431,8 @@ struct zs_pool *zs_create_pool(const cha
 	if (!pool->name)
 		goto err;
 
+	init_waitqueue_head(&pool->migration_wait);
+
 	if (create_cache(pool))
 		goto err;
 



  parent reply	other threads:[~2019-08-27  7:54 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-27  7:50 [PATCH 4.14 00/62] 4.14.141-stable review Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 01/62] HID: Add 044f:b320 ThrustMaster, Inc. 2 in 1 DT Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 02/62] MIPS: kernel: only use i8253 clocksource with periodic clockevent Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 03/62] mips: fix cacheinfo Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 04/62] netfilter: ebtables: fix a memory leak bug in compat Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 05/62] ASoC: dapm: Fix handling of custom_stop_condition on DAPM graph walks Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 06/62] bonding: Force slave speed check after link state recovery for 802.3ad Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 07/62] can: dev: call netif_carrier_off() in register_candev() Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 08/62] ASoC: Fail card instantiation if DAI format setup fails Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 09/62] st21nfca_connectivity_event_received: null check the allocation Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 10/62] st_nci_hci_connectivity_event_received: " Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 11/62] ASoC: ti: davinci-mcasp: Correct slot_width posed constraint Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 12/62] net: usb: qmi_wwan: Add the BroadMobi BM818 card Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 13/62] qed: RDMA - Fix the hw_ver returned in device attributes Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 14/62] isdn: mISDN: hfcsusb: Fix possible null-pointer dereferences in start_isoc_chain() Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 15/62] netfilter: ipset: Fix rename concurrency with listing Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 16/62] isdn: hfcsusb: Fix mISDN driver crash caused by transfer buffer on the stack Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 17/62] perf bench numa: Fix cpu0 binding Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 18/62] can: sja1000: force the string buffer NULL-terminated Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 19/62] can: peak_usb: " Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 20/62] net/ethernet/qlogic/qed: " Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 21/62] NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim() Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 22/62] HID: input: fix a4tech horizontal wheel custom usage Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 23/62] SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUAL Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 24/62] net: cxgb3_main: Fix a resource leak in a error path in init_one() Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 25/62] net: hisilicon: make hip04_tx_reclaim non-reentrant Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 26/62] net: hisilicon: fix hip04-xmit never return TX_BUSY Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 27/62] net: hisilicon: Fix dma_map_single failed on arm64 Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 28/62] libata: have ata_scsi_rw_xlat() fail invalid passthrough requests Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 29/62] libata: add SG safety checks in SFF pio transfers Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 30/62] x86/lib/cpu: Address missing prototypes warning Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 31/62] drm/vmwgfx: fix memory leak when too many retries have occurred Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 32/62] perf ftrace: Fix failure to set cpumask when only one cpu is present Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 33/62] perf cpumap: Fix writing to illegal memory in handling cpumap mask Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 34/62] perf pmu-events: Fix missing "cpu_clk_unhalted.core" event Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 35/62] selftests: kvm: Adding config fragments Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 36/62] HID: wacom: correct misreported EKR ring values Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 37/62] HID: wacom: Correct distance scale for 2nd-gen Intuos devices Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 38/62] Revert "dm bufio: fix deadlock with loop device" Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 39/62] ceph: dont try fill file_lock on unsuccessful GETFILELOCK reply Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 40/62] libceph: fix PG split vs OSD (re)connect race Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 41/62] drm/nouveau: Dont retry infinitely when receiving no data on i2c over AUX Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 42/62] gpiolib: never report open-drain/source lines as input to user-space Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 43/62] userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 44/62] x86/retpoline: Dont clobber RFLAGS during CALL_NOSPEC on i386 Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 45/62] x86/apic: Handle missing global clockevent gracefully Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 46/62] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 47/62] x86/boot: Save fields explicitly, zero out everything else Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 48/62] x86/boot: Fix boot regression caused by bootparam sanitizing Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 49/62] dm kcopyd: always complete failed jobs Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 50/62] dm btree: fix order of block initialization in btree_split_beneath Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 51/62] dm space map metadata: fix missing store of apply_bops() return value Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 52/62] dm table: fix invalid memory accesses with too high sector number Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 53/62] dm zoned: improve error handling in reclaim Greg Kroah-Hartman
2019-08-27  7:50 ` [PATCH 4.14 54/62] dm zoned: improve error handling in i/o map code Greg Kroah-Hartman
2019-08-27  7:51 ` [PATCH 4.14 55/62] dm zoned: properly handle backing device failure Greg Kroah-Hartman
2019-08-27  7:51 ` [PATCH 4.14 56/62] genirq: Properly pair kobject_del() with kobject_add() Greg Kroah-Hartman
2019-08-27  7:51 ` [PATCH 4.14 57/62] mm, page_owner: handle THP splits correctly Greg Kroah-Hartman
2019-08-27  7:51 ` [PATCH 4.14 58/62] mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely Greg Kroah-Hartman
2019-08-27  7:51 ` Greg Kroah-Hartman [this message]
2019-08-27  7:51 ` [PATCH 4.14 60/62] xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOT Greg Kroah-Hartman
2019-08-27  7:51 ` [PATCH 4.14 61/62] dm zoned: fix potential NULL dereference in dmz_do_reclaim() Greg Kroah-Hartman
2019-08-27  7:51 ` [PATCH 4.14 62/62] powerpc: Allow flush_(inval_)dcache_range to work across ranges >4GB Greg Kroah-Hartman
2019-08-27 17:24 ` [PATCH 4.14 00/62] 4.14.141-stable review Guenter Roeck
2019-08-27 19:12 ` shuah
2019-08-28  4:17 ` kernelci.org bot
2019-08-28  4:53 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190827072703.875523743@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=henryburns@google.com \
    --cc=henrywolfeburns@gmail.com \
    --cc=jwadams@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minchan@kernel.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=shakeelb@google.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).