From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15A49C433EF for ; Tue, 28 Sep 2021 08:44:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 97770611C7 for ; Tue, 28 Sep 2021 08:44:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 97770611C7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linutronix.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id ADEA6900003; Tue, 28 Sep 2021 04:44:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A8ECA900002; Tue, 28 Sep 2021 04:44:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9565A900003; Tue, 28 Sep 2021 04:44:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0138.hostedemail.com [216.40.44.138]) by kanga.kvack.org (Postfix) with ESMTP id 82A93900002 for ; Tue, 28 Sep 2021 04:44:23 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 37B888249980 for ; Tue, 28 Sep 2021 08:44:23 +0000 (UTC) X-FDA: 78636345606.01.A1CA38F Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf27.hostedemail.com (Postfix) with ESMTP id 97B3C7000081 for ; Tue, 28 Sep 2021 08:44:22 +0000 (UTC) Date: Tue, 28 Sep 2021 10:44:19 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1632818660; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=FEduYBEEFSZk3TglgSsDULGXVp+UVy3p16q0zetdcjg=; b=z40DhF3MSXcf9aPbNxrTLw1cb5uXCnD+LJZChkw8rPA7U6ZNbf2s0xJK07x1ygwEyCwp0o QkVWW5q2Mg/cDlhqT2cxZ+h0vlQLs8m4NPlGKtzBnLhIDurPk9JNV7HR1v7NWjhPtlTySZ cNGcBUgqdgm/yQo0LYmkaCNOWdwJmD8d/9ix4W91/wVR7n7pzYk+Tv+vpwqMItbSPhxK0s G3vppObTAz5EKFjAXPwW625+BVRpCBHkpNkOesx4htcqVvS1Hy8QcdPa4qtjXiW5TgEETs /QKqbEzfJElrZFEFhcCcdIiTJF/WihxT+JnY5/ASAO69har8qTzcA3qEyrwLkg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1632818660; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=FEduYBEEFSZk3TglgSsDULGXVp+UVy3p16q0zetdcjg=; b=iM3DJAqoUA8sm53vAAfAp/ZaP/AFRdKqNsSUsk98ksIXBFRkuJfqMxNpsCAXhvK4PPOz+o W8zVF7fQHJ5Ru/Bg== From: Sebastian Andrzej Siewior To: Minchan Kim Cc: linux-mm@kvack.org, Andrew Morton , Thomas Gleixner , Mike Galbraith , Mike Galbraith Subject: [PATCH] mm/zsmalloc: Replace bit spinlock and get_cpu_var() usage. Message-ID: <20210928084419.mkfu62barwrsvflq@linutronix.de> References: <20210923170121.1860133-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 97B3C7000081 X-Stat-Signature: d6h3e76nuawequu6etbig5xw59ussawe Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=z40DhF3M; dkim=pass header.d=linutronix.de header.s=2020e header.b=iM3DJAqo; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf27.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de X-HE-Tag: 1632818662-310080 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mike Galbraith For efficiency reasons, zsmalloc is using a slim `handle'. The value is the address of a memory allocation of 4 or 8 bytes depending on the size of the long data type. The lowest bit in that allocated memory is used as a bit spin lock. The usage of the bit spin lock is problematic because with the bit spin lock held zsmalloc acquires a rwlock_t and spinlock_t which are both sleeping locks on PREEMPT_RT and therefore must not be acquired with disabled preemption. Extend the handle to struct zsmalloc_handle which holds the old handle as addr and a spinlock_t which replaces the bit spinlock. Replace all the wrapper functions accordingly. The usage of get_cpu_var() in zs_map_object() is problematic because it disables preemption and makes it impossible to acquire any sleeping lock on PREEMPT_RT such as a spinlock_t. Replace the get_cpu_var() usage with a local_lock_t which is embedded struct mapping_area. It ensures that the access the struct is synchronized against all users on the same CPU. This survived LTP testing. Signed-off-by: Mike Galbraith Signed-off-by: Thomas Gleixner [bigeasy: replace the bitspin_lock() with a mutex, get_locked_var() and patch description. Mike then fixed the size magic and made handle lock spinlock_t.] Signed-off-by: Sebastian Andrzej Siewior --- Minchan, this is the replacement we have so far. There is something similar for block/zram. By disabling zsmalloc we also have zram disabled so no fallout. To my best knowledge, the only usage/testing is LTP based. mm/zsmalloc.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 78 insertions(+), 6 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 68e8831068f4b..22c18ac605b5f 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -57,6 +57,7 @@ #include #include #include +#include #define ZSPAGE_MAGIC 0x58 @@ -77,6 +78,20 @@ #define ZS_HANDLE_SIZE (sizeof(unsigned long)) +#ifdef CONFIG_PREEMPT_RT + +struct zsmalloc_handle { + unsigned long addr; + spinlock_t lock; +}; + +#define ZS_HANDLE_ALLOC_SIZE (sizeof(struct zsmalloc_handle)) + +#else + +#define ZS_HANDLE_ALLOC_SIZE (sizeof(unsigned long)) +#endif + /* * Object location (, ) is encoded as * a single (unsigned long) handle value. @@ -293,6 +308,7 @@ struct zspage { }; struct mapping_area { + local_lock_t lock; char *vm_buf; /* copy buffer for objects that span pages */ char *vm_addr; /* address of kmap_atomic()'ed pages */ enum zs_mapmode vm_mm; /* mapping mode */ @@ -322,7 +338,7 @@ static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage) {} static int create_cache(struct zs_pool *pool) { - pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_SIZE, + pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_ALLOC_SIZE, 0, 0, NULL); if (!pool->handle_cachep) return 1; @@ -346,10 +362,27 @@ static void destroy_cache(struct zs_pool *pool) static unsigned long cache_alloc_handle(struct zs_pool *pool, gfp_t gfp) { - return (unsigned long)kmem_cache_alloc(pool->handle_cachep, - gfp & ~(__GFP_HIGHMEM|__GFP_MOVABLE)); + void *p; + + p = kmem_cache_alloc(pool->handle_cachep, + gfp & ~(__GFP_HIGHMEM|__GFP_MOVABLE)); +#ifdef CONFIG_PREEMPT_RT + if (p) { + struct zsmalloc_handle *zh = p; + + spin_lock_init(&zh->lock); + } +#endif + return (unsigned long)p; } +#ifdef CONFIG_PREEMPT_RT +static struct zsmalloc_handle *zs_get_pure_handle(unsigned long handle) +{ + return (void *)(handle & ~((1 << OBJ_TAG_BITS) - 1)); +} +#endif + static void cache_free_handle(struct zs_pool *pool, unsigned long handle) { kmem_cache_free(pool->handle_cachep, (void *)handle); @@ -368,12 +401,18 @@ static void cache_free_zspage(struct zs_pool *pool, struct zspage *zspage) static void record_obj(unsigned long handle, unsigned long obj) { +#ifdef CONFIG_PREEMPT_RT + struct zsmalloc_handle *zh = zs_get_pure_handle(handle); + + WRITE_ONCE(zh->addr, obj); +#else /* * lsb of @obj represents handle lock while other bits * represent object value the handle is pointing so * updating shouldn't do store tearing. */ WRITE_ONCE(*(unsigned long *)handle, obj); +#endif } /* zpool driver */ @@ -455,7 +494,9 @@ MODULE_ALIAS("zpool-zsmalloc"); #endif /* CONFIG_ZPOOL */ /* per-cpu VM mapping areas for zspage accesses that cross page boundaries */ -static DEFINE_PER_CPU(struct mapping_area, zs_map_area); +static DEFINE_PER_CPU(struct mapping_area, zs_map_area) = { + .lock = INIT_LOCAL_LOCK(lock), +}; static bool is_zspage_isolated(struct zspage *zspage) { @@ -862,7 +903,13 @@ static unsigned long location_to_obj(struct page *page, unsigned int obj_idx) static unsigned long handle_to_obj(unsigned long handle) { +#ifdef CONFIG_PREEMPT_RT + struct zsmalloc_handle *zh = zs_get_pure_handle(handle); + + return zh->addr; +#else return *(unsigned long *)handle; +#endif } static unsigned long obj_to_head(struct page *page, void *obj) @@ -876,22 +923,46 @@ static unsigned long obj_to_head(struct page *page, void *obj) static inline int testpin_tag(unsigned long handle) { +#ifdef CONFIG_PREEMPT_RT + struct zsmalloc_handle *zh = zs_get_pure_handle(handle); + + return spin_is_locked(&zh->lock); +#else return bit_spin_is_locked(HANDLE_PIN_BIT, (unsigned long *)handle); +#endif } static inline int trypin_tag(unsigned long handle) { +#ifdef CONFIG_PREEMPT_RT + struct zsmalloc_handle *zh = zs_get_pure_handle(handle); + + return spin_trylock(&zh->lock); +#else return bit_spin_trylock(HANDLE_PIN_BIT, (unsigned long *)handle); +#endif } static void pin_tag(unsigned long handle) __acquires(bitlock) { +#ifdef CONFIG_PREEMPT_RT + struct zsmalloc_handle *zh = zs_get_pure_handle(handle); + + return spin_lock(&zh->lock); +#else bit_spin_lock(HANDLE_PIN_BIT, (unsigned long *)handle); +#endif } static void unpin_tag(unsigned long handle) __releases(bitlock) { +#ifdef CONFIG_PREEMPT_RT + struct zsmalloc_handle *zh = zs_get_pure_handle(handle); + + return spin_unlock(&zh->lock); +#else bit_spin_unlock(HANDLE_PIN_BIT, (unsigned long *)handle); +#endif } static void reset_page(struct page *page) @@ -1274,7 +1345,8 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, class = pool->size_class[class_idx]; off = (class->size * obj_idx) & ~PAGE_MASK; - area = &get_cpu_var(zs_map_area); + local_lock(&zs_map_area.lock); + area = this_cpu_ptr(&zs_map_area); area->vm_mm = mm; if (off + class->size <= PAGE_SIZE) { /* this object is contained entirely within a page */ @@ -1328,7 +1400,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) __zs_unmap_object(area, pages, off, class->size); } - put_cpu_var(zs_map_area); + local_unlock(&zs_map_area.lock); migrate_read_unlock(zspage); unpin_tag(handle); -- 2.33.0