From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 581A1C432C0 for ; Sun, 1 Dec 2019 01:54:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0AEDC20880 for ; Sun, 1 Dec 2019 01:54:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="wTvEdHl5" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0AEDC20880 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A64FD6B030F; Sat, 30 Nov 2019 20:54:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9C7026B0311; Sat, 30 Nov 2019 20:54:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 868846B0312; Sat, 30 Nov 2019 20:54:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0004.hostedemail.com [216.40.44.4]) by kanga.kvack.org (Postfix) with ESMTP id 62FB76B030F for ; Sat, 30 Nov 2019 20:54:49 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 16A5F2A87 for ; Sun, 1 Dec 2019 01:54:49 +0000 (UTC) X-FDA: 76214903898.05.pie32_29ffc963c7561 X-HE-Tag: pie32_29ffc963c7561 X-Filterd-Recvd-Size: 10025 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Sun, 1 Dec 2019 01:54:48 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 9310A208E4; Sun, 1 Dec 2019 01:54:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1575165287; bh=5PqCjY/r/tdFtFBsYkLcQgKU8NLH28o+Z0uktE9D728=; h=Date:From:To:Subject:From; b=wTvEdHl5Vi7BsP4KN+qehjlHLr90PfIyF2Gn7q3+GKkRSeAJ1eD6VdlNJH+Foedl3 HzrVdjaMEkKzEhnp2rnHWMZiW/j1MZyZATqFxTLxiwfE+gRWaz38Nbcfx0oP8zXyKB NA21HGaAcTaR/HyDIGxA6FYWpu8Ja4yborFUEjTQ= Date: Sat, 30 Nov 2019 17:54:47 -0800 From: akpm@linux-foundation.org To: akpm@linux-foundation.org, hdanton@sina.com, linux-mm@kvack.org, mhocko@suse.com, mm-commits@vger.kernel.org, oleksiy.avramchenko@sonymobile.com, rostedt@goodmis.org, torvalds@linux-foundation.org, urezki@gmail.com, willy@infradead.org Subject: [patch 092/158] mm/vmalloc: rework vmap_area_lock Message-ID: <20191201015447.bPXjuzW7E%akpm@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "Uladzislau Rezki (Sony)" Subject: mm/vmalloc: rework vmap_area_lock With the new allocation approach introduced in the 5.2 kernel, it becomes possible to get rid of one global spinlock. By doing that we can further improve the KVA from the performance point of view. Basically we can have two independent locks, one for allocation part and another one for deallocation, because of two different entities: "free data structures" and "busy data structures". As a result, allocation/deallocation operations can still interfere between each other in case of running simultaneously on different CPUs, it means there is still dependency, but with two locks it becomes lower. Summarizing: - it reduces the high lock contention - it allows to perform operations on "free" and "busy" trees in parallel on different CPUs. Please note it does not solve scalability issue. Test results: In order to evaluate this patch, we can run "vmalloc test driver" to see how many CPU cycles it takes to complete all test cases running sequentially. All online CPUs run it so it will cause a high lock contention. HiKey 960, ARM64, 8xCPUs, big.LITTLE: sudo ./test_vmalloc.sh sequential_test_order=1 [ 390.950557] All test took CPU0=457126382 cycles [ 391.046690] All test took CPU1=454763452 cycles [ 391.128586] All test took CPU2=454539334 cycles [ 391.222669] All test took CPU3=455649517 cycles [ 391.313946] All test took CPU4=388272196 cycles [ 391.410425] All test took CPU5=384036264 cycles [ 391.492219] All test took CPU6=387432964 cycles [ 391.578433] All test took CPU7=387201996 cycles [ 304.721224] All test took CPU0=391521310 cycles [ 304.821219] All test took CPU1=393533002 cycles [ 304.917120] All test took CPU2=392243032 cycles [ 305.008986] All test took CPU3=392353853 cycles [ 305.108944] All test took CPU4=297630721 cycles [ 305.196406] All test took CPU5=297548736 cycles [ 305.288602] All test took CPU6=297092392 cycles [ 305.381088] All test took CPU7=297293597 cycles ~14%-23% patched variant is better. Link: http://lkml.kernel.org/r/20191022155800.20468-1-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) Acked-by: Andrew Morton Cc: Hillf Danton Cc: Michal Hocko Cc: Matthew Wilcox Cc: Oleksiy Avramchenko Cc: Steven Rostedt Signed-off-by: Andrew Morton --- mm/vmalloc.c | 80 ++++++++++++++++++++++++++++++------------------- 1 file changed, 50 insertions(+), 30 deletions(-) --- a/mm/vmalloc.c~mm-vmalloc-rework-vmap_area_lock +++ a/mm/vmalloc.c @@ -331,6 +331,7 @@ EXPORT_SYMBOL(vmalloc_to_pfn); static DEFINE_SPINLOCK(vmap_area_lock); +static DEFINE_SPINLOCK(free_vmap_area_lock); /* Export for kexec only */ LIST_HEAD(vmap_area_list); static LLIST_HEAD(vmap_purge_list); @@ -1114,7 +1115,7 @@ retry: */ pva = kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node); - spin_lock(&vmap_area_lock); + spin_lock(&free_vmap_area_lock); if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) kmem_cache_free(vmap_area_cachep, pva); @@ -1124,14 +1125,17 @@ retry: * returned. Therefore trigger the overflow path. */ addr = __alloc_vmap_area(size, align, vstart, vend); + spin_unlock(&free_vmap_area_lock); + if (unlikely(addr == vend)) goto overflow; va->va_start = addr; va->va_end = addr + size; va->vm = NULL; - insert_vmap_area(va, &vmap_area_root, &vmap_area_list); + spin_lock(&vmap_area_lock); + insert_vmap_area(va, &vmap_area_root, &vmap_area_list); spin_unlock(&vmap_area_lock); BUG_ON(!IS_ALIGNED(va->va_start, align)); @@ -1141,7 +1145,6 @@ retry: return va; overflow: - spin_unlock(&vmap_area_lock); if (!purged) { purge_vmap_area_lazy(); purged = 1; @@ -1177,28 +1180,25 @@ int unregister_vmap_purge_notifier(struc } EXPORT_SYMBOL_GPL(unregister_vmap_purge_notifier); -static void __free_vmap_area(struct vmap_area *va) +/* + * Free a region of KVA allocated by alloc_vmap_area + */ +static void free_vmap_area(struct vmap_area *va) { /* * Remove from the busy tree/list. */ + spin_lock(&vmap_area_lock); unlink_va(va, &vmap_area_root); + spin_unlock(&vmap_area_lock); /* - * Merge VA with its neighbors, otherwise just add it. + * Insert/Merge it back to the free tree/list. */ + spin_lock(&free_vmap_area_lock); merge_or_add_vmap_area(va, &free_vmap_area_root, &free_vmap_area_list); -} - -/* - * Free a region of KVA allocated by alloc_vmap_area - */ -static void free_vmap_area(struct vmap_area *va) -{ - spin_lock(&vmap_area_lock); - __free_vmap_area(va); - spin_unlock(&vmap_area_lock); + spin_unlock(&free_vmap_area_lock); } /* @@ -1291,7 +1291,7 @@ static bool __purge_vmap_area_lazy(unsig flush_tlb_kernel_range(start, end); resched_threshold = lazy_max_pages() << 1; - spin_lock(&vmap_area_lock); + spin_lock(&free_vmap_area_lock); llist_for_each_entry_safe(va, n_va, valist, purge_list) { unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT; @@ -1306,9 +1306,9 @@ static bool __purge_vmap_area_lazy(unsig atomic_long_sub(nr, &vmap_lazy_nr); if (atomic_long_read(&vmap_lazy_nr) < resched_threshold) - cond_resched_lock(&vmap_area_lock); + cond_resched_lock(&free_vmap_area_lock); } - spin_unlock(&vmap_area_lock); + spin_unlock(&free_vmap_area_lock); return true; } @@ -2030,15 +2030,21 @@ int map_vm_area(struct vm_struct *area, } EXPORT_SYMBOL_GPL(map_vm_area); -static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va, - unsigned long flags, const void *caller) +static inline void setup_vmalloc_vm_locked(struct vm_struct *vm, + struct vmap_area *va, unsigned long flags, const void *caller) { - spin_lock(&vmap_area_lock); vm->flags = flags; vm->addr = (void *)va->va_start; vm->size = va->va_end - va->va_start; vm->caller = caller; va->vm = vm; +} + +static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va, + unsigned long flags, const void *caller) +{ + spin_lock(&vmap_area_lock); + setup_vmalloc_vm_locked(vm, va, flags, caller); spin_unlock(&vmap_area_lock); } @@ -3298,7 +3304,7 @@ struct vm_struct **pcpu_get_vm_areas(con goto err_free; } retry: - spin_lock(&vmap_area_lock); + spin_lock(&free_vmap_area_lock); /* start scanning - we scan from the top, begin with the last area */ area = term_area = last_area; @@ -3380,29 +3386,38 @@ retry: va = vas[area]; va->va_start = start; va->va_end = start + size; - - insert_vmap_area(va, &vmap_area_root, &vmap_area_list); } - spin_unlock(&vmap_area_lock); + spin_unlock(&free_vmap_area_lock); /* insert all vm's */ - for (area = 0; area < nr_vms; area++) - setup_vmalloc_vm(vms[area], vas[area], VM_ALLOC, + spin_lock(&vmap_area_lock); + for (area = 0; area < nr_vms; area++) { + insert_vmap_area(vas[area], &vmap_area_root, &vmap_area_list); + + setup_vmalloc_vm_locked(vms[area], vas[area], VM_ALLOC, pcpu_get_vm_areas); + } + spin_unlock(&vmap_area_lock); kfree(vas); return vms; recovery: - /* Remove previously inserted areas. */ + /* + * Remove previously allocated areas. There is no + * need in removing these areas from the busy tree, + * because they are inserted only on the final step + * and when pcpu_get_vm_areas() is success. + */ while (area--) { - __free_vmap_area(vas[area]); + merge_or_add_vmap_area(vas[area], + &free_vmap_area_root, &free_vmap_area_list); vas[area] = NULL; } overflow: - spin_unlock(&vmap_area_lock); + spin_unlock(&free_vmap_area_lock); if (!purged) { purge_vmap_area_lazy(); purged = true; @@ -3453,9 +3468,12 @@ void pcpu_free_vm_areas(struct vm_struct #ifdef CONFIG_PROC_FS static void *s_start(struct seq_file *m, loff_t *pos) + __acquires(&vmap_purge_lock) __acquires(&vmap_area_lock) { + mutex_lock(&vmap_purge_lock); spin_lock(&vmap_area_lock); + return seq_list_start(&vmap_area_list, *pos); } @@ -3465,8 +3483,10 @@ static void *s_next(struct seq_file *m, } static void s_stop(struct seq_file *m, void *p) + __releases(&vmap_purge_lock) __releases(&vmap_area_lock) { + mutex_unlock(&vmap_purge_lock); spin_unlock(&vmap_area_lock); } _