From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B6DCC433EF for ; Tue, 14 Sep 2021 19:15:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B351661029 for ; Tue, 14 Sep 2021 19:15:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B351661029 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 4F9D26B0072; Tue, 14 Sep 2021 15:15:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A8CE6B0073; Tue, 14 Sep 2021 15:15:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 398ED900002; Tue, 14 Sep 2021 15:15:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0113.hostedemail.com [216.40.44.113]) by kanga.kvack.org (Postfix) with ESMTP id 2DC7A6B0072 for ; Tue, 14 Sep 2021 15:15:43 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D2ACC18023435 for ; Tue, 14 Sep 2021 19:15:42 +0000 (UTC) X-FDA: 78587133324.15.869B64C Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf29.hostedemail.com (Postfix) with ESMTP id 67CF7900013E for ; Tue, 14 Sep 2021 19:15:42 +0000 (UTC) Received: by mail-ej1-f42.google.com with SMTP id qq21so634120ejb.10 for ; Tue, 14 Sep 2021 12:15:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=hHstVMUN/G0pYs2uKRn5zA+XIUh1l56ApnQSUa8rs7Q=; b=X9BKcGWEQsJOCiBp1Ia5outl6HBqUsFiJyV+A27LeZnuVpZom5iTey2FEMlQG2aCLu D721MGvcH2zCSIJtQ5WVYv3ac6gexRNarfzsdjEfVggvRMWuygxSUNCsPEE4aerEbsId 6KB+9doaXfKO2JmIoM86aTYAP81xYTL0GecNWwFKuiiaEtTmvmSEy0xqZY05IWg9O+Qa xMSVM50gB5yPAd0kj9hPh/SrESido7M8W5DzEd0ZFxp/Su1H/zy61RwdLq14lIrg8vbD R+zt+5iv7Y8fPimhyn6qiZXRixcVbNIoKruoPB/uk4I39oNH9bjlwhD2EXNeZe0gx9+E Yx4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=hHstVMUN/G0pYs2uKRn5zA+XIUh1l56ApnQSUa8rs7Q=; b=raYOjgfJf7UQPVUMNWSSjkCPV4fCHPl3AQpyW5xTgFZci7OincVMYR4s24MY3G7zau MwyM+PJ/wv8m2UQ9yS64IOT2b+6ZqPrw3BDlC8WGo6lFibGvcy9bwraJiCuMXHD+F28g SQypb5//X8NWpf5WAeJnms4NMKOwkP7ZTZ4mSuUQW9Envn2unP8yAY2twRuXKu7CgLG+ 0UxYPN3yao5UlL8AcmOtAdxE79yHrSPJAJU0A++qploU/idLPz5Sajs9CldfVHXlMRsr VaVPy4Wu3XX6BzPH+aZzZf+qDc/UvBStw7aJ/IgMgUyYMRynfOwjrTFS63EeJG95k0DI Yzjg== X-Gm-Message-State: AOAM533IOD5lz9aILAdPHsq8m5Bvc1LyORa8FHPLeH45LdjfyhWpLXnz 00lin8kW5MAWmhjRclWqfPl089zHwSDPBD1VG+Q= X-Google-Smtp-Source: ABdhPJxS8Orxy4055O/Xwy/Ud7P0uMfay8d4QTLnHeVOhO5HWQLVrjeeueVZRp+XAvOJMHTIKqCtnSNuVjdyG2nJ+ws= X-Received: by 2002:a17:906:c055:: with SMTP id bm21mr20429680ejb.350.1631646941112; Tue, 14 Sep 2021 12:15:41 -0700 (PDT) MIME-Version: 1.0 References: <20210908132727.16165-1-david@redhat.com> In-Reply-To: From: Uladzislau Rezki Date: Tue, 14 Sep 2021 21:15:30 +0200 Message-ID: Subject: Re: [PATCH v1] mm/vmalloc: fix exact allocations with an alignment > 1 To: David Hildenbrand Cc: LKML , Ping Fang , Andrew Morton , Roman Gushchin , Michal Hocko , Oscar Salvador , Linux Memory Management List Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 67CF7900013E X-Stat-Signature: n5atsu5tgf31d5nguw11wz8n7iatgfbt Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=X9BKcGWE; spf=pass (imf29.hostedemail.com: domain of urezki@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1631646942-715187 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > Let me have a look at it. > > Vlad Rezki > > On Wed, Sep 8, 2021 at 3:27 PM David Hildenbrand wrote: > > > > find_vmap_lowest_match() is imprecise such that it won't always > > find "the first free block ... that will accomplish the request" if > > an alignment > 1 was specified: especially also when the alignment is > > PAGE_SIZE. Unfortuantely, the way the vmalloc data structures were > > designed, propagating the max size without alignment information through > > the tree, it's hard to make it precise again when an alignment > 1 is > > specified. > > > > The issue is that in order to be able to eventually align later, > > find_vmap_lowest_match() has to search for a slightly bigger area and > > might skip some applicable subtrees just by lookign at the result of > > get_subtree_max_size(). While this usually doesn't matter, it matters for > > exact allocations as performed by KASAN when onlining a memory block, > > when the free block exactly matches the request. > > (mm/kasn/shadow.c:kasan_mem_notifier()). > > > > In case we online memory blocks out of order (not lowest to highest > > address), find_vmap_lowest_match() with PAGE_SIZE alignment will reject > > an exact allocation if it corresponds exactly to a free block. (there are > > some corner cases where it would still work, if we're lucky and hit the > > first is_within_this_va() inside the while loop) > > > > [root@vm-0 fedora]# echo online > /sys/devices/system/memory/memory82/state > > [root@vm-0 fedora]# echo online > /sys/devices/system/memory/memory83/state > > [root@vm-0 fedora]# echo online > /sys/devices/system/memory/memory85/state > > [root@vm-0 fedora]# echo online > /sys/devices/system/memory/memory84/state > > [ 223.858115] vmap allocation for size 16777216 failed: use vmalloc= to increase size > > [ 223.859415] bash: vmalloc: allocation failure: 16777216 bytes, mode:0x6000c0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 > > [ 223.860992] CPU: 4 PID: 1644 Comm: bash Kdump: loaded Not tainted 4.18.0-339.el8.x86_64+debug #1 > > [ 223.862149] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 > > [ 223.863580] Call Trace: > > [ 223.863946] dump_stack+0x8e/0xd0 > > [ 223.864420] warn_alloc.cold.90+0x8a/0x1b2 > > [ 223.864990] ? zone_watermark_ok_safe+0x300/0x300 > > [ 223.865626] ? slab_free_freelist_hook+0x85/0x1a0 > > [ 223.866264] ? __get_vm_area_node+0x240/0x2c0 > > [ 223.866858] ? kfree+0xdd/0x570 > > [ 223.867309] ? kmem_cache_alloc_node_trace+0x157/0x230 > > [ 223.868028] ? notifier_call_chain+0x90/0x160 > > [ 223.868625] __vmalloc_node_range+0x465/0x840 > > [ 223.869230] ? mark_held_locks+0xb7/0x120 > > > > While we could fix this in kasan_mem_notifier() by passing an alignment > > of "1", this is actually an implementation detail of vmalloc and to be > > handled internally. > > > > So use an alignment of 1 when calling find_vmap_lowest_match() for exact > > allocations that are expected to succeed -- if the given range can > > satisfy the alignment requirements. > > > > Fixes: 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap allocation") > > Reported-by: Ping Fang > > Cc: Andrew Morton > > Cc: Uladzislau Rezki (Sony) > > Cc: Roman Gushchin > > Cc: Michal Hocko > > Cc: Oscar Salvador > > Cc: linux-mm@kvack.org > > Signed-off-by: David Hildenbrand > > --- > > mm/vmalloc.c | 13 +++++++++++-- > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index d5cd52805149..c6071f5f8de3 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -1153,7 +1153,8 @@ is_within_this_va(struct vmap_area *va, unsigned long size, > > /* > > * Find the first free block(lowest start address) in the tree, > > * that will accomplish the request corresponding to passing > > - * parameters. > > + * parameters. Note that with an alignment > 1, this function > > + * can be imprecise and skip applicable free blocks. > > */ > > static __always_inline struct vmap_area * > > find_vmap_lowest_match(unsigned long size, > > @@ -1396,7 +1397,15 @@ __alloc_vmap_area(unsigned long size, unsigned long align, > > enum fit_type type; > > int ret; > > > > - va = find_vmap_lowest_match(size, align, vstart); > > + /* > > + * For exact allocations, ignore the alignment, such that > > + * find_vmap_lowest_match() won't search for a bigger free area just > > + * able to align later and consequently fail the search. > > + */ > > + if (vend - vstart == size && IS_ALIGNED(vstart, align)) > > + va = find_vmap_lowest_match(size, 1, vstart); > > + else > > + va = find_vmap_lowest_match(size, align, vstart); > > if (unlikely(!va)) > > return vend; > > > > > > base-commit: 7d2a07b769330c34b4deabeed939325c77a7ec2f > > -- > > 2.31.1 > > This patch looks like a KASAN specific to me. So i would like to avoid such changes to the vmalloc internals in order to simplify further maintenance and keeping it generic instead. Currently the find_vmap_lowest_match() adjusts the search size criteria for any alignment values even for PAGE_SIZE alignment. That is not optimal. Because a PAGE_SIZE or one page is a minimum granularity we operate on vmalloc allocations thus all free blocks are page aligned. All that means that we should adjust the search length only if an alignment request is bigger than one page, in case of one page, that corresponds to PAGE_SIZE value, there is no reason in such extra adjustment because all free->va_start have a page boundary anyway. Could you please test below patch that is a generic improvement? urezki@pc638:~/data/raid0/coding/linux-next.git$ git diff diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 24a3955d5a36..9d219ab5ae57 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1153,7 +1153,9 @@ is_within_this_va(struct vmap_area *va, unsigned long size, /* * Find the first free block(lowest start address) in the tree, * that will accomplish the request corresponding to passing - * parameters. + * parameters. Note that with an alignment > PAGE_SIZE, this + * function can skip applicable free blocks with lower start + * addresses which may fit for an alignment request. */ static __always_inline struct vmap_area * find_vmap_lowest_match(unsigned long size, @@ -1167,7 +1169,7 @@ find_vmap_lowest_match(unsigned long size, node = free_vmap_area_root.rb_node; /* Adjust the search size for alignment overhead. */ - length = size + align - 1; + length = (align > PAGE_SIZE) ? size + align:size; while (node) { va = rb_entry(node, struct vmap_area, rb_node); @@ -2251,7 +2253,7 @@ void __init vm_area_register_early(struct vm_struct *vm, size_t align) static void vmap_init_free_space(void) { - unsigned long vmap_start = 1; + unsigned long vmap_start = PAGE_ALIGN(1); const unsigned long vmap_end = ULONG_MAX; struct vmap_area *busy, *free; urezki@pc638:~/data/raid0/coding/linux-next.git$ Thanks! -- Vlad Rezki