From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A7E9C433EF for ; Wed, 15 Sep 2021 09:02:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A5F3C61178 for ; Wed, 15 Sep 2021 09:02:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A5F3C61178 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 3F1BA6B0071; Wed, 15 Sep 2021 05:02:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A1BF900002; Wed, 15 Sep 2021 05:02:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26A2E6B0073; Wed, 15 Sep 2021 05:02:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0061.hostedemail.com [216.40.44.61]) by kanga.kvack.org (Postfix) with ESMTP id 16C886B0071 for ; Wed, 15 Sep 2021 05:02:40 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id CF51316B05 for ; Wed, 15 Sep 2021 09:02:39 +0000 (UTC) X-FDA: 78589217238.10.5B77FFA Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 6019D50000AD for ; Wed, 15 Sep 2021 09:02:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1631696558; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mY4h2aG9XtMeGPKwGwD1cy8ROHMlsAc1FuQDh1xnvRM=; b=IVwgDHAywBJszHWjyCD4irRNb3ESIuj01YyBOQ2DcPL1Jesgl78q0NZ6y3cyjRgp0uQMs7 H+eeK57s8uiEt7ic69VniAGxWRA3cBGi4LOs6DBCSUSTfpzGn7ehcdUxOy3TtoSJlPsqm6 HLzqFRgTzCpJW7UgRiBFpuNJD2DWgfQ= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-498-ELrzSFAEOUywYwzsdQHVlg-1; Wed, 15 Sep 2021 05:02:36 -0400 X-MC-Unique: ELrzSFAEOUywYwzsdQHVlg-1 Received: by mail-wr1-f69.google.com with SMTP id i4-20020a5d5224000000b0015b14db14deso748632wra.23 for ; Wed, 15 Sep 2021 02:02:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=mY4h2aG9XtMeGPKwGwD1cy8ROHMlsAc1FuQDh1xnvRM=; b=1hhYrwpKnzPZ+Ao/Fq0Y4s8AxE+h4/3MWD7UTP/L8oRvUO/11v3tBtO/TBvUqw6X8+ POL/wLMSID6XYPK6s6byaB/wcLBbr+uj/wnVKPiBzerJOyrANQbuA+rQCRwITE0if2ic IqdS61qInoPXP1b2BqpuXlrW5XykQ5G6hA0xlSeFuoGGSHeDakyTWzFT40ZuHgnpCNmc 8AexrXf3oT/V+sj/V9BSaaGYClfRa/12MMTXdKT44vOZ5Pw6c7Mq7OD2aU5LuWhVrRgO mlLSUgooa9rIrd9O2LpX9RKsGRJrKb4yxO+sci4CIdUX+4SS8XsWvC1xjWryz65LhYpp uV9w== X-Gm-Message-State: AOAM533hynTXAZ8td2L6XBnQYz3AtT9GzNZ5ahTBfnpoPY3kKAiOdwdE 5otRMetLAveK/HVcc24dPGROohKUyWYPH9Yoq1JVjw9aLmPeH5mmEFoh3X3dsMSeY7hgIxllRsQ ApLfXRW2tjo7LGWAB4zOwk4+P3XItb1Z18eWe5JW73umsx49CTFAPbebQWow= X-Received: by 2002:adf:f805:: with SMTP id s5mr3592046wrp.259.1631696555040; Wed, 15 Sep 2021 02:02:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzYdOiJ6qcR8qALeWqyxvS4VTF90fFVR10jgDaUPVNm62GAiD3X2n58EK2ieHPNA4onD3hyLA== X-Received: by 2002:adf:f805:: with SMTP id s5mr3591988wrp.259.1631696554620; Wed, 15 Sep 2021 02:02:34 -0700 (PDT) Received: from [192.168.3.132] (p5b0c6426.dip0.t-ipconnect.de. [91.12.100.38]) by smtp.gmail.com with ESMTPSA id n26sm3385297wmi.43.2021.09.15.02.02.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 15 Sep 2021 02:02:34 -0700 (PDT) To: Uladzislau Rezki Cc: LKML , Ping Fang , Andrew Morton , Roman Gushchin , Michal Hocko , Oscar Salvador , Linux Memory Management List References: <20210908132727.16165-1-david@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1] mm/vmalloc: fix exact allocations with an alignment > 1 Message-ID: Date: Wed, 15 Sep 2021 11:02:33 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IVwgDHAy; spf=none (imf04.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 6019D50000AD X-Stat-Signature: eqwgz1ifpetxxgxx164afe7bwgsosp67 X-HE-Tag: 1631696559-133270 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 14.09.21 21:15, Uladzislau Rezki wrote: >> Let me have a look at it. >> >> Vlad Rezki >> >> On Wed, Sep 8, 2021 at 3:27 PM David Hildenbrand wr= ote: >>> >>> find_vmap_lowest_match() is imprecise such that it won't always >>> find "the first free block ... that will accomplish the request" if >>> an alignment > 1 was specified: especially also when the alignment is >>> PAGE_SIZE. Unfortuantely, the way the vmalloc data structures were >>> designed, propagating the max size without alignment information thro= ugh >>> the tree, it's hard to make it precise again when an alignment > 1 is >>> specified. >>> >>> The issue is that in order to be able to eventually align later, >>> find_vmap_lowest_match() has to search for a slightly bigger area and >>> might skip some applicable subtrees just by lookign at the result of >>> get_subtree_max_size(). While this usually doesn't matter, it matters= for >>> exact allocations as performed by KASAN when onlining a memory block, >>> when the free block exactly matches the request. >>> (mm/kasn/shadow.c:kasan_mem_notifier()). >>> >>> In case we online memory blocks out of order (not lowest to highest >>> address), find_vmap_lowest_match() with PAGE_SIZE alignment will reje= ct >>> an exact allocation if it corresponds exactly to a free block. (there= are >>> some corner cases where it would still work, if we're lucky and hit t= he >>> first is_within_this_va() inside the while loop) >>> >>> [root@vm-0 fedora]# echo online > /sys/devices/system/memory/memory82= /state >>> [root@vm-0 fedora]# echo online > /sys/devices/system/memory/memory83= /state >>> [root@vm-0 fedora]# echo online > /sys/devices/system/memory/memory85= /state >>> [root@vm-0 fedora]# echo online > /sys/devices/system/memory/memory84= /state >>> [ 223.858115] vmap allocation for size 16777216 failed: use vmalloc=3D= to increase size >>> [ 223.859415] bash: vmalloc: allocation failure: 16777216 bytes, mod= e:0x6000c0(GFP_KERNEL), nodemask=3D(null),cpuset=3D/,mems_allowed=3D0 >>> [ 223.860992] CPU: 4 PID: 1644 Comm: bash Kdump: loaded Not tainted = 4.18.0-339.el8.x86_64+debug #1 >>> [ 223.862149] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BI= OS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 >>> [ 223.863580] Call Trace: >>> [ 223.863946] dump_stack+0x8e/0xd0 >>> [ 223.864420] warn_alloc.cold.90+0x8a/0x1b2 >>> [ 223.864990] ? zone_watermark_ok_safe+0x300/0x300 >>> [ 223.865626] ? slab_free_freelist_hook+0x85/0x1a0 >>> [ 223.866264] ? __get_vm_area_node+0x240/0x2c0 >>> [ 223.866858] ? kfree+0xdd/0x570 >>> [ 223.867309] ? kmem_cache_alloc_node_trace+0x157/0x230 >>> [ 223.868028] ? notifier_call_chain+0x90/0x160 >>> [ 223.868625] __vmalloc_node_range+0x465/0x840 >>> [ 223.869230] ? mark_held_locks+0xb7/0x120 >>> >>> While we could fix this in kasan_mem_notifier() by passing an alignme= nt >>> of "1", this is actually an implementation detail of vmalloc and to b= e >>> handled internally. >>> >>> So use an alignment of 1 when calling find_vmap_lowest_match() for ex= act >>> allocations that are expected to succeed -- if the given range can >>> satisfy the alignment requirements. >>> >>> Fixes: 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vma= p allocation") >>> Reported-by: Ping Fang >>> Cc: Andrew Morton >>> Cc: Uladzislau Rezki (Sony) >>> Cc: Roman Gushchin >>> Cc: Michal Hocko >>> Cc: Oscar Salvador >>> Cc: linux-mm@kvack.org >>> Signed-off-by: David Hildenbrand >>> --- >>> mm/vmalloc.c | 13 +++++++++++-- >>> 1 file changed, 11 insertions(+), 2 deletions(-) >>> >>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >>> index d5cd52805149..c6071f5f8de3 100644 >>> --- a/mm/vmalloc.c >>> +++ b/mm/vmalloc.c >>> @@ -1153,7 +1153,8 @@ is_within_this_va(struct vmap_area *va, unsigne= d long size, >>> /* >>> * Find the first free block(lowest start address) in the tree, >>> * that will accomplish the request corresponding to passing >>> - * parameters. >>> + * parameters. Note that with an alignment > 1, this function >>> + * can be imprecise and skip applicable free blocks. >>> */ >>> static __always_inline struct vmap_area * >>> find_vmap_lowest_match(unsigned long size, >>> @@ -1396,7 +1397,15 @@ __alloc_vmap_area(unsigned long size, unsigned= long align, >>> enum fit_type type; >>> int ret; >>> >>> - va =3D find_vmap_lowest_match(size, align, vstart); >>> + /* >>> + * For exact allocations, ignore the alignment, such that >>> + * find_vmap_lowest_match() won't search for a bigger free ar= ea just >>> + * able to align later and consequently fail the search. >>> + */ >>> + if (vend - vstart =3D=3D size && IS_ALIGNED(vstart, align)) >>> + va =3D find_vmap_lowest_match(size, 1, vstart); >>> + else >>> + va =3D find_vmap_lowest_match(size, align, vstart); >>> if (unlikely(!va)) >>> return vend; >>> >>> >>> base-commit: 7d2a07b769330c34b4deabeed939325c77a7ec2f >>> -- >>> 2.31.1 >>> > This patch looks like a KASAN specific to me. So i would like to avoid > such changes to > the vmalloc internals in order to simplify further maintenance and > keeping it generic > instead. There is nothing KASAN specific in there :) It's specific to exact=20 applications -- and KASAN may be one of the only users for now. >=20 > Currently the find_vmap_lowest_match() adjusts the search size > criteria for any alignment > values even for PAGE_SIZE alignment. That is not optimal. Because a > PAGE_SIZE or one > page is a minimum granularity we operate on vmalloc allocations thus > all free blocks are > page aligned. >=20 > All that means that we should adjust the search length only if an > alignment request is bigger than > one page, in case of one page, that corresponds to PAGE_SIZE value, > there is no reason in such > extra adjustment because all free->va_start have a page boundary anyway= . >=20 > Could you please test below patch that is a generic improvement? I played with the exact approach below (almost exactly the same code in=20 find_vmap_lowest_match()), and it might make sense as a general=20 improvement -- if we can guarantee that start+end of ranges are always=20 PAGE-aligned; I was not able to come to that conclusion, and your change=20 to vmap_init_free_space() shows me that our existing alignment=20 code/checks might be quite fragile. But I mainly decided to go with my patch instead because it will also=20 work for exact allocations with align > PAGE_SIZE: most notably, when we=20 try population of hugepages instead of base pages in=20 __vmalloc_node_range(), by increasing the alignment. If the exact range=20 allows for using huge pages, placing huge pages will work just fine with=20 my modifications, while it won't with your approach. Long story short: my approach handles exact allocations also for bigger=20 alignment, Your optimization makes sense as a general improvement for=20 any vmalloc allocations. Thanks for having a look! >=20 > > urezki@pc638:~/data/raid0/coding/linux-next.git$ git diff > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 24a3955d5a36..9d219ab5ae57 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1153,7 +1153,9 @@ is_within_this_va(struct vmap_area *va, unsigned > long size, > /* > * Find the first free block(lowest start address) in the tree, > * that will accomplish the request corresponding to passing > - * parameters. > + * parameters. Note that with an alignment > PAGE_SIZE, this > + * function can skip applicable free blocks with lower start > + * addresses which may fit for an alignment request. > */ > static __always_inline struct vmap_area * > find_vmap_lowest_match(unsigned long size, > @@ -1167,7 +1169,7 @@ find_vmap_lowest_match(unsigned long size, > node =3D free_vmap_area_root.rb_node; >=20 > /* Adjust the search size for alignment overhead. */ > - length =3D size + align - 1; > + length =3D (align > PAGE_SIZE) ? size + align:size; >=20 > while (node) { > va =3D rb_entry(node, struct vmap_area, rb_node); > @@ -2251,7 +2253,7 @@ void __init vm_area_register_early(struct > vm_struct *vm, size_t align) >=20 > static void vmap_init_free_space(void) > { > - unsigned long vmap_start =3D 1; > + unsigned long vmap_start =3D PAGE_ALIGN(1); > const unsigned long vmap_end =3D ULONG_MAX; > struct vmap_area *busy, *free; >=20 > urezki@pc638:~/data/raid0/coding/linux-next.git$ > >=20 > Thanks! >=20 > -- > Vlad Rezki >=20 --=20 Thanks, David / dhildenb