From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from list by lists.gnu.org with archive (Exim 4.90_1) id 1mM8EA-00075f-1n for mharc-grub-devel@gnu.org; Fri, 03 Sep 2021 08:23:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34936) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mM8E9-00072c-3a for grub-devel@gnu.org; Fri, 03 Sep 2021 08:23:25 -0400 Received: from mail-pl1-x631.google.com ([2607:f8b0:4864:20::631]:37853) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mM8E7-0008Qi-13 for grub-devel@gnu.org; Fri, 03 Sep 2021 08:23:24 -0400 Received: by mail-pl1-x631.google.com with SMTP id q3so3201301plx.4 for ; Fri, 03 Sep 2021 05:23:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=axtens.net; s=google; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version; bh=MsoTPU22WyhPw2zZXur/xo7JTFd65mROrZxMyR339RY=; b=gj5gWh+luTXefEH+gqHEMR+EFFq9LboTMTVIj4AhAgG6wKwRy2hrGNsQixX8v3sozB FfhevooJmyW9t2Ky/GdEVGIb6OSiRCiARYIuaT0QE3V7Pt8RRbEYVPq8vAcnTpEUtnUo T8K20kuOMhUHrsfPpMcgAlEfyf8cdfne8pcp4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=MsoTPU22WyhPw2zZXur/xo7JTFd65mROrZxMyR339RY=; b=OhakAk4N4cO4FF0qR71SVjNPBl7Po0g/Upcyyzfq0ANsyW20Sv8OrEBP6/QUBpvE+A sJBm6sSxzm3qgKrHNK7N6JZ+BCY+vyR+aLEXH6qBQmM7Ck/Nu3qKa/UqTBE5Jo/9gaFZ O3xMm7SmMHG3H+elXEPs/LMlpFQRe5bFQnaWxQ/02R6Q80yvDSv2Vcm6noaeNBlccyvD 8rctlMgO/Ugw7eO+57ZtKtPbnsaR+U97DyipsMAk2Vv8AyH9aq2Mz8FR4tdYt7dL7+HO XvVso4KenA3kdz0aZmuUvux+SoL1Wv+Vbg9c/3jTEhDP4+++/KdBWvSbvBwh1G1lJf2I wNcw== X-Gm-Message-State: AOAM533ZEOvoFSzpxWfKappZqY/li0tfr5DRmDxixGcLcFhe46D8xs4V jsTPsEOWe6M51UcT/UfldcU38Q== X-Google-Smtp-Source: ABdhPJwhWdPHr/p4jHHzxoFMiC3DvlmD1G+hvx/NcNEan3jVoWAqLnQMxnEuTxPLufgo3Wgfh7GfGQ== X-Received: by 2002:a17:90b:1bce:: with SMTP id oa14mr608453pjb.237.1630671799654; Fri, 03 Sep 2021 05:23:19 -0700 (PDT) Received: from localhost ([2001:4479:e200:df00:434b:863f:e723:e3ca]) by smtp.gmail.com with ESMTPSA id d5sm5069840pjs.53.2021.09.03.05.23.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Sep 2021 05:23:19 -0700 (PDT) From: Daniel Axtens To: Daniel Kiper Cc: Patrick Steinhardt , grub-devel@gnu.org, Leif Lindholm , Stefan Berger Subject: Re: [PATCH v3 2/6] mm: Allow dynamically requesting additional memory regions In-Reply-To: <20210902124012.jg672of3psba4ie3@tomti.i.net-space.pl> References: <3f0ec2a76c1b3aa722efdd540b4d3ecce1789750.1629025332.git.ps@pks.im> <87sfyo5p3b.fsf@dja-thinkpad.axtens.net> <20210902124012.jg672of3psba4ie3@tomti.i.net-space.pl> Date: Fri, 03 Sep 2021 22:23:15 +1000 Message-ID: <871r656e6k.fsf@dja-thinkpad.axtens.net> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=2607:f8b0:4864:20::631; envelope-from=dja@axtens.net; helo=mail-pl1-x631.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: grub-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: The development of GNU GRUB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Sep 2021 12:23:25 -0000 Daniel Kiper writes: > On Thu, Sep 02, 2021 at 12:48:24AM +1000, Daniel Axtens wrote: >> Patrick Steinhardt writes: >> >> > Currently, all platforms will set up their heap on initialization of the >> > platform code. While this works mostly fine, it poses some limitations >> > on memory management on us. Most notably, allocating big chunks of >> > memory in the gigabyte range would require us to pre-request this many >> > bytes from the firmware and add it to the heap from the beginning on >> > some platforms like EFI. As this isn't needed for most configurations, >> > it is inefficient and may even negatively impact some usecases when, >> > e.g., chainloading. Nonetheless, allocating big chunks of memory is >> > required sometimes, where one example is the upcoming support for the >> > Argon2 key derival function in LUKS2. >> > >> > In order to avoid pre-allocating big chunks of memory, this commit >> > implements a runtime mechanism to add more pages to the system. When a >> > given allocation cannot be currently satisfied, we'll call a given >> > callback set up by the platform's own memory management subsystem, >> > asking it to add a memory area with at least `n` bytes. If this >> > succeeds, we retry searching for a valid memory region, which should now >> > succeed. >> > >> >> I implemented this for ieee1275-powerpc. I set the initial memory claim >> to 1MB to match EFI and to exercise the code. >> >> Thoughts as I progressed: >> >> - You probably need to think about how to satisfy requests with >> particular alignments: currently there is no way to specify that with >> the current interface, and I saw powerpc-ieee1275 return bunch of >> allocations at e.g 0x2a561e which is not particularly well aligned! > > I think at "firmware memory allocation" level we could always allocate > page aligned regions. Of course this may not satisfy allocations > aligned at larger values than a page. Though I think we can solve this > by allocating larger regions from firmware. Please look below for more > details... > >> - You haven't included in the calculations the extra space required for >> mm housekeeping. For example, I'm seeing an allocation for 32kB be >> requested via grub_mm_add_region_fn. I dutifully allocate 32kB, with >> no alignment requirements, and pass that pointer to >> grub_mm_init_region. grub_mm_init_region throws away bytes at the >> start to get to GRUB_MM_ALIGN, then uses some bytes for the >> grub_mm_header_t, then any actual allocation uses bytes for the >> malloc metadata. So the actual underlying allocation cannot be >> satisfied. >> >> I think you get away with this on EFI because you use BYTES_TO_PAGES >> and get page-aligned memory, but I think you should probably round up >> to the next power of 2 for smaller allocations or to the next page or >> so for larger allocations. > > I think we could allocate at least e.g. 128 MiB from firmware if there is > not enough memory available in the GRUB mm. This way we would avoid frequent > calls to firmware and could satisfy requests for larger alignments. > >> - After fixing that in the ieee1275 code, all_functional_test >> hangs trying to run the cmdline_cat test. I think this is from a slow >> algorithm somewhere - the grub allocator isn't exactly optimised for >> a proliferation of regions. > > Could you try the solution proposed above? Maybe it will solve problem of > frequent additions of memory to the GRUB mm. > >> - I noticed that nearly all the allocations were under 1MB. This seems >> inefficient for a trip out to firmware. So I made the ieee1275 code >> allocate at least max(4MB, (size of allocation rounded up nearest >> 1MB) + 4kB). This makes the tests run with only the usual failures, >> at least on pseries with debug on... still chasing some bugs beyond >> that. > > Yeah, this is similar to what I proposed above. Though I would want to see > larger numbers tested as I said earlier. > >> - The speed impact depends on the allocation size. I'll post something >> on that tomorrow, hopefully, but larger minimum allocations work >> noticably better. >> >> - We only have 4GB max to play with because (at least) powerpc-ieee1275 >> is technically defined to be 32 bit. So I'm a bit nervous about >> further large allocations unless we have a way to release them back >> to _firmware_, not just to grub. > > Ugh... This can be difficult. I am not sure the GRUB mm is smart enough > to release memory regions if they are not used anymore by it. I didn't explain this well. What I'm trying to say is this: Say you require 1GB of memory temporarily. You do this: { void *mem = grub_large_malloc(1024 * 1024 * 1024); operate_upon(mem); grub_large_free(mem); } large_malloc and large_free go directly to firmware. We bypass the existing grub mm infrastructure completely. This way, people who need this sort of very large allocation do it in a way that can return the memory to firmware. The reason I like this approach more than integrating the memory into the grub mm infrastructure is that it would be very possible to get yourself into an unbootable situation by requesting so much memory from firmware that there's no memory left to load the kernel and initrd into. In the case of powerpc-ieee1275 there's another failure mode where can load the kernel but then the kernel can't claim memory from FW for it's purposes (this is before it quiesces openfirmware and takes control of memory management itself) and so fails to boot. I'll try your suggestions around 128MB allocations next week - I think they will mostly work but I'm just a bit worried about the implications of letting grub take arbitrarily large chunks of memory. Kind regards, Daniel > >> I would think a better overall approach would be to allocate the 1/4 of >> ram when grub starts, and create a whole new interface for large slabs > > I am not very happy with allocating 1/4 of memory at start of the day. > I think allocating larger chunks of memory from firmware should be > enough to make things working as expected. > >> of memory that are directly allocated from, and directly returned to, >> the firmware. > > Daniel