From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F0C8C388F7 for ; Tue, 3 Nov 2020 12:53:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DCD7020731 for ; Tue, 3 Nov 2020 12:53:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=mg.codeaurora.org header.i=@mg.codeaurora.org header.b="qEy66a11" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728980AbgKCMxX (ORCPT ); Tue, 3 Nov 2020 07:53:23 -0500 Received: from z5.mailgun.us ([104.130.96.5]:53046 "EHLO z5.mailgun.us" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726388AbgKCMxW (ORCPT ); Tue, 3 Nov 2020 07:53:22 -0500 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1604408001; h=Message-ID: References: In-Reply-To: Subject: Cc: To: From: Date: Content-Transfer-Encoding: Content-Type: MIME-Version: Sender; bh=3euWrf7R06QN8CqTw6UG20KChFG7Q3BWaZnvp+cCPsk=; b=qEy66a110kOQzhSYQnntAS1WnPUrCW6IK0j/ecw8GrU6wHcaixAQiZ+johE9IDN3WY2NaeQU ZC7rR4ZQjcswgzfH/OXhGbyHKZszw50GE6rg4whaktq3E2Y5DEvecIy9UvmwdN7ZVRCLN8au 3CuCHutKJ8E0vnqRAFbAVSGanp0= X-Mailgun-Sending-Ip: 104.130.96.5 X-Mailgun-Sid: WyI0MWYwYSIsICJsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnIiwgImJlOWU0YSJd Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by smtp-out-n06.prod.us-east-1.postgun.com with SMTP id 5fa152c075bebe827ae70af3 (version=TLS1.2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Tue, 03 Nov 2020 12:53:20 GMT Sender: sudaraja=codeaurora.org@mg.codeaurora.org Received: by smtp.codeaurora.org (Postfix, from userid 1001) id 234BFC433A0; Tue, 3 Nov 2020 02:15:45 +0000 (UTC) Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: sudaraja) by smtp.codeaurora.org (Postfix) with ESMTPSA id 6FF0FC43382; Tue, 3 Nov 2020 02:15:43 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Mon, 02 Nov 2020 18:15:43 -0800 From: Sudarshan Rajagopalan To: David Hildenbrand Cc: Anshuman Khandual , Mark Rutland , Steven Price , Mike Rapoport , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Catalin Marinas , Will Deacon , Suren Baghdasaryan , Greg Kroah-Hartman , Pratik Patel Subject: Re: mm/memblock: export memblock_{start/end}_of_DRAM In-Reply-To: <7b50c0fa-bbeb-1041-c05c-2667134044b6@redhat.com> References: <7b50c0fa-bbeb-1041-c05c-2667134044b6@redhat.com> Message-ID: <8ddfc74f0bd529ad78bd2e2d0b95a6c6@codeaurora.org> X-Sender: sudaraja@codeaurora.org User-Agent: Roundcube Webmail/1.3.9 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020-10-29 23:41, David Hildenbrand wrote: > On 29.10.20 22:29, Sudarshan Rajagopalan wrote: >> Hello all, >> > > Hi! > Hi David.. thanks for the response as always. >> We have a usecase where a module driver adds certain memory blocks >> using >> add_memory_driver_managed(), so that it can perform memory hotplug >> operations on these blocks. In general, these memory blocks aren’t >> something that gets physically added later, but is part of actual RAM >> that system booted up with. Meaning – we set the ‘mem=’ cmdline >> parameter to limit the memory and later add the remaining ones using >> add_memory*() variants. >> >> The basic idea is to have driver have ownership and manage certain >> memory blocks for hotplug operations. > > So, in summary, you're still abusing the memory hot(un)plug > infrastructure from your driver - just not in a severe way as before. > And I'll tell you why, so you might understand why exposing this API > is not really a good idea and why your driver wouldn't - for example - > be upstream material. > > Don't get me wrong, what you are doing might be ok in your context, > but it's simply not universally applicable in our current model. > > Ordinary system RAM works different than many other devices (like PCI > devices) whereby *something* senses the device and exposes it to the > system, and some available driver binds to it and owns the memory. > > Memory is detected by a driver and added to the system via e.g., > add_memory_driver_managed(). Memory devices are created and the memory > is directly handed off to the system, to be used as system RAM as soon > as memory devices are onlined. There is no driver that "binds" memory > like other devices - it's rather the core (buddy) that uses/owns that > memory immediately after device creation. > I see.. and I agree that drivers are meant to *sense* that something changed or newly added, so that driver can check if it's the one responsible or compatible for handling this entity and binds to it. So I guess what it boils down to is - a driver that uses memory hotplug _cannot_ add/remove or have ownership of memblock boot memory, but for the newly added RAM blocks later on. I was trying to mimic the detecting and adding of extra RAM by limiting the System RAM with "mem=XGB" as though system booted with XGB of boot memory and later add the remaining blocks (force detection and adding) using add_memorY-driver_manager(). This remaining blocks are calculated by 'physical end addr of boot memory' - 'memblock_end_of_DRAM'. The "physical end addr of boot memory" i.e. the actual RAM that bootloader informs to kernel can be obtained by scanning the 'memory' DT node. >> >> For the driver be able to know how much memory was limited and how >> much >> actually present, we take the delta of ‘bootmem physical end address’ >> and ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is >> obtained by scanning the reg values in ‘memory’ DT node and >> determining >> the max {addr,size}. Since our driver is getting modularized, we won’t >> have access to memblock_end_of_DRAM (i.e. end address of all memory >> blocks after ‘mem=’ is applied). > > What you do with "mem=" is force memory detection to ignore some of > it's detected memory. > >> >> So checking if memblock_{start/end}_of_DRAM() symbols can be exported? >> Also, this information can be obtained by userspace by doing ‘cat >> /proc/iomem’ and greping for ‘System RAM’. So wondering if userspace >> can > > Not correct: with "mem=", cat /proc/iomem only shows *detected* + > added system RAM, not the unmodified detection. > That's correct - I meant 'memblock_end_of_DRAM' along with "mem=" can be calculated using 'cat /proc/iomem' which shows "detected plus added" System RAM, and not the remaining undetected one which got stripped off due to "mem=XGB". Basically, 'memblock_end_of_DRAM' address with 'mem=XGB' is {end addr of boot RAM - XGB}.. which would be same as end address of "System RAM" showed in /proc/iomem. The reasoning for this is - if userspace can have access to such info and calculate the memblock end address, why not let drivers have this info using memblock_end_of_DRAM()? >> have access to such info, can we allow kernel module drivers have >> access >> by exporting memblock_{start/end}_of_DRAM(). >> >> Or are there any other ways where a module driver can get the end >> address of system memory block? > > And here is our problem: You disabled *detection* of that memory by > the responsible driver (here: core). Now your driver wants to know > what would have been detected. Assume you have memory hole in that > region - it would not work by simply looking at start/end. You're > driver is not the one doing the detection. > Regarding the memory hole - the driver can inspect the 'memory' DT node that kernel gets from ABL from RAM partition table if any such holes exist or not. I agree that if such holes exists, hot adding will fail since it needs block size to be added. The same issue will arise if a RAM slot is added and a driver senses it and it only knows the start/end of this RAM slot (though such holes generally doesn't exists in RAM slots). This is again something specific to our target which we make sure there are no such holes in the top most memory which is stripped off by "mem=" and later added by the driver. I agree this is not universal upstream material type, but its a method that drivers can utilize. > Another issue is: when using such memory for KVM guests, there is no > mechanism that tracks ownership of that memory - imagine another > driver wanting to use that memory. This really only works in special > environments. > > Yet another issue: you cannot assume that memblock data will stay > around after boot. While we do it right now for arm64, that might > change at some point. This is also one of the reasons why we don't > export any real memblock data to drivers. > > > When using "mem=" you have to know the exact layout of your system RAM > and communicate the right places how that layout looks like manually: > here, to your driver. > I agree the issues mentioned here with this approach are valid from upstream POV, but we aren't trying to make a generic driver for this usecase and upstream it, but rather have it tailor made for our usecase alone where we know the layout of the System RAM (max bootmemory, no holes etc) and we utilize "mem=" and memory hotplug so that driver can add and have ownership of the remaining memory for later hotplug operations. > The clean way of doing things today is to allocate RAM and use it for > guests - e.g., using hugetlb/gigantic pages. As I said, there are > other techniques coming up to deal with minimizing struct page > overhead - if that's what you're concerned with (I still don't know > why you're removing the memory from the host when giving it to the > guest). The overhead of strut page with hugetlb is valid, but we have other usecases outside of inter-VM sharing where we rely on memory hotplugging. In general, we want a way to be able to add/remove and offline/online a memory which is part of boot. With all the tools available - "mem=", "/proc/iomem", "memory" DT node and memory hotplug framework, a driver can still be able to achieve this and these tools that are present now does allow it. Keeping the interVM memory sharing aside, would it be okay if memblock_end_of_DRAM() be exported? Like I mentioned before, there can be a userspace service that calculates this using 'cat /proc/iomem' and have it delivered to driver via a sysfs node. So I dont see any harm in exporting this info to driver. I agree other memblock info shouldn't be exposed outside to drivers. But I see no harm for memblock_end_of_DRAM(). I will be glad to share more info about the usecase where we use this approach if that would help, and I can check and get back on how much we can share since this is a proprietary usecase for Qualcomm. Sudarshan -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project