From: Catalin Marinas <catalin.marinas@arm.com> To: Anshuman Khandual <anshuman.khandual@arm.com> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, akpm@linux-foundation.org, will@kernel.org, mark.rutland@arm.com, mhocko@suse.com, ira.weiny@intel.com, david@redhat.com, cai@lca.pw, logang@deltatee.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, mgorman@techsingularity.net, osalvador@suse.de, ard.biesheuvel@arm.com, steve.capper@arm.com, broonie@kernel.org, valentin.schneider@arm.com, Robin.Murphy@arm.com, steven.price@arm.com, suzuki.poulose@arm.com Subject: Re: [PATCH V7 3/3] arm64/mm: Enable memory hot remove Date: Tue, 17 Sep 2019 16:08:53 +0100 [thread overview] Message-ID: <20190917150852.GC7305@arrakis.emea.arm.com> (raw) In-Reply-To: <a1962cde-b4df-e4a0-de61-252c0d0a25b2@arm.com> On Tue, Sep 17, 2019 at 10:06:11AM +0530, Anshuman Khandual wrote: > On 09/13/2019 03:39 PM, Catalin Marinas wrote: > > On Fri, Sep 13, 2019 at 11:28:01AM +0530, Anshuman Khandual wrote: > >> The problem (race) is not because of the inability to deal with partially > >> filled table. We can handle that correctly as explained below [1]. The > >> problem is with inadequate kernel page table locking during vmalloc() > >> which might be accessing intermediate kernel page table pointers which is > >> being freed with free_empty_tables() concurrently. Hence we cannot free > >> any page table page which can ever have entries from vmalloc() range. > > > > The way you deal with the partially filled table in this patch is to > > avoid freeing if there is a non-empty entry (!p*d_none()). This is what > > causes the race with vmalloc. If you simply avoid freeing a pmd page, > > for example, if the range floor/ceiling is not aligned to PUD_SIZE, > > irrespective of whether the other entries are empty or not, you > > shouldn't have this problem. You do free the pte page if the range is [...] > > We may have some pgtable pages not freed at both ends of the range > > (maximum 6 in total) but I don't really see this an issue. They could be > > reused if something else gets mapped in that range. > > I assume that the number 6 for maximum page possibility came from > > (floor edge + ceiling edge) * (PTE table + PMD table + PUD table) Yes. > >> Though not completely sure, whether I really understood the suggestion above > >> with respect to the floor-ceiling mechanism as in free_pgd_range(). Are you > >> suggesting that we should only attempt to free up those vmemmap range page > >> table pages which *definitely* could never overlap with vmalloc by working > >> on a modified (i.e cut down with floor-ceiling while avoiding vmalloc range > >> at each level) vmemmap range instead ? > > > > You can ignore the overlap check altogether, only free the page tables > > with floor/ceiling set to the start/size passed to arch_remove_memory() > > and vmemmap_free(). > > Wondering if it will be better to use [VMEMMAP_START - VMEMMAP_END] and > [PAGE_OFFSET - PAGE_END] as floor/ceiling respectively with vmemmap_free() > and arch_remove_memory(). Not only it is safe to free all page table pages > which span over these maximum possible mapping range but also it reduces > the risk for alignment related wastage. That's indeed better. You pass the floor/ceiling as the enclosing range and start/end as the actual range to unmap is. We avoid the potential "leak" around the edges when falling within the floor/ceiling range (I think that's close to what free_pgd_range() does). > >> This can be one restrictive version of the function > >> free_empty_tables() called in case there is an overlap. So we will > >> maintain two versions for free_empty_tables(). Please correct me if > >> any the above assumptions or understanding is wrong. > > > > I'd rather have a single version of free_empty_tables(). As I said > > above, the only downside is that a partially filled pgtable page would > > not be freed even though the other entries are empty. > > Sure. Also practically the limitation will be applicable only for vmemmap > mapping but not for linear mappings where the chances of overlap might be > negligible as it covers half kernel virtual address space. If you have a common set of functions, it doesn't heart to pass the correct floor/ceiling in both cases. -- Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com> To: Anshuman Khandual <anshuman.khandual@arm.com> Cc: mark.rutland@arm.com, mhocko@suse.com, david@redhat.com, linux-mm@kvack.org, arunks@codeaurora.org, cpandya@codeaurora.org, ira.weiny@intel.com, will@kernel.org, steven.price@arm.com, valentin.schneider@arm.com, suzuki.poulose@arm.com, Robin.Murphy@arm.com, broonie@kernel.org, cai@lca.pw, ard.biesheuvel@arm.com, dan.j.williams@intel.com, linux-arm-kernel@lists.infradead.org, osalvador@suse.de, steve.capper@arm.com, logang@deltatee.com, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@techsingularity.net Subject: Re: [PATCH V7 3/3] arm64/mm: Enable memory hot remove Date: Tue, 17 Sep 2019 16:08:53 +0100 [thread overview] Message-ID: <20190917150852.GC7305@arrakis.emea.arm.com> (raw) In-Reply-To: <a1962cde-b4df-e4a0-de61-252c0d0a25b2@arm.com> On Tue, Sep 17, 2019 at 10:06:11AM +0530, Anshuman Khandual wrote: > On 09/13/2019 03:39 PM, Catalin Marinas wrote: > > On Fri, Sep 13, 2019 at 11:28:01AM +0530, Anshuman Khandual wrote: > >> The problem (race) is not because of the inability to deal with partially > >> filled table. We can handle that correctly as explained below [1]. The > >> problem is with inadequate kernel page table locking during vmalloc() > >> which might be accessing intermediate kernel page table pointers which is > >> being freed with free_empty_tables() concurrently. Hence we cannot free > >> any page table page which can ever have entries from vmalloc() range. > > > > The way you deal with the partially filled table in this patch is to > > avoid freeing if there is a non-empty entry (!p*d_none()). This is what > > causes the race with vmalloc. If you simply avoid freeing a pmd page, > > for example, if the range floor/ceiling is not aligned to PUD_SIZE, > > irrespective of whether the other entries are empty or not, you > > shouldn't have this problem. You do free the pte page if the range is [...] > > We may have some pgtable pages not freed at both ends of the range > > (maximum 6 in total) but I don't really see this an issue. They could be > > reused if something else gets mapped in that range. > > I assume that the number 6 for maximum page possibility came from > > (floor edge + ceiling edge) * (PTE table + PMD table + PUD table) Yes. > >> Though not completely sure, whether I really understood the suggestion above > >> with respect to the floor-ceiling mechanism as in free_pgd_range(). Are you > >> suggesting that we should only attempt to free up those vmemmap range page > >> table pages which *definitely* could never overlap with vmalloc by working > >> on a modified (i.e cut down with floor-ceiling while avoiding vmalloc range > >> at each level) vmemmap range instead ? > > > > You can ignore the overlap check altogether, only free the page tables > > with floor/ceiling set to the start/size passed to arch_remove_memory() > > and vmemmap_free(). > > Wondering if it will be better to use [VMEMMAP_START - VMEMMAP_END] and > [PAGE_OFFSET - PAGE_END] as floor/ceiling respectively with vmemmap_free() > and arch_remove_memory(). Not only it is safe to free all page table pages > which span over these maximum possible mapping range but also it reduces > the risk for alignment related wastage. That's indeed better. You pass the floor/ceiling as the enclosing range and start/end as the actual range to unmap is. We avoid the potential "leak" around the edges when falling within the floor/ceiling range (I think that's close to what free_pgd_range() does). > >> This can be one restrictive version of the function > >> free_empty_tables() called in case there is an overlap. So we will > >> maintain two versions for free_empty_tables(). Please correct me if > >> any the above assumptions or understanding is wrong. > > > > I'd rather have a single version of free_empty_tables(). As I said > > above, the only downside is that a partially filled pgtable page would > > not be freed even though the other entries are empty. > > Sure. Also practically the limitation will be applicable only for vmemmap > mapping but not for linear mappings where the chances of overlap might be > negligible as it covers half kernel virtual address space. If you have a common set of functions, it doesn't heart to pass the correct floor/ceiling in both cases. -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-09-17 15:09 UTC|newest] Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-09-03 9:45 [PATCH V7 0/3] arm64/mm: Enable memory hot remove Anshuman Khandual 2019-09-03 9:45 ` Anshuman Khandual 2019-09-03 9:45 ` [PATCH V7 1/3] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory() Anshuman Khandual 2019-09-03 9:45 ` Anshuman Khandual 2019-09-04 8:16 ` David Hildenbrand 2019-09-04 8:16 ` David Hildenbrand 2019-09-05 4:27 ` Anshuman Khandual 2019-09-05 4:27 ` Anshuman Khandual 2019-09-16 1:44 ` Balbir Singh 2019-09-16 1:44 ` Balbir Singh 2019-09-18 9:28 ` Anshuman Khandual 2019-09-18 9:28 ` Anshuman Khandual 2019-09-03 9:45 ` [PATCH V7 2/3] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump Anshuman Khandual 2019-09-03 9:45 ` Anshuman Khandual 2019-09-15 2:35 ` Balbir Singh 2019-09-15 2:35 ` Balbir Singh 2019-09-18 9:12 ` Anshuman Khandual 2019-09-18 9:12 ` Anshuman Khandual 2019-09-03 9:45 ` [PATCH V7 3/3] arm64/mm: Enable memory hot remove Anshuman Khandual 2019-09-03 9:45 ` Anshuman Khandual 2019-09-10 16:17 ` Catalin Marinas 2019-09-10 16:17 ` Catalin Marinas 2019-09-11 10:31 ` David Hildenbrand 2019-09-11 10:31 ` David Hildenbrand 2019-09-12 4:28 ` Anshuman Khandual 2019-09-12 4:28 ` Anshuman Khandual 2019-09-12 8:37 ` Anshuman Khandual 2019-09-12 8:37 ` Anshuman Khandual 2019-09-12 20:15 ` Catalin Marinas 2019-09-12 20:15 ` Catalin Marinas 2019-09-13 5:58 ` Anshuman Khandual 2019-09-13 5:58 ` Anshuman Khandual 2019-09-13 10:09 ` Catalin Marinas 2019-09-13 10:09 ` Catalin Marinas 2019-09-17 4:36 ` Anshuman Khandual 2019-09-17 4:36 ` Anshuman Khandual 2019-09-17 15:08 ` Catalin Marinas [this message] 2019-09-17 15:08 ` Catalin Marinas
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190917150852.GC7305@arrakis.emea.arm.com \ --to=catalin.marinas@arm.com \ --cc=Robin.Murphy@arm.com \ --cc=akpm@linux-foundation.org \ --cc=anshuman.khandual@arm.com \ --cc=ard.biesheuvel@arm.com \ --cc=arunks@codeaurora.org \ --cc=broonie@kernel.org \ --cc=cai@lca.pw \ --cc=cpandya@codeaurora.org \ --cc=dan.j.williams@intel.com \ --cc=david@redhat.com \ --cc=ira.weiny@intel.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=logang@deltatee.com \ --cc=mark.rutland@arm.com \ --cc=mgorman@techsingularity.net \ --cc=mhocko@suse.com \ --cc=osalvador@suse.de \ --cc=steve.capper@arm.com \ --cc=steven.price@arm.com \ --cc=suzuki.poulose@arm.com \ --cc=valentin.schneider@arm.com \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.