From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D35A4C4361A for ; Fri, 4 Dec 2020 08:14:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 85B91225AC for ; Fri, 4 Dec 2020 08:14:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728709AbgLDINu (ORCPT ); Fri, 4 Dec 2020 03:13:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727402AbgLDINu (ORCPT ); Fri, 4 Dec 2020 03:13:50 -0500 Received: from mail-pj1-x1044.google.com (mail-pj1-x1044.google.com [IPv6:2607:f8b0:4864:20::1044]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53704C061A51; Fri, 4 Dec 2020 00:13:10 -0800 (PST) Received: by mail-pj1-x1044.google.com with SMTP id h7so3410213pjk.1; Fri, 04 Dec 2020 00:13:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=3fbzFtODlG6DhIFOuWtZWUtQuIVTw1JtB82OhelR1f8=; b=hiaCSk2RB4ZbZZ7qUDKx5WuheWKkTpmcIvHy+cDMvtWoe2wDZGGmS/CTLyju1KF2h8 wrdeWY5t6bUGml15IUYpkpCB3LWc8lGpFbHfR3bWaCjZlD3sXcH9T/3RpltgStc6Koto 1j8Cz+Cj3tG5sw/UmjdiDUMINMSPhCVp5JuxGyUJtGxH4PAe0RR8GUSCV82Zd8p/Yh41 BzYOl3j+6ON8TZJFpdErKBQway7qpqf5hq+9ZcktbSgtVO/++tslXRwcBKuYp/G8Kldq AlgSDTdbZ8gam9IwQZQDkNGSFEc5lAR9DYMhIO6aym5RiP/KPWlSuKDaVLgbPXhM+Sm+ jnVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=3fbzFtODlG6DhIFOuWtZWUtQuIVTw1JtB82OhelR1f8=; b=oY2aBJKp5hYSyLJCNOnvLg6Vz4qg1EUJN5VPyrFAzRZ7S56IPY+oNKLv+sZgfjghM0 kKa2IFRtTnMsHh6QjAyfAryfXWhRgYDlr6IL47w+I/+4EjJECEnblBVMvPRqGfBqmE0i l20a6qeT6CwSUQXUH5tsIMnnyB6iM7yrQioGy+DWKneb7Pv75jFrk6FoAgIJIqya9dQC vEqhMY1DrvD9c9EENOObCPiN2Mu1YlZAtmvmytsZ3RZs6RdmEiQ1xkx5m/ZO7PX+FPTL g2ehTByEQ6I+5z6rnLh/ewlEZ3btJbpEiE7aG5LdGZcpnqsfMeEPz5CiTV9VkJEmUQ6S YLQA== X-Gm-Message-State: AOAM532JT858ZyxOMh+NtSzkg5fNoKzeD8KIaIN192F7A6J+Ba9OyOM4 vVE14HSOgDqyj6SkzDyQmSuXR+m4kLs= X-Google-Smtp-Source: ABdhPJxFGKvxSujUjSN4DbHCJiDIqCfKt/Gd/ih5Y1oXid6kspH8VxuaLkxN8iSd6XopGq0+v05kZA== X-Received: by 2002:a17:902:c395:b029:da:9aca:c972 with SMTP id g21-20020a170902c395b02900da9acac972mr2815306plg.32.1607069588337; Fri, 04 Dec 2020 00:13:08 -0800 (PST) Received: from localhost ([1.129.136.83]) by smtp.gmail.com with ESMTPSA id n68sm4177105pfn.161.2020.12.04.00.13.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Dec 2020 00:13:07 -0800 (PST) Date: Fri, 04 Dec 2020 18:12:58 +1000 From: Nicholas Piggin Subject: Re: [PATCH v8 11/12] mm/vmalloc: Hugepage vmalloc mappings To: "akpm@linux-foundation.org" , "linux-mm@kvack.org" , "Edgecombe, Rick P" Cc: "christophe.leroy@csgroup.eu" , "hch@infradead.org" , "Jonathan.Cameron@Huawei.com" , "linux-arch@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linuxppc-dev@lists.ozlabs.org" , "lizefan@huawei.com" References: <20201128152559.999540-1-npiggin@gmail.com> <20201128152559.999540-12-npiggin@gmail.com> In-Reply-To: MIME-Version: 1.0 Message-Id: <1607068679.lfd133za4h.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Excerpts from Edgecombe, Rick P's message of December 1, 2020 6:21 am: > On Sun, 2020-11-29 at 01:25 +1000, Nicholas Piggin wrote: >> Support huge page vmalloc mappings. Config option >> HAVE_ARCH_HUGE_VMALLOC >> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and >> supports PMD sized vmap mappings. >>=20 >> vmalloc will attempt to allocate PMD-sized pages if allocating PMD >> size >> or larger, and fall back to small pages if that was unsuccessful. >>=20 >> Allocations that do not use PAGE_KERNEL prot are not permitted to use >> huge pages, because not all callers expect this (e.g., module >> allocations vs strict module rwx). >=20 > Several architectures (x86, arm64, others?) allocate modules initially > with PAGE_KERNEL and so I think this test will not exclude module > allocations in those cases. Ah, thanks. I guess archs must additionally ensure that their PAGE_KERNEL allocations are suitable for huge page mappings before enabling the option. If there is interest from those archs to support this, I have an early (un-posted) patch that adds an explicit VM_HUGE flag that could override the pessemistic arch default. It's not much trouble to add this=20 to the large system hash allocations. It's very out of date now but I=20 can at least give what I have to anyone doing an arch support that wants it. >=20 > [snip] >=20 >> @@ -2400,6 +2453,7 @@ static inline void set_area_direct_map(const >> struct vm_struct *area, >> { >> int i; >> =20 >> + /* HUGE_VMALLOC passes small pages to set_direct_map */ >> for (i =3D 0; i < area->nr_pages; i++) >> if (page_address(area->pages[i])) >> set_direct_map(area->pages[i]); >> @@ -2433,11 +2487,12 @@ static void vm_remove_mappings(struct >> vm_struct *area, int deallocate_pages) >> * map. Find the start and end range of the direct mappings to >> make sure >> * the vm_unmap_aliases() flush includes the direct map. >> */ >> - for (i =3D 0; i < area->nr_pages; i++) { >> + for (i =3D 0; i < area->nr_pages; i +=3D 1U << area->page_order) { >> unsigned long addr =3D (unsigned long)page_address(area- >> >pages[i]); >> if (addr) { >> + unsigned long page_size =3D PAGE_SIZE << area- >> >page_order; >> start =3D min(addr, start); >> - end =3D max(addr + PAGE_SIZE, end); >> + end =3D max(addr + page_size, end); >> flush_dmap =3D 1; >> } >> } >=20 > The logic around this is a bit tangled. The reset of the direct map has > to succeed, but if the set_direct_map_() functions require a split they > could fail. For x86, set_memory_ro() calls on a vmalloc alias will > mirror the page size and permission on the direct map and so the direct > map will be broken to 4k pages if it's a RO vmalloc allocation. >=20 > But after this, module vmalloc()'s could have large pages which would > result in large RO pages on the direct map. Then it could possibly fail > when trying to reset a 4k page out of a large RO direct map mapping.=20 >=20 > I think either module allocations need to be actually excluded from > having large pages (seems like you might have seen other issues as > well?), or another option could be to use the changes here: > https://lore.kernel.org/lkml/20201125092208.12544-4-rppt@kernel.org/ > to reset the direct map for a large page range at a time for large=20 > vmalloc pages. >=20 Right, x86 would have to do something about that before enabling. A VM_HUGE flag might be quick and easy but maybe other options are not=20 too difficult. Thanks, Nick From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5C8EC433FE for ; Fri, 4 Dec 2020 08:16:53 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 270C8225AC for ; Fri, 4 Dec 2020 08:16:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 270C8225AC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4CnQXf6s9YzDrQd for ; Fri, 4 Dec 2020 19:16:50 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:4864:20::644; helo=mail-pl1-x644.google.com; envelope-from=npiggin@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=hiaCSk2R; dkim-atps=neutral Received: from mail-pl1-x644.google.com (mail-pl1-x644.google.com [IPv6:2607:f8b0:4864:20::644]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4CnQST1zr0zDr6w for ; Fri, 4 Dec 2020 19:13:13 +1100 (AEDT) Received: by mail-pl1-x644.google.com with SMTP id 4so2686000plk.5 for ; Fri, 04 Dec 2020 00:13:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=3fbzFtODlG6DhIFOuWtZWUtQuIVTw1JtB82OhelR1f8=; b=hiaCSk2RB4ZbZZ7qUDKx5WuheWKkTpmcIvHy+cDMvtWoe2wDZGGmS/CTLyju1KF2h8 wrdeWY5t6bUGml15IUYpkpCB3LWc8lGpFbHfR3bWaCjZlD3sXcH9T/3RpltgStc6Koto 1j8Cz+Cj3tG5sw/UmjdiDUMINMSPhCVp5JuxGyUJtGxH4PAe0RR8GUSCV82Zd8p/Yh41 BzYOl3j+6ON8TZJFpdErKBQway7qpqf5hq+9ZcktbSgtVO/++tslXRwcBKuYp/G8Kldq AlgSDTdbZ8gam9IwQZQDkNGSFEc5lAR9DYMhIO6aym5RiP/KPWlSuKDaVLgbPXhM+Sm+ jnVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=3fbzFtODlG6DhIFOuWtZWUtQuIVTw1JtB82OhelR1f8=; b=YCt5ptYAaF76LiyDEPSIwiH5Aoci3Yw6QdUixtDlETrcfCWmkofcwNqDMqJWO6Uubd 4PREVg7Q9NP1Bnv7Dg+uuCtZI1XIWbe/ad1KUIde0FXp2P/FPLzfdC5Q5Ro7iCsT/Y0U RNbQYABECYnr474QkGF4xgr44s+CkD0wbjhS74Q9cYj9gs9zH44EJrYCwHGp+GqoBN0W QxNT7TcCJinxDbTf9rDV1a9XpLB85Vx4jGVNB/qS+VY3K/EKANjxf0JlRPGdJjwC05Hg wlasYSUhjlIiI+mLqyxf5PQouHYfqPjIZGHkMhaNbuGtDlqoLktVJ4QcxqtaAgnOpP5X 2GOQ== X-Gm-Message-State: AOAM531BxZdzuX5l7CJhrkcJTuDIYT57NiE3JWh0eUfLK2/jGYUx1Q9o snX8FLIMvYeftM0c6bfSx4k= X-Google-Smtp-Source: ABdhPJxFGKvxSujUjSN4DbHCJiDIqCfKt/Gd/ih5Y1oXid6kspH8VxuaLkxN8iSd6XopGq0+v05kZA== X-Received: by 2002:a17:902:c395:b029:da:9aca:c972 with SMTP id g21-20020a170902c395b02900da9acac972mr2815306plg.32.1607069588337; Fri, 04 Dec 2020 00:13:08 -0800 (PST) Received: from localhost ([1.129.136.83]) by smtp.gmail.com with ESMTPSA id n68sm4177105pfn.161.2020.12.04.00.13.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Dec 2020 00:13:07 -0800 (PST) Date: Fri, 04 Dec 2020 18:12:58 +1000 From: Nicholas Piggin Subject: Re: [PATCH v8 11/12] mm/vmalloc: Hugepage vmalloc mappings To: "akpm@linux-foundation.org" , "linux-mm@kvack.org" , "Edgecombe, Rick P" References: <20201128152559.999540-1-npiggin@gmail.com> <20201128152559.999540-12-npiggin@gmail.com> In-Reply-To: MIME-Version: 1.0 Message-Id: <1607068679.lfd133za4h.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "linux-arch@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "hch@infradead.org" , "lizefan@huawei.com" , "Jonathan.Cameron@Huawei.com" , "linuxppc-dev@lists.ozlabs.org" Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" Excerpts from Edgecombe, Rick P's message of December 1, 2020 6:21 am: > On Sun, 2020-11-29 at 01:25 +1000, Nicholas Piggin wrote: >> Support huge page vmalloc mappings. Config option >> HAVE_ARCH_HUGE_VMALLOC >> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and >> supports PMD sized vmap mappings. >>=20 >> vmalloc will attempt to allocate PMD-sized pages if allocating PMD >> size >> or larger, and fall back to small pages if that was unsuccessful. >>=20 >> Allocations that do not use PAGE_KERNEL prot are not permitted to use >> huge pages, because not all callers expect this (e.g., module >> allocations vs strict module rwx). >=20 > Several architectures (x86, arm64, others?) allocate modules initially > with PAGE_KERNEL and so I think this test will not exclude module > allocations in those cases. Ah, thanks. I guess archs must additionally ensure that their PAGE_KERNEL allocations are suitable for huge page mappings before enabling the option. If there is interest from those archs to support this, I have an early (un-posted) patch that adds an explicit VM_HUGE flag that could override the pessemistic arch default. It's not much trouble to add this=20 to the large system hash allocations. It's very out of date now but I=20 can at least give what I have to anyone doing an arch support that wants it. >=20 > [snip] >=20 >> @@ -2400,6 +2453,7 @@ static inline void set_area_direct_map(const >> struct vm_struct *area, >> { >> int i; >> =20 >> + /* HUGE_VMALLOC passes small pages to set_direct_map */ >> for (i =3D 0; i < area->nr_pages; i++) >> if (page_address(area->pages[i])) >> set_direct_map(area->pages[i]); >> @@ -2433,11 +2487,12 @@ static void vm_remove_mappings(struct >> vm_struct *area, int deallocate_pages) >> * map. Find the start and end range of the direct mappings to >> make sure >> * the vm_unmap_aliases() flush includes the direct map. >> */ >> - for (i =3D 0; i < area->nr_pages; i++) { >> + for (i =3D 0; i < area->nr_pages; i +=3D 1U << area->page_order) { >> unsigned long addr =3D (unsigned long)page_address(area- >> >pages[i]); >> if (addr) { >> + unsigned long page_size =3D PAGE_SIZE << area- >> >page_order; >> start =3D min(addr, start); >> - end =3D max(addr + PAGE_SIZE, end); >> + end =3D max(addr + page_size, end); >> flush_dmap =3D 1; >> } >> } >=20 > The logic around this is a bit tangled. The reset of the direct map has > to succeed, but if the set_direct_map_() functions require a split they > could fail. For x86, set_memory_ro() calls on a vmalloc alias will > mirror the page size and permission on the direct map and so the direct > map will be broken to 4k pages if it's a RO vmalloc allocation. >=20 > But after this, module vmalloc()'s could have large pages which would > result in large RO pages on the direct map. Then it could possibly fail > when trying to reset a 4k page out of a large RO direct map mapping.=20 >=20 > I think either module allocations need to be actually excluded from > having large pages (seems like you might have seen other issues as > well?), or another option could be to use the changes here: > https://lore.kernel.org/lkml/20201125092208.12544-4-rppt@kernel.org/ > to reset the direct map for a large page range at a time for large=20 > vmalloc pages. >=20 Right, x86 would have to do something about that before enabling. A VM_HUGE flag might be quick and easy but maybe other options are not=20 too difficult. Thanks, Nick