From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD40DEB64DA for ; Fri, 7 Jul 2023 19:08:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232930AbjGGTIO (ORCPT ); Fri, 7 Jul 2023 15:08:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52264 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232889AbjGGTIM (ORCPT ); Fri, 7 Jul 2023 15:08:12 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F81426A0 for ; Fri, 7 Jul 2023 12:06:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688756813; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=paIBB7sUf/Zqn/4VcL295pzVpCyEKoMCoSkthIK/cDY=; b=jQ05UvU4OoIW3AEz0DucWBOUW8A0hS6NT+PgJGkEWp/TXOqa3OQpIPer+oqxg95rT3ifqI MEqwJg6fGmyBYbFrHguRiCkZfQkirbkYFzSPXA+XSozCed5npt01cb97Waw+hEFdVBJr1f 30uiWhjT7x0jusUuOUmz2nGJ+l0Kx5k= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-308-Lpjx4nWbPQOmL68F0cY1tA-1; Fri, 07 Jul 2023 15:06:52 -0400 X-MC-Unique: Lpjx4nWbPQOmL68F0cY1tA-1 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-3fa979d0c32so12643205e9.2 for ; Fri, 07 Jul 2023 12:06:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688756811; x=1691348811; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=paIBB7sUf/Zqn/4VcL295pzVpCyEKoMCoSkthIK/cDY=; b=hITDprYJD3OnqgLpu1jBSfNeEWpC2cqVuZ8kcCs8QAofL9uS0/RBP4ttAbUSeCLpSq UQkCXWYacaz3QIbSgGGsMSSiUe/sk3dbCPo1ibDNGVkXccHChrC4ECztcqTNplyxGk2B C3YICBzmxqDUJujErK6ME+X5dqiy0GsCrXubGdYMIOt/G0JchWxJkRCjhkcqAK7v6Plp B25f+4LpSA02mJ1eAUMeiCf+hNt/ddMcuAmeTBhIW4p2TQFy4uce1wNKJeIeGWOhW4t5 YfoUfixldNkYI/cOUjR5ckvC0glB82fqsn7SAbZn6eS296RJtYNsBJM+3qcXMqjMrIh1 vgHQ== X-Gm-Message-State: ABy/qLZ9EHRxIq4aeFjrPmC74ueqzb2oOYDZJVHvOipUrB9UWW5fYTJ7 l5PKUvkBefSuiensdqay4gFxQeBNZxaQBH9UjKFeSLi+cTi4B5O/xsCNKIr4P5hKHH55DUTVca8 Z8GKaDfB+ItMSGpkMBbSBVQMp X-Received: by 2002:a7b:c3d5:0:b0:3f6:91c:4e86 with SMTP id t21-20020a7bc3d5000000b003f6091c4e86mr4429232wmj.3.1688756811017; Fri, 07 Jul 2023 12:06:51 -0700 (PDT) X-Google-Smtp-Source: APBJJlFZCPQ0FYZqAu9VBMOE7vXIKu8edRl0ks2ySLHo/6XMBlpctAYBJaE1EDQ7rgfWGU/ApHetcA== X-Received: by 2002:a7b:c3d5:0:b0:3f6:91c:4e86 with SMTP id t21-20020a7bc3d5000000b003f6091c4e86mr4429208wmj.3.1688756810620; Fri, 07 Jul 2023 12:06:50 -0700 (PDT) Received: from ?IPV6:2003:d8:2f04:3c00:248f:bf5b:b03e:aac7? (p200300d82f043c00248fbf5bb03eaac7.dip0.t-ipconnect.de. [2003:d8:2f04:3c00:248f:bf5b:b03e:aac7]) by smtp.gmail.com with ESMTPSA id c26-20020a7bc01a000000b003f7f475c3bcsm10392928wmb.1.2023.07.07.12.06.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 07 Jul 2023 12:06:49 -0700 (PDT) Message-ID: Date: Fri, 7 Jul 2023 21:06:48 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Content-Language: en-US To: Ryan Roberts , Matthew Wilcox Cc: "Huang, Ying" , Andrew Morton , "Kirill A. Shutemov" , Yin Fengwei , Yu Zhao , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230703135330.1865927-1-ryan.roberts@arm.com> <20230703135330.1865927-5-ryan.roberts@arm.com> <87edlkgnfa.fsf@yhuang6-desk2.ccr.corp.intel.com> <44e60630-5e9d-c8df-ab79-cb0767de680e@arm.com> <524bacd2-4a47-2b8b-6685-c46e31a01631@redhat.com> <1e406f04-78ef-6573-e1f1-f0d0e0d5246a@redhat.com> <9dd036a8-9ba3-0cc4-b791-cb3178237728@arm.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 4/5] mm: FLEXIBLE_THP for improved performance In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >>> I still feel that it would be better for the thp and large anon folio controls >>> to be independent though - what's the argument for tying them together? >> >> Thinking about desired 2 MiB flexible THP on aarch64 (64k kernel) vs, 2 MiB PMD >> THP on aarch64 (4k kernel), how are they any different? Just the way they are >> mapped ... > > The last patch in the series shows my current approach to that: > > int arch_wants_pte_order(struct vm_area_struct *vma) > { > if (hugepage_vma_check(vma, vma->vm_flags, false, true, true)) > return CONFIG_ARM64_PTE_ORDER_THP; <<< always the contpte size > else > return CONFIG_ARM64_PTE_ORDER_NOTHP; <<< limited to 64K > } > > But Yu has raised concerns that this type of policy needs to be in the core mm. > So we could have the arch blindly return the preferred order from HW perspective > (which would be contpte size for arm64). Then for !hugepage_vma_check(), mm > could take the min of that value and some determined "acceptable" limit (which > in my mind is 64K ;-). Yeah, it's really tricky. Because why should arm64 with 64k base pages *not* return 2MiB (which is one possible cont-pte size IIRC) ? I share the idea that 64k might *currently* on *some platforms* be a reasonable choice. But that's where the "fun" begins. > >> >> It's easy to say "64k vs. 2 MiB" is a difference and we want separate controls, >> but how is "2MiB vs. 2 MiB" different? >> >> Having that said, I think we have to make up our mind how much control we want >> to give user space. Again, the "2MiB vs. 2 MiB" case nicely shows that it's not >> trivial: memory waste is a real issue on some systems where we limit THP to >> madvise(). >> >> >> Just throwing it out for discussing: >> >> What about keeping the "all / madvise / never" semantics (and MADV_NOHUGEPAGE >> ...) but having an additional config knob that specifies in which cases we >> *still* allow flexible THP even though the system was configured for "madvise". >> >> I can't come up with a good name for that, but something like >> "max_auto_size=64k" could be something reasonable to set. We could have an >> arch+hw specific default. > > Ahha, yes, that's essentially what I have above. I personally also like the idea > of the limit being an absolute value rather than an order. Although I know Yu > feels differently (see [1]). Exposed to user space I think it should be a human-readable value. Inside the kernel, I don't particularly care. (Having databases/VMs on arch64 with 64k in mind) I think it might be interesting to have something like the following: thp=madvise max_auto_size=64k/128k/256k So in MADV_HUGEPAGE VMAs (such as under QEMU), we'd happily take any flexible THP, especially ones < PMD THP (512 MiB) as well. 2 MiB or 4 MiB THP? sure, give them to my VM. You're barely going to find 512 MiB THP either way in practice .... But for the remainder of my system, just do something reasonable and don't go crazy on the memory waste. I'll try reading all the previous discussions next week. -- Cheers, David / dhildenb From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 878BAEB64D9 for ; Fri, 7 Jul 2023 19:07:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type: Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:Subject:From:References:Cc:To: MIME-Version:Date:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=gsGEHp+olE6SMjFxpjm9xn7fkKBPkfFVmmZUFQnNyKQ=; b=K6NaI6WzqyuQZc uHBZDElDcYvgFqqUEJ3Ji9sua1rKdH2PY5yZe9J3CHV183Yy1tUMdMPhYNHfSr1wdBwwG2DiJZw5I toRnq39yNx7yLlDTc4EApLT6hKy7DZ19mxJB0ohll8nSsfo+3W1wkvUT5gTX2suUb+qysaHJKUUGP VjHgEtJttPZmy8PNzdZF9qy1GhtqOzrrh5xzVQ63jTIykQ5WirPi3Nsuz0b66GdIJOw2xVnCIbbrA v7zNgiWdwfm9/BxvUdcTS3eiQPMUAEGeZZxscg8sSeD8kdPnZAD5KqVVYX75Aj1v/LEScknnyMhSs //bNQBGR9zbhyYGmpj5g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qHqnA-005URl-38; Fri, 07 Jul 2023 19:06:56 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qHqn8-005UQv-36 for linux-arm-kernel@lists.infradead.org; Fri, 07 Jul 2023 19:06:56 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688756814; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=paIBB7sUf/Zqn/4VcL295pzVpCyEKoMCoSkthIK/cDY=; b=THChHSAnqwS6cBBPykWrIMzh3uqPke6RR5Fdb1jCudDctKwlgw3/TUI9VWK2AUbE8BPI9e N7y3/hECVDJooVwKa0utolZyr20MQDJBSGOLZ+gSsV0i/V0DZs4ZmcfMXCYgHaYDE4tBRw 5VMx2/tpVNj93Vam5q2SvUEms4qIsTE= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-308-q1TEoaVDOHy8Vz5Au107Gg-1; Fri, 07 Jul 2023 15:06:52 -0400 X-MC-Unique: q1TEoaVDOHy8Vz5Au107Gg-1 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-3f5df65fa35so12676915e9.3 for ; Fri, 07 Jul 2023 12:06:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688756811; x=1691348811; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=paIBB7sUf/Zqn/4VcL295pzVpCyEKoMCoSkthIK/cDY=; b=PzCyxW7M+tM2DAm1yasnFsMPb3nkCoPJCC4tznWCvbkau1U4dZgXAgiLCh5m9melOZ uFEF07CEkitjtRu8aRsK+Rp0qNDtmde6uH6ejEyeq1qQE7RXfgkll/TztD83r3ZSVfS2 Srt0LocB4uYbRTqrr6yg51it3PrfNW7TyMtUx6LGWp+RIkN0Y+wW6QQAfZPSskGp6aRK WihjgVBt2GcEdyye+P1/pmXVEVn7WqsfOo1v1XbrfaD7eS4MgFH9VYlSelTTHEB9mX5d 1hGhUDRe5aAX867e4XUpyXydqzBl8rbZ0KxJPidfW/wW0ZvHKAkR9ZuUClB3YTpQv9dT VXXg== X-Gm-Message-State: ABy/qLZHsyHjtFN03UoxKwC9W05tLpblSlSGEcekbiFbtUV5/r1PyT+4 Y2PBUsDWtyuVzC98xW0gFqVTTOeDBCuyt0dvjFcWWCH6NxX59G14lzoQtbu4rdzSJab7bmOqTBN N7n9gsUZ1+BQGAlnZFT9PXZzYjKJlTe13ZDo= X-Received: by 2002:a7b:c3d5:0:b0:3f6:91c:4e86 with SMTP id t21-20020a7bc3d5000000b003f6091c4e86mr4429229wmj.3.1688756811017; Fri, 07 Jul 2023 12:06:51 -0700 (PDT) X-Google-Smtp-Source: APBJJlFZCPQ0FYZqAu9VBMOE7vXIKu8edRl0ks2ySLHo/6XMBlpctAYBJaE1EDQ7rgfWGU/ApHetcA== X-Received: by 2002:a7b:c3d5:0:b0:3f6:91c:4e86 with SMTP id t21-20020a7bc3d5000000b003f6091c4e86mr4429208wmj.3.1688756810620; Fri, 07 Jul 2023 12:06:50 -0700 (PDT) Received: from ?IPV6:2003:d8:2f04:3c00:248f:bf5b:b03e:aac7? (p200300d82f043c00248fbf5bb03eaac7.dip0.t-ipconnect.de. [2003:d8:2f04:3c00:248f:bf5b:b03e:aac7]) by smtp.gmail.com with ESMTPSA id c26-20020a7bc01a000000b003f7f475c3bcsm10392928wmb.1.2023.07.07.12.06.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 07 Jul 2023 12:06:49 -0700 (PDT) Message-ID: Date: Fri, 7 Jul 2023 21:06:48 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 To: Ryan Roberts , Matthew Wilcox Cc: "Huang, Ying" , Andrew Morton , "Kirill A. Shutemov" , Yin Fengwei , Yu Zhao , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230703135330.1865927-1-ryan.roberts@arm.com> <20230703135330.1865927-5-ryan.roberts@arm.com> <87edlkgnfa.fsf@yhuang6-desk2.ccr.corp.intel.com> <44e60630-5e9d-c8df-ab79-cb0767de680e@arm.com> <524bacd2-4a47-2b8b-6685-c46e31a01631@redhat.com> <1e406f04-78ef-6573-e1f1-f0d0e0d5246a@redhat.com> <9dd036a8-9ba3-0cc4-b791-cb3178237728@arm.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 4/5] mm: FLEXIBLE_THP for improved performance In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230707_120655_095386_B340E194 X-CRM114-Status: GOOD ( 25.86 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org >>> I still feel that it would be better for the thp and large anon folio controls >>> to be independent though - what's the argument for tying them together? >> >> Thinking about desired 2 MiB flexible THP on aarch64 (64k kernel) vs, 2 MiB PMD >> THP on aarch64 (4k kernel), how are they any different? Just the way they are >> mapped ... > > The last patch in the series shows my current approach to that: > > int arch_wants_pte_order(struct vm_area_struct *vma) > { > if (hugepage_vma_check(vma, vma->vm_flags, false, true, true)) > return CONFIG_ARM64_PTE_ORDER_THP; <<< always the contpte size > else > return CONFIG_ARM64_PTE_ORDER_NOTHP; <<< limited to 64K > } > > But Yu has raised concerns that this type of policy needs to be in the core mm. > So we could have the arch blindly return the preferred order from HW perspective > (which would be contpte size for arm64). Then for !hugepage_vma_check(), mm > could take the min of that value and some determined "acceptable" limit (which > in my mind is 64K ;-). Yeah, it's really tricky. Because why should arm64 with 64k base pages *not* return 2MiB (which is one possible cont-pte size IIRC) ? I share the idea that 64k might *currently* on *some platforms* be a reasonable choice. But that's where the "fun" begins. > >> >> It's easy to say "64k vs. 2 MiB" is a difference and we want separate controls, >> but how is "2MiB vs. 2 MiB" different? >> >> Having that said, I think we have to make up our mind how much control we want >> to give user space. Again, the "2MiB vs. 2 MiB" case nicely shows that it's not >> trivial: memory waste is a real issue on some systems where we limit THP to >> madvise(). >> >> >> Just throwing it out for discussing: >> >> What about keeping the "all / madvise / never" semantics (and MADV_NOHUGEPAGE >> ...) but having an additional config knob that specifies in which cases we >> *still* allow flexible THP even though the system was configured for "madvise". >> >> I can't come up with a good name for that, but something like >> "max_auto_size=64k" could be something reasonable to set. We could have an >> arch+hw specific default. > > Ahha, yes, that's essentially what I have above. I personally also like the idea > of the limit being an absolute value rather than an order. Although I know Yu > feels differently (see [1]). Exposed to user space I think it should be a human-readable value. Inside the kernel, I don't particularly care. (Having databases/VMs on arch64 with 64k in mind) I think it might be interesting to have something like the following: thp=madvise max_auto_size=64k/128k/256k So in MADV_HUGEPAGE VMAs (such as under QEMU), we'd happily take any flexible THP, especially ones < PMD THP (512 MiB) as well. 2 MiB or 4 MiB THP? sure, give them to my VM. You're barely going to find 512 MiB THP either way in practice .... But for the remainder of my system, just do something reasonable and don't go crazy on the memory waste. I'll try reading all the previous discussions next week. -- Cheers, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel