From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79BE9EB64D9 for ; Fri, 7 Jul 2023 11:41:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232203AbjGGLll (ORCPT ); Fri, 7 Jul 2023 07:41:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43832 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231950AbjGGLli (ORCPT ); Fri, 7 Jul 2023 07:41:38 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 828BE170F for ; Fri, 7 Jul 2023 04:40:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688730057; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=B42AZOQjOcVF/Y0j7fMumdgD1JfYw1Jno81BLnsFXtE=; b=E5S7DvcdtJXwUCsX5ztKp6kcht8lnHs3AiRtY/GAoc7w1haYg0ByzOkMtJpv8vzhp7CR4h ndljfaVxPY7m9kN8xNSmv1JeeAztIyJZt9zqa372/P7kRFUhekIdJ4PSJFbP8n5r1Gx4M1 TNCGXCoKyi+SWT4DAkFYkW9RFRWmwpU= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-347-xPOV0Ir4PoSDZaQSrMjiqg-1; Fri, 07 Jul 2023 07:40:56 -0400 X-MC-Unique: xPOV0Ir4PoSDZaQSrMjiqg-1 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-3f5df65f9f4so10715725e9.2 for ; Fri, 07 Jul 2023 04:40:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688730055; x=1691322055; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=B42AZOQjOcVF/Y0j7fMumdgD1JfYw1Jno81BLnsFXtE=; b=JuIfO3LUwWowqpOiD9Dqngkikg0ZE6hjuZvSh5hsir+rNXlWnckmRwPfSZVJedDO+6 AE19ns4uXrlUP5mHca7ghkXg1a6InCZmc1W30WT5rTpmVCDGpBAQI3yrz0eElT5tAyF9 S6qugQ7MnFCPWO3BW/HVe2O9RFLHNQ4i095UgWDuiB7ZpPaOWkkUn+eEnpa63r3e4g4g H2UWSmo6qZ1cCZjZXIletrKD/nW/r3pRh/uHr4oPSVVFKJhgNxsxLK931fpgRrkKUBgQ 33UuWCUmiqaCtRtdQwk4C3V5RWHdbyxQ2FzWdagjJT9gNQsy5NUNQa8XjSXDkr02nUjr 7FmQ== X-Gm-Message-State: ABy/qLacsS8stK7C9z+8teDrzlMGyoNqKPRMZOAPzqK12pUb0FlJ023t r2T16+RR84Bz/zHfnc996cbf8pXorh051eOrL/z7g2/R43OJ6QH/GfhUaz99IutV1lrf8+5O94d FSpTIa2j1Ia8DXwto+YSMO9tB X-Received: by 2002:a05:600c:2901:b0:3fb:e189:3532 with SMTP id i1-20020a05600c290100b003fbe1893532mr3613106wmd.20.1688730055295; Fri, 07 Jul 2023 04:40:55 -0700 (PDT) X-Google-Smtp-Source: APBJJlE2u5GKvVIRg71UDKdc/YaTynxzfOeVJPhDExv67pRdggkX9UXBnpHZW0fkzXhIX/EU8tjPNA== X-Received: by 2002:a05:600c:2901:b0:3fb:e189:3532 with SMTP id i1-20020a05600c290100b003fbe1893532mr3613084wmd.20.1688730054926; Fri, 07 Jul 2023 04:40:54 -0700 (PDT) Received: from ?IPV6:2003:d8:2f04:3c00:248f:bf5b:b03e:aac7? (p200300d82f043c00248fbf5bb03eaac7.dip0.t-ipconnect.de. [2003:d8:2f04:3c00:248f:bf5b:b03e:aac7]) by smtp.gmail.com with ESMTPSA id q15-20020a7bce8f000000b003fbe791a0e8sm2299013wmj.0.2023.07.07.04.40.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 07 Jul 2023 04:40:54 -0700 (PDT) Message-ID: Date: Fri, 7 Jul 2023 13:40:53 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Content-Language: en-US To: Ryan Roberts , Andrew Morton , Matthew Wilcox , "Kirill A. Shutemov" , Yin Fengwei , Yu Zhao , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230703135330.1865927-1-ryan.roberts@arm.com> <78159ed0-a233-9afb-712f-2df1a4858b22@redhat.com> <4d4c45a2-0037-71de-b182-f516fee07e67@arm.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 0/5] variable-order, large folios for anonymous memory In-Reply-To: <4d4c45a2-0037-71de-b182-f516fee07e67@arm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06.07.23 10:02, Ryan Roberts wrote: > On 05/07/2023 20:38, David Hildenbrand wrote: >> On 03.07.23 15:53, Ryan Roberts wrote: >>> Hi All, >>> >>> This is v2 of a series to implement variable order, large folios for anonymous >>> memory. The objective of this is to improve performance by allocating larger >>> chunks of memory during anonymous page faults. See [1] for background. >>> > > [...] > >>> Thanks, >>> Ryan >> >> Hi Ryan, >> >> is page migration already working as expected (what about page compaction?), and >> do we handle migration -ENOMEM when allocating a target page: do we split an >> fallback to 4k page migration? >> > > Hi David, All, Hi Ryan, thanks a lot for the list. But can you comment on the page migration part (IOW did you try it already)? For example, memory hotunplug, CMA, MCE handling, compaction all rely on page migration of something that was allocated using GFP_MOVABLE to actually work. Compaction seems to skip any higher-order folios, but the question is if the udnerlying migration itself works. If it already works: great! If not, this really has to be tackled early, because otherwise we'll be breaking the GFP_MOVABLE semantics. > > This series aims to be the bare minimum to demonstrate allocation of large anon > folios. As such, there is a laundry list of things that need to be done for this > feature to play nicely with other features. My preferred route is to merge this > with it's Kconfig defaulted to disabled, and its Kconfig description clearly > shouting that it's EXPERIMENTAL with an explanation of why (similar to > READ_ONLY_THP_FOR_FS). As long as we are not sure about the user space control and as long as basic functionality is not working (example, page migration), I would tend to not merge this early just for the sake of it. But yes, something like mlock can eventually be tackled later: as long as there is a runtime interface to disable it ;) > > That said, I've put together a table of the items that I'm aware of that need > attention. It would be great if people can review and add any missing items. > Then we can hopefully parallelize the implementation work. David, I don't think > the items you raised are covered - would you mind providing a bit more detail so > I can add them to the list? (or just add them to the list yourself, if you prefer). > > --- > > - item: > mlock > > description: >- > Large, pte-mapped folios are ignored when mlock is requested. Code comment > for mlock_vma_folio() says "...filter out pte mappings of THPs, which > cannot be consistently counted: a pte mapping of the THP head cannot be > distinguished by the page alone." > > location: > - mlock_pte_range() > - mlock_vma_folio() > > assignee: > Yin, Fengwei > > > - item: > numa balancing > > description: >- > Large, pte-mapped folios are ignored by numa-balancing code. Commit > comment (e81c480): "We're going to have THP mapped with PTEs. It will > confuse numabalancing. Let's skip them for now." > > location: > - do_numa_page() > > assignee: > > > > - item: > madvise > > description: >- > MADV_COLD, MADV_PAGEOUT, MADV_FREE: For large folios, code assumes > exclusive only if mapcount==1, else skips remainder of operation. For > large, pte-mapped folios, exclusive folios can have mapcount upto nr_pages > and still be exclusive. Even better; don't split the folio if it fits > entirely within the range? Discussion at > > https://lore.kernel.org/linux-mm/6cec6f68-248e-63b4-5615-9e0f3f819a0a@redhat.com/ > talks about changing folio mapcounting - may help determine if exclusive > without pgtable scan? > > location: > - madvise_cold_or_pageout_pte_range() > - madvise_free_pte_range() > > assignee: > > > > - item: > shrink_folio_list > > description: >- > Raised by Yu Zhao; I can't see the problem in the code - need > clarification > > location: > - shrink_folio_list() > > assignee: > > > > - item: > compaction > > description: >- > Raised at LSFMM: Compaction skips non-order-0 pages. Already problem for > page-cache pages today. Is my understand correct? > > location: > - > > assignee: > I'm still thinking about the whole mapcount thingy (and I burned way too much time on that yesterday), which is a big item for such a list and affects some of these items. A pagetable scan is pretty much irrelevant for order-2 pages. But once we're talking about higher orders we really don't want to do that. I'm preparing a writeup with users and challenges. Is swapping working as expected? zswap? -- Cheers, David / dhildenb From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C404AC001DC for ; Fri, 7 Jul 2023 11:41:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type: Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:Subject:From:References:Cc:To: MIME-Version:Date:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=sHz2vnY62LpQFJ+Tu/LKDaAzEsxUvtsrcC8DaH6AUGs=; b=BIMaUr/ipCPCfo UXGBjxPZajRjWPUPH8lZRpxtdfFOy0Xsx1WwPpg/4zGd4CtSISNz7mdiPgF0tbiRtConflYAr+UoJ 3sfdBZS6DCTiDyW7z97itIfAr14ljHmpBtcxFTe5cYNifKhVwbKqb1qiqZx3Q9HjHBaz1Uylwf+Aw 3g0IjvYrrPgF5g3/oXjQ/D4qnCyBe3/wl76fK3iKnKvdiAB2v5/m9qhthLKxW1I6HobYBTD3kS9DV +EXSyq91uu8g8LPgsDHcho0YvBlo8DTcfydfxTw5eCvYgEqzWLkdTdtm+06gZe3L7j6U1nUxfDA+Q uF+nQ3ossMbyf8kfuL4A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qHjph-004XLW-1q; Fri, 07 Jul 2023 11:41:05 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qHjpd-004XJG-2i for linux-arm-kernel@lists.infradead.org; Fri, 07 Jul 2023 11:41:03 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688730060; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=B42AZOQjOcVF/Y0j7fMumdgD1JfYw1Jno81BLnsFXtE=; b=IVMNFZVNu1Gktky+Q0gqTVJ8SCoks2VSDpmObwUzEW8JkWVj/fcGK8hhpg3X1IJ8wtTJ+l DcTdple9h2wrnnRr6Hn8a4Tc2yDzARAIGUbhm3yMDybq/Ipv6VvJ8kn8c2j25E2jOq7+Eq WhYxuuLnYsZ4O2UcCPWws1bF/3ZH6ow= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-347-N3IUS10bM3y20ClRiFUL5Q-1; Fri, 07 Jul 2023 07:40:56 -0400 X-MC-Unique: N3IUS10bM3y20ClRiFUL5Q-1 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-3faabd8fd33so10814975e9.0 for ; Fri, 07 Jul 2023 04:40:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688730055; x=1691322055; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=B42AZOQjOcVF/Y0j7fMumdgD1JfYw1Jno81BLnsFXtE=; b=Ax5fsF5fldpRKkGEFKkcGkSnSse/0l+NsVhSAH1sJ3Eo/YNcDw62EJMAqNe0R+JG8D 4ECIViVjrLJoZgQsDBjRRAX+4t02UmTGB12d80HnK/8SR7LMkz/X25JOZH9J5tJIz7t+ LvYxNI8DtWLuhC6l1SrVjnwaTtgpmNmWa2FKAfURhOraMsJQZE7MKhZ1Zkd4r9PI45Au 97okihbka+5yQQJK94lsYGYpSZ92Tv58U5ZX+dryhaDhWW1jqJJVnuPyZ5MoUWMhozOA GWgxUCaAO0dlmb+AcDVL4LPmbtEUd7H83A9A2BKGilQGec4lVll+k0BozyBt+CPCkfSk koqg== X-Gm-Message-State: ABy/qLYmHqUmwxOuInBq8oI4TrFcoc+WFrMZsGhVhc373iRTLboh2tuk pqrOrRf7wRxWADcuV+ZlvF7a0lyTIZRf84WCPV5uuUeAFr6I8DUYfwr3BiVl+5WUE9IqA+QG9T3 1BNxbkH+9U9a3syOqjJmZlgEZvsOE5jp8CFc= X-Received: by 2002:a05:600c:2901:b0:3fb:e189:3532 with SMTP id i1-20020a05600c290100b003fbe1893532mr3613100wmd.20.1688730055294; Fri, 07 Jul 2023 04:40:55 -0700 (PDT) X-Google-Smtp-Source: APBJJlE2u5GKvVIRg71UDKdc/YaTynxzfOeVJPhDExv67pRdggkX9UXBnpHZW0fkzXhIX/EU8tjPNA== X-Received: by 2002:a05:600c:2901:b0:3fb:e189:3532 with SMTP id i1-20020a05600c290100b003fbe1893532mr3613084wmd.20.1688730054926; Fri, 07 Jul 2023 04:40:54 -0700 (PDT) Received: from ?IPV6:2003:d8:2f04:3c00:248f:bf5b:b03e:aac7? (p200300d82f043c00248fbf5bb03eaac7.dip0.t-ipconnect.de. [2003:d8:2f04:3c00:248f:bf5b:b03e:aac7]) by smtp.gmail.com with ESMTPSA id q15-20020a7bce8f000000b003fbe791a0e8sm2299013wmj.0.2023.07.07.04.40.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 07 Jul 2023 04:40:54 -0700 (PDT) Message-ID: Date: Fri, 7 Jul 2023 13:40:53 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 To: Ryan Roberts , Andrew Morton , Matthew Wilcox , "Kirill A. Shutemov" , Yin Fengwei , Yu Zhao , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230703135330.1865927-1-ryan.roberts@arm.com> <78159ed0-a233-9afb-712f-2df1a4858b22@redhat.com> <4d4c45a2-0037-71de-b182-f516fee07e67@arm.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 0/5] variable-order, large folios for anonymous memory In-Reply-To: <4d4c45a2-0037-71de-b182-f516fee07e67@arm.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230707_044102_036224_343C272E X-CRM114-Status: GOOD ( 38.67 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 06.07.23 10:02, Ryan Roberts wrote: > On 05/07/2023 20:38, David Hildenbrand wrote: >> On 03.07.23 15:53, Ryan Roberts wrote: >>> Hi All, >>> >>> This is v2 of a series to implement variable order, large folios for anonymous >>> memory. The objective of this is to improve performance by allocating larger >>> chunks of memory during anonymous page faults. See [1] for background. >>> > > [...] > >>> Thanks, >>> Ryan >> >> Hi Ryan, >> >> is page migration already working as expected (what about page compaction?), and >> do we handle migration -ENOMEM when allocating a target page: do we split an >> fallback to 4k page migration? >> > > Hi David, All, Hi Ryan, thanks a lot for the list. But can you comment on the page migration part (IOW did you try it already)? For example, memory hotunplug, CMA, MCE handling, compaction all rely on page migration of something that was allocated using GFP_MOVABLE to actually work. Compaction seems to skip any higher-order folios, but the question is if the udnerlying migration itself works. If it already works: great! If not, this really has to be tackled early, because otherwise we'll be breaking the GFP_MOVABLE semantics. > > This series aims to be the bare minimum to demonstrate allocation of large anon > folios. As such, there is a laundry list of things that need to be done for this > feature to play nicely with other features. My preferred route is to merge this > with it's Kconfig defaulted to disabled, and its Kconfig description clearly > shouting that it's EXPERIMENTAL with an explanation of why (similar to > READ_ONLY_THP_FOR_FS). As long as we are not sure about the user space control and as long as basic functionality is not working (example, page migration), I would tend to not merge this early just for the sake of it. But yes, something like mlock can eventually be tackled later: as long as there is a runtime interface to disable it ;) > > That said, I've put together a table of the items that I'm aware of that need > attention. It would be great if people can review and add any missing items. > Then we can hopefully parallelize the implementation work. David, I don't think > the items you raised are covered - would you mind providing a bit more detail so > I can add them to the list? (or just add them to the list yourself, if you prefer). > > --- > > - item: > mlock > > description: >- > Large, pte-mapped folios are ignored when mlock is requested. Code comment > for mlock_vma_folio() says "...filter out pte mappings of THPs, which > cannot be consistently counted: a pte mapping of the THP head cannot be > distinguished by the page alone." > > location: > - mlock_pte_range() > - mlock_vma_folio() > > assignee: > Yin, Fengwei > > > - item: > numa balancing > > description: >- > Large, pte-mapped folios are ignored by numa-balancing code. Commit > comment (e81c480): "We're going to have THP mapped with PTEs. It will > confuse numabalancing. Let's skip them for now." > > location: > - do_numa_page() > > assignee: > > > > - item: > madvise > > description: >- > MADV_COLD, MADV_PAGEOUT, MADV_FREE: For large folios, code assumes > exclusive only if mapcount==1, else skips remainder of operation. For > large, pte-mapped folios, exclusive folios can have mapcount upto nr_pages > and still be exclusive. Even better; don't split the folio if it fits > entirely within the range? Discussion at > > https://lore.kernel.org/linux-mm/6cec6f68-248e-63b4-5615-9e0f3f819a0a@redhat.com/ > talks about changing folio mapcounting - may help determine if exclusive > without pgtable scan? > > location: > - madvise_cold_or_pageout_pte_range() > - madvise_free_pte_range() > > assignee: > > > > - item: > shrink_folio_list > > description: >- > Raised by Yu Zhao; I can't see the problem in the code - need > clarification > > location: > - shrink_folio_list() > > assignee: > > > > - item: > compaction > > description: >- > Raised at LSFMM: Compaction skips non-order-0 pages. Already problem for > page-cache pages today. Is my understand correct? > > location: > - > > assignee: > I'm still thinking about the whole mapcount thingy (and I burned way too much time on that yesterday), which is a big item for such a list and affects some of these items. A pagetable scan is pretty much irrelevant for order-2 pages. But once we're talking about higher orders we really don't want to do that. I'm preparing a writeup with users and challenges. Is swapping working as expected? zswap? -- Cheers, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel