From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A21F6C433DF for ; Fri, 21 Aug 2020 10:15:28 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7208B207BB for ; Fri, 21 Aug 2020 10:15:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="M6rmpRRr" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7208B207BB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 2E2AF135507D1; Fri, 21 Aug 2020 03:15:28 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=207.211.31.120; helo=us-smtp-1.mimecast.com; envelope-from=david@redhat.com; receiver= Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 5C26C1348A271 for ; Fri, 21 Aug 2020 03:15:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1598004924; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=PmsTrWC7QL0R7Dl8wMRdta4a1oPLZaekFOdsDnhXsJY=; b=M6rmpRRrT+iXLXG/8wXowAlhZLctSt2Itd3O7NNoe2FglW5SB6qNteQ3M7dDXwDlTH6YqQ uQS47twzsvXRwcxWtdrPwO80XYyt8bseUuH5sHKk60+9tFGa6DE2JntbgBMmJTOuUzQC+n I9AZC+1JL9I44mCJfnVGejP9WsfEKmA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-85-UcZwKt8kPgaJ_9wxSbWINA-1; Fri, 21 Aug 2020 06:15:20 -0400 X-MC-Unique: UcZwKt8kPgaJ_9wxSbWINA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A8B2A80733B; Fri, 21 Aug 2020 10:15:14 +0000 (UTC) Received: from [10.36.114.87] (ovpn-114-87.ams2.redhat.com [10.36.114.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1404219C78; Fri, 21 Aug 2020 10:15:03 +0000 (UTC) Subject: Re: [PATCH v4 00/23] device-dax: Support sub-dividing soft-reserved ranges To: Dan Williams References: <159643094279.4062302.17779410714418721328.stgit@dwillia2-desk3.amr.corp.intel.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; prefer-encrypt=mutual; keydata= mQINBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABtCREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT6JAlgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63W5Ag0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAGJAjwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat GmbH Message-ID: <6af3de0d-ffdc-8942-3922-ebaeef20dd63@redhat.com> Date: Fri, 21 Aug 2020 12:15:03 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Message-ID-Hash: HFUIOM6D33FY5OONPNVZCC7BVHYJ4NTO X-Message-ID-Hash: HFUIOM6D33FY5OONPNVZCC7BVHYJ4NTO X-MailFrom: david@redhat.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation CC: Andrew Morton , Ard Biesheuvel , Mike Rapoport , Borislav Petkov , David Airlie , Will Deacon , Catalin Marinas , Ard Biesheuvel , Joao Martins , Tom Lendacky , "Rafael J. Wysocki" , Jonathan Cameron , X86 ML , "H. Peter Anvin" , Thomas Gleixner , Greg Kroah-Hartman , Pavel Tatashin , Peter Zijlstra , Ben Skeggs , Benjamin Herrenschmidt , Jason Gunthorpe , Jia He , Ingo Molnar , Dave Hansen , Paul Mackerras , Brice Goglin , Michael Ellerman , "Rafael J. Wysocki" , Daniel Vetter , Andy Lutomirski , "Rafael J. Wysocki" , Linux MM , linux-nvdimm , Linux Kernel Mailing List , Linux ACPI , Maling list - DRI developers X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit >> >> 1. On x86-64, e820 indicates "soft-reserved" memory. This memory is not >> automatically used in the buddy during boot, but remains untouched >> (similar to pmem). But as it involves ACPI as well, it could also be >> used on arm64 (-e820), correct? > > Correct, arm64 also gets the EFI support for enumerating memory this > way. However, I would clarify that whether soft-reserved is given to > the buddy allocator by default or not is the kernel's policy choice, > "buddy-by-default" is ok and is what will happen anyways with older > kernels on platforms that enumerate a memory range this way. Is "soft-reserved" then the right terminology for that? It sounds very x86-64/e820 specific. Maybe a compressed for of "performance differentiated memory" might be a better fit to expose to user space, no? > >> 2. Soft-reserved memory is volatile RAM with differing performance >> characteristics ("performance differentiated memory"). What would be >> examples of such memory? > > Likely the most prominent one that drove the creation of the "EFI > Specific Purpose" attribute bit is high-bandwidth memory. One concrete > example of that was a platform called Knights Landing [1] that ended > up shipping firmware that lied to the OS about the latency > characteristics of the memory to try to reverse engineer OS behavior > to not allocate from that memory range by default. With the EFI > attribute firmware performance tables can tell the truth about the > performance characteristics of the memory range *and* indicate that > the OS not use it for general purpose allocations by default. Thanks for clarifying! > > [1]: https://software.intel.com/content/www/us/en/develop/blogs/an-intro-to-mcdram-high-bandwidth-memory-on-knights-landing.html > >> Like, memory that is faster than RAM (scratch >> pad), or slower (pmem)? Or both? :) > > Both, but note that PMEM is already hard-reserved by default. > Soft-reserved is about a memory range that, for example, an > administrator may want to reserve 100% for a weather simulation where > if even a small amount of memory was stolen for the page cache the > application may not meet its performance targets. It could also be a > memory range that is so slow that only applications with higher > latency tolerances would be prepared to consume it. > > In other words the soft-reserved memory can be used to indicate memory > that is either too precious, or too slow for general purpose OS > allocations. Right, so actually performance-differentiated in any way :) > >> Is it a valid use case to use pmem >> in a hypervisor to back this memory? > > Depends on the pmem. That performance capability is indicated by the > ACPI HMAT, not the EFI soft-reserved designation. > >> 3. There seem to be use cases where "soft-reserved" memory is used via >> DAX. What is an example use case? I assume it's *not* to treat it like >> PMEM but instead e.g., use it as a fast buffer inside applications or >> similar. > > Right, in that weather-simulation example that application could just > mmap /dev/daxX.Y and never worry about contending for the "fast > memory" resource on the platform. Alternatively if that resource needs > to be shared and/or over-commited then kernel memory-management > services are needed and that dax-device can be assigned to kmem. > >> 4. There seem to be use cases where some part of "soft-reserved" memory >> is used via DAX, some other is given to the buddy. What is an example >> use case? Is this really necessary or only some theoretical use case? > > It's as necessary as pmem namespace partitioning, or the inclusion of > dax-kmem upstream in the first place. In that kmem case the motivation > was that some users want a portion of pmem provisioned for storage and > some for volatile usage. The motivation is similar here, platform > firmware can only identify memory attributes on coarse boundaries, > finer grained provisioning decisions are up to the administrator / > platform-owner and the kernel is a just a facilitator of that policy. > >> >> 5. The "provisioned along performance relevant address boundaries." part >> is unclear to me. Can you give an example of how this would look like >> from user space? Like, split that memory in blocks of size X with >> alignment Y and give them to separate applications? > > One example of platform address boundaries are the memory address > ranges that alias in a direct-mapped memory-side-cache. In the > direct-map-cache aliasing may repeat every N GBs where N is the ratio > of far-to-near memory. ("Near memory" == cache "Far memory" == > backing memory). Also refer back to the background in the page > allocator shuffling patches [2]. With this partitioning mechanism you > could, for one example use case, assign different VMs to exclusive > colors in the memory side cache. Interesting, thanks! > > [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e900a918b098 > >> 6. If you add such memory to the buddy, is there any way the system can >> differentiate it from other memory? E.g., via fake/other NUMA nodes? > > Numa node numbers / are how performance differentiated memory ranges > are enumerated. The expectation is that all distinct performance > memory targets have unique ACPI proximity domains and Linux numa node > numbers as a result. Makes sense to me (although it's somehow weird, because memory of the same socket/node would be represented via different NUMA nodes), thanks! > >> Also, can you give examples of how kmem-added memory is represented in >> /proc/iomem for a) pmem and b) soft-resered memory after this series >> (skimming over the patches, I think there is a change for pmem, right?)? > > I don't expect a change. The only difference is the parent resource > will be marked "Soft Reserved" instead of "Persistent Memory". Right, I misread patch #11 while skimming - I thought the device resource would be dropped. > >> I am really wondering if it's the right approach to squeeze this into >> our pmem/nvdimm infrastructure just because it's easy to do. E.g., man >> "ndctl" - "ndctl - Manage "libnvdimm" subsystem devices (Non-volatile >> Memory)" speaks explicitly about non-volatile memory. > > In fact it's not squeezed into PMEM infrastructure. dax-kmem and > device-dax are independent of PMEM. PMEM is one source of potential > device-dax instances, soft-reserved memory is another orthogonal > source. This is why device-dax needs its own userspace policy directed > partitioning mechanism because there is no PMEM to store the > configuration for partitioned higph-bandwidth memory. The userspace > tooling for this mechanism is targeted for a tool called daxctl that > has no PMEM dependencies. Look to Joao's use case that is using this > infrastructure independent of PMEM with manual soft-reservations > specified on the kernel command-line. Thanks for clarifying, I was under the impression we would be reusing libnvdimm to manage that memory. -- Thanks, David / dhildenb _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE924C433E4 for ; Fri, 21 Aug 2020 10:15:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7BE6620FC3 for ; Fri, 21 Aug 2020 10:15:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PZCvgQWW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728489AbgHUKP2 (ORCPT ); Fri, 21 Aug 2020 06:15:28 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:35767 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726983AbgHUKP2 (ORCPT ); Fri, 21 Aug 2020 06:15:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1598004925; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=PmsTrWC7QL0R7Dl8wMRdta4a1oPLZaekFOdsDnhXsJY=; b=PZCvgQWWHNEWMwZwNdpqdF44SZotdvW7gDo1cNgjmaAsSfsAL7Q84dROxAsTG5iuu0/vGN 1Zs2qm0nmEEbUSf3+0T6vo2EoEQoY2d2/JmULxFoHvUSIQC2CeJcePaLZyzjpJy+1lwO8L C3lyuaBvYpcDj63jqStPBRg14Z0yo8c= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-85-UcZwKt8kPgaJ_9wxSbWINA-1; Fri, 21 Aug 2020 06:15:20 -0400 X-MC-Unique: UcZwKt8kPgaJ_9wxSbWINA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A8B2A80733B; Fri, 21 Aug 2020 10:15:14 +0000 (UTC) Received: from [10.36.114.87] (ovpn-114-87.ams2.redhat.com [10.36.114.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1404219C78; Fri, 21 Aug 2020 10:15:03 +0000 (UTC) Subject: Re: [PATCH v4 00/23] device-dax: Support sub-dividing soft-reserved ranges To: Dan Williams Cc: Andrew Morton , Ira Weiny , Ard Biesheuvel , Mike Rapoport , Borislav Petkov , Vishal Verma , David Airlie , Will Deacon , Catalin Marinas , Ard Biesheuvel , Joao Martins , Tom Lendacky , Dave Jiang , "Rafael J. Wysocki" , Jonathan Cameron , Wei Yang , X86 ML , "H. Peter Anvin" , Thomas Gleixner , Greg Kroah-Hartman , Pavel Tatashin , Peter Zijlstra , Ben Skeggs , Benjamin Herrenschmidt , Jason Gunthorpe , Jia He , Ingo Molnar , Dave Hansen , Paul Mackerras , Brice Goglin , Jeff Moyer , Michael Ellerman , "Rafael J. Wysocki" , Daniel Vetter , Andy Lutomirski , "Rafael J. Wysocki" , Linux MM , linux-nvdimm , Linux Kernel Mailing List , Linux ACPI , Maling list - DRI developers References: <159643094279.4062302.17779410714418721328.stgit@dwillia2-desk3.amr.corp.intel.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; prefer-encrypt=mutual; keydata= mQINBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABtCREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT6JAlgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63W5Ag0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAGJAjwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat GmbH Message-ID: <6af3de0d-ffdc-8942-3922-ebaeef20dd63@redhat.com> Date: Fri, 21 Aug 2020 12:15:03 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org >> >> 1. On x86-64, e820 indicates "soft-reserved" memory. This memory is not >> automatically used in the buddy during boot, but remains untouched >> (similar to pmem). But as it involves ACPI as well, it could also be >> used on arm64 (-e820), correct? > > Correct, arm64 also gets the EFI support for enumerating memory this > way. However, I would clarify that whether soft-reserved is given to > the buddy allocator by default or not is the kernel's policy choice, > "buddy-by-default" is ok and is what will happen anyways with older > kernels on platforms that enumerate a memory range this way. Is "soft-reserved" then the right terminology for that? It sounds very x86-64/e820 specific. Maybe a compressed for of "performance differentiated memory" might be a better fit to expose to user space, no? > >> 2. Soft-reserved memory is volatile RAM with differing performance >> characteristics ("performance differentiated memory"). What would be >> examples of such memory? > > Likely the most prominent one that drove the creation of the "EFI > Specific Purpose" attribute bit is high-bandwidth memory. One concrete > example of that was a platform called Knights Landing [1] that ended > up shipping firmware that lied to the OS about the latency > characteristics of the memory to try to reverse engineer OS behavior > to not allocate from that memory range by default. With the EFI > attribute firmware performance tables can tell the truth about the > performance characteristics of the memory range *and* indicate that > the OS not use it for general purpose allocations by default. Thanks for clarifying! > > [1]: https://software.intel.com/content/www/us/en/develop/blogs/an-intro-to-mcdram-high-bandwidth-memory-on-knights-landing.html > >> Like, memory that is faster than RAM (scratch >> pad), or slower (pmem)? Or both? :) > > Both, but note that PMEM is already hard-reserved by default. > Soft-reserved is about a memory range that, for example, an > administrator may want to reserve 100% for a weather simulation where > if even a small amount of memory was stolen for the page cache the > application may not meet its performance targets. It could also be a > memory range that is so slow that only applications with higher > latency tolerances would be prepared to consume it. > > In other words the soft-reserved memory can be used to indicate memory > that is either too precious, or too slow for general purpose OS > allocations. Right, so actually performance-differentiated in any way :) > >> Is it a valid use case to use pmem >> in a hypervisor to back this memory? > > Depends on the pmem. That performance capability is indicated by the > ACPI HMAT, not the EFI soft-reserved designation. > >> 3. There seem to be use cases where "soft-reserved" memory is used via >> DAX. What is an example use case? I assume it's *not* to treat it like >> PMEM but instead e.g., use it as a fast buffer inside applications or >> similar. > > Right, in that weather-simulation example that application could just > mmap /dev/daxX.Y and never worry about contending for the "fast > memory" resource on the platform. Alternatively if that resource needs > to be shared and/or over-commited then kernel memory-management > services are needed and that dax-device can be assigned to kmem. > >> 4. There seem to be use cases where some part of "soft-reserved" memory >> is used via DAX, some other is given to the buddy. What is an example >> use case? Is this really necessary or only some theoretical use case? > > It's as necessary as pmem namespace partitioning, or the inclusion of > dax-kmem upstream in the first place. In that kmem case the motivation > was that some users want a portion of pmem provisioned for storage and > some for volatile usage. The motivation is similar here, platform > firmware can only identify memory attributes on coarse boundaries, > finer grained provisioning decisions are up to the administrator / > platform-owner and the kernel is a just a facilitator of that policy. > >> >> 5. The "provisioned along performance relevant address boundaries." part >> is unclear to me. Can you give an example of how this would look like >> from user space? Like, split that memory in blocks of size X with >> alignment Y and give them to separate applications? > > One example of platform address boundaries are the memory address > ranges that alias in a direct-mapped memory-side-cache. In the > direct-map-cache aliasing may repeat every N GBs where N is the ratio > of far-to-near memory. ("Near memory" == cache "Far memory" == > backing memory). Also refer back to the background in the page > allocator shuffling patches [2]. With this partitioning mechanism you > could, for one example use case, assign different VMs to exclusive > colors in the memory side cache. Interesting, thanks! > > [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e900a918b098 > >> 6. If you add such memory to the buddy, is there any way the system can >> differentiate it from other memory? E.g., via fake/other NUMA nodes? > > Numa node numbers / are how performance differentiated memory ranges > are enumerated. The expectation is that all distinct performance > memory targets have unique ACPI proximity domains and Linux numa node > numbers as a result. Makes sense to me (although it's somehow weird, because memory of the same socket/node would be represented via different NUMA nodes), thanks! > >> Also, can you give examples of how kmem-added memory is represented in >> /proc/iomem for a) pmem and b) soft-resered memory after this series >> (skimming over the patches, I think there is a change for pmem, right?)? > > I don't expect a change. The only difference is the parent resource > will be marked "Soft Reserved" instead of "Persistent Memory". Right, I misread patch #11 while skimming - I thought the device resource would be dropped. > >> I am really wondering if it's the right approach to squeeze this into >> our pmem/nvdimm infrastructure just because it's easy to do. E.g., man >> "ndctl" - "ndctl - Manage "libnvdimm" subsystem devices (Non-volatile >> Memory)" speaks explicitly about non-volatile memory. > > In fact it's not squeezed into PMEM infrastructure. dax-kmem and > device-dax are independent of PMEM. PMEM is one source of potential > device-dax instances, soft-reserved memory is another orthogonal > source. This is why device-dax needs its own userspace policy directed > partitioning mechanism because there is no PMEM to store the > configuration for partitioned higph-bandwidth memory. The userspace > tooling for this mechanism is targeted for a tool called daxctl that > has no PMEM dependencies. Look to Joao's use case that is using this > infrastructure independent of PMEM with manual soft-reservations > specified on the kernel command-line. Thanks for clarifying, I was under the impression we would be reusing libnvdimm to manage that memory. -- Thanks, David / dhildenb From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3E3AC433E1 for ; Fri, 21 Aug 2020 10:15:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A64D4207DA for ; Fri, 21 Aug 2020 10:15:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PZCvgQWW" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A64D4207DA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4720A6B0088; Fri, 21 Aug 2020 06:15:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 447376B0089; Fri, 21 Aug 2020 06:15:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3376B6B008C; Fri, 21 Aug 2020 06:15:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0242.hostedemail.com [216.40.44.242]) by kanga.kvack.org (Postfix) with ESMTP id 1DF0D6B0088 for ; Fri, 21 Aug 2020 06:15:28 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id CF9BF1F10 for ; Fri, 21 Aug 2020 10:15:27 +0000 (UTC) X-FDA: 77174168694.10.chess15_1d067a527038 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id 96B4A16A4B1 for ; Fri, 21 Aug 2020 10:15:27 +0000 (UTC) X-HE-Tag: chess15_1d067a527038 X-Filterd-Recvd-Size: 14465 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Fri, 21 Aug 2020 10:15:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1598004925; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=PmsTrWC7QL0R7Dl8wMRdta4a1oPLZaekFOdsDnhXsJY=; b=PZCvgQWWHNEWMwZwNdpqdF44SZotdvW7gDo1cNgjmaAsSfsAL7Q84dROxAsTG5iuu0/vGN 1Zs2qm0nmEEbUSf3+0T6vo2EoEQoY2d2/JmULxFoHvUSIQC2CeJcePaLZyzjpJy+1lwO8L C3lyuaBvYpcDj63jqStPBRg14Z0yo8c= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-85-UcZwKt8kPgaJ_9wxSbWINA-1; Fri, 21 Aug 2020 06:15:20 -0400 X-MC-Unique: UcZwKt8kPgaJ_9wxSbWINA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A8B2A80733B; Fri, 21 Aug 2020 10:15:14 +0000 (UTC) Received: from [10.36.114.87] (ovpn-114-87.ams2.redhat.com [10.36.114.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1404219C78; Fri, 21 Aug 2020 10:15:03 +0000 (UTC) Subject: Re: [PATCH v4 00/23] device-dax: Support sub-dividing soft-reserved ranges To: Dan Williams Cc: Andrew Morton , Ira Weiny , Ard Biesheuvel , Mike Rapoport , Borislav Petkov , Vishal Verma , David Airlie , Will Deacon , Catalin Marinas , Ard Biesheuvel , Joao Martins , Tom Lendacky , Dave Jiang , "Rafael J. Wysocki" , Jonathan Cameron , Wei Yang , X86 ML , "H. Peter Anvin" , Thomas Gleixner , Greg Kroah-Hartman , Pavel Tatashin , Peter Zijlstra , Ben Skeggs , Benjamin Herrenschmidt , Jason Gunthorpe , Jia He , Ingo Molnar , Dave Hansen , Paul Mackerras , Brice Goglin , Jeff Moyer , Michael Ellerman , "Rafael J. Wysocki" , Daniel Vetter , Andy Lutomirski , "Rafael J. Wysocki" , Linux MM , linux-nvdimm , Linux Kernel Mailing List , Linux ACPI , Maling list - DRI developers References: <159643094279.4062302.17779410714418721328.stgit@dwillia2-desk3.amr.corp.intel.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; prefer-encrypt=mutual; keydata= mQINBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABtCREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT6JAlgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63W5Ag0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAGJAjwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat GmbH Message-ID: <6af3de0d-ffdc-8942-3922-ebaeef20dd63@redhat.com> Date: Fri, 21 Aug 2020 12:15:03 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Rspamd-Queue-Id: 96B4A16A4B1 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >> >> 1. On x86-64, e820 indicates "soft-reserved" memory. This memory is no= t >> automatically used in the buddy during boot, but remains untouched >> (similar to pmem). But as it involves ACPI as well, it could also be >> used on arm64 (-e820), correct? >=20 > Correct, arm64 also gets the EFI support for enumerating memory this > way. However, I would clarify that whether soft-reserved is given to > the buddy allocator by default or not is the kernel's policy choice, > "buddy-by-default" is ok and is what will happen anyways with older > kernels on platforms that enumerate a memory range this way. Is "soft-reserved" then the right terminology for that? It sounds very x86-64/e820 specific. Maybe a compressed for of "performance differentiated memory" might be a better fit to expose to user space, no? >=20 >> 2. Soft-reserved memory is volatile RAM with differing performance >> characteristics ("performance differentiated memory"). What would be >> examples of such memory? >=20 > Likely the most prominent one that drove the creation of the "EFI > Specific Purpose" attribute bit is high-bandwidth memory. One concrete > example of that was a platform called Knights Landing [1] that ended > up shipping firmware that lied to the OS about the latency > characteristics of the memory to try to reverse engineer OS behavior > to not allocate from that memory range by default. With the EFI > attribute firmware performance tables can tell the truth about the > performance characteristics of the memory range *and* indicate that > the OS not use it for general purpose allocations by default. Thanks for clarifying! >=20 > [1]: https://software.intel.com/content/www/us/en/develop/blogs/an-intr= o-to-mcdram-high-bandwidth-memory-on-knights-landing.html >=20 >> Like, memory that is faster than RAM (scratch >> pad), or slower (pmem)? Or both? :) >=20 > Both, but note that PMEM is already hard-reserved by default. > Soft-reserved is about a memory range that, for example, an > administrator may want to reserve 100% for a weather simulation where > if even a small amount of memory was stolen for the page cache the > application may not meet its performance targets. It could also be a > memory range that is so slow that only applications with higher > latency tolerances would be prepared to consume it. >=20 > In other words the soft-reserved memory can be used to indicate memory > that is either too precious, or too slow for general purpose OS > allocations. Right, so actually performance-differentiated in any way :) >=20 >> Is it a valid use case to use pmem >> in a hypervisor to back this memory? >=20 > Depends on the pmem. That performance capability is indicated by the > ACPI HMAT, not the EFI soft-reserved designation. >=20 >> 3. There seem to be use cases where "soft-reserved" memory is used via >> DAX. What is an example use case? I assume it's *not* to treat it like >> PMEM but instead e.g., use it as a fast buffer inside applications or >> similar. >=20 > Right, in that weather-simulation example that application could just > mmap /dev/daxX.Y and never worry about contending for the "fast > memory" resource on the platform. Alternatively if that resource needs > to be shared and/or over-commited then kernel memory-management > services are needed and that dax-device can be assigned to kmem. >=20 >> 4. There seem to be use cases where some part of "soft-reserved" memor= y >> is used via DAX, some other is given to the buddy. What is an example >> use case? Is this really necessary or only some theoretical use case? >=20 > It's as necessary as pmem namespace partitioning, or the inclusion of > dax-kmem upstream in the first place. In that kmem case the motivation > was that some users want a portion of pmem provisioned for storage and > some for volatile usage. The motivation is similar here, platform > firmware can only identify memory attributes on coarse boundaries, > finer grained provisioning decisions are up to the administrator / > platform-owner and the kernel is a just a facilitator of that policy. >=20 >> >> 5. The "provisioned along performance relevant address boundaries." pa= rt >> is unclear to me. Can you give an example of how this would look like >> from user space? Like, split that memory in blocks of size X with >> alignment Y and give them to separate applications? >=20 > One example of platform address boundaries are the memory address > ranges that alias in a direct-mapped memory-side-cache. In the > direct-map-cache aliasing may repeat every N GBs where N is the ratio > of far-to-near memory. ("Near memory" =3D=3D cache "Far memory" =3D=3D > backing memory). Also refer back to the background in the page > allocator shuffling patches [2]. With this partitioning mechanism you > could, for one example use case, assign different VMs to exclusive > colors in the memory side cache. Interesting, thanks! >=20 > [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git= /commit/?id=3De900a918b098 >=20 >> 6. If you add such memory to the buddy, is there any way the system ca= n >> differentiate it from other memory? E.g., via fake/other NUMA nodes? >=20 > Numa node numbers / are how performance differentiated memory ranges > are enumerated. The expectation is that all distinct performance > memory targets have unique ACPI proximity domains and Linux numa node > numbers as a result. Makes sense to me (although it's somehow weird, because memory of the same socket/node would be represented via different NUMA nodes), thanks! >=20 >> Also, can you give examples of how kmem-added memory is represented in >> /proc/iomem for a) pmem and b) soft-resered memory after this series >> (skimming over the patches, I think there is a change for pmem, right?= )? >=20 > I don't expect a change. The only difference is the parent resource > will be marked "Soft Reserved" instead of "Persistent Memory". Right, I misread patch #11 while skimming - I thought the device resource would be dropped. >=20 >> I am really wondering if it's the right approach to squeeze this into >> our pmem/nvdimm infrastructure just because it's easy to do. E.g., man >> "ndctl" - "ndctl - Manage "libnvdimm" subsystem devices (Non-volatile >> Memory)" speaks explicitly about non-volatile memory. >=20 > In fact it's not squeezed into PMEM infrastructure. dax-kmem and > device-dax are independent of PMEM. PMEM is one source of potential > device-dax instances, soft-reserved memory is another orthogonal > source. This is why device-dax needs its own userspace policy directed > partitioning mechanism because there is no PMEM to store the > configuration for partitioned higph-bandwidth memory. The userspace > tooling for this mechanism is targeted for a tool called daxctl that > has no PMEM dependencies. Look to Joao's use case that is using this > infrastructure independent of PMEM with manual soft-reservations > specified on the kernel command-line. Thanks for clarifying, I was under the impression we would be reusing libnvdimm to manage that memory. --=20 Thanks, David / dhildenb From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB175C433E1 for ; Fri, 21 Aug 2020 10:15:33 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8D2A9207BB for ; Fri, 21 Aug 2020 10:15:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CDGs0sAa" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8D2A9207BB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 04DF26EAC7; Fri, 21 Aug 2020 10:15:33 +0000 (UTC) Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [207.211.31.81]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8F3B46EAC7 for ; Fri, 21 Aug 2020 10:15:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1598004930; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=PmsTrWC7QL0R7Dl8wMRdta4a1oPLZaekFOdsDnhXsJY=; b=CDGs0sAacf1B2Wq9S5IA5coGAQGrPZvOAK5qdZnSn51R4u4jVbKHHMqGJFo0NseXXzoFQK iW60Ftz38TYxNkhjFdcgMML1uoZolI3hCmGGz/iVNZk789PGqJ/rr6oaGpO0cvx9TrT2pN 0lA8Xx69XvbyQIXMBLwov8nx75vSDy0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-85-UcZwKt8kPgaJ_9wxSbWINA-1; Fri, 21 Aug 2020 06:15:20 -0400 X-MC-Unique: UcZwKt8kPgaJ_9wxSbWINA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A8B2A80733B; Fri, 21 Aug 2020 10:15:14 +0000 (UTC) Received: from [10.36.114.87] (ovpn-114-87.ams2.redhat.com [10.36.114.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1404219C78; Fri, 21 Aug 2020 10:15:03 +0000 (UTC) Subject: Re: [PATCH v4 00/23] device-dax: Support sub-dividing soft-reserved ranges To: Dan Williams References: <159643094279.4062302.17779410714418721328.stgit@dwillia2-desk3.amr.corp.intel.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; prefer-encrypt=mutual; keydata= mQINBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABtCREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT6JAlgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63W5Ag0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAGJAjwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat GmbH Message-ID: <6af3de0d-ffdc-8942-3922-ebaeef20dd63@redhat.com> Date: Fri, 21 Aug 2020 12:15:03 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Rafael J. Wysocki" , "Rafael J. Wysocki" , David Airlie , Catalin Marinas , Dave Hansen , Maling list - DRI developers , Paul Mackerras , Linux MM , Michael Ellerman , "H. Peter Anvin" , Joao Martins , Will Deacon , Ard Biesheuvel , Dave Jiang , Linux ACPI , linux-nvdimm , Vishal Verma , X86 ML , Mike Rapoport , Peter Zijlstra , Jeff Moyer , Jason Gunthorpe , Ben Skeggs , Tom Lendacky , Pavel Tatashin , Ira Weiny , Borislav Petkov , Andy Lutomirski , Jonathan Cameron , Jia He , Thomas Gleixner , Ingo Molnar , Ard Biesheuvel , Greg Kroah-Hartman , "Rafael J. Wysocki" , Linux Kernel Mailing List , Wei Yang , Brice Goglin , Andrew Morton Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" >> >> 1. On x86-64, e820 indicates "soft-reserved" memory. This memory is not >> automatically used in the buddy during boot, but remains untouched >> (similar to pmem). But as it involves ACPI as well, it could also be >> used on arm64 (-e820), correct? > > Correct, arm64 also gets the EFI support for enumerating memory this > way. However, I would clarify that whether soft-reserved is given to > the buddy allocator by default or not is the kernel's policy choice, > "buddy-by-default" is ok and is what will happen anyways with older > kernels on platforms that enumerate a memory range this way. Is "soft-reserved" then the right terminology for that? It sounds very x86-64/e820 specific. Maybe a compressed for of "performance differentiated memory" might be a better fit to expose to user space, no? > >> 2. Soft-reserved memory is volatile RAM with differing performance >> characteristics ("performance differentiated memory"). What would be >> examples of such memory? > > Likely the most prominent one that drove the creation of the "EFI > Specific Purpose" attribute bit is high-bandwidth memory. One concrete > example of that was a platform called Knights Landing [1] that ended > up shipping firmware that lied to the OS about the latency > characteristics of the memory to try to reverse engineer OS behavior > to not allocate from that memory range by default. With the EFI > attribute firmware performance tables can tell the truth about the > performance characteristics of the memory range *and* indicate that > the OS not use it for general purpose allocations by default. Thanks for clarifying! > > [1]: https://software.intel.com/content/www/us/en/develop/blogs/an-intro-to-mcdram-high-bandwidth-memory-on-knights-landing.html > >> Like, memory that is faster than RAM (scratch >> pad), or slower (pmem)? Or both? :) > > Both, but note that PMEM is already hard-reserved by default. > Soft-reserved is about a memory range that, for example, an > administrator may want to reserve 100% for a weather simulation where > if even a small amount of memory was stolen for the page cache the > application may not meet its performance targets. It could also be a > memory range that is so slow that only applications with higher > latency tolerances would be prepared to consume it. > > In other words the soft-reserved memory can be used to indicate memory > that is either too precious, or too slow for general purpose OS > allocations. Right, so actually performance-differentiated in any way :) > >> Is it a valid use case to use pmem >> in a hypervisor to back this memory? > > Depends on the pmem. That performance capability is indicated by the > ACPI HMAT, not the EFI soft-reserved designation. > >> 3. There seem to be use cases where "soft-reserved" memory is used via >> DAX. What is an example use case? I assume it's *not* to treat it like >> PMEM but instead e.g., use it as a fast buffer inside applications or >> similar. > > Right, in that weather-simulation example that application could just > mmap /dev/daxX.Y and never worry about contending for the "fast > memory" resource on the platform. Alternatively if that resource needs > to be shared and/or over-commited then kernel memory-management > services are needed and that dax-device can be assigned to kmem. > >> 4. There seem to be use cases where some part of "soft-reserved" memory >> is used via DAX, some other is given to the buddy. What is an example >> use case? Is this really necessary or only some theoretical use case? > > It's as necessary as pmem namespace partitioning, or the inclusion of > dax-kmem upstream in the first place. In that kmem case the motivation > was that some users want a portion of pmem provisioned for storage and > some for volatile usage. The motivation is similar here, platform > firmware can only identify memory attributes on coarse boundaries, > finer grained provisioning decisions are up to the administrator / > platform-owner and the kernel is a just a facilitator of that policy. > >> >> 5. The "provisioned along performance relevant address boundaries." part >> is unclear to me. Can you give an example of how this would look like >> from user space? Like, split that memory in blocks of size X with >> alignment Y and give them to separate applications? > > One example of platform address boundaries are the memory address > ranges that alias in a direct-mapped memory-side-cache. In the > direct-map-cache aliasing may repeat every N GBs where N is the ratio > of far-to-near memory. ("Near memory" == cache "Far memory" == > backing memory). Also refer back to the background in the page > allocator shuffling patches [2]. With this partitioning mechanism you > could, for one example use case, assign different VMs to exclusive > colors in the memory side cache. Interesting, thanks! > > [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e900a918b098 > >> 6. If you add such memory to the buddy, is there any way the system can >> differentiate it from other memory? E.g., via fake/other NUMA nodes? > > Numa node numbers / are how performance differentiated memory ranges > are enumerated. The expectation is that all distinct performance > memory targets have unique ACPI proximity domains and Linux numa node > numbers as a result. Makes sense to me (although it's somehow weird, because memory of the same socket/node would be represented via different NUMA nodes), thanks! > >> Also, can you give examples of how kmem-added memory is represented in >> /proc/iomem for a) pmem and b) soft-resered memory after this series >> (skimming over the patches, I think there is a change for pmem, right?)? > > I don't expect a change. The only difference is the parent resource > will be marked "Soft Reserved" instead of "Persistent Memory". Right, I misread patch #11 while skimming - I thought the device resource would be dropped. > >> I am really wondering if it's the right approach to squeeze this into >> our pmem/nvdimm infrastructure just because it's easy to do. E.g., man >> "ndctl" - "ndctl - Manage "libnvdimm" subsystem devices (Non-volatile >> Memory)" speaks explicitly about non-volatile memory. > > In fact it's not squeezed into PMEM infrastructure. dax-kmem and > device-dax are independent of PMEM. PMEM is one source of potential > device-dax instances, soft-reserved memory is another orthogonal > source. This is why device-dax needs its own userspace policy directed > partitioning mechanism because there is no PMEM to store the > configuration for partitioned higph-bandwidth memory. The userspace > tooling for this mechanism is targeted for a tool called daxctl that > has no PMEM dependencies. Look to Joao's use case that is using this > infrastructure independent of PMEM with manual soft-reservations > specified on the kernel command-line. Thanks for clarifying, I was under the impression we would be reusing libnvdimm to manage that memory. -- Thanks, David / dhildenb _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel