From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA179C0502C for ; Mon, 29 Aug 2022 16:26:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230414AbiH2Q0T (ORCPT ); Mon, 29 Aug 2022 12:26:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33070 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230435AbiH2Q0N (ORCPT ); Mon, 29 Aug 2022 12:26:13 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA7567C1D9; Mon, 29 Aug 2022 09:26:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1661790371; x=1693326371; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=k1Bydfd+WyOURaZCkIr7Wq133wz3qjLUnS9McjzhzJY=; b=KhDSx0LbgD4w16GLVGrsvItZ1EQg7hGfDcG948YJ8Xz8d9J/Gf9aFaYc gXHmDAPgty/hkwS11QbKJIZd+pDub9fSKfeF8CxVfxZgBpgRIW45jVXB/ 14cKQNoWvSwGzb3bY1yJyKrPmoK+KEqxf2mp4Dq2IIcsTo6GJ0Dse1E8q 16SGsBn5ToVjXzAA3D0qgIhQg7ZrQvl8zf/vpcX7JDRigy18+UWEX0tx6 MubULz0q7ImuA9VVYNN6+A2Q/45qBVx1ijUQg33tFIywWsgmyzNOHGaFG HKwa4815ecdMBoNyECDagszi/9wVojgPKiU7kj26oCfrxc6LEDHf3SMGi g==; X-IronPort-AV: E=McAfee;i="6500,9779,10454"; a="295714034" X-IronPort-AV: E=Sophos;i="5.93,272,1654585200"; d="scan'208";a="295714034" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Aug 2022 09:19:31 -0700 X-IronPort-AV: E=Sophos;i="5.93,272,1654585200"; d="scan'208";a="611342635" Received: from rlacadex-mobl.amr.corp.intel.com (HELO [10.209.116.122]) ([10.209.116.122]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Aug 2022 09:19:30 -0700 Message-ID: <984e07ed-914f-93ca-a141-3fc8677878e0@intel.com> Date: Mon, 29 Aug 2022 09:19:27 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCHv7 02/14] mm: Add support for unaccepted memory Content-Language: en-US To: Dionna Amalie Glaze , Tom Lendacky Cc: Mel Gorman , Vlastimil Babka , "Kirill A. Shutemov" , Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel , Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Dario Faggioli , Mike Rapoport , David Hildenbrand , Marcelo Cerri , tim.gardner@canonical.com, Khalid ElMously , philip.cox@canonical.com, the arch/x86 maintainers , Linux Memory Management List , linux-coco@lists.linux.dev, linux-efi , LKML , Mike Rapoport References: <20220614120231.48165-1-kirill.shutemov@linux.intel.com> <20220614120231.48165-3-kirill.shutemov@linux.intel.com> <8cf143e7-2b62-1a1e-de84-e3dcc6c027a4@suse.cz> <20220810141959.ictqchz7josyd7pt@techsingularity.net> <2981e25e-9cda-518a-9750-b8694f2356b5@amd.com> From: Dave Hansen In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/29/22 09:02, Dionna Amalie Glaze wrote: >>> The stack track is in mm/page_alloc.c. I've done a little >>> investigation, but I can't account for why there's a hard cutoff of >>> correctness at 256GB >>> >>> [ 0.065563] RIP: 0010:memmap_init_range+0x108/0x173 >>> [ 0.066309] Code: 77 16 f6 42 10 02 74 10 48 03 42 08 48 c1 e8 0c >>> 48 89 c3 e9 3a ff ff ff 48 89 df 48 c1 e7 06 48 03 3d d9 a2 66 ff 48 >>> 8d 47 08 47 34 01 00 00 00 48 c7 47 38 00 00 00 00 c7 47 30 ff ff >>> ff ff >>> [ 0.069108] RSP: 0000:ffffffffad603dc8 EFLAGS: 00010082 ORIG_RAX: >>> 0000000000000404 >>> [ 0.070193] RAX: ffffdba740000048 RBX: 0000000000000001 RCX: 0000000000000000 >>> [ 0.071170] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffdba740000040 >>> [ 0.072224] RBP: 0000000000000000 R08: 0000000000001000 R09: 0000000000000000 >>> [ 0.073283] R10: 0000000000000001 R11: ffffffffad645c60 R12: 0000000000000000 >>> [ 0.074304] R13: 00000000000000a0 R14: 0000000000000000 R15: 0000000000000000 >>> [ 0.075285] FS: 0000000000000000(0000) GS:ffffffffadd6c000(0000) >>> knlGS:0000000000000000 >>> [ 0.076365] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 0.077194] CR2: ffffdba740000074 CR3: 0008001ee3a0c000 CR4: 00000000000606b0 >>> [ 0.078209] Call Trace: >>> [ 0.078524] >>> [ 0.078887] ? free_area_init+0x5c1/0x66c >>> [ 0.079417] ? zone_sizes_init+0x52/0x6c >>> [ 0.079934] ? setup_arch+0xa55/0xb6d >>> [ 0.080417] ? start_kernel+0x64/0x65a >>> [ 0.080897] ? secondary_startup_64_no_verify+0xd6/0xdb >>> [ 0.081620] >> Note that there is a bug in Brijesh's version of the patch and it will >> almost exclusively use the MSR protocol. Please try the version of the >> patch that I recently sent up based on the current unaccepted memory tree >> from Kirill. >> > I've now tested this patch set with Tom's new patch set, and it > appears to be that the problem with 256GB is more likely to be due to > this unaccepted memory patch set rather than something AMD-specific. > > Kirill, do you see any problems with 256GB on TDX? It seems there is > some unaccepted memory getting touched in memmap_init_range when the > VM's memory size is at least 256GB It really helps this kind of stuff if you can post the *actual* error. I assume this was a page fault, so there should have been some other stuff before the RIP:... Another thing that's really nice is to do the disassembly of the "Code:" or share disassembly of memmap_init_range. Even nicer would be to give an faddr2line of the RIP value and track down which C code was actually at fault. It's *possible* to look into these things from what you posted, but it's just slow. I'm sure Kirill will appreciate the help.