From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B28B3C433DB for ; Fri, 29 Jan 2021 16:25:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1CEF864E04 for ; Fri, 29 Jan 2021 16:25:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1CEF864E04 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=soleen.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2683A8D0002; Fri, 29 Jan 2021 11:25:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 219DF8D0001; Fri, 29 Jan 2021 11:25:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0BBAB8D0002; Fri, 29 Jan 2021 11:25:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0161.hostedemail.com [216.40.44.161]) by kanga.kvack.org (Postfix) with ESMTP id E02A98D0001 for ; Fri, 29 Jan 2021 11:24:59 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 9D034824999B for ; Fri, 29 Jan 2021 16:24:59 +0000 (UTC) X-FDA: 77759336718.10.route77_4809c26275a9 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id 73ED9169F9A for ; Fri, 29 Jan 2021 16:24:59 +0000 (UTC) X-HE-Tag: route77_4809c26275a9 X-Filterd-Recvd-Size: 11476 Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Fri, 29 Jan 2021 16:24:58 +0000 (UTC) Received: by mail-ed1-f50.google.com with SMTP id d22so11205407edy.1 for ; Fri, 29 Jan 2021 08:24:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WgbUS6odp3tj+goIhgQw3f/xZbTeZ2r3qyGlfJ7m8TU=; b=ccua4vShtocvMsEuZvbjdmgzZDTi70GMYIAFwtjt30X2YiUZbWMbPmpK9Lzzwz82vl UfwXG49vmF76e75EaWdO92xxlEu1zB4cU4Puv2v38mbdiBep9oFIFikJdeZ4HR37Q3ck Y9iFdKlacDFHORaf5tGTj953LRpjVJlvlV9jUVFk+UmKeBc93ctlj5rg0HgxAbxmex9e WwzUWDT22bGJglMEND3VLIro97FLP4iBvs1uyZbUIiNqdaiQuHd0fcWupHV9B+I8dlez a6WbxQqgqCwWQztxJk8NpXoTAJ9mm1f+E+V+FiiEc6//s4iSOwNk1+0Uq/AjqfAr7RHf GoCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WgbUS6odp3tj+goIhgQw3f/xZbTeZ2r3qyGlfJ7m8TU=; b=ou3Ir2ITEwXTHutyTg8JP1ePDVhWc7gpTtYbATtGNNjZrAdX9Fo4iGsA5mUFjz5BOX jjS34grrEzzbkFxWvauuuQYbLq2k8UErPLhQVx3WZ1X6f59635zdIA8SH0Hk6+ohPzsB AtqVh8MtMV+p7Ihj0a75Z+0GeBRD/mtgRze4g8Y5VouwhL7V16oKBu6aVnmNGLFxsxKS 8fpzcWn+6PYAjhfLsGAwNj9cPUXVhylLnJS38xrcrMsfLhuEuK6C95KEWy9uZOCOZNcA XrNWrTG9JhbJBmarFONWJzq6g+vfM8Ie4mJ69X91tdMu7wayTj+b4NC/gspaQNTbwvYE Zqug== X-Gm-Message-State: AOAM5311d2u9GrJ7eYlJ3fCyBmI9Wrudd3dJpFaBxrHzKqK3ziPIme67 6RvLBeRs7Eqa8dq8hvYVOqfEKV5MYjnTS1TYwPeGpw== X-Google-Smtp-Source: ABdhPJzlVo3gqgD6q1EibAVwmgpXZ+wveOABr/j0eki0wDQUDL/lZn2MUANoEv0UKRf5T+zOjKqTvzDyOQq7imTF35k= X-Received: by 2002:a05:6402:304e:: with SMTP id bu14mr5854148edb.60.1611937497294; Fri, 29 Jan 2021 08:24:57 -0800 (PST) MIME-Version: 1.0 References: <8c2b75fe-a3e5-8eff-7f37-5d23c7ad9742@redhat.com> <94797c92-cd90-8a65-b879-0bb5f12b9fc5@redhat.com> <92912784-f3a3-b5a5-2d45-4c86ae26315f@redhat.com> In-Reply-To: <92912784-f3a3-b5a5-2d45-4c86ae26315f@redhat.com> From: Pavel Tatashin Date: Fri, 29 Jan 2021 11:24:21 -0500 Message-ID: Subject: Re: dax alignment problem on arm64 (and other achitectures) To: David Hildenbrand Cc: Anshuman Khandual , linux-mm , LKML , Sasha Levin , Tyler Hicks , Andrew Morton , Dan Williams , Michal Hocko , Oscar Salvador , Vlastimil Babka , Joonsoo Kim , Jason Gunthorpe , Marc Zyngier , Linux ARM , Will Deacon , James Morse , James Morris Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jan 29, 2021 at 8:19 AM David Hildenbrand wrote: > > On 29.01.21 03:06, Pavel Tatashin wrote: > >>> Might be related to the broken custom pfn_valid() implementation for > >>> ZONE_DEVICE. > >>> > >>> https://lkml.kernel.org/r/1608621144-4001-1-git-send-email-anshuman.khandual@arm.com > >>> > >>> And essentially ignoring sub-section data in there for now as well (but > >>> might not be that relevant yet). In addition, this might also be related to > >>> > >>> https://lkml.kernel.org/r/161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com > >> > >> I will check it, and see what I find. I saw that panic almost a year > >> ago, things might have changed since then. > > > > Hi David, > > > > There is no panic anymore, but I also can't offset by 2M anymore, the > > minimum that works now is 16M, and if alignment is less than 16M > > creating devdax device fails. > > I wonder why we get such different namespace sizes? Where do the > differences come from? This looks very weird. > > > > > So, I tried the new ARM64 patch that reduces section sizes, and two > > alignments for pmem: regular 2G alignment, and 2G+16M alignment. > > (subtracted 16M from the bottom) > > > > ***** 4K page, 6G RAM, 2G PRAM ***** > > BOOT: > > 40000000-1bfffffff : System RAM > > 1c0000000-23fffffff : namespace0.0 > > DEVDAX: > > 40000000-1bfffffff : System RAM > > 1c0000000-1c21fffff : namespace0.0 > > 1c2200000-23fffffff : dax0.0 > > HOTPLUG: > > 40000000-1bfffffff : System RAM > > 1c0000000-1c21fffff : namespace0.0 > > 1c8000000-23fffffff : dax0.0 > > 1c8000000-23fffffff : System RAM (kmem) 128M Wasted (Expected) > > The namespace spans 34MB?? > > > > > ***** 4K page, 6G-16M RAM, 2G+16M PRAM ***** > > BOOT: > > 40000000-1beffffff : System RAM > > 1bf000000-23fffffff : namespace0.0 > > DEVDAX: > > 40000000-1beffffff : System RAM > > 1bf000000-1c11fffff : namespace0.0 > > 1c1200000-23fffffff : dax0.0 > > HOTPLUG: > > 40000000-1beffffff : System RAM > > 1bf000000-1c11fffff : namespace0.0 > > 1c8000000-23fffffff : dax0.0 > > 1c8000000-23fffffff : System RAM (kmem) 144M Wasted (????) > > The namespace spans 34MB?? Right, this seems like a bug > > > > > ***** 64K page, 6G RAM, 2G PRAM ***** > > BOOT: > > 40000000-1bfffffff : System RAM > > 1c0000000-23fffffff : namespace0.0 > > DEVDAX: > > 40000000-1bfffffff : System RAM > > 1c0000000-1dfffffff : namespace0.0 > > 1e0000000-23fffffff : dax0.0 > > HOTPLUG: > > 40000000-1bfffffff : System RAM > > 1c0000000-1dfffffff : namespace0.0 > > The namespace spans 512MB ?!? What? This is because section size is 512M with 64K pages. > > > 1e0000000-23fffffff : dax0.0 > > 1e0000000-23fffffff : System RAM (kmem) 512M Wasted (Expected) > > > > ***** 64K page, 6G-16M RAM, 2G+16M PRAM ***** > > BOOT: > > 40000000-1beffffff : System RAM > > 1bf000000-23fffffff : namespace0.0 > > DEVDAX: > > 40000000-1beffffff : System RAM > > 1bf000000-1bf3fffff : namespace0.0 > > 1bf400000-23fffffff : dax0.0 > > HOTPLUG: > > 40000000-1beffffff : System RAM > > 1bf000000-1bf3fffff : namespace0.0 > > The namespace now consumes 4MB ?!? > > > 1c0000000-23fffffff : dax0.0 > > 1c0000000-23fffffff : System RAM (kmem) 16M Wasted (Optimal) > > Good :) I guess more optimal would be 2MB/0MB :) Agree, but for the offset 16M this is optimal, because 16M is smaller than section size. > > > > > In all three cases only System RAM, namespace0.0, and dax0.0 were > > printed from /proc/iomem. > > BOOT content of iomem right after boot > > DEVDAX content of iomem after devdax is created > > ndctl create-namespace --mode devdax -e namespace0.0" > > HOTPLUG content of imem after dax0.0 is hotplugged: > > echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind > > echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id > > > > > > The most surprising part is why with 4K pages and 16M offset 144M is > > wasted? For whatever reason, when devdax is created 34 goes wasted to > > the label? Something is wrong here.. However, I am happy with 64K > > pages result, and that only 16M is wasted, of course optimally, we > > should be using any memory here, but it is still much better than what > > we have now. > > Definitely, but we should try figuring out what's going on here. I > assume on x86-64 it behaves differently? Yes, we should root cause. I highly suspect that there is somewhere alignment miscalculations happen that cause this memory waste with the offset 16M. I am also not sure why the 2M label size was increased, and why 16M is now an alignment requirement. I tested on x86, and got pretty much the same results as on ARM64: 2M offset is not allowed anymore 16M minimum, and even with 16M offset, 144M is wasted. Here is full QEMU command if anyone wants to repro it: KERNEL_PARAM='console=ttyS0 ip=dhcp' KERNEL_PARAM+=' memmap=2G!8G' #KERNEL_PARAM+=' memmap=2064M!8176M' qemu-system-x86_64 \ -m 8G -smp 1 \ -machine q35 \ -nographic \ -enable-kvm \ -kernel pmem/native/arch/x86/boot/bzImage \ -initrd ../poky/build/tmp/deploy/images/qemux86-64/core-image-minimal-qemux86-64.cpio.gz \ -chardev stdio,id=console,signal=off,mux=on \ -mon chardev=console \ -serial chardev:console \ -netdev user,hostfwd=tcp::5000-:22,id=netdev0 \ -device virtio-net-pci,netdev=netdev0 \ -append "$KERNEL_PARAM" Also, I am using current master branch tip for ndctl command: root@qemux86-64:~# ndctl --version 71.2.gea014c0 ***** 4K page, 6G RAM, 2G PRAM: kernel parameter memmap=2G!8G ***** BOOT: 100000000-1ffffffff : System RAM 200000000-27fffffff : Persistent Memory (legacy) 200000000-27fffffff : namespace0.0 DEVDAX: 100000000-1ffffffff : System RAM 200000000-27fffffff : Persistent Memory (legacy) 200000000-2021fffff : namespace0.0 202200000-27fffffff : dax0.0 HOTPLUG: 100000000-1ffffffff : System RAM 200000000-27fffffff : Persistent Memory (legacy) 200000000-2021fffff : namespace0.0 208000000-27fffffff : dax0.0 208000000-27fffffff : System RAM (kmem) (128M Wasted) ***** 4K page, 6G-16M RAM, 2G+16M PRAM: kernel parameter memmap=2064M!8176M ***** BOOT: 100000000-1feffffff : System RAM 1ff000000-27fffffff : Persistent Memory (legacy) 1ff000000-27fffffff : namespace0.0 DEVDAX: 100000000-1feffffff : System RAM 1ff000000-27fffffff : Persistent Memory (legacy) 1ff000000-2011fffff : namespace0.0 201200000-27fffffff : dax0.0 HOTPLUG: 100000000-1feffffff : System RAM 1ff000000-27fffffff : Persistent Memory (legacy) 1ff000000-2011fffff : namespace0.0 208000000-27fffffff : dax0.0 208000000-27fffffff : System RAM (kmem) (144M Wasted) The least amount of wasted memory I can get on x86 with this experiment is with offset that is larger than 34M, and 16M aligned: 48M: memmap=2096M!8144M root@qemux86-64:~# cat /proc/iomem | grep 'dax\|namespace\|System\|Pers' 100000000-1fcffffff : System RAM 1fd000000-27fffffff : Persistent Memory (legacy) 1fd000000-1ff1fffff : namespace0.0 200000000-27fffffff : dax0.0 200000000-27fffffff : System RAM (kmem) (48M Wasted) Pasha > > Thanks > > > -- > Thanks, > > David / dhildenb >