linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Tatashin <pasha.tatashin@soleen.com>
To: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>,
	linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Sasha Levin <sashal@kernel.org>,
	Tyler Hicks <tyhicks@linux.microsoft.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Michal Hocko <mhocko@suse.com>,
	Oscar Salvador <osalvador@suse.de>,
	Vlastimil Babka <vbabka@suse.cz>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Jason Gunthorpe <jgg@ziepe.ca>, Marc Zyngier <maz@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Will Deacon <will.deacon@arm.com>,
	James Morse <james.morse@arm.com>,
	James Morris <jmorris@namei.org>
Subject: Re: dax alignment problem on arm64 (and other achitectures)
Date: Fri, 29 Jan 2021 11:24:21 -0500	[thread overview]
Message-ID: <CA+CK2bDJ3hrWoE91L2wpAk+Yu0_=GtYw=4gLDDD7mxs321b_aA@mail.gmail.com> (raw)
In-Reply-To: <92912784-f3a3-b5a5-2d45-4c86ae26315f@redhat.com>

On Fri, Jan 29, 2021 at 8:19 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 29.01.21 03:06, Pavel Tatashin wrote:
> >>> Might be related to the broken custom pfn_valid() implementation for
> >>> ZONE_DEVICE.
> >>>
> >>> https://lkml.kernel.org/r/1608621144-4001-1-git-send-email-anshuman.khandual@arm.com
> >>>
> >>> And essentially ignoring sub-section data in there for now as well (but
> >>> might not be that relevant yet). In addition, this might also be related to
> >>>
> >>> https://lkml.kernel.org/r/161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com
> >>
> >> I will check it, and see what I find. I saw that panic almost a year
> >> ago, things might have changed since then.
> >
> > Hi David,
> >
> > There is no panic anymore, but I also can't offset by 2M anymore, the
> > minimum that works now is 16M, and if alignment is less than 16M
> > creating devdax device fails.
>
> I wonder why we get such different namespace sizes? Where do the
> differences come from? This looks very weird.
>
> >
> > So, I tried the new ARM64 patch that reduces section sizes, and two
> > alignments for pmem: regular 2G alignment, and 2G+16M alignment.
> > (subtracted 16M from the bottom)
> >
> > ***** 4K page, 6G RAM, 2G PRAM  *****
> > BOOT:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-23fffffff : namespace0.0
> > DEVDAX:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-1c21fffff : namespace0.0
> > 1c2200000-23fffffff : dax0.0
> > HOTPLUG:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-1c21fffff : namespace0.0
> > 1c8000000-23fffffff : dax0.0
> >    1c8000000-23fffffff : System RAM (kmem)               128M Wasted (Expected)
>
> The namespace spans 34MB??
>
> >
> > ***** 4K page, 6G-16M RAM, 2G+16M PRAM  *****
> > BOOT:
> > 40000000-1beffffff : System RAM
> > 1bf000000-23fffffff : namespace0.0
> > DEVDAX:
> > 40000000-1beffffff : System RAM
> > 1bf000000-1c11fffff : namespace0.0
> > 1c1200000-23fffffff : dax0.0
> > HOTPLUG:
> > 40000000-1beffffff : System RAM
> > 1bf000000-1c11fffff : namespace0.0
> > 1c8000000-23fffffff : dax0.0
> >    1c8000000-23fffffff : System RAM (kmem)               144M Wasted (????)
>
> The namespace spans 34MB??

Right, this seems like a bug

>
> >
> > ***** 64K page, 6G RAM, 2G PRAM  *****
> > BOOT:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-23fffffff : namespace0.0
> > DEVDAX:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-1dfffffff : namespace0.0
> > 1e0000000-23fffffff : dax0.0
> > HOTPLUG:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-1dfffffff : namespace0.0
>
> The namespace spans 512MB ?!? What?

This is because section size is 512M with 64K pages.

>
> > 1e0000000-23fffffff : dax0.0
> >    1e0000000-23fffffff : System RAM (kmem)               512M Wasted (Expected)
> >
> > ***** 64K page, 6G-16M RAM, 2G+16M PRAM  *****
> > BOOT:
> > 40000000-1beffffff : System RAM
> > 1bf000000-23fffffff : namespace0.0
> > DEVDAX:
> > 40000000-1beffffff : System RAM
> > 1bf000000-1bf3fffff : namespace0.0
> > 1bf400000-23fffffff : dax0.0
> > HOTPLUG:
> > 40000000-1beffffff : System RAM
> > 1bf000000-1bf3fffff : namespace0.0
>
> The namespace now consumes 4MB ?!?
>
> > 1c0000000-23fffffff : dax0.0
> >    1c0000000-23fffffff : System RAM (kmem)               16M Wasted (Optimal)
>
> Good :) I guess more optimal would be 2MB/0MB :)

Agree, but for the offset 16M this is optimal, because 16M is smaller
than section size.

>
> >
> > In all three cases only System RAM, namespace0.0, and dax0.0 were
> > printed from /proc/iomem.
> > BOOT    content of iomem right after boot
> > DEVDAX  content of iomem after devdax is created
> >             ndctl create-namespace --mode devdax  -e namespace0.0"
> > HOTPLUG content of imem after dax0.0 is hotplugged:
> >             echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
> >             echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
> >
> >
> > The most surprising part is why with 4K pages and 16M offset 144M is
> > wasted? For whatever reason, when devdax is created 34 goes wasted to
> > the label? Something is wrong here.. However, I am happy with 64K
> > pages result, and that only 16M is wasted, of course optimally, we
> > should be using any memory here, but it is still much better than what
> > we have now.
>
> Definitely, but we should try figuring out what's going on here. I
> assume on x86-64 it behaves differently?

Yes, we should root cause. I highly suspect that there is somewhere
alignment miscalculations happen that cause this memory waste with the
offset 16M. I am also not sure why the 2M label size was increased,
and  why 16M is now an alignment requirement.

I tested on x86, and got pretty much the same results as on ARM64: 2M
offset is not allowed anymore 16M minimum, and even with 16M offset,
144M is wasted. Here is full QEMU command if anyone wants to repro it:


KERNEL_PARAM='console=ttyS0 ip=dhcp'
KERNEL_PARAM+=' memmap=2G!8G'
#KERNEL_PARAM+=' memmap=2064M!8176M'

qemu-system-x86_64
                                 \
        -m 8G -smp 1
                                 \
        -machine q35
                                 \
        -nographic
                                 \
        -enable-kvm
                                 \
        -kernel pmem/native/arch/x86/boot/bzImage
                                 \
        -initrd
../poky/build/tmp/deploy/images/qemux86-64/core-image-minimal-qemux86-64.cpio.gz
       \
        -chardev stdio,id=console,signal=off,mux=on
                                 \
        -mon chardev=console
                                 \
        -serial chardev:console
                                 \
        -netdev user,hostfwd=tcp::5000-:22,id=netdev0
                                 \
        -device virtio-net-pci,netdev=netdev0
                                 \
        -append "$KERNEL_PARAM"

Also, I am using current master branch tip for ndctl command:
root@qemux86-64:~# ndctl --version
71.2.gea014c0

***** 4K page, 6G RAM, 2G PRAM:  kernel parameter memmap=2G!8G *****
BOOT:
100000000-1ffffffff : System RAM
200000000-27fffffff : Persistent Memory (legacy)
  200000000-27fffffff : namespace0.0

DEVDAX:
100000000-1ffffffff : System RAM
200000000-27fffffff : Persistent Memory (legacy)
  200000000-2021fffff : namespace0.0
  202200000-27fffffff : dax0.0

HOTPLUG:
100000000-1ffffffff : System RAM
200000000-27fffffff : Persistent Memory (legacy)
  200000000-2021fffff : namespace0.0
  208000000-27fffffff : dax0.0
    208000000-27fffffff : System RAM (kmem)        (128M Wasted)

***** 4K page, 6G-16M RAM, 2G+16M PRAM: kernel parameter
memmap=2064M!8176M *****
BOOT:
100000000-1feffffff : System RAM
1ff000000-27fffffff : Persistent Memory (legacy)
  1ff000000-27fffffff : namespace0.0

DEVDAX:
100000000-1feffffff : System RAM
1ff000000-27fffffff : Persistent Memory (legacy)
  1ff000000-2011fffff : namespace0.0
  201200000-27fffffff : dax0.0

HOTPLUG:
100000000-1feffffff : System RAM
1ff000000-27fffffff : Persistent Memory (legacy)
  1ff000000-2011fffff : namespace0.0
  208000000-27fffffff : dax0.0
    208000000-27fffffff : System RAM (kmem)  (144M Wasted)

The least amount of wasted memory I can get on x86 with this
experiment is with offset that is larger than 34M, and 16M aligned:
48M: memmap=2096M!8144M

root@qemux86-64:~# cat /proc/iomem | grep 'dax\|namespace\|System\|Pers'
100000000-1fcffffff : System RAM
1fd000000-27fffffff : Persistent Memory (legacy)
  1fd000000-1ff1fffff : namespace0.0
  200000000-27fffffff : dax0.0
    200000000-27fffffff : System RAM (kmem) (48M Wasted)

Pasha


>
> Thanks
>
>
> --
> Thanks,
>
> David / dhildenb
>

  parent reply	other threads:[~2021-01-29 16:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-27 20:43 dax alignment problem on arm64 (and other achitectures) Pavel Tatashin
2021-01-27 21:09 ` David Hildenbrand
2021-01-27 21:49   ` Pavel Tatashin
2021-01-27 22:18     ` David Hildenbrand
2021-01-27 23:33       ` Pavel Tatashin
2021-01-28 15:03         ` David Hildenbrand
2021-01-29  2:06           ` Pavel Tatashin
     [not found]             ` <92912784-f3a3-b5a5-2d45-4c86ae26315f@redhat.com>
2021-01-29 16:24               ` Pavel Tatashin [this message]
2021-01-29 19:06                 ` Pavel Tatashin
2021-01-29 19:12                   ` Pavel Tatashin
2021-01-29 19:41                     ` Pavel Tatashin
2021-01-29  2:55     ` Dan Williams
     [not found]       ` <CA+CK2bBJmnTVF8ZfwLyLqgjgo63G-rVQTYwUqgmx8wXFtRH9-g@mail.gmail.com>
     [not found]         ` <c5fe4330-55e2-48b8-1961-ca9eb879354e@oracle.com>
2021-01-29 16:32           ` Pavel Tatashin
2021-01-29 17:22             ` Joao Martins
2021-01-29 20:26         ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+CK2bDJ3hrWoE91L2wpAk+Yu0_=GtYw=4gLDDD7mxs321b_aA@mail.gmail.com' \
    --to=pasha.tatashin@soleen.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=james.morse@arm.com \
    --cc=jgg@ziepe.ca \
    --cc=jmorris@namei.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maz@kernel.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=sashal@kernel.org \
    --cc=tyhicks@linux.microsoft.com \
    --cc=vbabka@suse.cz \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).