LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Wei Yang <richard.weiyang@gmail.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Wei Yang <richard.weiyang@gmail.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Tejun Heo <tj@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [Patch V2 2/2] x86/mm/numa: remove the numa_nodemask_from_meminfo()
Date: Tue, 11 Apr 2017 00:39:14 +0800
Message-ID: <20170410163914.GA4404@WeideMacBook-Pro.local> (raw)
In-Reply-To: <20170410124320.fq5sw4lt2imztiyl@pd.tnic>


[-- Attachment #1: Type: text/plain, Size: 5881 bytes --]

On Mon, Apr 10, 2017 at 02:43:20PM +0200, Borislav Petkov wrote:
>On Sun, Apr 09, 2017 at 11:12:14AM +0800, Wei Yang wrote:
>> Oops, sorry to bring in the regression with my cleanup.
>> I haven't noticed there is a kernel command line "numa=fake", which
>> is the cause of the crash I think.
>
>Of course it is, didn't you see my debugging upthread?
>
>> So from my understanding, I am goting to do these tests:
>> 
>> 1. all fake numa scenarios with Kirill's qemu command line
>
>It is enough if you boot the kernel with "numa=fake..."
>
>> 2. Real numa scenarios with following qemu command option
>
>Not qemu command option but a kernel cmdline option.
>
>> 3. Baremetal
>> 
>> One more question, on the baremetal mathine, I can't change the
>> numa configuration, so there would be only one case. Do you have
>> some specific requirement?
>
>numa=fake on baremetal too.
>
>> Well, if I missed something, just let me know :-)
>> 
>> > Qemu can emulate real numa too, for example you can boot with:
>> >
>> > -smp 64 \
>> > -numa node,nodeid=0,cpus=1-8 \
>> > -numa node,nodeid=1,cpus=9-16 \
>> > -numa node,nodeid=2,cpus=17-24 \
>> > -numa node,nodeid=3,cpus=25-32 \
>> > -numa node,nodeid=4,cpus=0 \
>> > -numa node,nodeid=4,cpus=33-39 \
>> > -numa node,nodeid=5,cpus=40-47 \
>> > -numa node,nodeid=6,cpus=48-55 \
>> > -numa node,nodeid=7,cpus=56-63
>
>Also, do this in kvm. kvm can emulate a lot of numa configurations, do
>experiment with those too.
>
>Basically, try to break your "cleanup". Stuff one should do for every
>patch one sends anyway.

Hi, Borislav

I have tried several test combinations of the fake numa. The result shows good.

The test result marked as P (Passed), means the system boots up and simple
kernel build test succeed.

# test matrix and result

## Qemu

With qemu, I have tried [phys_node, emu_node] = [(1, 4), (0, 2, 4, 8)]

  +----------------+--------+--------+
  |      phys_node |   1    |   4    |
  |emu_node        |        |        |
  +----------------+--------+--------+
  |        0       |   P    |   P    |
  +----------------+--------+--------+
  |        2       |   P    |   P    |
  +----------------+--------+--------+
  |        4       |   P    |   P    |
  +----------------+--------+--------+
  |        8       |   P    |   P    |
  +----------------+--------+--------+

phys_node is emulated with qemu command line:
    
    "-numa node,nodeid=0,cpus=1-2 -numa node,nodeid=1,cpus=3-4 -numa
    node,nodeid=2,cpus=0 -numa node,nodeid=2,cpus=5 -numa
    node,nodeid=3,cpus=6-7"

emu_node is emulated with kernel command line:

    "numa=fake=N"

## Baremetal

On my machine, it only has one numa node, so I could just verify phys_node
with 1.

  +----------------+--------+
  |      phys_node |   1    |
  |emu_node        |        |
  +----------------+--------+
  |        0       |   P    |
  +----------------+--------+
  |        2       |   P    |
  +----------------+--------+
  |        4       |   P    |
  +----------------+--------+
  |        8       |   P    |
  +----------------+--------+


emu_node is emulated with kernel command line:

    "numa=fake=N"

# Other things I observed

Generally, in qemu guest, every thing looks good, while there are two things I
saw in baremetal machine.

At first I want to emphasize, I saw the same behavior with/without my
"cleanup".

## only 3 node when fake=4

[    0.000000] Faking a node at [mem 0x0000000000000000-0x000000022f5fffff]
[    0.000000] Faking node 0 at [mem 0x0000000000000000-0x000000007fffffff]
(2048MB)
[    0.000000] Faking node 1 at [mem 0x0000000080000000-0x0000000133ffffff]
(2880MB)
[    0.000000] Faking node 2 at [mem 0x0000000134000000-0x000000022f5fffff]
(4022MB)
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009cfff]
[    0.000000]   node   0: [mem 0x0000000000100000-0x000000007fffffff]
[    0.000000]   node   1: [mem 0x0000000080000000-0x00000000ba5b1fff]
[    0.000000]   node   1: [mem 0x00000000ba5b9000-0x00000000bad8dfff]
[    0.000000]   node   1: [mem 0x00000000bafb6000-0x00000000ca8a1fff]
[    0.000000]   node   1: [mem 0x00000000ca93a000-0x00000000ca977fff]
[    0.000000]   node   1: [mem 0x00000000cafff000-0x00000000caffffff]
[    0.000000]   node   1: [mem 0x0000000100000000-0x0000000133ffffff]
[    0.000000]   node   2: [mem 0x0000000134000000-0x000000022f5fffff]

## some warning

I don't see these two warnings without "numa=fake=N".

[    0.004000] sched: CPU #1's llc-sibling CPU #0 is not on the same node!  [node: 1 != 0]. Ignoring dependency.
[    0.004000] ------------[ cut here ]------------
[    0.004000] WARNING: CPU: 1 PID: 0 at arch/x86/kernel/smpboot.c:424 topology_sane.isra.5+0x6c/0x70

[    8.594469] sysfs: cannot create duplicate filename '/devices/platform/coretemp.0/hwmon/hwmon2/temp2_label'
[    8.594478] ------------[ cut here ]------------
[    8.594482] WARNING: CPU: 4 PID: 34 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x56/0x70

# Some thoughts on the code

After went throught the numa_emulation(), I suggest to restructure the
numa_nodes_parsed based on the emulated nodes, instead of set
numa_nodes_parsed directly in emu_setup_memblk().

Two cases in my mind, which are not friendly:
1. split_nodes_size_interleave/split_nodes_interleave() may fail or the
following procedure may fail.
2. fake node may be less than physcial nodes

Both of them may leads to a inaccurate numa_nodes_parsed. So I have a patch to
restructure it from emulated node info.

Will send it soon.

>
>-- 
>Regards/Gruss,
>    Boris.
>
>Good mailing practices for 400: avoid top-posting and trim the reply.

-- 
Wei Yang
Help you, Help me

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply index

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-14  3:08 [Patch V2 1/2] x86/mm/numa: trivial fix on typo and error message Wei Yang
2017-03-14  3:08 ` [Patch V2 2/2] x86/mm/numa: remove the numa_nodemask_from_meminfo() Wei Yang
2017-04-03  9:58   ` [tip:x86/mm] x86/mm/numa: Remove numa_nodemask_from_meminfo() tip-bot for Wei Yang
2017-04-06 12:44   ` [Patch V2 2/2] x86/mm/numa: remove the numa_nodemask_from_meminfo() Kirill A. Shutemov
2017-04-06 14:59     ` Borislav Petkov
2017-04-06 15:42       ` Kirill A. Shutemov
2017-04-06 18:01         ` Borislav Petkov
2017-04-06 18:21           ` Kirill A. Shutemov
2017-04-06 18:48             ` Borislav Petkov
2017-04-09  3:12               ` Wei Yang
2017-04-10 12:43                 ` Borislav Petkov
2017-04-10 16:39                   ` Wei Yang [this message]
2017-04-03  9:57 ` [tip:x86/mm] x86/mm/numa: Improve alloc_node_data() error path message tip-bot for Wei Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170410163914.GA4404@WeideMacBook-Pro.local \
    --to=richard.weiyang@gmail.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git