All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Tony Luck <tony.luck@gmail.com>,
	Daniel J Blueman <daniel@numascale.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Steffen Persvold <sp@numascale.com>
Subject: Re: [PATCH v4 4/4] Use 2GB memory block size on large-memory x86-64 systems
Date: Wed, 26 Aug 2015 13:49:23 -0700	[thread overview]
Message-ID: <20150826134923.9d8fad571de4be237a84ff50@linux-foundation.org> (raw)
In-Reply-To: <CAE9FiQXFHWNR41yEDjj=tAteanbngXQNQcwZvvVVexRmAQTTeA@mail.gmail.com>

On Tue, 25 Aug 2015 22:42:05 -0700 Yinghai Lu <yinghai@kernel.org> wrote:

> On Tue, Aug 25, 2015 at 9:17 PM, Ingo Molnar <mingo@kernel.org> wrote:
> > NAK due to lack of cleanliness: the two loops look almost identical - this sure
> > can be factored out...
> 
> Please check complete version at
> 
> https://patchwork.kernel.org/patch/7074341/

That doesn't do what Ingo suggested: "can be factored out...".

Please review this?

--- a/drivers/base/node.c~mm-check-if-section-present-during-memory-block-unregistering-v2-fix
+++ a/drivers/base/node.c
@@ -375,6 +375,22 @@ static int __init_refok get_nid_for_pfn(
 	return pfn_to_nid(pfn);
 }
 
+/*
+ * A memory block can have several absent sections.  A helper function for
+ * skipping over these holes.
+ *
+ * If an absent section is detected, skip_absent_section() will advance *pfn
+ * to the final page in that section and will return true.
+ */
+static bool skip_absent_section(unsigned long *pfn)
+{
+	if (present_section_nr(pfn_to_section_nr(*pfn)))
+		return false;
+
+	*pfn = round_down(*pfn + PAGES_PER_SECTION, PAGES_PER_SECTION) - 1;
+	return true;
+}
+
 /* register memory section under specified node if it spans that node */
 int register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
 {
@@ -390,18 +406,10 @@ int register_mem_sect_under_node(struct
 	sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
 	sect_end_pfn += PAGES_PER_SECTION - 1;
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
-		int page_nid, scn_nr;
+		int page_nid;
 
-		/*
-		 * memory block could have several absent sections from start.
-		 * skip pfn range from absent section
-		 */
-		scn_nr = pfn_to_section_nr(pfn);
-		if (!present_section_nr(scn_nr)) {
-			pfn = round_down(pfn + PAGES_PER_SECTION,
-					 PAGES_PER_SECTION) - 1;
+		if (skip_absent_section(&pfn))
 			continue;
-		}
 
 		page_nid = get_nid_for_pfn(pfn);
 		if (page_nid < 0)
@@ -441,18 +449,10 @@ int unregister_mem_sect_under_nodes(stru
 	sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
 	sect_end_pfn += PAGES_PER_SECTION - 1;
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
-		int nid, scn_nr;
+		int nid;
 
-		/*
-		 * memory block could have several absent sections from start.
-		 * skip pfn range from absent section
-		 */
-		scn_nr = pfn_to_section_nr(pfn);
-		if (!present_section_nr(scn_nr)) {
-			pfn = round_down(pfn + PAGES_PER_SECTION,
-					 PAGES_PER_SECTION) - 1;
+		if (skip_absent_section(&pfn))
 			continue;
-		}
 
 		nid = get_nid_for_pfn(pfn);
 		if (nid < 0)
_


> Andrew,
> Ingo NAKed raw version of this patch, so you may need to remove it
> from -mm tree.

I don't know what that means.  We have multiple patches under at least
two different Subject:s.  Please be very careful and very specific when
identifying patches.  Otherwise mistakes will be made.


I presently have three patches:

mm-check-if-section-present-during-memory-block-unregistering.patch
mm-check-if-section-present-during-memory-block-unregistering-v2.patch
mm-check-if-section-present-during-memory-block-unregistering-v2-fix.patch

When these are consolidated together, this is the result:


From: Yinghai Lu <yinghai@kernel.org>
Subject: mm: check if section present during memory block (un)registering

Tony Luck found on his setup, if memory block size 512M will cause crash
during booting.

 BUG: unable to handle kernel paging request at ffffea0074000020
 IP: [<ffffffff81670527>] get_nid_for_pfn+0x17/0x40
 PGD 128ffcb067 PUD 128ffc9067 PMD 0
 Oops: 0000 [#1] SMP
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc8 #1
...
 Call Trace:
  [<ffffffff81453b56>] ? register_mem_sect_under_node+0x66/0xe0
  [<ffffffff81453eeb>] register_one_node+0x17b/0x240
  [<ffffffff81b1f1ed>] ? pci_iommu_alloc+0x6e/0x6e
  [<ffffffff81b1f229>] topology_init+0x3c/0x95
  [<ffffffff8100213d>] do_one_initcall+0xcd/0x1f0

The system has non continuous RAM address:
 BIOS-e820: [mem 0x0000001300000000-0x0000001cffffffff] usable
 BIOS-e820: [mem 0x0000001d70000000-0x0000001ec7ffefff] usable
 BIOS-e820: [mem 0x0000001f00000000-0x0000002bffffffff] usable
 BIOS-e820: [mem 0x0000002c18000000-0x0000002d6fffefff] usable
 BIOS-e820: [mem 0x0000002e00000000-0x00000039ffffffff] usable

So there are start sections in memory block not present.
For example:
memory block : [0x2c18000000, 0x2c20000000) 512M
first three sections are not present.

Current register_mem_sect_under_node() assume first section is present,
but memory block section number range [start_section_nr, end_section_nr]
would include not present section.

For arch that support vmemmap, we don't setup memmap for struct page area
within not present sections area.

So skip the pfn range that belong to not present section.

Also fixes unregister_mem_sect_under_nodes().

Fixes: bdee237c0343 ("x86: mm: Use 2GB memory block size on large memory x86-64 systems")
Fixes: 982792c782ef ("x86, mm: probe memory block size for generic x86 64bit")
[akpm@linux-foundation.org: factor out common code]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reported-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>	[3.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/base/node.c |   27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff -puN drivers/base/node.c~mm-check-if-section-present-during-memory-block-unregistering drivers/base/node.c
--- a/drivers/base/node.c~mm-check-if-section-present-during-memory-block-unregistering
+++ a/drivers/base/node.c
@@ -375,6 +375,22 @@ static int __init_refok get_nid_for_pfn(
 	return pfn_to_nid(pfn);
 }
 
+/*
+ * A memory block can have several absent sections.  A helper function for
+ * skipping over these holes.
+ *
+ * If an absent section is detected, skip_absent_section() will advance *pfn
+ * to the final page in that section and will return true.
+ */
+static bool skip_absent_section(unsigned long *pfn)
+{
+	if (present_section_nr(pfn_to_section_nr(*pfn)))
+		return false;
+
+	*pfn = round_down(*pfn + PAGES_PER_SECTION, PAGES_PER_SECTION) - 1;
+	return true;
+}
+
 /* register memory section under specified node if it spans that node */
 int register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
 {
@@ -392,6 +408,9 @@ int register_mem_sect_under_node(struct
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
 		int page_nid;
 
+		if (skip_absent_section(&pfn))
+			continue;
+
 		page_nid = get_nid_for_pfn(pfn);
 		if (page_nid < 0)
 			continue;
@@ -426,11 +445,15 @@ int unregister_mem_sect_under_nodes(stru
 		return -ENOMEM;
 	nodes_clear(*unlinked_nodes);
 
-	sect_start_pfn = section_nr_to_pfn(phys_index);
-	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
+	sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
+	sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
+	sect_end_pfn += PAGES_PER_SECTION - 1;
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
 		int nid;
 
+		if (skip_absent_section(&pfn))
+			continue;
+
 		nid = get_nid_for_pfn(pfn);
 		if (nid < 0)
 			continue;
_


  reply	other threads:[~2015-08-26 20:49 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-04  8:29 [PATCH v4 1/4] Numachip: Fix 16-bit APIC ID truncation Daniel J Blueman
2014-11-04  8:29 ` [PATCH v4 2/4] Numachip: Elide self-IPI ICR polling Daniel J Blueman
2014-11-04 17:21   ` [tip:x86/platform] x86: numachip: " tip-bot for Daniel J Blueman
2014-11-04  8:29 ` [PATCH v4 3/4] Numachip: APIC driver cleanups Daniel J Blueman
2014-11-04 17:22   ` [tip:x86/platform] x86: numachip: " tip-bot for Daniel J Blueman
2014-11-04  8:29 ` [PATCH v4 4/4] Use 2GB memory block size on large-memory x86-64 systems Daniel J Blueman
2014-11-04 17:22   ` [tip:x86/mm] x86: mm: " tip-bot for Daniel J Blueman
2014-11-05 22:10     ` Yinghai Lu
2015-08-21 18:19   ` [PATCH v4 4/4] " Luck, Tony
2015-08-21 18:38     ` Yinghai Lu
2015-08-21 20:27       ` Luck, Tony
2015-08-21 20:50         ` Yinghai Lu
2015-08-21 23:54           ` Tony Luck
2015-08-24 17:46             ` Yinghai Lu
2015-08-24 20:41               ` Tony Luck
2015-08-24 21:25                 ` Yinghai Lu
2015-08-24 22:39                   ` Tony Luck
2015-08-24 23:41                     ` Yinghai Lu
2015-08-24 23:59                       ` Yinghai Lu
     [not found]                         ` <CA+8MBbKur4SLh-7EKhU16_ra7gbvnOARg-ZWScJWH9q1hKufZQ@mail.gmail.com>
2015-08-25 19:01                           ` Yinghai Lu
2015-08-25 22:06                             ` Tony Luck
2015-08-26  4:17                         ` Ingo Molnar
2015-08-26  5:42                           ` Yinghai Lu
2015-08-26 20:49                             ` Andrew Morton [this message]
2015-08-26 21:15                               ` Yinghai Lu
2014-11-04 17:21 ` [tip:x86/platform] x86: numachip: Fix 16-bit APIC ID truncation tip-bot for Daniel J Blueman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150826134923.9d8fad571de4be237a84ff50@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=bhelgaas@google.com \
    --cc=daniel@numascale.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=sp@numascale.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@gmail.com \
    --cc=x86@kernel.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.