[PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy
@ 2012-10-18 20:50 Yinghai Lu
  2012-10-18 20:50 ` [PATCH 1/3] ACPI: Introduce a new acpi handle to determine HID match Yinghai Lu
                   ` (21 more replies)
  0 siblings, 22 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

on top of current linus/master and tip/x86/mm2, but please zap last patch in that branch.

1. use brk to mapping first PMD_SIZE range under end of ram.
2. top down to initialize page table range by range.
3. get rid of calculate_page_table, and find_early_page_table.
4. remove early_ioremap in page table accessing.
5. remove workaround in xen to mark page RO.

v2: changes, update xen interface about pagetable_reserve, so not
   use pgt_buf_* in xen code directly.
v3: use range top-down to initialize page table, so will not use
   calculating/find early table anymore.
   also reorder the patches sequence.
v4: add mapping_mark_page_ro to fix xen, also move pgt_buf_* to init.c
    and merge alloc_low_page(), and for 32bit need to add alloc_low_pages
    to fix 32bit kmap setting.
v5: remove mark_page_ro workaround.
    Add another 5 cleanup patches.

could be found at:
        git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-mm

Yinghai Lu (19):
  x86, mm: Align start address to correct big page size
  x86, mm: Use big page size for small memory range
  x86, mm: Don't clear page table if range is ram
  x86, mm: only keep initial mapping for ram
  x86, mm: Break down init_all_memory_mapping
  x86, mm: setup page table in top-down
  x86, mm: Remove early_memremap workaround for page table accessing on 64bit
  x86, mm: Remove parameter in alloc_low_page for 64bit
  x86, mm: Merge alloc_low_page between 64bit and 32bit
  x86, mm: Move min_pfn_mapped back to mm/init.c
  x86, mm, xen: Remove mapping_pagatable_reserve
  x86, mm: Add alloc_low_pages(num)
  x86, mm: only call early_ioremap_page_table_range_init() once
  x86, mm: Move back pgt_buf_* to mm/init.c
  x86, mm: Move init_gbpages() out of setup.c
  x86, mm: change low/hignmem_pfn_init to static on 32bit
  x86, mm: Move function declaration into mm_internal.h
  x86, mm: Let "memmap=" take more entries one time
  x86, mm: Add check before clear pte above max_low_pfn on 32bit

 arch/x86/include/asm/init.h          |   20 +--
 arch/x86/include/asm/pgtable.h       |    1 +
 arch/x86/include/asm/pgtable_types.h |    1 -
 arch/x86/include/asm/x86_init.h      |   12 --
 arch/x86/kernel/e820.c               |   16 ++-
 arch/x86/kernel/setup.c              |   17 +--
 arch/x86/kernel/x86_init.c           |    4 -
 arch/x86/mm/init.c                   |  355 +++++++++++++++++-----------------
 arch/x86/mm/init_32.c                |   85 ++++++---
 arch/x86/mm/init_64.c                |  119 +++--------
 arch/x86/mm/mm_internal.h            |   17 ++
 arch/x86/xen/mmu.c                   |   28 ---
 12 files changed, 308 insertions(+), 367 deletions(-)
 create mode 100644 arch/x86/mm/mm_internal.h

-- 
1.7.7


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 1/3] ACPI: Introduce a new acpi handle to determine HID match.
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-11-02 12:23   ` Rafael J. Wysocki
  2012-10-18 20:50 ` [PATCH 01/19] x86, mm: Align start address to correct big page size Yinghai Lu
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Tang Chen, Yinghai Lu,
	Len Brown, linux-acpi

From: Tang Chen <tangchen@cn.fujitsu.com>

We need to find out if one handle is for root bridge, and install notify
handler for it to handle pci root bus hot add.
At that time, root bridge acpi device is not created yet.

So acpi_match_device_ids() will not work.

This patch add a function to check if new acpi handle's HID matches a list
of IDs.  The new api use acpi_device_info instead acpi_device.

-v2: updated changelog, also check length for string info...
     change checking sequence by moving string comaring close to for loop.
					- Yinghai

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-acpi@vger.kernel.org
---
 drivers/acpi/scan.c     |   33 +++++++++++++++++++++++++++++++++
 include/acpi/acpi_bus.h |    2 ++
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 5dfec09..33ca993 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -312,6 +312,39 @@ int acpi_match_device_ids(struct acpi_device *device,
 }
 EXPORT_SYMBOL(acpi_match_device_ids);
 
+int acpi_match_object_info_ids(struct acpi_device_info *info,
+			       const struct acpi_device_id *ids)
+{
+	const struct acpi_device_id *id;
+	char *str;
+	u32 len;
+	int i;
+
+	len = info->hardware_id.length;
+	if (len) {
+		str = info->hardware_id.string;
+		if (str)
+			for (id = ids; id->id[0]; id++)
+				if (!strcmp((char *)id->id, str))
+					return 0;
+	}
+
+	for (i = 0; i < info->compatible_id_list.count; i++) {
+		len = info->compatible_id_list.ids[i].length;
+		if (!len)
+			continue;
+		str = info->compatible_id_list.ids[i].string;
+		if (!str)
+			continue;
+		for (id = ids; id->id[0]; id++)
+			if (!strcmp((char *)id->id, str))
+				return 0;
+	}
+
+	return -ENOENT;
+}
+EXPORT_SYMBOL(acpi_match_object_info_ids);
+
 static void acpi_free_ids(struct acpi_device *device)
 {
 	struct acpi_hardware_id *id, *tmp;
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 608f92f..6ac415c 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -374,6 +374,8 @@ int acpi_bus_start(struct acpi_device *device);
 acpi_status acpi_bus_get_ejd(acpi_handle handle, acpi_handle * ejd);
 int acpi_match_device_ids(struct acpi_device *device,
 			  const struct acpi_device_id *ids);
+int acpi_match_object_info_ids(struct acpi_device_info *info,
+			       const struct acpi_device_id *ids);
 int acpi_create_dir(struct acpi_device *);
 void acpi_remove_dir(struct acpi_device *);
 
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 01/19] x86, mm: Align start address to correct big page size
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
  2012-10-18 20:50 ` [PATCH 1/3] ACPI: Introduce a new acpi handle to determine HID match Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-22 14:16   ` Konrad Rzeszutek Wilk
  2012-10-18 20:50 ` [PATCH 2/3] PCI: correctly detect ACPI PCI host bridge objects Yinghai Lu
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

We are going to use buffer in BRK to pre-map page table buffer.

Page table buffer could be only page aligned, but range around it are
ram too, we could use bigger page to map it to avoid small pages.

We will adjust page_size_mask in next patch to use big page size for
small ram range.

Before that, this patch will make start address to be aligned down
according to bigger page size, otherwise entry in page page will
not have correct value.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init_32.c |    1 +
 arch/x86/mm/init_64.c |    5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 11a5800..27f7fc6 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -310,6 +310,7 @@ repeat:
 					__pgprot(PTE_IDENT_ATTR |
 						 _PAGE_PSE);
 
+				pfn &= PMD_MASK >> PAGE_SHIFT;
 				addr2 = (pfn + PTRS_PER_PTE-1) * PAGE_SIZE +
 					PAGE_OFFSET + PAGE_SIZE-1;
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index ab558eb..f40f383 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -461,7 +461,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
 			pages++;
 			spin_lock(&init_mm.page_table_lock);
 			set_pte((pte_t *)pmd,
-				pfn_pte(address >> PAGE_SHIFT,
+				pfn_pte((address & PMD_MASK) >> PAGE_SHIFT,
 					__pgprot(pgprot_val(prot) | _PAGE_PSE)));
 			spin_unlock(&init_mm.page_table_lock);
 			last_map_addr = next;
@@ -536,7 +536,8 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
 			pages++;
 			spin_lock(&init_mm.page_table_lock);
 			set_pte((pte_t *)pud,
-				pfn_pte(addr >> PAGE_SHIFT, PAGE_KERNEL_LARGE));
+				pfn_pte((addr & PUD_MASK) >> PAGE_SHIFT,
+					PAGE_KERNEL_LARGE));
 			spin_unlock(&init_mm.page_table_lock);
 			last_map_addr = next;
 			continue;
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 2/3] PCI: correctly detect ACPI PCI host bridge objects
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
  2012-10-18 20:50 ` [PATCH 1/3] ACPI: Introduce a new acpi handle to determine HID match Yinghai Lu
  2012-10-18 20:50 ` [PATCH 01/19] x86, mm: Align start address to correct big page size Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-26  9:10   ` Bjorn Helgaas
  2012-10-18 20:50 ` [PATCH 02/19] x86, mm: Use big page size for small memory range Yinghai Lu
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Jiang Liu, Yinghai Lu,
	Len Brown, linux-acpi

From: Jiang Liu <jiang.liu@huawei.com>

The code in pci_root_hp.c depends on function acpi_is_root_bridge()
to check whether an ACPI object is a PCI host bridge or not.
If an ACPI device hasn't been created for the ACPI object yet,
function acpi_is_root_bridge() will return false even if the object
is a PCI host bridge object. That behavior will cause two issues:
1) No ACPI notification handler installed for PCI host bridges absent
   at startup, so hotplug events for those bridges won't be handled.
2) rescan_root_bridge() can't reenumerate offlined PCI host bridges
   because the ACPI devices have been already destroyed.

So use acpi_match_object_info_ids() to correctly detect PCI host bridges.

-v2: update to use acpi_match_object_info_ids() from Tang Chen  - Yinghai

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-acpi@vger.kernel.org
---
 drivers/acpi/pci_root_hp.c |   25 ++++++++++++++++++++++++-
 1 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/pci_root_hp.c b/drivers/acpi/pci_root_hp.c
index 2aebf78..3edec7f 100644
--- a/drivers/acpi/pci_root_hp.c
+++ b/drivers/acpi/pci_root_hp.c
@@ -19,6 +19,12 @@ struct acpi_root_bridge {
 	u32 flags;
 };
 
+static const struct acpi_device_id root_device_ids[] = {
+	{"PNP0A03", 0},
+	{"PNP0A08", 0},
+	{"", 0},
+};
+
 /* bridge flags */
 #define ROOT_BRIDGE_HAS_EJ0	(0x00000002)
 #define ROOT_BRIDGE_HAS_PS3	(0x00000080)
@@ -256,6 +262,23 @@ static void handle_hotplug_event_root(acpi_handle handle, u32 type,
 				_handle_hotplug_event_root);
 }
 
+static bool acpi_is_root_bridge_object(acpi_handle handle)
+{
+	struct acpi_device_info *info = NULL;
+	acpi_status status;
+	bool ret;
+
+	status = acpi_get_object_info(handle, &info);
+	if (ACPI_FAILURE(status))
+		return false;
+
+	ret = !acpi_match_object_info_ids(info, root_device_ids);
+
+	kfree(info);
+
+	return ret;
+}
+
 static acpi_status __init
 find_root_bridges(acpi_handle handle, u32 lvl, void *context, void **rv)
 {
@@ -264,7 +287,7 @@ find_root_bridges(acpi_handle handle, u32 lvl, void *context, void **rv)
 				      .pointer = objname };
 	int *count = (int *)context;
 
-	if (!acpi_is_root_bridge(handle))
+	if (!acpi_is_root_bridge_object(handle))
 		return AE_OK;
 
 	(*count)++;
-- 
1.7.7

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 02/19] x86, mm: Use big page size for small memory range
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (2 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 2/3] PCI: correctly detect ACPI PCI host bridge objects Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-22 14:21   ` Konrad Rzeszutek Wilk
  2012-10-18 20:50 ` [PATCH 3/3] PCI, ACPI: debug print for installation of acpi root bridge's notifier Yinghai Lu
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

We could map small range in the middle of big range at first, so should use
big page size at first to avoid using small page size to break down page table.

Only can set big page bit when that range has ram area around it.

-v2: fix 32bit boundary checking. We can not count ram above max_low_pfn
	for 32 bit.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init.c |   37 +++++++++++++++++++++++++++++++++++++
 1 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index c12dfd5..09ce38f 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -88,6 +88,40 @@ static int __meminit save_mr(struct map_range *mr, int nr_range,
 	return nr_range;
 }
 
+/*
+ * adjust the page_size_mask for small range to go with
+ *	big page size instead small one if nearby are ram too.
+ */
+static void __init_refok adjust_range_page_size_mask(struct map_range *mr,
+							 int nr_range)
+{
+	int i;
+
+	for (i = 0; i < nr_range; i++) {
+		if ((page_size_mask & (1<<PG_LEVEL_2M)) &&
+		    !(mr[i].page_size_mask & (1<<PG_LEVEL_2M))) {
+			unsigned long start = round_down(mr[i].start, PMD_SIZE);
+			unsigned long end = round_up(mr[i].end, PMD_SIZE);
+
+#ifdef CONFIG_X86_32
+			if ((end >> PAGE_SHIFT) > max_low_pfn)
+				continue;
+#endif
+
+			if (memblock_is_region_memory(start, end - start))
+				mr[i].page_size_mask |= 1<<PG_LEVEL_2M;
+		}
+		if ((page_size_mask & (1<<PG_LEVEL_1G)) &&
+		    !(mr[i].page_size_mask & (1<<PG_LEVEL_1G))) {
+			unsigned long start = round_down(mr[i].start, PUD_SIZE);
+			unsigned long end = round_up(mr[i].end, PUD_SIZE);
+
+			if (memblock_is_region_memory(start, end - start))
+				mr[i].page_size_mask |= 1<<PG_LEVEL_1G;
+		}
+	}
+}
+
 static int __meminit split_mem_range(struct map_range *mr, int nr_range,
 				     unsigned long start,
 				     unsigned long end)
@@ -182,6 +216,9 @@ static int __meminit split_mem_range(struct map_range *mr, int nr_range,
 		nr_range--;
 	}
 
+	if (!after_bootmem)
+		adjust_range_page_size_mask(mr, nr_range);
+
 	for (i = 0; i < nr_range; i++)
 		printk(KERN_DEBUG " [mem %#010lx-%#010lx] page %s\n",
 				mr[i].start, mr[i].end - 1,
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 3/3] PCI, ACPI: debug print for installation of acpi root bridge's notifier
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (3 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 02/19] x86, mm: Use big page size for small memory range Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-18 20:50 ` [PATCH 03/19] x86, mm: Don't clear page table if range is ram Yinghai Lu
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Tang Chen

From: Tang Chen <tangchen@cn.fujitsu.com>

acpi_install_notify_handler() could fail. So check the exit status
and give a better debug info.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Singed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/acpi/pci_root_hp.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/pci_root_hp.c b/drivers/acpi/pci_root_hp.c
index 3edec7f..01e71f6 100644
--- a/drivers/acpi/pci_root_hp.c
+++ b/drivers/acpi/pci_root_hp.c
@@ -282,6 +282,7 @@ static bool acpi_is_root_bridge_object(acpi_handle handle)
 static acpi_status __init
 find_root_bridges(acpi_handle handle, u32 lvl, void *context, void **rv)
 {
+	acpi_status status;
 	char objname[64];
 	struct acpi_buffer buffer = { .length = sizeof(objname),
 				      .pointer = objname };
@@ -294,9 +295,14 @@ find_root_bridges(acpi_handle handle, u32 lvl, void *context, void **rv)
 
 	acpi_get_name(handle, ACPI_FULL_PATHNAME, &buffer);
 
-	acpi_install_notify_handler(handle, ACPI_SYSTEM_NOTIFY,
-				handle_hotplug_event_root, NULL);
-	printk(KERN_DEBUG "acpi root: %s notify handler installed\n", objname);
+	status = acpi_install_notify_handler(handle, ACPI_SYSTEM_NOTIFY,
+					handle_hotplug_event_root, NULL);
+	if (ACPI_FAILURE(status))
+		printk(KERN_DEBUG "acpi root: %s notify handler is not installed, exit status: %u\n",
+				  objname, (unsigned int)status);
+	else
+		printk(KERN_DEBUG "acpi root: %s notify handler is installed\n",
+				 objname);
 
 	add_acpi_root_bridge(handle);
 
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 03/19] x86, mm: Don't clear page table if range is ram
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (4 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 3/3] PCI, ACPI: debug print for installation of acpi root bridge's notifier Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-22 14:28   ` Konrad Rzeszutek Wilk
  2012-10-18 20:50 ` [PATCH 04/19] x86, mm: only keep initial mapping for ram Yinghai Lu
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

After we add code use buffer in BRK to pre-map page table,
it should be safe to remove early_memmap for page table accessing.
Instead we get panic with that.

It turns out we clear the initial page table wrongly for next range that is
separated by holes.
And it only happens when we are trying to map range one by one range separately.

We need to check if the range is ram before clearing page table.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init_64.c |   37 ++++++++++++++++---------------------
 1 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index f40f383..61b3c44 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -363,20 +363,19 @@ static unsigned long __meminit
 phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
 	      pgprot_t prot)
 {
-	unsigned pages = 0;
+	unsigned long pages = 0, next;
 	unsigned long last_map_addr = end;
 	int i;
 
 	pte_t *pte = pte_page + pte_index(addr);
 
-	for(i = pte_index(addr); i < PTRS_PER_PTE; i++, addr += PAGE_SIZE, pte++) {
-
+	for (i = pte_index(addr); i < PTRS_PER_PTE; i++, addr = next, pte++) {
+		next = (addr & PAGE_MASK) + PAGE_SIZE;
 		if (addr >= end) {
-			if (!after_bootmem) {
-				for(; i < PTRS_PER_PTE; i++, pte++)
-					set_pte(pte, __pte(0));
-			}
-			break;
+			if (!after_bootmem &&
+			    !e820_any_mapped(addr & PAGE_MASK, next, 0))
+				set_pte(pte, __pte(0));
+			continue;
 		}
 
 		/*
@@ -418,16 +417,14 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
 		pte_t *pte;
 		pgprot_t new_prot = prot;
 
+		next = (address & PMD_MASK) + PMD_SIZE;
 		if (address >= end) {
-			if (!after_bootmem) {
-				for (; i < PTRS_PER_PMD; i++, pmd++)
-					set_pmd(pmd, __pmd(0));
-			}
-			break;
+			if (!after_bootmem &&
+			    !e820_any_mapped(address & PMD_MASK, next, 0))
+				set_pmd(pmd, __pmd(0));
+			continue;
 		}
 
-		next = (address & PMD_MASK) + PMD_SIZE;
-
 		if (pmd_val(*pmd)) {
 			if (!pmd_large(*pmd)) {
 				spin_lock(&init_mm.page_table_lock);
@@ -494,13 +491,11 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
 		pmd_t *pmd;
 		pgprot_t prot = PAGE_KERNEL;
 
-		if (addr >= end)
-			break;
-
 		next = (addr & PUD_MASK) + PUD_SIZE;
-
-		if (!after_bootmem && !e820_any_mapped(addr, next, 0)) {
-			set_pud(pud, __pud(0));
+		if (addr >= end) {
+			if (!after_bootmem &&
+			    !e820_any_mapped(addr & PUD_MASK, next, 0))
+				set_pud(pud, __pud(0));
 			continue;
 		}
 
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 04/19] x86, mm: only keep initial mapping for ram
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (5 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 03/19] x86, mm: Don't clear page table if range is ram Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-22 14:33   ` Konrad Rzeszutek Wilk
  2012-10-18 20:50 ` [PATCH 05/19] x86, mm: Break down init_all_memory_mapping Yinghai Lu
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

0 mean any e820 type, for any range is overlapped with any entry in e820,
kernel will keep it's initial page table mapping.

What we want is only keeping initial page table for ram range.

Change to E820_RAM and E820_RESERVED_KERN.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init_64.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 61b3c44..4898e80 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -373,7 +373,8 @@ phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
 		next = (addr & PAGE_MASK) + PAGE_SIZE;
 		if (addr >= end) {
 			if (!after_bootmem &&
-			    !e820_any_mapped(addr & PAGE_MASK, next, 0))
+			    !e820_any_mapped(addr & PAGE_MASK, next, E820_RAM) &&
+			    !e820_any_mapped(addr & PAGE_MASK, next, E820_RESERVED_KERN))
 				set_pte(pte, __pte(0));
 			continue;
 		}
@@ -420,7 +421,8 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
 		next = (address & PMD_MASK) + PMD_SIZE;
 		if (address >= end) {
 			if (!after_bootmem &&
-			    !e820_any_mapped(address & PMD_MASK, next, 0))
+			    !e820_any_mapped(address & PMD_MASK, next, E820_RAM) &&
+			    !e820_any_mapped(address & PMD_MASK, next, E820_RESERVED_KERN))
 				set_pmd(pmd, __pmd(0));
 			continue;
 		}
@@ -494,7 +496,8 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
 		next = (addr & PUD_MASK) + PUD_SIZE;
 		if (addr >= end) {
 			if (!after_bootmem &&
-			    !e820_any_mapped(addr & PUD_MASK, next, 0))
+			    !e820_any_mapped(addr & PUD_MASK, next, E820_RAM) &&
+			    !e820_any_mapped(addr & PUD_MASK, next, E820_RESERVED_KERN))
 				set_pud(pud, __pud(0));
 			continue;
 		}
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 05/19] x86, mm: Break down init_all_memory_mapping
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (6 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 04/19] x86, mm: only keep initial mapping for ram Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-18 20:50 ` [PATCH 06/19] x86, mm: setup page table in top-down Yinghai Lu
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

Will replace that with top-down page table initialization.
New API need to take range: init_range_memory_mapping()

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init.c |   45 +++++++++++++++++++++------------------------
 1 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 09ce38f..dbb2916 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -398,40 +398,30 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
  * Depending on the alignment of E820 ranges, this may possibly result in using
  * smaller size (i.e. 4K instead of 2M or 1G) page tables.
  */
-static void __init init_all_memory_mapping(void)
+static void __init init_range_memory_mapping(unsigned long range_start,
+					   unsigned long range_end)
 {
 	unsigned long start_pfn, end_pfn;
 	int i;
 
-	/* the ISA range is always mapped regardless of memory holes */
-	init_memory_mapping(0, ISA_END_ADDRESS);
-
 	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL) {
-		u64 start = start_pfn << PAGE_SHIFT;
-		u64 end = end_pfn << PAGE_SHIFT;
+		u64 start = (u64)start_pfn << PAGE_SHIFT;
+		u64 end = (u64)end_pfn << PAGE_SHIFT;
 
-		if (end <= ISA_END_ADDRESS)
+		if (end <= range_start)
 			continue;
 
-		if (start < ISA_END_ADDRESS)
-			start = ISA_END_ADDRESS;
-#ifdef CONFIG_X86_32
-		/* on 32 bit, we only map up to max_low_pfn */
-		if ((start >> PAGE_SHIFT) >= max_low_pfn)
+		if (start < range_start)
+			start = range_start;
+
+		if (start >= range_end)
 			continue;
 
-		if ((end >> PAGE_SHIFT) > max_low_pfn)
-			end = max_low_pfn << PAGE_SHIFT;
-#endif
-		init_memory_mapping(start, end);
-	}
+		if (end > range_end)
+			end = range_end;
 
-#ifdef CONFIG_X86_64
-	if (max_pfn > max_low_pfn) {
-		/* can we preseve max_low_pfn ?*/
-		max_low_pfn = max_pfn;
+		init_memory_mapping(start, end);
 	}
-#endif
 }
 
 void __init init_mem_mapping(void)
@@ -461,8 +451,15 @@ void __init init_mem_mapping(void)
 		(pgt_buf_top << PAGE_SHIFT) - 1);
 
 	max_pfn_mapped = 0; /* will get exact value next */
-	init_all_memory_mapping();
-
+	/* the ISA range is always mapped regardless of memory holes */
+	init_memory_mapping(0, ISA_END_ADDRESS);
+	init_range_memory_mapping(ISA_END_ADDRESS, end);
+#ifdef CONFIG_X86_64
+	if (max_pfn > max_low_pfn) {
+		/* can we preseve max_low_pfn ?*/
+		max_low_pfn = max_pfn;
+	}
+#endif
 	/*
 	 * Reserve the kernel pagetable pages we used (pgt_buf_start -
 	 * pgt_buf_end) and free the other ones (pgt_buf_end - pgt_buf_top)
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 06/19] x86, mm: setup page table in top-down
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (7 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 05/19] x86, mm: Break down init_all_memory_mapping Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-19 16:24   ` Stefano Stabellini
  2012-10-22 15:06   ` Konrad Rzeszutek Wilk
  2012-10-18 20:50 ` [PATCH 07/19] x86, mm: Remove early_memremap workaround for page table accessing on 64bit Yinghai Lu
                   ` (12 subsequent siblings)
  21 siblings, 2 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

Get pgt_buf early from BRK, and use it to map PMD_SIZE from top at first.
then use mapped pages to map more range below, and keep looping until
all pages get mapped.

alloc_low_page will use page from BRK at first, after that buff is used up,
will use memblock to find and reserve page for page table usage.

At last we could get rid of calculation and find early pgt related code.

-v2: update to after fix_xen change,
     also use MACRO for initial pgt_buf size and add comments with it.
-v3: skip big reserved range in memblock.reserved near end.
-v4: don't need fix_xen change now.

Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/page_types.h |    1 +
 arch/x86/include/asm/pgtable.h    |    1 +
 arch/x86/kernel/setup.c           |    3 +
 arch/x86/mm/init.c                |  207 ++++++++++--------------------------
 arch/x86/mm/init_32.c             |   17 +++-
 arch/x86/mm/init_64.c             |   17 +++-
 6 files changed, 91 insertions(+), 155 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 54c9787..9f6f3e6 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -45,6 +45,7 @@ extern int devmem_is_allowed(unsigned long pagenr);
 
 extern unsigned long max_low_pfn_mapped;
 extern unsigned long max_pfn_mapped;
+extern unsigned long min_pfn_mapped;
 
 static inline phys_addr_t get_max_mapped(void)
 {
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index dd1a888..6991a3e 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -603,6 +603,7 @@ static inline int pgd_none(pgd_t pgd)
 
 extern int direct_gbpages;
 void init_mem_mapping(void);
+void early_alloc_pgt_buf(void);
 
 /* local pte updates need not use xchg for locking */
 static inline pte_t native_local_ptep_get_and_clear(pte_t *ptep)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index e72e4c6..73cb7ba 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -124,6 +124,7 @@
  */
 unsigned long max_low_pfn_mapped;
 unsigned long max_pfn_mapped;
+unsigned long min_pfn_mapped;
 
 #ifdef CONFIG_DMI
 RESERVE_BRK(dmi_alloc, 65536);
@@ -897,6 +898,8 @@ void __init setup_arch(char **cmdline_p)
 
 	reserve_ibft_region();
 
+	early_alloc_pgt_buf();
+
 	/*
 	 * Need to conclude brk, before memblock_x86_fill()
 	 *  it could use memblock_find_in_range, could overlap with
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index dbb2916..9ff29c1 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -21,6 +21,21 @@ unsigned long __initdata pgt_buf_start;
 unsigned long __meminitdata pgt_buf_end;
 unsigned long __meminitdata pgt_buf_top;
 
+/* need 4 4k for initial PMD_SIZE, 4k for 0-ISA_END_ADDRESS */
+#define INIT_PGT_BUF_SIZE	(5 * PAGE_SIZE)
+RESERVE_BRK(early_pgt_alloc, INIT_PGT_BUF_SIZE);
+void  __init early_alloc_pgt_buf(void)
+{
+	unsigned long tables = INIT_PGT_BUF_SIZE;
+	phys_addr_t base;
+
+	base = __pa(extend_brk(tables, PAGE_SIZE));
+
+	pgt_buf_start = base >> PAGE_SHIFT;
+	pgt_buf_end = pgt_buf_start;
+	pgt_buf_top = pgt_buf_start + (tables >> PAGE_SHIFT);
+}
+
 int after_bootmem;
 
 int direct_gbpages
@@ -228,105 +243,6 @@ static int __meminit split_mem_range(struct map_range *mr, int nr_range,
 	return nr_range;
 }
 
-static unsigned long __init calculate_table_space_size(unsigned long start,
-					  unsigned long end)
-{
-	unsigned long puds = 0, pmds = 0, ptes = 0, tables;
-	struct map_range mr[NR_RANGE_MR];
-	int nr_range, i;
-
-	pr_info("calculate_table_space_size: [mem %#010lx-%#010lx]\n",
-	       start, end - 1);
-
-	memset(mr, 0, sizeof(mr));
-	nr_range = 0;
-	nr_range = split_mem_range(mr, nr_range, start, end);
-
-	for (i = 0; i < nr_range; i++) {
-		unsigned long range, extra;
-
-		range = mr[i].end - mr[i].start;
-		puds += (range + PUD_SIZE - 1) >> PUD_SHIFT;
-
-		if (mr[i].page_size_mask & (1 << PG_LEVEL_1G)) {
-			extra = range - ((range >> PUD_SHIFT) << PUD_SHIFT);
-			pmds += (extra + PMD_SIZE - 1) >> PMD_SHIFT;
-		} else
-			pmds += (range + PMD_SIZE - 1) >> PMD_SHIFT;
-
-		if (mr[i].page_size_mask & (1 << PG_LEVEL_2M)) {
-			extra = range - ((range >> PMD_SHIFT) << PMD_SHIFT);
-#ifdef CONFIG_X86_32
-			extra += PMD_SIZE;
-#endif
-			/* The first 2/4M doesn't use large pages. */
-			if (mr[i].start < PMD_SIZE)
-				extra += PMD_SIZE - mr[i].start;
-
-			ptes += (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
-		} else
-			ptes += (range + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	}
-
-	tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
-	tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
-	tables += roundup(ptes * sizeof(pte_t), PAGE_SIZE);
-
-#ifdef CONFIG_X86_32
-	/* for fixmap */
-	tables += roundup(__end_of_fixed_addresses * sizeof(pte_t), PAGE_SIZE);
-#endif
-
-	return tables;
-}
-
-static unsigned long __init calculate_all_table_space_size(void)
-{
-	unsigned long start_pfn, end_pfn;
-	unsigned long tables;
-	int i;
-
-	/* the ISA range is always mapped regardless of memory holes */
-	tables = calculate_table_space_size(0, ISA_END_ADDRESS);
-
-	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL) {
-		u64 start = start_pfn << PAGE_SHIFT;
-		u64 end = end_pfn << PAGE_SHIFT;
-
-		if (end <= ISA_END_ADDRESS)
-			continue;
-
-		if (start < ISA_END_ADDRESS)
-			start = ISA_END_ADDRESS;
-#ifdef CONFIG_X86_32
-		/* on 32 bit, we only map up to max_low_pfn */
-		if ((start >> PAGE_SHIFT) >= max_low_pfn)
-			continue;
-
-		if ((end >> PAGE_SHIFT) > max_low_pfn)
-			end = max_low_pfn << PAGE_SHIFT;
-#endif
-		tables += calculate_table_space_size(start, end);
-	}
-
-	return tables;
-}
-
-static void __init find_early_table_space(unsigned long start,
-					  unsigned long good_end,
-					  unsigned long tables)
-{
-	phys_addr_t base;
-
-	base = memblock_find_in_range(start, good_end, tables, PAGE_SIZE);
-	if (!base)
-		panic("Cannot find space for the kernel page tables");
-
-	pgt_buf_start = base >> PAGE_SHIFT;
-	pgt_buf_end = pgt_buf_start;
-	pgt_buf_top = pgt_buf_start + (tables >> PAGE_SHIFT);
-}
-
 static struct range pfn_mapped[E820_X_MAX];
 static int nr_pfn_mapped;
 
@@ -391,17 +307,14 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
 }
 
 /*
- * Iterate through E820 memory map and create direct mappings for only E820_RAM
- * regions. We cannot simply create direct mappings for all pfns from
- * [0 to max_low_pfn) and [4GB to max_pfn) because of possible memory holes in
- * high addresses that cannot be marked as UC by fixed/variable range MTRRs.
- * Depending on the alignment of E820 ranges, this may possibly result in using
- * smaller size (i.e. 4K instead of 2M or 1G) page tables.
+ * this one could take range with hole in it
  */
-static void __init init_range_memory_mapping(unsigned long range_start,
+static unsigned long __init init_range_memory_mapping(
+					   unsigned long range_start,
 					   unsigned long range_end)
 {
 	unsigned long start_pfn, end_pfn;
+	unsigned long mapped_ram_size = 0;
 	int i;
 
 	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL) {
@@ -421,71 +334,67 @@ static void __init init_range_memory_mapping(unsigned long range_start,
 			end = range_end;
 
 		init_memory_mapping(start, end);
+
+		mapped_ram_size += end - start;
 	}
+
+	return mapped_ram_size;
 }
 
 void __init init_mem_mapping(void)
 {
-	unsigned long tables, good_end, end;
+	unsigned long end, real_end, start, last_start;
+	unsigned long step_size;
+	unsigned long addr;
+	unsigned long mapped_ram_size = 0;
+	unsigned long new_mapped_ram_size;
 
 	probe_page_size_mask();
 
-	/*
-	 * Find space for the kernel direct mapping tables.
-	 *
-	 * Later we should allocate these tables in the local node of the
-	 * memory mapped. Unfortunately this is done currently before the
-	 * nodes are discovered.
-	 */
 #ifdef CONFIG_X86_64
 	end = max_pfn << PAGE_SHIFT;
-	good_end = end;
 #else
 	end = max_low_pfn << PAGE_SHIFT;
-	good_end = max_pfn_mapped << PAGE_SHIFT;
 #endif
-	tables = calculate_all_table_space_size();
-	find_early_table_space(0, good_end, tables);
-	printk(KERN_DEBUG "kernel direct mapping tables up to %#lx @ [mem %#010lx-%#010lx] prealloc\n",
-		end - 1, pgt_buf_start << PAGE_SHIFT,
-		(pgt_buf_top << PAGE_SHIFT) - 1);
 
-	max_pfn_mapped = 0; /* will get exact value next */
 	/* the ISA range is always mapped regardless of memory holes */
 	init_memory_mapping(0, ISA_END_ADDRESS);
-	init_range_memory_mapping(ISA_END_ADDRESS, end);
+
+	/* xen has big range in reserved near end of ram, skip it at first */
+	addr = memblock_find_in_range(ISA_END_ADDRESS, end, PMD_SIZE,
+			 PAGE_SIZE);
+	real_end = addr + PMD_SIZE;
+
+	/* step_size need to be small so pgt_buf from BRK could cover it */
+	step_size = PMD_SIZE;
+	max_pfn_mapped = 0; /* will get exact value next */
+	min_pfn_mapped = real_end >> PAGE_SHIFT;
+	last_start = start = real_end;
+	while (last_start > ISA_END_ADDRESS) {
+		if (last_start > step_size) {
+			start = round_down(last_start - 1, step_size);
+			if (start < ISA_END_ADDRESS)
+				start = ISA_END_ADDRESS;
+		} else
+			start = ISA_END_ADDRESS;
+		new_mapped_ram_size = init_range_memory_mapping(start,
+							last_start);
+		last_start = start;
+		min_pfn_mapped = last_start >> PAGE_SHIFT;
+		if (new_mapped_ram_size > mapped_ram_size)
+			step_size <<= 5;
+		mapped_ram_size += new_mapped_ram_size;
+	}
+
+	if (real_end < end)
+		init_range_memory_mapping(real_end, end);
+
 #ifdef CONFIG_X86_64
 	if (max_pfn > max_low_pfn) {
 		/* can we preseve max_low_pfn ?*/
 		max_low_pfn = max_pfn;
 	}
 #endif
-	/*
-	 * Reserve the kernel pagetable pages we used (pgt_buf_start -
-	 * pgt_buf_end) and free the other ones (pgt_buf_end - pgt_buf_top)
-	 * so that they can be reused for other purposes.
-	 *
-	 * On native it just means calling memblock_reserve, on Xen it also
-	 * means marking RW the pagetable pages that we allocated before
-	 * but that haven't been used.
-	 *
-	 * In fact on xen we mark RO the whole range pgt_buf_start -
-	 * pgt_buf_top, because we have to make sure that when
-	 * init_memory_mapping reaches the pagetable pages area, it maps
-	 * RO all the pagetable pages, including the ones that are beyond
-	 * pgt_buf_end at that time.
-	 */
-	if (pgt_buf_end > pgt_buf_start) {
-		printk(KERN_DEBUG "kernel direct mapping tables up to %#lx @ [mem %#010lx-%#010lx] final\n",
-			end - 1, pgt_buf_start << PAGE_SHIFT,
-			(pgt_buf_end << PAGE_SHIFT) - 1);
-		x86_init.mapping.pagetable_reserve(PFN_PHYS(pgt_buf_start),
-				PFN_PHYS(pgt_buf_end));
-	}
-
-	/* stop the wrong using */
-	pgt_buf_top = 0;
-
 	early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
 }
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 27f7fc6..7bb1106 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -61,11 +61,22 @@ bool __read_mostly __vmalloc_start_set = false;
 
 static __init void *alloc_low_page(void)
 {
-	unsigned long pfn = pgt_buf_end++;
+	unsigned long pfn;
 	void *adr;
 
-	if (pfn >= pgt_buf_top)
-		panic("alloc_low_page: ran out of memory");
+	if ((pgt_buf_end + 1) >= pgt_buf_top) {
+		unsigned long ret;
+		if (min_pfn_mapped >= max_pfn_mapped)
+			panic("alloc_low_page: ran out of memory");
+		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
+					max_pfn_mapped << PAGE_SHIFT,
+					PAGE_SIZE, PAGE_SIZE);
+		if (!ret)
+			panic("alloc_low_page: can not alloc memory");
+		memblock_reserve(ret, PAGE_SIZE);
+		pfn = ret >> PAGE_SHIFT;
+	} else
+		pfn = pgt_buf_end++;
 
 	adr = __va(pfn * PAGE_SIZE);
 	clear_page(adr);
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 4898e80..7dfa69b 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -316,7 +316,7 @@ void __init cleanup_highmap(void)
 
 static __ref void *alloc_low_page(unsigned long *phys)
 {
-	unsigned long pfn = pgt_buf_end++;
+	unsigned long pfn;
 	void *adr;
 
 	if (after_bootmem) {
@@ -326,8 +326,19 @@ static __ref void *alloc_low_page(unsigned long *phys)
 		return adr;
 	}
 
-	if (pfn >= pgt_buf_top)
-		panic("alloc_low_page: ran out of memory");
+	if ((pgt_buf_end + 1) >= pgt_buf_top) {
+		unsigned long ret;
+		if (min_pfn_mapped >= max_pfn_mapped)
+			panic("alloc_low_page: ran out of memory");
+		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
+					max_pfn_mapped << PAGE_SHIFT,
+					PAGE_SIZE, PAGE_SIZE);
+		if (!ret)
+			panic("alloc_low_page: can not alloc memory");
+		memblock_reserve(ret, PAGE_SIZE);
+		pfn = ret >> PAGE_SHIFT;
+	} else
+		pfn = pgt_buf_end++;
 
 	adr = early_memremap(pfn * PAGE_SIZE, PAGE_SIZE);
 	clear_page(adr);
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 07/19] x86, mm: Remove early_memremap workaround for page table accessing on 64bit
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (8 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 06/19] x86, mm: setup page table in top-down Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-22 15:07   ` Konrad Rzeszutek Wilk
  2012-10-18 20:50 ` [PATCH 08/19] x86, mm: Remove parameter in alloc_low_page for 64bit Yinghai Lu
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

We do not need that workaround anymore after patches that pre-map page
table buf and do not clear initial page table wrongly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init_64.c |   38 ++++----------------------------------
 1 files changed, 4 insertions(+), 34 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 7dfa69b..4e6873f 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -340,36 +340,12 @@ static __ref void *alloc_low_page(unsigned long *phys)
 	} else
 		pfn = pgt_buf_end++;
 
-	adr = early_memremap(pfn * PAGE_SIZE, PAGE_SIZE);
+	adr = __va(pfn * PAGE_SIZE);
 	clear_page(adr);
 	*phys  = pfn * PAGE_SIZE;
 	return adr;
 }
 
-static __ref void *map_low_page(void *virt)
-{
-	void *adr;
-	unsigned long phys, left;
-
-	if (after_bootmem)
-		return virt;
-
-	phys = __pa(virt);
-	left = phys & (PAGE_SIZE - 1);
-	adr = early_memremap(phys & PAGE_MASK, PAGE_SIZE);
-	adr = (void *)(((unsigned long)adr) | left);
-
-	return adr;
-}
-
-static __ref void unmap_low_page(void *adr)
-{
-	if (after_bootmem)
-		return;
-
-	early_iounmap((void *)((unsigned long)adr & PAGE_MASK), PAGE_SIZE);
-}
-
 static unsigned long __meminit
 phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
 	      pgprot_t prot)
@@ -441,10 +417,9 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
 		if (pmd_val(*pmd)) {
 			if (!pmd_large(*pmd)) {
 				spin_lock(&init_mm.page_table_lock);
-				pte = map_low_page((pte_t *)pmd_page_vaddr(*pmd));
+				pte = (pte_t *)pmd_page_vaddr(*pmd);
 				last_map_addr = phys_pte_init(pte, address,
 								end, prot);
-				unmap_low_page(pte);
 				spin_unlock(&init_mm.page_table_lock);
 				continue;
 			}
@@ -480,7 +455,6 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
 
 		pte = alloc_low_page(&pte_phys);
 		last_map_addr = phys_pte_init(pte, address, end, new_prot);
-		unmap_low_page(pte);
 
 		spin_lock(&init_mm.page_table_lock);
 		pmd_populate_kernel(&init_mm, pmd, __va(pte_phys));
@@ -515,10 +489,9 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
 
 		if (pud_val(*pud)) {
 			if (!pud_large(*pud)) {
-				pmd = map_low_page(pmd_offset(pud, 0));
+				pmd = pmd_offset(pud, 0);
 				last_map_addr = phys_pmd_init(pmd, addr, end,
 							 page_size_mask, prot);
-				unmap_low_page(pmd);
 				__flush_tlb_all();
 				continue;
 			}
@@ -555,7 +528,6 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
 		pmd = alloc_low_page(&pmd_phys);
 		last_map_addr = phys_pmd_init(pmd, addr, end, page_size_mask,
 					      prot);
-		unmap_low_page(pmd);
 
 		spin_lock(&init_mm.page_table_lock);
 		pud_populate(&init_mm, pud, __va(pmd_phys));
@@ -591,17 +563,15 @@ kernel_physical_mapping_init(unsigned long start,
 			next = end;
 
 		if (pgd_val(*pgd)) {
-			pud = map_low_page((pud_t *)pgd_page_vaddr(*pgd));
+			pud = (pud_t *)pgd_page_vaddr(*pgd);
 			last_map_addr = phys_pud_init(pud, __pa(start),
 						 __pa(end), page_size_mask);
-			unmap_low_page(pud);
 			continue;
 		}
 
 		pud = alloc_low_page(&pud_phys);
 		last_map_addr = phys_pud_init(pud, __pa(start), __pa(next),
 						 page_size_mask);
-		unmap_low_page(pud);
 
 		spin_lock(&init_mm.page_table_lock);
 		pgd_populate(&init_mm, pgd, __va(pud_phys));
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 08/19] x86, mm: Remove parameter in alloc_low_page for 64bit
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (9 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 07/19] x86, mm: Remove early_memremap workaround for page table accessing on 64bit Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-22 15:09   ` Konrad Rzeszutek Wilk
  2012-10-18 20:50 ` [PATCH 09/19] x86, mm: Merge alloc_low_page between 64bit and 32bit Yinghai Lu
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

Now all page table buf are pre-mapped, and could use virtual address directly.
So don't need to remember physics address anymore.

Remove that phys pointer in alloc_low_page(), and that will allow us to merge
alloc_low_page between 64bit and 32bit.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init_64.c |   19 +++++++------------
 1 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 4e6873f..cbf8dbe 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -314,14 +314,13 @@ void __init cleanup_highmap(void)
 	}
 }
 
-static __ref void *alloc_low_page(unsigned long *phys)
+static __ref void *alloc_low_page(void)
 {
 	unsigned long pfn;
 	void *adr;
 
 	if (after_bootmem) {
 		adr = (void *)get_zeroed_page(GFP_ATOMIC | __GFP_NOTRACK);
-		*phys = __pa(adr);
 
 		return adr;
 	}
@@ -342,7 +341,6 @@ static __ref void *alloc_low_page(unsigned long *phys)
 
 	adr = __va(pfn * PAGE_SIZE);
 	clear_page(adr);
-	*phys  = pfn * PAGE_SIZE;
 	return adr;
 }
 
@@ -400,7 +398,6 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
 	int i = pmd_index(address);
 
 	for (; i < PTRS_PER_PMD; i++, address = next) {
-		unsigned long pte_phys;
 		pmd_t *pmd = pmd_page + pmd_index(address);
 		pte_t *pte;
 		pgprot_t new_prot = prot;
@@ -453,11 +450,11 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
 			continue;
 		}
 
-		pte = alloc_low_page(&pte_phys);
+		pte = alloc_low_page();
 		last_map_addr = phys_pte_init(pte, address, end, new_prot);
 
 		spin_lock(&init_mm.page_table_lock);
-		pmd_populate_kernel(&init_mm, pmd, __va(pte_phys));
+		pmd_populate_kernel(&init_mm, pmd, pte);
 		spin_unlock(&init_mm.page_table_lock);
 	}
 	update_page_count(PG_LEVEL_2M, pages);
@@ -473,7 +470,6 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
 	int i = pud_index(addr);
 
 	for (; i < PTRS_PER_PUD; i++, addr = next) {
-		unsigned long pmd_phys;
 		pud_t *pud = pud_page + pud_index(addr);
 		pmd_t *pmd;
 		pgprot_t prot = PAGE_KERNEL;
@@ -525,12 +521,12 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
 			continue;
 		}
 
-		pmd = alloc_low_page(&pmd_phys);
+		pmd = alloc_low_page();
 		last_map_addr = phys_pmd_init(pmd, addr, end, page_size_mask,
 					      prot);
 
 		spin_lock(&init_mm.page_table_lock);
-		pud_populate(&init_mm, pud, __va(pmd_phys));
+		pud_populate(&init_mm, pud, pmd);
 		spin_unlock(&init_mm.page_table_lock);
 	}
 	__flush_tlb_all();
@@ -555,7 +551,6 @@ kernel_physical_mapping_init(unsigned long start,
 
 	for (; start < end; start = next) {
 		pgd_t *pgd = pgd_offset_k(start);
-		unsigned long pud_phys;
 		pud_t *pud;
 
 		next = (start + PGDIR_SIZE) & PGDIR_MASK;
@@ -569,12 +564,12 @@ kernel_physical_mapping_init(unsigned long start,
 			continue;
 		}
 
-		pud = alloc_low_page(&pud_phys);
+		pud = alloc_low_page();
 		last_map_addr = phys_pud_init(pud, __pa(start), __pa(next),
 						 page_size_mask);
 
 		spin_lock(&init_mm.page_table_lock);
-		pgd_populate(&init_mm, pgd, __va(pud_phys));
+		pgd_populate(&init_mm, pgd, pud);
 		spin_unlock(&init_mm.page_table_lock);
 		pgd_changed = true;
 	}
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 09/19] x86, mm: Merge alloc_low_page between 64bit and 32bit
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (10 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 08/19] x86, mm: Remove parameter in alloc_low_page for 64bit Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-22 15:11   ` Konrad Rzeszutek Wilk
  2012-10-18 20:50 ` [PATCH 10/19] x86, mm: Move min_pfn_mapped back to mm/init.c Yinghai Lu
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

They are almost same except 64 bit need to handle after_bootmem.

Add mm_internal.h to hide that alloc_low_page out of arch/x86/mm/init*.c

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init.c        |   34 ++++++++++++++++++++++++++++++++++
 arch/x86/mm/init_32.c     |   26 ++------------------------
 arch/x86/mm/init_64.c     |   32 ++------------------------------
 arch/x86/mm/mm_internal.h |    6 ++++++
 4 files changed, 44 insertions(+), 54 deletions(-)
 create mode 100644 arch/x86/mm/mm_internal.h

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 9ff29c1..c398b2c 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -17,10 +17,44 @@
 #include <asm/proto.h>
 #include <asm/dma.h>		/* for MAX_DMA_PFN */
 
+#include "mm_internal.h"
+
 unsigned long __initdata pgt_buf_start;
 unsigned long __meminitdata pgt_buf_end;
 unsigned long __meminitdata pgt_buf_top;
 
+__ref void *alloc_low_page(void)
+{
+	unsigned long pfn;
+	void *adr;
+
+#ifdef CONFIG_X86_64
+	if (after_bootmem) {
+		adr = (void *)get_zeroed_page(GFP_ATOMIC | __GFP_NOTRACK);
+
+		return adr;
+	}
+#endif
+
+	if ((pgt_buf_end + 1) >= pgt_buf_top) {
+		unsigned long ret;
+		if (min_pfn_mapped >= max_pfn_mapped)
+			panic("alloc_low_page: ran out of memory");
+		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
+					max_pfn_mapped << PAGE_SHIFT,
+					PAGE_SIZE, PAGE_SIZE);
+		if (!ret)
+			panic("alloc_low_page: can not alloc memory");
+		memblock_reserve(ret, PAGE_SIZE);
+		pfn = ret >> PAGE_SHIFT;
+	} else
+		pfn = pgt_buf_end++;
+
+	adr = __va(pfn * PAGE_SIZE);
+	clear_page(adr);
+	return adr;
+}
+
 /* need 4 4k for initial PMD_SIZE, 4k for 0-ISA_END_ADDRESS */
 #define INIT_PGT_BUF_SIZE	(5 * PAGE_SIZE)
 RESERVE_BRK(early_pgt_alloc, INIT_PGT_BUF_SIZE);
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 7bb1106..a7f2df1 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -53,36 +53,14 @@
 #include <asm/page_types.h>
 #include <asm/init.h>
 
+#include "mm_internal.h"
+
 unsigned long highstart_pfn, highend_pfn;
 
 static noinline int do_test_wp_bit(void);
 
 bool __read_mostly __vmalloc_start_set = false;
 
-static __init void *alloc_low_page(void)
-{
-	unsigned long pfn;
-	void *adr;
-
-	if ((pgt_buf_end + 1) >= pgt_buf_top) {
-		unsigned long ret;
-		if (min_pfn_mapped >= max_pfn_mapped)
-			panic("alloc_low_page: ran out of memory");
-		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
-					max_pfn_mapped << PAGE_SHIFT,
-					PAGE_SIZE, PAGE_SIZE);
-		if (!ret)
-			panic("alloc_low_page: can not alloc memory");
-		memblock_reserve(ret, PAGE_SIZE);
-		pfn = ret >> PAGE_SHIFT;
-	} else
-		pfn = pgt_buf_end++;
-
-	adr = __va(pfn * PAGE_SIZE);
-	clear_page(adr);
-	return adr;
-}
-
 /*
  * Creates a middle page table and puts a pointer to it in the
  * given global directory entry. This only returns the gd entry
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index cbf8dbe..aabe8ff 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -54,6 +54,8 @@
 #include <asm/uv/uv.h>
 #include <asm/setup.h>
 
+#include "mm_internal.h"
+
 static int __init parse_direct_gbpages_off(char *arg)
 {
 	direct_gbpages = 0;
@@ -314,36 +316,6 @@ void __init cleanup_highmap(void)
 	}
 }
 
-static __ref void *alloc_low_page(void)
-{
-	unsigned long pfn;
-	void *adr;
-
-	if (after_bootmem) {
-		adr = (void *)get_zeroed_page(GFP_ATOMIC | __GFP_NOTRACK);
-
-		return adr;
-	}
-
-	if ((pgt_buf_end + 1) >= pgt_buf_top) {
-		unsigned long ret;
-		if (min_pfn_mapped >= max_pfn_mapped)
-			panic("alloc_low_page: ran out of memory");
-		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
-					max_pfn_mapped << PAGE_SHIFT,
-					PAGE_SIZE, PAGE_SIZE);
-		if (!ret)
-			panic("alloc_low_page: can not alloc memory");
-		memblock_reserve(ret, PAGE_SIZE);
-		pfn = ret >> PAGE_SHIFT;
-	} else
-		pfn = pgt_buf_end++;
-
-	adr = __va(pfn * PAGE_SIZE);
-	clear_page(adr);
-	return adr;
-}
-
 static unsigned long __meminit
 phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
 	      pgprot_t prot)
diff --git a/arch/x86/mm/mm_internal.h b/arch/x86/mm/mm_internal.h
new file mode 100644
index 0000000..b3f993a
--- /dev/null
+++ b/arch/x86/mm/mm_internal.h
@@ -0,0 +1,6 @@
+#ifndef __X86_MM_INTERNAL_H
+#define __X86_MM_INTERNAL_H
+
+void *alloc_low_page(void);
+
+#endif	/* __X86_MM_INTERNAL_H */
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 10/19] x86, mm: Move min_pfn_mapped back to mm/init.c
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (11 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 09/19] x86, mm: Merge alloc_low_page between 64bit and 32bit Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-18 20:50 ` [PATCH 11/19] x86, mm, xen: Remove mapping_pagatable_reserve Yinghai Lu
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

Also change it to static.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/page_types.h |    1 -
 arch/x86/kernel/setup.c           |    1 -
 arch/x86/mm/init.c                |    2 ++
 3 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 9f6f3e6..54c9787 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -45,7 +45,6 @@ extern int devmem_is_allowed(unsigned long pagenr);
 
 extern unsigned long max_low_pfn_mapped;
 extern unsigned long max_pfn_mapped;
-extern unsigned long min_pfn_mapped;
 
 static inline phys_addr_t get_max_mapped(void)
 {
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 73cb7ba..9cb2e27 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -124,7 +124,6 @@
  */
 unsigned long max_low_pfn_mapped;
 unsigned long max_pfn_mapped;
-unsigned long min_pfn_mapped;
 
 #ifdef CONFIG_DMI
 RESERVE_BRK(dmi_alloc, 65536);
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index c398b2c..2257727 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -23,6 +23,8 @@ unsigned long __initdata pgt_buf_start;
 unsigned long __meminitdata pgt_buf_end;
 unsigned long __meminitdata pgt_buf_top;
 
+static unsigned long min_pfn_mapped;
+
 __ref void *alloc_low_page(void)
 {
 	unsigned long pfn;
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 11/19] x86, mm, xen: Remove mapping_pagatable_reserve
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (12 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 10/19] x86, mm: Move min_pfn_mapped back to mm/init.c Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-22 15:14   ` Konrad Rzeszutek Wilk
  2012-10-18 20:50 ` [PATCH 12/19] x86, mm: Add alloc_low_pages(num) Yinghai Lu
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

page table area are pre-mapped now, and mark_page_ro is used to make it RO
for xen.

mapping_pagetable_reserve is not used anymore, so remove it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/pgtable_types.h |    1 -
 arch/x86/include/asm/x86_init.h      |   12 ------------
 arch/x86/kernel/x86_init.c           |    4 ----
 arch/x86/mm/init.c                   |    4 ----
 arch/x86/xen/mmu.c                   |   28 ----------------------------
 5 files changed, 0 insertions(+), 49 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index ec8a1fc..79738f2 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -301,7 +301,6 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
 /* Install a pte for a particular vaddr in kernel space. */
 void set_pte_vaddr(unsigned long vaddr, pte_t pte);
 
-extern void native_pagetable_reserve(u64 start, u64 end);
 #ifdef CONFIG_X86_32
 extern void native_pagetable_init(void);
 #else
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 5769349..3b2ce8f 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -69,17 +69,6 @@ struct x86_init_oem {
 };
 
 /**
- * struct x86_init_mapping - platform specific initial kernel pagetable setup
- * @pagetable_reserve:	reserve a range of addresses for kernel pagetable usage
- *
- * For more details on the purpose of this hook, look in
- * init_memory_mapping and the commit that added it.
- */
-struct x86_init_mapping {
-	void (*pagetable_reserve)(u64 start, u64 end);
-};
-
-/**
  * struct x86_init_paging - platform specific paging functions
  * @pagetable_init:	platform specific paging initialization call to setup
  *			the kernel pagetables and prepare accessors functions.
@@ -136,7 +125,6 @@ struct x86_init_ops {
 	struct x86_init_mpparse		mpparse;
 	struct x86_init_irqs		irqs;
 	struct x86_init_oem		oem;
-	struct x86_init_mapping		mapping;
 	struct x86_init_paging		paging;
 	struct x86_init_timers		timers;
 	struct x86_init_iommu		iommu;
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 7a3d075..50cf83e 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -62,10 +62,6 @@ struct x86_init_ops x86_init __initdata = {
 		.banner			= default_banner,
 	},
 
-	.mapping = {
-		.pagetable_reserve		= native_pagetable_reserve,
-	},
-
 	.paging = {
 		.pagetable_init		= native_pagetable_init,
 	},
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 2257727..dd09d20 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -112,10 +112,6 @@ static void __init probe_page_size_mask(void)
 		__supported_pte_mask |= _PAGE_GLOBAL;
 	}
 }
-void __init native_pagetable_reserve(u64 start, u64 end)
-{
-	memblock_reserve(start, end - start);
-}
 
 #ifdef CONFIG_X86_32
 #define NR_RANGE_MR 3
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 6226c99..efc5260 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1178,20 +1178,6 @@ static void xen_exit_mmap(struct mm_struct *mm)
 
 static void xen_post_allocator_init(void);
 
-static __init void xen_mapping_pagetable_reserve(u64 start, u64 end)
-{
-	/* reserve the range used */
-	native_pagetable_reserve(start, end);
-
-	/* set as RW the rest */
-	printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end,
-			PFN_PHYS(pgt_buf_top));
-	while (end < PFN_PHYS(pgt_buf_top)) {
-		make_lowmem_page_readwrite(__va(end));
-		end += PAGE_SIZE;
-	}
-}
-
 #ifdef CONFIG_X86_64
 static void __init xen_cleanhighmap(unsigned long vaddr,
 				    unsigned long vaddr_end)
@@ -1484,19 +1470,6 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
 #else /* CONFIG_X86_64 */
 static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
 {
-	unsigned long pfn = pte_pfn(pte);
-
-	/*
-	 * If the new pfn is within the range of the newly allocated
-	 * kernel pagetable, and it isn't being mapped into an
-	 * early_ioremap fixmap slot as a freshly allocated page, make sure
-	 * it is RO.
-	 */
-	if (((!is_early_ioremap_ptep(ptep) &&
-			pfn >= pgt_buf_start && pfn < pgt_buf_top)) ||
-			(is_early_ioremap_ptep(ptep) && pfn != (pgt_buf_end - 1)))
-		pte = pte_wrprotect(pte);
-
 	return pte;
 }
 #endif /* CONFIG_X86_64 */
@@ -2178,7 +2151,6 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
 
 void __init xen_init_mmu_ops(void)
 {
-	x86_init.mapping.pagetable_reserve = xen_mapping_pagetable_reserve;
 	x86_init.paging.pagetable_init = xen_pagetable_init;
 	pv_mmu_ops = xen_mmu_ops;
 
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 12/19] x86, mm: Add alloc_low_pages(num)
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (13 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 11/19] x86, mm, xen: Remove mapping_pagatable_reserve Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-22 15:17   ` Konrad Rzeszutek Wilk
  2012-10-18 20:50 ` [PATCH 13/19] x86, mm: only call early_ioremap_page_table_range_init() once Yinghai Lu
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

32bit kmap mapping need page table to be used for low to high.

At this point those page table are still from pgt_buf_* from BRK,
So it is ok now.
But we want to move early_ioremap_page_table_range_init() out of
init_memory_mapping() and only call it one time later, that will
make page_table_range_init/page_table_kmap_check/alloc_low_page to
use memblock to get page.

memblock allocation for page table are from high to low.

So will get panic from page_table_kmap_check() that has BUG_ON to do
ordering checking.

This patch add alloc_low_pages to make it possible to alloc serveral pages
at first, and hand out pages one by one from low to high.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init.c        |   33 +++++++++++++++++++++------------
 arch/x86/mm/mm_internal.h |    6 +++++-
 2 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index dd09d20..de71c0d 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -25,36 +25,45 @@ unsigned long __meminitdata pgt_buf_top;
 
 static unsigned long min_pfn_mapped;
 
-__ref void *alloc_low_page(void)
+__ref void *alloc_low_pages(unsigned int num)
 {
 	unsigned long pfn;
-	void *adr;
+	int i;
 
 #ifdef CONFIG_X86_64
 	if (after_bootmem) {
-		adr = (void *)get_zeroed_page(GFP_ATOMIC | __GFP_NOTRACK);
+		unsigned int order;
 
-		return adr;
+		order = get_order((unsigned long)num << PAGE_SHIFT);
+		return (void *)__get_free_pages(GFP_ATOMIC | __GFP_NOTRACK |
+						__GFP_ZERO, order);
 	}
 #endif
 
-	if ((pgt_buf_end + 1) >= pgt_buf_top) {
+	if ((pgt_buf_end + num) >= pgt_buf_top) {
 		unsigned long ret;
 		if (min_pfn_mapped >= max_pfn_mapped)
 			panic("alloc_low_page: ran out of memory");
 		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
 					max_pfn_mapped << PAGE_SHIFT,
-					PAGE_SIZE, PAGE_SIZE);
+					PAGE_SIZE * num , PAGE_SIZE);
 		if (!ret)
 			panic("alloc_low_page: can not alloc memory");
-		memblock_reserve(ret, PAGE_SIZE);
+		memblock_reserve(ret, PAGE_SIZE * num);
 		pfn = ret >> PAGE_SHIFT;
-	} else
-		pfn = pgt_buf_end++;
+	} else {
+		pfn = pgt_buf_end;
+		pgt_buf_end += num;
+	}
+
+	for (i = 0; i < num; i++) {
+		void *adr;
+
+		adr = __va((pfn + i) << PAGE_SHIFT);
+		clear_page(adr);
+	}
 
-	adr = __va(pfn * PAGE_SIZE);
-	clear_page(adr);
-	return adr;
+	return __va(pfn << PAGE_SHIFT);
 }
 
 /* need 4 4k for initial PMD_SIZE, 4k for 0-ISA_END_ADDRESS */
diff --git a/arch/x86/mm/mm_internal.h b/arch/x86/mm/mm_internal.h
index b3f993a..7e3b88e 100644
--- a/arch/x86/mm/mm_internal.h
+++ b/arch/x86/mm/mm_internal.h
@@ -1,6 +1,10 @@
 #ifndef __X86_MM_INTERNAL_H
 #define __X86_MM_INTERNAL_H
 
-void *alloc_low_page(void);
+void *alloc_low_pages(unsigned int num);
+static inline void *alloc_low_page(void)
+{
+	return alloc_low_pages(1);
+}
 
 #endif	/* __X86_MM_INTERNAL_H */
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 13/19] x86, mm: only call early_ioremap_page_table_range_init() once
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (14 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 12/19] x86, mm: Add alloc_low_pages(num) Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-22 15:24   ` Konrad Rzeszutek Wilk
  2012-10-18 20:50 ` [PATCH 14/19] x86, mm: Move back pgt_buf_* to mm/init.c Yinghai Lu
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

On 32bit, We should not keep calling that during every init_memory_mapping.

Need to update page_table_range_init() to count the pages for kmap page table
at first, and use new added alloc_low_pages() to get pages in sequence.
That will conform requirement that page table need to be in low to high order.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init.c    |   13 +++++--------
 arch/x86/mm/init_32.c |   47 +++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 46 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index de71c0d..4eece3c 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -334,14 +334,6 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
 		ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,
 						   mr[i].page_size_mask);
 
-#ifdef CONFIG_X86_32
-	early_ioremap_page_table_range_init();
-
-	load_cr3(swapper_pg_dir);
-#endif
-
-	__flush_tlb_all();
-
 	add_pfn_range_mapped(start >> PAGE_SHIFT, ret >> PAGE_SHIFT);
 
 	return ret >> PAGE_SHIFT;
@@ -435,7 +427,12 @@ void __init init_mem_mapping(void)
 		/* can we preseve max_low_pfn ?*/
 		max_low_pfn = max_pfn;
 	}
+#else
+	early_ioremap_page_table_range_init();
+	load_cr3(swapper_pg_dir);
+	__flush_tlb_all();
 #endif
+
 	early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
 }
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index a7f2df1..ef7f0dc 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -135,8 +135,39 @@ pte_t * __init populate_extra_pte(unsigned long vaddr)
 	return one_page_table_init(pmd) + pte_idx;
 }
 
+static unsigned long __init
+page_table_range_init_count(unsigned long start, unsigned long end)
+{
+	unsigned long count = 0;
+#ifdef CONFIG_HIGHMEM
+	int pmd_idx_kmap_begin = fix_to_virt(FIX_KMAP_END) >> PMD_SHIFT;
+	int pmd_idx_kmap_end = fix_to_virt(FIX_KMAP_BEGIN) >> PMD_SHIFT;
+	int pgd_idx, pmd_idx;
+	unsigned long vaddr;
+
+	if (pmd_idx_kmap_begin == pmd_idx_kmap_end)
+		return count;
+
+	vaddr = start;
+	pgd_idx = pgd_index(vaddr);
+
+	for ( ; (pgd_idx < PTRS_PER_PGD) && (vaddr != end); pgd_idx++) {
+		for (; (pmd_idx < PTRS_PER_PMD) && (vaddr != end);
+							pmd_idx++) {
+			if ((vaddr >> PMD_SHIFT) >= pmd_idx_kmap_begin &&
+			    (vaddr >> PMD_SHIFT) <= pmd_idx_kmap_end)
+				count++;
+			vaddr += PMD_SIZE;
+		}
+		pmd_idx = 0;
+	}
+#endif
+	return count;
+}
+
 static pte_t *__init page_table_kmap_check(pte_t *pte, pmd_t *pmd,
-					   unsigned long vaddr, pte_t *lastpte)
+					   unsigned long vaddr, pte_t *lastpte,
+					   void **adr)
 {
 #ifdef CONFIG_HIGHMEM
 	/*
@@ -150,16 +181,15 @@ static pte_t *__init page_table_kmap_check(pte_t *pte, pmd_t *pmd,
 
 	if (pmd_idx_kmap_begin != pmd_idx_kmap_end
 	    && (vaddr >> PMD_SHIFT) >= pmd_idx_kmap_begin
-	    && (vaddr >> PMD_SHIFT) <= pmd_idx_kmap_end
-	    && ((__pa(pte) >> PAGE_SHIFT) < pgt_buf_start
-		|| (__pa(pte) >> PAGE_SHIFT) >= pgt_buf_end)) {
+	    && (vaddr >> PMD_SHIFT) <= pmd_idx_kmap_end) {
 		pte_t *newpte;
 		int i;
 
 		BUG_ON(after_bootmem);
-		newpte = alloc_low_page();
+		newpte = *adr;
 		for (i = 0; i < PTRS_PER_PTE; i++)
 			set_pte(newpte + i, pte[i]);
+		*adr = (void *)(((unsigned long)(*adr)) + PAGE_SIZE);
 
 		paravirt_alloc_pte(&init_mm, __pa(newpte) >> PAGE_SHIFT);
 		set_pmd(pmd, __pmd(__pa(newpte)|_PAGE_TABLE));
@@ -193,6 +223,11 @@ page_table_range_init(unsigned long start, unsigned long end, pgd_t *pgd_base)
 	pgd_t *pgd;
 	pmd_t *pmd;
 	pte_t *pte = NULL;
+	unsigned long count = page_table_range_init_count(start, end);
+	void *adr = NULL;
+
+	if (count)
+		adr = alloc_low_pages(count);
 
 	vaddr = start;
 	pgd_idx = pgd_index(vaddr);
@@ -205,7 +240,7 @@ page_table_range_init(unsigned long start, unsigned long end, pgd_t *pgd_base)
 		for (; (pmd_idx < PTRS_PER_PMD) && (vaddr != end);
 							pmd++, pmd_idx++) {
 			pte = page_table_kmap_check(one_page_table_init(pmd),
-			                            pmd, vaddr, pte);
+						    pmd, vaddr, pte, &adr);
 
 			vaddr += PMD_SIZE;
 		}
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 14/19] x86, mm: Move back pgt_buf_* to mm/init.c
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (15 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 13/19] x86, mm: only call early_ioremap_page_table_range_init() once Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-18 20:50 ` [PATCH 15/19] x86, mm: Move init_gbpages() out of setup.c Yinghai Lu
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

Also change them to static.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/init.h |    4 ----
 arch/x86/mm/init.c          |    6 +++---
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index 4f13998..626ea8d 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -12,8 +12,4 @@ kernel_physical_mapping_init(unsigned long start,
 			     unsigned long end,
 			     unsigned long page_size_mask);
 
-extern unsigned long __initdata pgt_buf_start;
-extern unsigned long __meminitdata pgt_buf_end;
-extern unsigned long __meminitdata pgt_buf_top;
-
 #endif /* _ASM_X86_INIT_32_H */
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 4eece3c..3fcdfa9 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -19,9 +19,9 @@
 
 #include "mm_internal.h"
 
-unsigned long __initdata pgt_buf_start;
-unsigned long __meminitdata pgt_buf_end;
-unsigned long __meminitdata pgt_buf_top;
+static unsigned long __initdata pgt_buf_start;
+static unsigned long __initdata pgt_buf_end;
+static unsigned long __initdata pgt_buf_top;
 
 static unsigned long min_pfn_mapped;
 
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 15/19] x86, mm: Move init_gbpages() out of setup.c
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (16 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 14/19] x86, mm: Move back pgt_buf_* to mm/init.c Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-18 20:50 ` [PATCH 16/19] x86, mm: change low/hignmem_pfn_init to static on 32bit Yinghai Lu
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

Put it in mm/init.c, and call it from probe_page_mask().
init_mem_mapping is calling probe_page_mask at first.
So calling sequence is not changed.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/setup.c |   15 +--------------
 arch/x86/mm/init.c      |   12 ++++++++++++
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 9cb2e27..d5e0887 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -282,18 +282,7 @@ void * __init extend_brk(size_t size, size_t align)
 	return ret;
 }
 
-#ifdef CONFIG_X86_64
-static void __init init_gbpages(void)
-{
-	if (direct_gbpages && cpu_has_gbpages)
-		printk(KERN_INFO "Using GB pages for direct mapping\n");
-	else
-		direct_gbpages = 0;
-}
-#else
-static inline void init_gbpages(void)
-{
-}
+#ifdef CONFIG_X86_32
 static void __init cleanup_highmap(void)
 {
 }
@@ -930,8 +919,6 @@ void __init setup_arch(char **cmdline_p)
 
 	setup_real_mode();
 
-	init_gbpages();
-
 	init_mem_mapping();
 
 	memblock.current_limit = get_max_mapped();
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 3fcdfa9..3fb0848 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -89,6 +89,16 @@ int direct_gbpages
 #endif
 ;
 
+static void __init init_gbpages(void)
+{
+#ifdef CONFIG_X86_64
+	if (direct_gbpages && cpu_has_gbpages)
+		printk(KERN_INFO "Using GB pages for direct mapping\n");
+	else
+		direct_gbpages = 0;
+#endif
+}
+
 struct map_range {
 	unsigned long start;
 	unsigned long end;
@@ -99,6 +109,8 @@ static int page_size_mask;
 
 static void __init probe_page_size_mask(void)
 {
+	init_gbpages();
+
 #if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK)
 	/*
 	 * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages.
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 16/19] x86, mm: change low/hignmem_pfn_init to static on 32bit
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (17 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 15/19] x86, mm: Move init_gbpages() out of setup.c Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-18 20:50 ` [PATCH 17/19] x86, mm: Move function declaration into mm_internal.h Yinghai Lu
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init_32.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index ef7f0dc..0f05742 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -575,7 +575,7 @@ early_param("highmem", parse_highmem);
  * artificially via the highmem=x boot parameter then create
  * it:
  */
-void __init lowmem_pfn_init(void)
+static void __init lowmem_pfn_init(void)
 {
 	/* max_low_pfn is 0, we already have early_res support */
 	max_low_pfn = max_pfn;
@@ -611,7 +611,7 @@ void __init lowmem_pfn_init(void)
  * We have more RAM than fits into lowmem - we try to put it into
  * highmem, also taking the highmem=x boot parameter into account:
  */
-void __init highmem_pfn_init(void)
+static void __init highmem_pfn_init(void)
 {
 	max_low_pfn = MAXMEM_PFN;
 
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 17/19] x86, mm: Move function declaration into mm_internal.h
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (18 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 16/19] x86, mm: change low/hignmem_pfn_init to static on 32bit Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-18 20:50 ` [PATCH 18/19] x86, mm: Let "memmap=" take more entries one time Yinghai Lu
  2012-10-18 20:50 ` [PATCH 19/19] x86, mm: Add check before clear pte above max_low_pfn on 32bit Yinghai Lu
  21 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

They are only for mm/init*.c.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/init.h |   16 +++-------------
 arch/x86/mm/mm_internal.h   |    7 +++++++
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index 626ea8d..bac770b 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -1,15 +1,5 @@
-#ifndef _ASM_X86_INIT_32_H
-#define _ASM_X86_INIT_32_H
+#ifndef _ASM_X86_INIT_H
+#define _ASM_X86_INIT_H
 
-#ifdef CONFIG_X86_32
-extern void __init early_ioremap_page_table_range_init(void);
-#endif
 
-extern void __init zone_sizes_init(void);
-
-extern unsigned long __init
-kernel_physical_mapping_init(unsigned long start,
-			     unsigned long end,
-			     unsigned long page_size_mask);
-
-#endif /* _ASM_X86_INIT_32_H */
+#endif /* _ASM_X86_INIT_H */
diff --git a/arch/x86/mm/mm_internal.h b/arch/x86/mm/mm_internal.h
index 7e3b88e..dc79ac1 100644
--- a/arch/x86/mm/mm_internal.h
+++ b/arch/x86/mm/mm_internal.h
@@ -7,4 +7,11 @@ static inline void *alloc_low_page(void)
 	return alloc_low_pages(1);
 }
 
+void early_ioremap_page_table_range_init(void);
+
+unsigned long kernel_physical_mapping_init(unsigned long start,
+					     unsigned long end,
+					     unsigned long page_size_mask);
+void zone_sizes_init(void);
+
 #endif	/* __X86_MM_INTERNAL_H */
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 18/19] x86, mm: Let "memmap=" take more entries one time
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (19 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 17/19] x86, mm: Move function declaration into mm_internal.h Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  2012-10-22 15:19   ` Konrad Rzeszutek Wilk
  2012-10-18 20:50 ` [PATCH 19/19] x86, mm: Add check before clear pte above max_low_pfn on 32bit Yinghai Lu
  21 siblings, 1 reply; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

Current "memmap=" only can take one entry every time.
when we have more entries, we have to use memmap= for each of them.

For pxe booting, we have command line length limitation, those extra
"memmap=" would waste too much space.

This patch make memmap= could take several entries one time,
and those entries will be split with ','

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/e820.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index ed858e9..f281328 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -835,7 +835,7 @@ static int __init parse_memopt(char *p)
 }
 early_param("mem", parse_memopt);
 
-static int __init parse_memmap_opt(char *p)
+static int __init parse_memmap_one(char *p)
 {
 	char *oldp;
 	u64 start_at, mem_size;
@@ -877,6 +877,20 @@ static int __init parse_memmap_opt(char *p)
 
 	return *p == '\0' ? 0 : -EINVAL;
 }
+static int __init parse_memmap_opt(char *str)
+{
+	while (str) {
+		char *k = strchr(str, ',');
+
+		if (k)
+			*k++ = 0;
+
+		parse_memmap_one(str);
+		str = k;
+	}
+
+	return 0;
+}
 early_param("memmap", parse_memmap_opt);
 
 void __init finish_e820_parsing(void)
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 19/19] x86, mm: Add check before clear pte above max_low_pfn on 32bit
  2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
                   ` (20 preceding siblings ...)
  2012-10-18 20:50 ` [PATCH 18/19] x86, mm: Let "memmap=" take more entries one time Yinghai Lu
@ 2012-10-18 20:50 ` Yinghai Lu
  21 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-18 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin, Tejun Heo
  Cc: Stefano Stabellini, linux-kernel, Yinghai Lu

During test patch that adjust page_size_mask to map small range ram with
big page size, found page table is setup wrongly for 32bit. And
native_pagetable_init wrong clear pte for pmd with large page support.

1. add more comments about why we are expecting pte.

2. Add BUG checking, so next time we could find problem earlier
   when we mess up page table setup again.

3. max_low_pfn is not included boundary for low memory mapping.
   We should check from max_low_pfn instead of +1.

4. add print out when some pte really get cleared, or we should use
   WARN() to find out why above max_low_pfn get mapped? so we could
   fix it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init_32.c |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 0f05742..49e9edf 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -480,9 +480,14 @@ void __init native_pagetable_init(void)
 
 	/*
 	 * Remove any mappings which extend past the end of physical
-	 * memory from the boot time page table:
+	 * memory from the boot time page table.
+	 * In virtual address space, we should have at least two pages
+	 * from VMALLOC_END to pkmap or fixmap according to VMALLOC_END
+	 * definition. And max_low_pfn is set to VMALLOC_END physical
+	 * address. If initial memory mapping is doing right job, we
+	 * should have pte used near max_low_pfn or one pmd is not present.
 	 */
-	for (pfn = max_low_pfn + 1; pfn < 1<<(32-PAGE_SHIFT); pfn++) {
+	for (pfn = max_low_pfn; pfn < 1<<(32-PAGE_SHIFT); pfn++) {
 		va = PAGE_OFFSET + (pfn<<PAGE_SHIFT);
 		pgd = base + pgd_index(va);
 		if (!pgd_present(*pgd))
@@ -493,10 +498,19 @@ void __init native_pagetable_init(void)
 		if (!pmd_present(*pmd))
 			break;
 
+		/* should not be large page here */
+		if (pmd_large(*pmd)) {
+			pr_warn("try to clear pte for ram above max_low_pfn: pfn: %lx pmd: %p pmd phys: %lx, but pmd is big page and is not using pte !\n",
+				pfn, pmd, __pa(pmd));
+			BUG_ON(1);
+		}
+
 		pte = pte_offset_kernel(pmd, va);
 		if (!pte_present(*pte))
 			break;
 
+		printk(KERN_DEBUG "clearing pte for ram above max_low_pfn: pfn: %lx pmd: %p pmd phys: %lx pte: %p pte phys: %lx\n",
+				pfn, pmd, __pa(pmd), pte, __pa(pte));
 		pte_clear(NULL, va, pte);
 	}
 	paravirt_alloc_pmd(&init_mm, __pa(base) >> PAGE_SHIFT);
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/19] x86, mm: setup page table in top-down
  2012-10-18 20:50 ` [PATCH 06/19] x86, mm: setup page table in top-down Yinghai Lu
@ 2012-10-19 16:24   ` Stefano Stabellini
  2012-10-19 16:41     ` Yinghai Lu
  2012-10-22 15:06   ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 63+ messages in thread
From: Stefano Stabellini @ 2012-10-19 16:24 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Thu, 18 Oct 2012, Yinghai Lu wrote:
> Get pgt_buf early from BRK, and use it to map PMD_SIZE from top at first.
> then use mapped pages to map more range below, and keep looping until
> all pages get mapped.
> 
> alloc_low_page will use page from BRK at first, after that buff is used up,
> will use memblock to find and reserve page for page table usage.
> 
> At last we could get rid of calculation and find early pgt related code.
> 
> -v2: update to after fix_xen change,
>      also use MACRO for initial pgt_buf size and add comments with it.
> -v3: skip big reserved range in memblock.reserved near end.
> -v4: don't need fix_xen change now.
> 
> Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>

The series is starting to get in good shape!
I tested it on a 2G and an 8G VM and it seems to be working fine.


The most important thing to do now is testing it on different machines
(with and without xen) and writing better commit messages.
We can help you with the testing but you really need to write better
docs.

In particular you should state in clear letters that alloc_low_page is
always going to return a page that is already mapped. You should write
it both in the commit message and in the code as a comment.
This is particularly important because it is going to become part of the
interface between the common x86 code and the other x86 subsystems like
Xen.

Also, it wouldn't hurt to explain the overall design at the beginning of
the series: I shouldn't have to read your code to understand what you
are doing. I should read the description of the patch, understand what
you are doing, then go and check if the code actually does what you say
it does.



>  arch/x86/include/asm/page_types.h |    1 +
>  arch/x86/include/asm/pgtable.h    |    1 +
>  arch/x86/kernel/setup.c           |    3 +
>  arch/x86/mm/init.c                |  207 ++++++++++--------------------------
>  arch/x86/mm/init_32.c             |   17 +++-
>  arch/x86/mm/init_64.c             |   17 +++-
>  6 files changed, 91 insertions(+), 155 deletions(-)
> 
> diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
> index 54c9787..9f6f3e6 100644
> --- a/arch/x86/include/asm/page_types.h
> +++ b/arch/x86/include/asm/page_types.h
> @@ -45,6 +45,7 @@ extern int devmem_is_allowed(unsigned long pagenr);
> 
>  extern unsigned long max_low_pfn_mapped;
>  extern unsigned long max_pfn_mapped;
> +extern unsigned long min_pfn_mapped;
> 
>  static inline phys_addr_t get_max_mapped(void)
>  {
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index dd1a888..6991a3e 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -603,6 +603,7 @@ static inline int pgd_none(pgd_t pgd)
> 
>  extern int direct_gbpages;
>  void init_mem_mapping(void);
> +void early_alloc_pgt_buf(void);
> 
>  /* local pte updates need not use xchg for locking */
>  static inline pte_t native_local_ptep_get_and_clear(pte_t *ptep)
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index e72e4c6..73cb7ba 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -124,6 +124,7 @@
>   */
>  unsigned long max_low_pfn_mapped;
>  unsigned long max_pfn_mapped;
> +unsigned long min_pfn_mapped;
> 
>  #ifdef CONFIG_DMI
>  RESERVE_BRK(dmi_alloc, 65536);
> @@ -897,6 +898,8 @@ void __init setup_arch(char **cmdline_p)
> 
>         reserve_ibft_region();
> 
> +       early_alloc_pgt_buf();
> +
>         /*
>          * Need to conclude brk, before memblock_x86_fill()
>          *  it could use memblock_find_in_range, could overlap with
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index dbb2916..9ff29c1 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -21,6 +21,21 @@ unsigned long __initdata pgt_buf_start;
>  unsigned long __meminitdata pgt_buf_end;
>  unsigned long __meminitdata pgt_buf_top;
> 
> +/* need 4 4k for initial PMD_SIZE, 4k for 0-ISA_END_ADDRESS */
> +#define INIT_PGT_BUF_SIZE      (5 * PAGE_SIZE)
> +RESERVE_BRK(early_pgt_alloc, INIT_PGT_BUF_SIZE);
> +void  __init early_alloc_pgt_buf(void)
> +{
> +       unsigned long tables = INIT_PGT_BUF_SIZE;
> +       phys_addr_t base;
> +
> +       base = __pa(extend_brk(tables, PAGE_SIZE));
> +
> +       pgt_buf_start = base >> PAGE_SHIFT;
> +       pgt_buf_end = pgt_buf_start;
> +       pgt_buf_top = pgt_buf_start + (tables >> PAGE_SHIFT);
> +}
> +
>  int after_bootmem;
> 
>  int direct_gbpages
> @@ -228,105 +243,6 @@ static int __meminit split_mem_range(struct map_range *mr, int nr_range,
>         return nr_range;
>  }
> 
> -static unsigned long __init calculate_table_space_size(unsigned long start,
> -                                         unsigned long end)
> -{
> -       unsigned long puds = 0, pmds = 0, ptes = 0, tables;
> -       struct map_range mr[NR_RANGE_MR];
> -       int nr_range, i;
> -
> -       pr_info("calculate_table_space_size: [mem %#010lx-%#010lx]\n",
> -              start, end - 1);
> -
> -       memset(mr, 0, sizeof(mr));
> -       nr_range = 0;
> -       nr_range = split_mem_range(mr, nr_range, start, end);
> -
> -       for (i = 0; i < nr_range; i++) {
> -               unsigned long range, extra;
> -
> -               range = mr[i].end - mr[i].start;
> -               puds += (range + PUD_SIZE - 1) >> PUD_SHIFT;
> -
> -               if (mr[i].page_size_mask & (1 << PG_LEVEL_1G)) {
> -                       extra = range - ((range >> PUD_SHIFT) << PUD_SHIFT);
> -                       pmds += (extra + PMD_SIZE - 1) >> PMD_SHIFT;
> -               } else
> -                       pmds += (range + PMD_SIZE - 1) >> PMD_SHIFT;
> -
> -               if (mr[i].page_size_mask & (1 << PG_LEVEL_2M)) {
> -                       extra = range - ((range >> PMD_SHIFT) << PMD_SHIFT);
> -#ifdef CONFIG_X86_32
> -                       extra += PMD_SIZE;
> -#endif
> -                       /* The first 2/4M doesn't use large pages. */
> -                       if (mr[i].start < PMD_SIZE)
> -                               extra += PMD_SIZE - mr[i].start;
> -
> -                       ptes += (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
> -               } else
> -                       ptes += (range + PAGE_SIZE - 1) >> PAGE_SHIFT;
> -       }
> -
> -       tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
> -       tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
> -       tables += roundup(ptes * sizeof(pte_t), PAGE_SIZE);
> -
> -#ifdef CONFIG_X86_32
> -       /* for fixmap */
> -       tables += roundup(__end_of_fixed_addresses * sizeof(pte_t), PAGE_SIZE);
> -#endif
> -
> -       return tables;
> -}
> -
> -static unsigned long __init calculate_all_table_space_size(void)
> -{
> -       unsigned long start_pfn, end_pfn;
> -       unsigned long tables;
> -       int i;
> -
> -       /* the ISA range is always mapped regardless of memory holes */
> -       tables = calculate_table_space_size(0, ISA_END_ADDRESS);
> -
> -       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL) {
> -               u64 start = start_pfn << PAGE_SHIFT;
> -               u64 end = end_pfn << PAGE_SHIFT;
> -
> -               if (end <= ISA_END_ADDRESS)
> -                       continue;
> -
> -               if (start < ISA_END_ADDRESS)
> -                       start = ISA_END_ADDRESS;
> -#ifdef CONFIG_X86_32
> -               /* on 32 bit, we only map up to max_low_pfn */
> -               if ((start >> PAGE_SHIFT) >= max_low_pfn)
> -                       continue;
> -
> -               if ((end >> PAGE_SHIFT) > max_low_pfn)
> -                       end = max_low_pfn << PAGE_SHIFT;
> -#endif
> -               tables += calculate_table_space_size(start, end);
> -       }
> -
> -       return tables;
> -}
> -
> -static void __init find_early_table_space(unsigned long start,
> -                                         unsigned long good_end,
> -                                         unsigned long tables)
> -{
> -       phys_addr_t base;
> -
> -       base = memblock_find_in_range(start, good_end, tables, PAGE_SIZE);
> -       if (!base)
> -               panic("Cannot find space for the kernel page tables");
> -
> -       pgt_buf_start = base >> PAGE_SHIFT;
> -       pgt_buf_end = pgt_buf_start;
> -       pgt_buf_top = pgt_buf_start + (tables >> PAGE_SHIFT);
> -}
> -
>  static struct range pfn_mapped[E820_X_MAX];
>  static int nr_pfn_mapped;
> 
> @@ -391,17 +307,14 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
>  }
> 
>  /*
> - * Iterate through E820 memory map and create direct mappings for only E820_RAM
> - * regions. We cannot simply create direct mappings for all pfns from
> - * [0 to max_low_pfn) and [4GB to max_pfn) because of possible memory holes in
> - * high addresses that cannot be marked as UC by fixed/variable range MTRRs.
> - * Depending on the alignment of E820 ranges, this may possibly result in using
> - * smaller size (i.e. 4K instead of 2M or 1G) page tables.
> + * this one could take range with hole in it
>   */
> -static void __init init_range_memory_mapping(unsigned long range_start,
> +static unsigned long __init init_range_memory_mapping(
> +                                          unsigned long range_start,
>                                            unsigned long range_end)
>  {
>         unsigned long start_pfn, end_pfn;
> +       unsigned long mapped_ram_size = 0;
>         int i;
> 
>         for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL) {
> @@ -421,71 +334,67 @@ static void __init init_range_memory_mapping(unsigned long range_start,
>                         end = range_end;
> 
>                 init_memory_mapping(start, end);
> +
> +               mapped_ram_size += end - start;
>         }
> +
> +       return mapped_ram_size;
>  }
> 
>  void __init init_mem_mapping(void)
>  {
> -       unsigned long tables, good_end, end;
> +       unsigned long end, real_end, start, last_start;
> +       unsigned long step_size;
> +       unsigned long addr;
> +       unsigned long mapped_ram_size = 0;
> +       unsigned long new_mapped_ram_size;
> 
>         probe_page_size_mask();
> 
> -       /*
> -        * Find space for the kernel direct mapping tables.
> -        *
> -        * Later we should allocate these tables in the local node of the
> -        * memory mapped. Unfortunately this is done currently before the
> -        * nodes are discovered.
> -        */
>  #ifdef CONFIG_X86_64
>         end = max_pfn << PAGE_SHIFT;
> -       good_end = end;
>  #else
>         end = max_low_pfn << PAGE_SHIFT;
> -       good_end = max_pfn_mapped << PAGE_SHIFT;
>  #endif
> -       tables = calculate_all_table_space_size();
> -       find_early_table_space(0, good_end, tables);
> -       printk(KERN_DEBUG "kernel direct mapping tables up to %#lx @ [mem %#010lx-%#010lx] prealloc\n",
> -               end - 1, pgt_buf_start << PAGE_SHIFT,
> -               (pgt_buf_top << PAGE_SHIFT) - 1);
> 
> -       max_pfn_mapped = 0; /* will get exact value next */
>         /* the ISA range is always mapped regardless of memory holes */
>         init_memory_mapping(0, ISA_END_ADDRESS);
> -       init_range_memory_mapping(ISA_END_ADDRESS, end);
> +
> +       /* xen has big range in reserved near end of ram, skip it at first */
> +       addr = memblock_find_in_range(ISA_END_ADDRESS, end, PMD_SIZE,
> +                        PAGE_SIZE);
> +       real_end = addr + PMD_SIZE;
> +
> +       /* step_size need to be small so pgt_buf from BRK could cover it */
> +       step_size = PMD_SIZE;
> +       max_pfn_mapped = 0; /* will get exact value next */
> +       min_pfn_mapped = real_end >> PAGE_SHIFT;
> +       last_start = start = real_end;
> +       while (last_start > ISA_END_ADDRESS) {
> +               if (last_start > step_size) {
> +                       start = round_down(last_start - 1, step_size);
> +                       if (start < ISA_END_ADDRESS)
> +                               start = ISA_END_ADDRESS;
> +               } else
> +                       start = ISA_END_ADDRESS;
> +               new_mapped_ram_size = init_range_memory_mapping(start,
> +                                                       last_start);
> +               last_start = start;
> +               min_pfn_mapped = last_start >> PAGE_SHIFT;
> +               if (new_mapped_ram_size > mapped_ram_size)
> +                       step_size <<= 5;
> +               mapped_ram_size += new_mapped_ram_size;
> +       }
> +
> +       if (real_end < end)
> +               init_range_memory_mapping(real_end, end);
> +
>  #ifdef CONFIG_X86_64
>         if (max_pfn > max_low_pfn) {
>                 /* can we preseve max_low_pfn ?*/
>                 max_low_pfn = max_pfn;
>         }
>  #endif
> -       /*
> -        * Reserve the kernel pagetable pages we used (pgt_buf_start -
> -        * pgt_buf_end) and free the other ones (pgt_buf_end - pgt_buf_top)
> -        * so that they can be reused for other purposes.
> -        *
> -        * On native it just means calling memblock_reserve, on Xen it also
> -        * means marking RW the pagetable pages that we allocated before
> -        * but that haven't been used.
> -        *
> -        * In fact on xen we mark RO the whole range pgt_buf_start -
> -        * pgt_buf_top, because we have to make sure that when
> -        * init_memory_mapping reaches the pagetable pages area, it maps
> -        * RO all the pagetable pages, including the ones that are beyond
> -        * pgt_buf_end at that time.
> -        */
> -       if (pgt_buf_end > pgt_buf_start) {
> -               printk(KERN_DEBUG "kernel direct mapping tables up to %#lx @ [mem %#010lx-%#010lx] final\n",
> -                       end - 1, pgt_buf_start << PAGE_SHIFT,
> -                       (pgt_buf_end << PAGE_SHIFT) - 1);
> -               x86_init.mapping.pagetable_reserve(PFN_PHYS(pgt_buf_start),
> -                               PFN_PHYS(pgt_buf_end));
> -       }
> -
> -       /* stop the wrong using */
> -       pgt_buf_top = 0;
> -
>         early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
>  }
> 
> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index 27f7fc6..7bb1106 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -61,11 +61,22 @@ bool __read_mostly __vmalloc_start_set = false;
> 
>  static __init void *alloc_low_page(void)
>  {
> -       unsigned long pfn = pgt_buf_end++;
> +       unsigned long pfn;
>         void *adr;
> 
> -       if (pfn >= pgt_buf_top)
> -               panic("alloc_low_page: ran out of memory");
> +       if ((pgt_buf_end + 1) >= pgt_buf_top) {
> +               unsigned long ret;
> +               if (min_pfn_mapped >= max_pfn_mapped)
> +                       panic("alloc_low_page: ran out of memory");
> +               ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
> +                                       max_pfn_mapped << PAGE_SHIFT,
> +                                       PAGE_SIZE, PAGE_SIZE);
> +               if (!ret)
> +                       panic("alloc_low_page: can not alloc memory");
> +               memblock_reserve(ret, PAGE_SIZE);
> +               pfn = ret >> PAGE_SHIFT;
> +       } else
> +               pfn = pgt_buf_end++;
> 
>         adr = __va(pfn * PAGE_SIZE);
>         clear_page(adr);
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 4898e80..7dfa69b 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -316,7 +316,7 @@ void __init cleanup_highmap(void)
> 
>  static __ref void *alloc_low_page(unsigned long *phys)
>  {
> -       unsigned long pfn = pgt_buf_end++;
> +       unsigned long pfn;
>         void *adr;
> 
>         if (after_bootmem) {
> @@ -326,8 +326,19 @@ static __ref void *alloc_low_page(unsigned long *phys)
>                 return adr;
>         }
> 
> -       if (pfn >= pgt_buf_top)
> -               panic("alloc_low_page: ran out of memory");
> +       if ((pgt_buf_end + 1) >= pgt_buf_top) {
> +               unsigned long ret;
> +               if (min_pfn_mapped >= max_pfn_mapped)
> +                       panic("alloc_low_page: ran out of memory");
> +               ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
> +                                       max_pfn_mapped << PAGE_SHIFT,
> +                                       PAGE_SIZE, PAGE_SIZE);
> +               if (!ret)
> +                       panic("alloc_low_page: can not alloc memory");
> +               memblock_reserve(ret, PAGE_SIZE);
> +               pfn = ret >> PAGE_SHIFT;
> +       } else
> +               pfn = pgt_buf_end++;
> 
>         adr = early_memremap(pfn * PAGE_SIZE, PAGE_SIZE);
>         clear_page(adr);
> --
> 1.7.7
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/19] x86, mm: setup page table in top-down
  2012-10-19 16:24   ` Stefano Stabellini
@ 2012-10-19 16:41     ` Yinghai Lu
  2012-10-22 13:19       ` Stefano Stabellini
  2012-10-22 14:14       ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-19 16:41 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, linux-kernel

On Fri, Oct 19, 2012 at 9:24 AM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Thu, 18 Oct 2012, Yinghai Lu wrote:
>> Get pgt_buf early from BRK, and use it to map PMD_SIZE from top at first.
>> then use mapped pages to map more range below, and keep looping until
>> all pages get mapped.
>>
>> alloc_low_page will use page from BRK at first, after that buff is used up,
>> will use memblock to find and reserve page for page table usage.
>>
>> At last we could get rid of calculation and find early pgt related code.
>>
>> -v2: update to after fix_xen change,
>>      also use MACRO for initial pgt_buf size and add comments with it.
>> -v3: skip big reserved range in memblock.reserved near end.
>> -v4: don't need fix_xen change now.
>>
>> Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>
> The series is starting to get in good shape!
> I tested it on a 2G and an 8G VM and it seems to be working fine.

domU on 32bit and 64bit?

>
>
> The most important thing to do now is testing it on different machines
> (with and without xen) and writing better commit messages.
> We can help you with the testing but you really need to write better
> docs.

I tested on native one on 32bit, and 64bit.

Also xen dom0 32bit and 64bit.

Changelog is always problem now. Looks like everyone is complaining about that.

Actually I already tried my best on that, really don't know what to do next.

>
> In particular you should state in clear letters that alloc_low_page is
> always going to return a page that is already mapped. You should write
> it both in the commit message and in the code as a comment.
> This is particularly important because it is going to become part of the
> interface between the common x86 code and the other x86 subsystems like
> Xen.

alloc_low_page() is used in arch/x86/mm/init*.c. How come it becomes
interface to
other subsystem?

I'm not sure if we really need that comment in code for that:
---
the page that alloc_low_page return is directed mapped already, could
use virtual address
to access it.
---

>
> Also, it wouldn't hurt to explain the overall design at the beginning of
> the series: I shouldn't have to read your code to understand what you
> are doing. I should read the description of the patch, understand what
> you are doing, then go and check if the code actually does what you say
> it does.

Actually I really don't like to read too long change log in commit.
Changelog should be concise and precise.
code change is more straightforward to me.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/19] x86, mm: setup page table in top-down
  2012-10-19 16:41     ` Yinghai Lu
@ 2012-10-22 13:19       ` Stefano Stabellini
  2012-10-22 18:17         ` Yinghai Lu
  2012-10-22 14:14       ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 63+ messages in thread
From: Stefano Stabellini @ 2012-10-22 13:19 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Jacob Shin, Tejun Heo, linux-kernel

On Fri, 19 Oct 2012, Yinghai Lu wrote:
> On Fri, Oct 19, 2012 at 9:24 AM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > On Thu, 18 Oct 2012, Yinghai Lu wrote:
> >> Get pgt_buf early from BRK, and use it to map PMD_SIZE from top at first.
> >> then use mapped pages to map more range below, and keep looping until
> >> all pages get mapped.
> >>
> >> alloc_low_page will use page from BRK at first, after that buff is used up,
> >> will use memblock to find and reserve page for page table usage.
> >>
> >> At last we could get rid of calculation and find early pgt related code.
> >>
> >> -v2: update to after fix_xen change,
> >>      also use MACRO for initial pgt_buf size and add comments with it.
> >> -v3: skip big reserved range in memblock.reserved near end.
> >> -v4: don't need fix_xen change now.
> >>
> >> Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
> >> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> >
> > The series is starting to get in good shape!
> > I tested it on a 2G and an 8G VM and it seems to be working fine.
> 
> domU on 32bit and 64bit?

domU 64bit


> > The most important thing to do now is testing it on different machines
> > (with and without xen) and writing better commit messages.
> > We can help you with the testing but you really need to write better
> > docs.
> 
> I tested on native one on 32bit, and 64bit.
> 
> Also xen dom0 32bit and 64bit.

Good, thanks!


> Changelog is always problem now. Looks like everyone is complaining about that.
> 
> Actually I already tried my best on that, really don't know what to do next.
> 
> >
> > In particular you should state in clear letters that alloc_low_page is
> > always going to return a page that is already mapped. You should write
> > it both in the commit message and in the code as a comment.
> > This is particularly important because it is going to become part of the
> > interface between the common x86 code and the other x86 subsystems like
> > Xen.
> 
> alloc_low_page() is used in arch/x86/mm/init*.c. How come it becomes
> interface to
> other subsystem?

I chose the wrong words.

I meant that always allocating pages from areas that are already mapped,
will become an assumption for other x86 subsystems like Xen.
One shouldn't just go ahead and change this assumption without changing
the subsystems too.


> I'm not sure if we really need that comment in code for that:
> ---
> the page that alloc_low_page return is directed mapped already, could
> use virtual address
> to access it.
> ---

I just want to make sure that 3 years from now, when somebody comes up
with a new great idea to improve the intial pagetable allocation, he
doesn't forget that changing alloc_low_page might break other subsystems.

So I think that a comment is required here and should explicitly
mention why it is important that alloc_low_page returns a mapped page.


> > Also, it wouldn't hurt to explain the overall design at the beginning of
> > the series: I shouldn't have to read your code to understand what you
> > are doing. I should read the description of the patch, understand what
> > you are doing, then go and check if the code actually does what you say
> > it does.
> 
> Actually I really don't like to read too long change log in commit.
> Changelog should be concise and precise.
> code change is more straightforward to me.

Many people don't think like that.
Of course you shouldn't document line by line changes in the commit
message but you should include a full explanation of your changes, like
I wrote before.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/19] x86, mm: setup page table in top-down
  2012-10-19 16:41     ` Yinghai Lu
  2012-10-22 13:19       ` Stefano Stabellini
@ 2012-10-22 14:14       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 63+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-10-22 14:14 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Jacob Shin, Tejun Heo, linux-kernel

> Changelog is always problem now. Looks like everyone is complaining about that.
> 
> Actually I already tried my best on that, really don't know what to do next.

Read on. I've some advice.
> 
> >
> > In particular you should state in clear letters that alloc_low_page is
> > always going to return a page that is already mapped. You should write
> > it both in the commit message and in the code as a comment.
> > This is particularly important because it is going to become part of the
> > interface between the common x86 code and the other x86 subsystems like
> > Xen.
> 
> alloc_low_page() is used in arch/x86/mm/init*.c. How come it becomes
> interface to
> other subsystem?
> 
> I'm not sure if we really need that comment in code for that:
> ---
> the page that alloc_low_page return is directed mapped already, could
> use virtual address
> to access it.
> ---
> 
> >
> > Also, it wouldn't hurt to explain the overall design at the beginning of
> > the series: I shouldn't have to read your code to understand what you
> > are doing. I should read the description of the patch, understand what
> > you are doing, then go and check if the code actually does what you say
> > it does.
> 
> Actually I really don't like to read too long change log in commit.
> Changelog should be concise and precise.

They depend on the patch in question. If it is something simple
(fix compile warning) sure. But for something that is deep in the bowels
of complicated code - then it should have a good explanation that
follows roughly these steps:

 1). Introduce the problem. Explain what is wrong with the existing
     code.
 2). Explain how your patch fixes it.
 3). Explain (if there are any) alternative solutions.

Then re-read it and look at the verb tense. It should be the same tense -
so don't mix them.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 01/19] x86, mm: Align start address to correct big page size
  2012-10-18 20:50 ` [PATCH 01/19] x86, mm: Align start address to correct big page size Yinghai Lu
@ 2012-10-22 14:16   ` Konrad Rzeszutek Wilk
  2012-10-22 16:31     ` Yinghai Lu
  0 siblings, 1 reply; 63+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-10-22 14:16 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

>On Thu, Oct 18, 2012 at 01:50:10PM -0700, Yinghai Lu wrote:


I am pretty sure I gave you some ideas of how to fix up the commit
description in earlier reviews, but it looks like you missed them.

Let me write them here once more.

> We are going to use buffer in BRK to pre-map page table buffer.

What buffer? Is buffer the same thing as page table?
> 
> Page table buffer could be only page aligned, but range around it are

.. ranges
> ram too, we could use bigger page to map it to avoid small pages.
> 
> We will adjust page_size_mask in next patch to use big page size for

Instead of saying "next patch" - include the title of the patch
so that one can search for it.

> small ram range.
> 
> Before that, this patch will make start address to be aligned down

s/will make/made/

> according to bigger page size, otherwise entry in page page will
> not have correct value.


I would structure this git commit description to first introduce
the problem.

Say at the start of the patch:

"Before this patch, the start address was aligned down according
to bigger a page size (1GB, 2MB). This is a problem b/c an
entry in the page table will not have correct value. "

Here can you explain why it does not have the correct value?

> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/mm/init_32.c |    1 +
>  arch/x86/mm/init_64.c |    5 +++--
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index 11a5800..27f7fc6 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -310,6 +310,7 @@ repeat:
>  					__pgprot(PTE_IDENT_ATTR |
>  						 _PAGE_PSE);
>  
> +				pfn &= PMD_MASK >> PAGE_SHIFT;
>  				addr2 = (pfn + PTRS_PER_PTE-1) * PAGE_SIZE +
>  					PAGE_OFFSET + PAGE_SIZE-1;
>  
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index ab558eb..f40f383 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -461,7 +461,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
>  			pages++;
>  			spin_lock(&init_mm.page_table_lock);
>  			set_pte((pte_t *)pmd,
> -				pfn_pte(address >> PAGE_SHIFT,
> +				pfn_pte((address & PMD_MASK) >> PAGE_SHIFT,
>  					__pgprot(pgprot_val(prot) | _PAGE_PSE)));
>  			spin_unlock(&init_mm.page_table_lock);
>  			last_map_addr = next;
> @@ -536,7 +536,8 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
>  			pages++;
>  			spin_lock(&init_mm.page_table_lock);
>  			set_pte((pte_t *)pud,
> -				pfn_pte(addr >> PAGE_SHIFT, PAGE_KERNEL_LARGE));
> +				pfn_pte((addr & PUD_MASK) >> PAGE_SHIFT,
> +					PAGE_KERNEL_LARGE));
>  			spin_unlock(&init_mm.page_table_lock);
>  			last_map_addr = next;
>  			continue;
> -- 
> 1.7.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 02/19] x86, mm: Use big page size for small memory range
  2012-10-18 20:50 ` [PATCH 02/19] x86, mm: Use big page size for small memory range Yinghai Lu
@ 2012-10-22 14:21   ` Konrad Rzeszutek Wilk
  2012-10-22 16:33     ` Yinghai Lu
  0 siblings, 1 reply; 63+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-10-22 14:21 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Thu, Oct 18, 2012 at 01:50:12PM -0700, Yinghai Lu wrote:
> We could map small range in the middle of big range at first, so should use
> big page size at first to avoid using small page size to break down page table.
> 
> Only can set big page bit when that range has ram area around it.

The code looks good.

I would alter the description to say:

(Describe the problem)

"We are wasting entries in the page-table b/c are not taking advantage
of the fact that adjoining ranges could be of the same type and
coalescing them together. Instead we end up using the small size type."

(Explain your patch).

"We fix this by iterating over the ranges, detecting whether the
ranges that are next to each other are of the same type - and if so
set them to our type."

> 
> -v2: fix 32bit boundary checking. We can not count ram above max_low_pfn
> 	for 32 bit.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/mm/init.c |   37 +++++++++++++++++++++++++++++++++++++
>  1 files changed, 37 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index c12dfd5..09ce38f 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -88,6 +88,40 @@ static int __meminit save_mr(struct map_range *mr, int nr_range,
>  	return nr_range;
>  }
>  
> +/*
> + * adjust the page_size_mask for small range to go with
> + *	big page size instead small one if nearby are ram too.
> + */
> +static void __init_refok adjust_range_page_size_mask(struct map_range *mr,
> +							 int nr_range)
> +{
> +	int i;
> +
> +	for (i = 0; i < nr_range; i++) {
> +		if ((page_size_mask & (1<<PG_LEVEL_2M)) &&
> +		    !(mr[i].page_size_mask & (1<<PG_LEVEL_2M))) {
> +			unsigned long start = round_down(mr[i].start, PMD_SIZE);
> +			unsigned long end = round_up(mr[i].end, PMD_SIZE);
> +
> +#ifdef CONFIG_X86_32
> +			if ((end >> PAGE_SHIFT) > max_low_pfn)
> +				continue;
> +#endif
> +
> +			if (memblock_is_region_memory(start, end - start))
> +				mr[i].page_size_mask |= 1<<PG_LEVEL_2M;
> +		}
> +		if ((page_size_mask & (1<<PG_LEVEL_1G)) &&
> +		    !(mr[i].page_size_mask & (1<<PG_LEVEL_1G))) {
> +			unsigned long start = round_down(mr[i].start, PUD_SIZE);
> +			unsigned long end = round_up(mr[i].end, PUD_SIZE);
> +
> +			if (memblock_is_region_memory(start, end - start))
> +				mr[i].page_size_mask |= 1<<PG_LEVEL_1G;
> +		}
> +	}
> +}
> +
>  static int __meminit split_mem_range(struct map_range *mr, int nr_range,
>  				     unsigned long start,
>  				     unsigned long end)
> @@ -182,6 +216,9 @@ static int __meminit split_mem_range(struct map_range *mr, int nr_range,
>  		nr_range--;
>  	}
>  
> +	if (!after_bootmem)
> +		adjust_range_page_size_mask(mr, nr_range);
> +
>  	for (i = 0; i < nr_range; i++)
>  		printk(KERN_DEBUG " [mem %#010lx-%#010lx] page %s\n",
>  				mr[i].start, mr[i].end - 1,
> -- 
> 1.7.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 03/19] x86, mm: Don't clear page table if range is ram
  2012-10-18 20:50 ` [PATCH 03/19] x86, mm: Don't clear page table if range is ram Yinghai Lu
@ 2012-10-22 14:28   ` Konrad Rzeszutek Wilk
  2012-10-22 16:56     ` Yinghai Lu
  0 siblings, 1 reply; 63+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-10-22 14:28 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Thu, Oct 18, 2012 at 01:50:14PM -0700, Yinghai Lu wrote:
> After we add code use buffer in BRK to pre-map page table,
                   ^- to

So .. which patch is that? Can you include the title of the
patch here?

> it should be safe to remove early_memmap for page table accessing.
> Instead we get panic with that.
> 
> It turns out we clear the initial page table wrongly for next range that is
              ^- that

> separated by holes.
> And it only happens when we are trying to map range one by one range separately.
                                                     ^-s

> 
> We need to check if the range is ram before clearing page table.

Ok, so that sounds like a bug-fix... but
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/mm/init_64.c |   37 ++++++++++++++++---------------------
>  1 files changed, 16 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index f40f383..61b3c44 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -363,20 +363,19 @@ static unsigned long __meminit
>  phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
>  	      pgprot_t prot)
>  {
> -	unsigned pages = 0;
> +	unsigned long pages = 0, next;
>  	unsigned long last_map_addr = end;
>  	int i;
>  
>  	pte_t *pte = pte_page + pte_index(addr);
>  
> -	for(i = pte_index(addr); i < PTRS_PER_PTE; i++, addr += PAGE_SIZE, pte++) {
> -
> +	for (i = pte_index(addr); i < PTRS_PER_PTE; i++, addr = next, pte++) {
> +		next = (addr & PAGE_MASK) + PAGE_SIZE;
>  		if (addr >= end) {
> -			if (!after_bootmem) {
> -				for(; i < PTRS_PER_PTE; i++, pte++)
> -					set_pte(pte, __pte(0));
> -			}
> -			break;
> +			if (!after_bootmem &&
> +			    !e820_any_mapped(addr & PAGE_MASK, next, 0))
> +				set_pte(pte, __pte(0));
> +			continue;

.. Interestingly, you also removed the extra loop. How come? Why not
retain the little loop? (which could call e820_any_mapped?) Is that
an improvement and cleanup? If so, I would think you should at least
explain in the git commit:

"And while we are at it, also axe the extra loop and instead depend on
the top loop which we can safely piggyback on."

>  		}
>  
>  		/*
> @@ -418,16 +417,14 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
>  		pte_t *pte;
>  		pgprot_t new_prot = prot;
>  
> +		next = (address & PMD_MASK) + PMD_SIZE;
>  		if (address >= end) {
> -			if (!after_bootmem) {
> -				for (; i < PTRS_PER_PMD; i++, pmd++)
> -					set_pmd(pmd, __pmd(0));
> -			}
> -			break;
> +			if (!after_bootmem &&
> +			    !e820_any_mapped(address & PMD_MASK, next, 0))
> +				set_pmd(pmd, __pmd(0));
> +			continue;
>  		}
>  
> -		next = (address & PMD_MASK) + PMD_SIZE;
> -
>  		if (pmd_val(*pmd)) {
>  			if (!pmd_large(*pmd)) {
>  				spin_lock(&init_mm.page_table_lock);
> @@ -494,13 +491,11 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
>  		pmd_t *pmd;
>  		pgprot_t prot = PAGE_KERNEL;
>  
> -		if (addr >= end)
> -			break;
> -
>  		next = (addr & PUD_MASK) + PUD_SIZE;
> -
> -		if (!after_bootmem && !e820_any_mapped(addr, next, 0)) {
> -			set_pud(pud, __pud(0));
> +		if (addr >= end) {
> +			if (!after_bootmem &&
> +			    !e820_any_mapped(addr & PUD_MASK, next, 0))
> +				set_pud(pud, __pud(0));
>  			continue;
>  		}
>  
> -- 
> 1.7.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 04/19] x86, mm: only keep initial mapping for ram
  2012-10-18 20:50 ` [PATCH 04/19] x86, mm: only keep initial mapping for ram Yinghai Lu
@ 2012-10-22 14:33   ` Konrad Rzeszutek Wilk
  2012-10-22 17:43     ` Yinghai Lu
  0 siblings, 1 reply; 63+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-10-22 14:33 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Thu, Oct 18, 2012 at 01:50:15PM -0700, Yinghai Lu wrote:
> 0 mean any e820 type, for any range is overlapped with any entry in e820,
> kernel will keep it's initial page table mapping.
> 
> What we want is only keeping initial page table for ram range.

Can you squash it in the previous patch then pls? It seems like
this is a bug-fix to a bug-fix. And since this patchset is still
out-side the Linus's tree you have the option of squashing/rebasing.

> 
> Change to E820_RAM and E820_RESERVED_KERN.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/mm/init_64.c |    9 ++++++---
>  1 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 61b3c44..4898e80 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -373,7 +373,8 @@ phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
>  		next = (addr & PAGE_MASK) + PAGE_SIZE;
>  		if (addr >= end) {
>  			if (!after_bootmem &&
> -			    !e820_any_mapped(addr & PAGE_MASK, next, 0))
> +			    !e820_any_mapped(addr & PAGE_MASK, next, E820_RAM) &&
> +			    !e820_any_mapped(addr & PAGE_MASK, next, E820_RESERVED_KERN))
>  				set_pte(pte, __pte(0));
>  			continue;
>  		}
> @@ -420,7 +421,8 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
>  		next = (address & PMD_MASK) + PMD_SIZE;
>  		if (address >= end) {
>  			if (!after_bootmem &&
> -			    !e820_any_mapped(address & PMD_MASK, next, 0))
> +			    !e820_any_mapped(address & PMD_MASK, next, E820_RAM) &&
> +			    !e820_any_mapped(address & PMD_MASK, next, E820_RESERVED_KERN))
>  				set_pmd(pmd, __pmd(0));
>  			continue;
>  		}
> @@ -494,7 +496,8 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
>  		next = (addr & PUD_MASK) + PUD_SIZE;
>  		if (addr >= end) {
>  			if (!after_bootmem &&
> -			    !e820_any_mapped(addr & PUD_MASK, next, 0))
> +			    !e820_any_mapped(addr & PUD_MASK, next, E820_RAM) &&
> +			    !e820_any_mapped(addr & PUD_MASK, next, E820_RESERVED_KERN))
>  				set_pud(pud, __pud(0));
>  			continue;
>  		}
> -- 
> 1.7.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/19] x86, mm: setup page table in top-down
  2012-10-18 20:50 ` [PATCH 06/19] x86, mm: setup page table in top-down Yinghai Lu
  2012-10-19 16:24   ` Stefano Stabellini
@ 2012-10-22 15:06   ` Konrad Rzeszutek Wilk
  2012-10-22 18:56     ` Yinghai Lu
  1 sibling, 1 reply; 63+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-10-22 15:06 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Thu, Oct 18, 2012 at 01:50:17PM -0700, Yinghai Lu wrote:
> Get pgt_buf early from BRK, and use it to map PMD_SIZE from top at first.
> then use mapped pages to map more range below, and keep looping until
  ^Then                                 ^-s
> all pages get mapped.
> 
> alloc_low_page will use page from BRK at first, after that buff is used up,
                                                             ^^^^ buffer
> will use memblock to find and reserve page for page table usage.
                                        ^^^^ - pages

You might want to mention how 'memblock' searches for regions.
Presumarily it is from top to bottom.


And also explain the granularity of what the size you are mapping
_after_ you are done with the PMD_SIZE.


> 
> At last we could get rid of calculation and find early pgt related code.
             ^^^^^ - can

> 
> -v2: update to after fix_xen change,
>      also use MACRO for initial pgt_buf size and add comments with it.
> -v3: skip big reserved range in memblock.reserved near end.
> -v4: don't need fix_xen change now.
> 
> Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/include/asm/page_types.h |    1 +
>  arch/x86/include/asm/pgtable.h    |    1 +
>  arch/x86/kernel/setup.c           |    3 +
>  arch/x86/mm/init.c                |  207 ++++++++++--------------------------
>  arch/x86/mm/init_32.c             |   17 +++-
>  arch/x86/mm/init_64.c             |   17 +++-
>  6 files changed, 91 insertions(+), 155 deletions(-)
> 
> diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
> index 54c9787..9f6f3e6 100644
> --- a/arch/x86/include/asm/page_types.h
> +++ b/arch/x86/include/asm/page_types.h
> @@ -45,6 +45,7 @@ extern int devmem_is_allowed(unsigned long pagenr);
>  
>  extern unsigned long max_low_pfn_mapped;
>  extern unsigned long max_pfn_mapped;
> +extern unsigned long min_pfn_mapped;

Why not call it 'last_min_pfn_mapped'? It looks to be keyed
of the last 'start' of memory we mapped and keeps on decreasing.
>  
>  static inline phys_addr_t get_max_mapped(void)
>  {
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index dd1a888..6991a3e 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -603,6 +603,7 @@ static inline int pgd_none(pgd_t pgd)
>  
>  extern int direct_gbpages;
>  void init_mem_mapping(void);
> +void early_alloc_pgt_buf(void);
>  
>  /* local pte updates need not use xchg for locking */
>  static inline pte_t native_local_ptep_get_and_clear(pte_t *ptep)
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index e72e4c6..73cb7ba 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -124,6 +124,7 @@
>   */
>  unsigned long max_low_pfn_mapped;
>  unsigned long max_pfn_mapped;
> +unsigned long min_pfn_mapped;
>  
>  #ifdef CONFIG_DMI
>  RESERVE_BRK(dmi_alloc, 65536);
> @@ -897,6 +898,8 @@ void __init setup_arch(char **cmdline_p)
>  
>  	reserve_ibft_region();
>  
> +	early_alloc_pgt_buf();
> +
>  	/*
>  	 * Need to conclude brk, before memblock_x86_fill()
>  	 *  it could use memblock_find_in_range, could overlap with
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index dbb2916..9ff29c1 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -21,6 +21,21 @@ unsigned long __initdata pgt_buf_start;
>  unsigned long __meminitdata pgt_buf_end;
>  unsigned long __meminitdata pgt_buf_top;
>  
> +/* need 4 4k for initial PMD_SIZE, 4k for 0-ISA_END_ADDRESS */
> +#define INIT_PGT_BUF_SIZE	(5 * PAGE_SIZE)
> +RESERVE_BRK(early_pgt_alloc, INIT_PGT_BUF_SIZE);
> +void  __init early_alloc_pgt_buf(void)
> +{
> +	unsigned long tables = INIT_PGT_BUF_SIZE;
> +	phys_addr_t base;
> +
> +	base = __pa(extend_brk(tables, PAGE_SIZE));
> +
> +	pgt_buf_start = base >> PAGE_SHIFT;
> +	pgt_buf_end = pgt_buf_start;
> +	pgt_buf_top = pgt_buf_start + (tables >> PAGE_SHIFT);
> +}
> +
>  int after_bootmem;
>  
>  int direct_gbpages
> @@ -228,105 +243,6 @@ static int __meminit split_mem_range(struct map_range *mr, int nr_range,
>  	return nr_range;
>  }
>  
> -static unsigned long __init calculate_table_space_size(unsigned long start,
> -					  unsigned long end)
> -{
> -	unsigned long puds = 0, pmds = 0, ptes = 0, tables;
> -	struct map_range mr[NR_RANGE_MR];
> -	int nr_range, i;
> -
> -	pr_info("calculate_table_space_size: [mem %#010lx-%#010lx]\n",
> -	       start, end - 1);
> -
> -	memset(mr, 0, sizeof(mr));
> -	nr_range = 0;
> -	nr_range = split_mem_range(mr, nr_range, start, end);
> -
> -	for (i = 0; i < nr_range; i++) {
> -		unsigned long range, extra;
> -
> -		range = mr[i].end - mr[i].start;
> -		puds += (range + PUD_SIZE - 1) >> PUD_SHIFT;
> -
> -		if (mr[i].page_size_mask & (1 << PG_LEVEL_1G)) {
> -			extra = range - ((range >> PUD_SHIFT) << PUD_SHIFT);
> -			pmds += (extra + PMD_SIZE - 1) >> PMD_SHIFT;
> -		} else
> -			pmds += (range + PMD_SIZE - 1) >> PMD_SHIFT;
> -
> -		if (mr[i].page_size_mask & (1 << PG_LEVEL_2M)) {
> -			extra = range - ((range >> PMD_SHIFT) << PMD_SHIFT);
> -#ifdef CONFIG_X86_32
> -			extra += PMD_SIZE;
> -#endif
> -			/* The first 2/4M doesn't use large pages. */
> -			if (mr[i].start < PMD_SIZE)
> -				extra += PMD_SIZE - mr[i].start;
> -
> -			ptes += (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
> -		} else
> -			ptes += (range + PAGE_SIZE - 1) >> PAGE_SHIFT;
> -	}
> -
> -	tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
> -	tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
> -	tables += roundup(ptes * sizeof(pte_t), PAGE_SIZE);
> -
> -#ifdef CONFIG_X86_32
> -	/* for fixmap */
> -	tables += roundup(__end_of_fixed_addresses * sizeof(pte_t), PAGE_SIZE);
> -#endif
> -
> -	return tables;
> -}
> -
> -static unsigned long __init calculate_all_table_space_size(void)
> -{
> -	unsigned long start_pfn, end_pfn;
> -	unsigned long tables;
> -	int i;
> -
> -	/* the ISA range is always mapped regardless of memory holes */
> -	tables = calculate_table_space_size(0, ISA_END_ADDRESS);
> -
> -	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL) {
> -		u64 start = start_pfn << PAGE_SHIFT;
> -		u64 end = end_pfn << PAGE_SHIFT;
> -
> -		if (end <= ISA_END_ADDRESS)
> -			continue;
> -
> -		if (start < ISA_END_ADDRESS)
> -			start = ISA_END_ADDRESS;
> -#ifdef CONFIG_X86_32
> -		/* on 32 bit, we only map up to max_low_pfn */
> -		if ((start >> PAGE_SHIFT) >= max_low_pfn)
> -			continue;
> -
> -		if ((end >> PAGE_SHIFT) > max_low_pfn)
> -			end = max_low_pfn << PAGE_SHIFT;
> -#endif
> -		tables += calculate_table_space_size(start, end);
> -	}
> -
> -	return tables;
> -}
> -
> -static void __init find_early_table_space(unsigned long start,
> -					  unsigned long good_end,
> -					  unsigned long tables)
> -{
> -	phys_addr_t base;
> -
> -	base = memblock_find_in_range(start, good_end, tables, PAGE_SIZE);
> -	if (!base)
> -		panic("Cannot find space for the kernel page tables");
> -
> -	pgt_buf_start = base >> PAGE_SHIFT;
> -	pgt_buf_end = pgt_buf_start;
> -	pgt_buf_top = pgt_buf_start + (tables >> PAGE_SHIFT);
> -}
> -
>  static struct range pfn_mapped[E820_X_MAX];
>  static int nr_pfn_mapped;
>  
> @@ -391,17 +307,14 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
>  }
>  
>  /*
> - * Iterate through E820 memory map and create direct mappings for only E820_RAM
> - * regions. We cannot simply create direct mappings for all pfns from
> - * [0 to max_low_pfn) and [4GB to max_pfn) because of possible memory holes in
> - * high addresses that cannot be marked as UC by fixed/variable range MTRRs.
> - * Depending on the alignment of E820 ranges, this may possibly result in using
> - * smaller size (i.e. 4K instead of 2M or 1G) page tables.
> + * this one could take range with hole in it

You forgot a period at the end.

>   */
> -static void __init init_range_memory_mapping(unsigned long range_start,
> +static unsigned long __init init_range_memory_mapping(
> +					   unsigned long range_start,
>  					   unsigned long range_end)
>  {
>  	unsigned long start_pfn, end_pfn;
> +	unsigned long mapped_ram_size = 0;
>  	int i;
>  
>  	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL) {
> @@ -421,71 +334,67 @@ static void __init init_range_memory_mapping(unsigned long range_start,
>  			end = range_end;
>  
>  		init_memory_mapping(start, end);
> +
> +		mapped_ram_size += end - start;
>  	}
> +
> +	return mapped_ram_size;
>  }
>  
>  void __init init_mem_mapping(void)
>  {
> -	unsigned long tables, good_end, end;
> +	unsigned long end, real_end, start, last_start;
> +	unsigned long step_size;
> +	unsigned long addr;
> +	unsigned long mapped_ram_size = 0;
> +	unsigned long new_mapped_ram_size;
>  
>  	probe_page_size_mask();
>  
> -	/*
> -	 * Find space for the kernel direct mapping tables.
> -	 *
> -	 * Later we should allocate these tables in the local node of the
> -	 * memory mapped. Unfortunately this is done currently before the
> -	 * nodes are discovered.
> -	 */
>  #ifdef CONFIG_X86_64
>  	end = max_pfn << PAGE_SHIFT;
> -	good_end = end;
>  #else
>  	end = max_low_pfn << PAGE_SHIFT;
> -	good_end = max_pfn_mapped << PAGE_SHIFT;
>  #endif
> -	tables = calculate_all_table_space_size();
> -	find_early_table_space(0, good_end, tables);
> -	printk(KERN_DEBUG "kernel direct mapping tables up to %#lx @ [mem %#010lx-%#010lx] prealloc\n",
> -		end - 1, pgt_buf_start << PAGE_SHIFT,
> -		(pgt_buf_top << PAGE_SHIFT) - 1);
>  
> -	max_pfn_mapped = 0; /* will get exact value next */
>  	/* the ISA range is always mapped regardless of memory holes */
>  	init_memory_mapping(0, ISA_END_ADDRESS);
> -	init_range_memory_mapping(ISA_END_ADDRESS, end);
> +
> +	/* xen has big range in reserved near end of ram, skip it at first */

How does feeding ISA_END_ADDRESS in to the memblock_find_in_range skip it?
> +	addr = memblock_find_in_range(ISA_END_ADDRESS, end, PMD_SIZE,
> +			 PAGE_SIZE);
> +	real_end = addr + PMD_SIZE;
> +
> +	/* step_size need to be small so pgt_buf from BRK could cover it */
> +	step_size = PMD_SIZE;
> +	max_pfn_mapped = 0; /* will get exact value next */

next in the init_rnage_memory_mapping? Might want to spell that out.

> +	min_pfn_mapped = real_end >> PAGE_SHIFT;
> +	last_start = start = real_end;

Might want to add a comment here saying:

"We are looping from the top to the bottom."

> +	while (last_start > ISA_END_ADDRESS) {
> +		if (last_start > step_size) {
> +			start = round_down(last_start - 1, step_size);
> +			if (start < ISA_END_ADDRESS)
> +				start = ISA_END_ADDRESS;
> +		} else
> +			start = ISA_END_ADDRESS;
> +		new_mapped_ram_size = init_range_memory_mapping(start,
> +							last_start);
> +		last_start = start;
> +		min_pfn_mapped = last_start >> PAGE_SHIFT;
> +		if (new_mapped_ram_size > mapped_ram_size)
> +			step_size <<= 5;

Should '5' have a #define value?

> +		mapped_ram_size += new_mapped_ram_size;
> +	}

It looks like the step_size would keep on increasing on every loop.
First it would be 2MB, 64MB, then 2GB, and so - until the amount
of memory that has been mapped is greater then unmapped.
Is that right?

I am basing that assumption on that the "new_mapped_ram_size"
would return the size of the newly mapped region (start, last_start)
in bytes. And the 'mapped_ram_size' is the size of the previously
mapped region plus all the other ones.

The logic being that  at the start of execution you start with a 2MB,
compare it to 0, and increase step_size up to 64MB. Then start
at real_end-2MB-step_size -> real_end-2MB-1. That gets you a 64MB chunk.

Since new_mapped_ram_size (64MB) > mapped_ram_sized (2MB)
you increase step_size once more.

If so, you should also explain that in the git commit description and
in the loop logic..

> +
> +	if (real_end < end)
> +		init_range_memory_mapping(real_end, end);
> +
>  #ifdef CONFIG_X86_64
>  	if (max_pfn > max_low_pfn) {
>  		/* can we preseve max_low_pfn ?*/
>  		max_low_pfn = max_pfn;
>  	}
>  #endif
> -	/*
> -	 * Reserve the kernel pagetable pages we used (pgt_buf_start -
> -	 * pgt_buf_end) and free the other ones (pgt_buf_end - pgt_buf_top)
> -	 * so that they can be reused for other purposes.
> -	 *
> -	 * On native it just means calling memblock_reserve, on Xen it also
> -	 * means marking RW the pagetable pages that we allocated before
> -	 * but that haven't been used.
> -	 *
> -	 * In fact on xen we mark RO the whole range pgt_buf_start -
> -	 * pgt_buf_top, because we have to make sure that when
> -	 * init_memory_mapping reaches the pagetable pages area, it maps
> -	 * RO all the pagetable pages, including the ones that are beyond
> -	 * pgt_buf_end at that time.
> -	 */
> -	if (pgt_buf_end > pgt_buf_start) {
> -		printk(KERN_DEBUG "kernel direct mapping tables up to %#lx @ [mem %#010lx-%#010lx] final\n",
> -			end - 1, pgt_buf_start << PAGE_SHIFT,
> -			(pgt_buf_end << PAGE_SHIFT) - 1);
> -		x86_init.mapping.pagetable_reserve(PFN_PHYS(pgt_buf_start),
> -				PFN_PHYS(pgt_buf_end));
> -	}
> -
> -	/* stop the wrong using */
> -	pgt_buf_top = 0;
> -
>  	early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
>  }
>  
> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index 27f7fc6..7bb1106 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -61,11 +61,22 @@ bool __read_mostly __vmalloc_start_set = false;
>  
>  static __init void *alloc_low_page(void)
>  {
> -	unsigned long pfn = pgt_buf_end++;
> +	unsigned long pfn;
>  	void *adr;
>  
> -	if (pfn >= pgt_buf_top)
> -		panic("alloc_low_page: ran out of memory");
> +	if ((pgt_buf_end + 1) >= pgt_buf_top) {
> +		unsigned long ret;
> +		if (min_pfn_mapped >= max_pfn_mapped)
> +			panic("alloc_low_page: ran out of memory");
> +		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
> +					max_pfn_mapped << PAGE_SHIFT,
> +					PAGE_SIZE, PAGE_SIZE);
> +		if (!ret)
> +			panic("alloc_low_page: can not alloc memory");
> +		memblock_reserve(ret, PAGE_SIZE);
> +		pfn = ret >> PAGE_SHIFT;
> +	} else
> +		pfn = pgt_buf_end++;
>  
>  	adr = __va(pfn * PAGE_SIZE);
>  	clear_page(adr);
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 4898e80..7dfa69b 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -316,7 +316,7 @@ void __init cleanup_highmap(void)
>  
>  static __ref void *alloc_low_page(unsigned long *phys)
>  {
> -	unsigned long pfn = pgt_buf_end++;
> +	unsigned long pfn;
>  	void *adr;
>  
>  	if (after_bootmem) {
> @@ -326,8 +326,19 @@ static __ref void *alloc_low_page(unsigned long *phys)
>  		return adr;
>  	}
>  
> -	if (pfn >= pgt_buf_top)
> -		panic("alloc_low_page: ran out of memory");
> +	if ((pgt_buf_end + 1) >= pgt_buf_top) {
> +		unsigned long ret;
> +		if (min_pfn_mapped >= max_pfn_mapped)
> +			panic("alloc_low_page: ran out of memory");
> +		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
> +					max_pfn_mapped << PAGE_SHIFT,
> +					PAGE_SIZE, PAGE_SIZE);
> +		if (!ret)
> +			panic("alloc_low_page: can not alloc memory");
> +		memblock_reserve(ret, PAGE_SIZE);
> +		pfn = ret >> PAGE_SHIFT;
> +	} else
> +		pfn = pgt_buf_end++;
>  
>  	adr = early_memremap(pfn * PAGE_SIZE, PAGE_SIZE);
>  	clear_page(adr);
> -- 
> 1.7.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 07/19] x86, mm: Remove early_memremap workaround for page table accessing on 64bit
  2012-10-18 20:50 ` [PATCH 07/19] x86, mm: Remove early_memremap workaround for page table accessing on 64bit Yinghai Lu
@ 2012-10-22 15:07   ` Konrad Rzeszutek Wilk
  2012-10-22 19:08     ` Yinghai Lu
  0 siblings, 1 reply; 63+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-10-22 15:07 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Thu, Oct 18, 2012 at 01:50:18PM -0700, Yinghai Lu wrote:
> We do not need that workaround anymore after patches that pre-map page
> table buf and do not clear initial page table wrongly.


.. and somewhere during the v2 posting we had a discussion about
why this work-around came about. You should include a bit about
that and copy-n-paste some of that here please.


> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/mm/init_64.c |   38 ++++----------------------------------
>  1 files changed, 4 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 7dfa69b..4e6873f 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -340,36 +340,12 @@ static __ref void *alloc_low_page(unsigned long *phys)
>  	} else
>  		pfn = pgt_buf_end++;
>  
> -	adr = early_memremap(pfn * PAGE_SIZE, PAGE_SIZE);
> +	adr = __va(pfn * PAGE_SIZE);
>  	clear_page(adr);
>  	*phys  = pfn * PAGE_SIZE;
>  	return adr;
>  }
>  
> -static __ref void *map_low_page(void *virt)
> -{
> -	void *adr;
> -	unsigned long phys, left;
> -
> -	if (after_bootmem)
> -		return virt;
> -
> -	phys = __pa(virt);
> -	left = phys & (PAGE_SIZE - 1);
> -	adr = early_memremap(phys & PAGE_MASK, PAGE_SIZE);
> -	adr = (void *)(((unsigned long)adr) | left);
> -
> -	return adr;
> -}
> -
> -static __ref void unmap_low_page(void *adr)
> -{
> -	if (after_bootmem)
> -		return;
> -
> -	early_iounmap((void *)((unsigned long)adr & PAGE_MASK), PAGE_SIZE);
> -}
> -
>  static unsigned long __meminit
>  phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
>  	      pgprot_t prot)
> @@ -441,10 +417,9 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
>  		if (pmd_val(*pmd)) {
>  			if (!pmd_large(*pmd)) {
>  				spin_lock(&init_mm.page_table_lock);
> -				pte = map_low_page((pte_t *)pmd_page_vaddr(*pmd));
> +				pte = (pte_t *)pmd_page_vaddr(*pmd);
>  				last_map_addr = phys_pte_init(pte, address,
>  								end, prot);
> -				unmap_low_page(pte);
>  				spin_unlock(&init_mm.page_table_lock);
>  				continue;
>  			}
> @@ -480,7 +455,6 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
>  
>  		pte = alloc_low_page(&pte_phys);
>  		last_map_addr = phys_pte_init(pte, address, end, new_prot);
> -		unmap_low_page(pte);
>  
>  		spin_lock(&init_mm.page_table_lock);
>  		pmd_populate_kernel(&init_mm, pmd, __va(pte_phys));
> @@ -515,10 +489,9 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
>  
>  		if (pud_val(*pud)) {
>  			if (!pud_large(*pud)) {
> -				pmd = map_low_page(pmd_offset(pud, 0));
> +				pmd = pmd_offset(pud, 0);
>  				last_map_addr = phys_pmd_init(pmd, addr, end,
>  							 page_size_mask, prot);
> -				unmap_low_page(pmd);
>  				__flush_tlb_all();
>  				continue;
>  			}
> @@ -555,7 +528,6 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
>  		pmd = alloc_low_page(&pmd_phys);
>  		last_map_addr = phys_pmd_init(pmd, addr, end, page_size_mask,
>  					      prot);
> -		unmap_low_page(pmd);
>  
>  		spin_lock(&init_mm.page_table_lock);
>  		pud_populate(&init_mm, pud, __va(pmd_phys));
> @@ -591,17 +563,15 @@ kernel_physical_mapping_init(unsigned long start,
>  			next = end;
>  
>  		if (pgd_val(*pgd)) {
> -			pud = map_low_page((pud_t *)pgd_page_vaddr(*pgd));
> +			pud = (pud_t *)pgd_page_vaddr(*pgd);
>  			last_map_addr = phys_pud_init(pud, __pa(start),
>  						 __pa(end), page_size_mask);
> -			unmap_low_page(pud);
>  			continue;
>  		}
>  
>  		pud = alloc_low_page(&pud_phys);
>  		last_map_addr = phys_pud_init(pud, __pa(start), __pa(next),
>  						 page_size_mask);
> -		unmap_low_page(pud);
>  
>  		spin_lock(&init_mm.page_table_lock);
>  		pgd_populate(&init_mm, pgd, __va(pud_phys));
> -- 
> 1.7.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 08/19] x86, mm: Remove parameter in alloc_low_page for 64bit
  2012-10-18 20:50 ` [PATCH 08/19] x86, mm: Remove parameter in alloc_low_page for 64bit Yinghai Lu
@ 2012-10-22 15:09   ` Konrad Rzeszutek Wilk
  2012-10-22 19:09     ` Yinghai Lu
  0 siblings, 1 reply; 63+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-10-22 15:09 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Thu, Oct 18, 2012 at 01:50:19PM -0700, Yinghai Lu wrote:
> Now all page table buf are pre-mapped, and could use virtual address directly.
                     ^^ buffers              ^^^^ -> can

> So don't need to remember physics address anymore.
  ^^ We

physics? Physical.

> 
> Remove that phys pointer in alloc_low_page(), and that will allow us to merge
> alloc_low_page between 64bit and 32bit.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/mm/init_64.c |   19 +++++++------------
>  1 files changed, 7 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 4e6873f..cbf8dbe 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -314,14 +314,13 @@ void __init cleanup_highmap(void)
>  	}
>  }
>  
> -static __ref void *alloc_low_page(unsigned long *phys)
> +static __ref void *alloc_low_page(void)
>  {
>  	unsigned long pfn;
>  	void *adr;
>  
>  	if (after_bootmem) {
>  		adr = (void *)get_zeroed_page(GFP_ATOMIC | __GFP_NOTRACK);
> -		*phys = __pa(adr);
>  
>  		return adr;
>  	}
> @@ -342,7 +341,6 @@ static __ref void *alloc_low_page(unsigned long *phys)
>  
>  	adr = __va(pfn * PAGE_SIZE);
>  	clear_page(adr);
> -	*phys  = pfn * PAGE_SIZE;
>  	return adr;
>  }
>  
> @@ -400,7 +398,6 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
>  	int i = pmd_index(address);
>  
>  	for (; i < PTRS_PER_PMD; i++, address = next) {
> -		unsigned long pte_phys;
>  		pmd_t *pmd = pmd_page + pmd_index(address);
>  		pte_t *pte;
>  		pgprot_t new_prot = prot;
> @@ -453,11 +450,11 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
>  			continue;
>  		}
>  
> -		pte = alloc_low_page(&pte_phys);
> +		pte = alloc_low_page();
>  		last_map_addr = phys_pte_init(pte, address, end, new_prot);
>  
>  		spin_lock(&init_mm.page_table_lock);
> -		pmd_populate_kernel(&init_mm, pmd, __va(pte_phys));
> +		pmd_populate_kernel(&init_mm, pmd, pte);
>  		spin_unlock(&init_mm.page_table_lock);
>  	}
>  	update_page_count(PG_LEVEL_2M, pages);
> @@ -473,7 +470,6 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
>  	int i = pud_index(addr);
>  
>  	for (; i < PTRS_PER_PUD; i++, addr = next) {
> -		unsigned long pmd_phys;
>  		pud_t *pud = pud_page + pud_index(addr);
>  		pmd_t *pmd;
>  		pgprot_t prot = PAGE_KERNEL;
> @@ -525,12 +521,12 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
>  			continue;
>  		}
>  
> -		pmd = alloc_low_page(&pmd_phys);
> +		pmd = alloc_low_page();
>  		last_map_addr = phys_pmd_init(pmd, addr, end, page_size_mask,
>  					      prot);
>  
>  		spin_lock(&init_mm.page_table_lock);
> -		pud_populate(&init_mm, pud, __va(pmd_phys));
> +		pud_populate(&init_mm, pud, pmd);
>  		spin_unlock(&init_mm.page_table_lock);
>  	}
>  	__flush_tlb_all();
> @@ -555,7 +551,6 @@ kernel_physical_mapping_init(unsigned long start,
>  
>  	for (; start < end; start = next) {
>  		pgd_t *pgd = pgd_offset_k(start);
> -		unsigned long pud_phys;
>  		pud_t *pud;
>  
>  		next = (start + PGDIR_SIZE) & PGDIR_MASK;
> @@ -569,12 +564,12 @@ kernel_physical_mapping_init(unsigned long start,
>  			continue;
>  		}
>  
> -		pud = alloc_low_page(&pud_phys);
> +		pud = alloc_low_page();
>  		last_map_addr = phys_pud_init(pud, __pa(start), __pa(next),
>  						 page_size_mask);
>  
>  		spin_lock(&init_mm.page_table_lock);
> -		pgd_populate(&init_mm, pgd, __va(pud_phys));
> +		pgd_populate(&init_mm, pgd, pud);
>  		spin_unlock(&init_mm.page_table_lock);
>  		pgd_changed = true;
>  	}
> -- 
> 1.7.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 09/19] x86, mm: Merge alloc_low_page between 64bit and 32bit
  2012-10-18 20:50 ` [PATCH 09/19] x86, mm: Merge alloc_low_page between 64bit and 32bit Yinghai Lu
@ 2012-10-22 15:11   ` Konrad Rzeszutek Wilk
  2012-10-22 19:14     ` Yinghai Lu
  0 siblings, 1 reply; 63+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-10-22 15:11 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Thu, Oct 18, 2012 at 01:50:20PM -0700, Yinghai Lu wrote:
> They are almost same except 64 bit need to handle after_bootmem.
                                                                 ^^ -
case.

> 
> Add mm_internal.h to hide that alloc_low_page out of arch/x86/mm/init*.c

Huh?

I think what you are saying is that you want to expose alloc_low_page
decleration in a header since the function resides in mm/init_[32|64].c ?


> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/mm/init.c        |   34 ++++++++++++++++++++++++++++++++++
>  arch/x86/mm/init_32.c     |   26 ++------------------------
>  arch/x86/mm/init_64.c     |   32 ++------------------------------
>  arch/x86/mm/mm_internal.h |    6 ++++++
>  4 files changed, 44 insertions(+), 54 deletions(-)
>  create mode 100644 arch/x86/mm/mm_internal.h
> 
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 9ff29c1..c398b2c 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -17,10 +17,44 @@
>  #include <asm/proto.h>
>  #include <asm/dma.h>		/* for MAX_DMA_PFN */
>  
> +#include "mm_internal.h"
> +
>  unsigned long __initdata pgt_buf_start;
>  unsigned long __meminitdata pgt_buf_end;
>  unsigned long __meminitdata pgt_buf_top;
>  
> +__ref void *alloc_low_page(void)
> +{
> +	unsigned long pfn;
> +	void *adr;
> +
> +#ifdef CONFIG_X86_64
> +	if (after_bootmem) {
> +		adr = (void *)get_zeroed_page(GFP_ATOMIC | __GFP_NOTRACK);
> +
> +		return adr;
> +	}
> +#endif
> +
> +	if ((pgt_buf_end + 1) >= pgt_buf_top) {
> +		unsigned long ret;
> +		if (min_pfn_mapped >= max_pfn_mapped)
> +			panic("alloc_low_page: ran out of memory");
> +		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
> +					max_pfn_mapped << PAGE_SHIFT,
> +					PAGE_SIZE, PAGE_SIZE);
> +		if (!ret)
> +			panic("alloc_low_page: can not alloc memory");
> +		memblock_reserve(ret, PAGE_SIZE);
> +		pfn = ret >> PAGE_SHIFT;
> +	} else
> +		pfn = pgt_buf_end++;
> +
> +	adr = __va(pfn * PAGE_SIZE);
> +	clear_page(adr);
> +	return adr;
> +}
> +
>  /* need 4 4k for initial PMD_SIZE, 4k for 0-ISA_END_ADDRESS */
>  #define INIT_PGT_BUF_SIZE	(5 * PAGE_SIZE)
>  RESERVE_BRK(early_pgt_alloc, INIT_PGT_BUF_SIZE);
> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index 7bb1106..a7f2df1 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -53,36 +53,14 @@
>  #include <asm/page_types.h>
>  #include <asm/init.h>
>  
> +#include "mm_internal.h"
> +
>  unsigned long highstart_pfn, highend_pfn;
>  
>  static noinline int do_test_wp_bit(void);
>  
>  bool __read_mostly __vmalloc_start_set = false;
>  
> -static __init void *alloc_low_page(void)
> -{
> -	unsigned long pfn;
> -	void *adr;
> -
> -	if ((pgt_buf_end + 1) >= pgt_buf_top) {
> -		unsigned long ret;
> -		if (min_pfn_mapped >= max_pfn_mapped)
> -			panic("alloc_low_page: ran out of memory");
> -		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
> -					max_pfn_mapped << PAGE_SHIFT,
> -					PAGE_SIZE, PAGE_SIZE);
> -		if (!ret)
> -			panic("alloc_low_page: can not alloc memory");
> -		memblock_reserve(ret, PAGE_SIZE);
> -		pfn = ret >> PAGE_SHIFT;
> -	} else
> -		pfn = pgt_buf_end++;
> -
> -	adr = __va(pfn * PAGE_SIZE);
> -	clear_page(adr);
> -	return adr;
> -}
> -
>  /*
>   * Creates a middle page table and puts a pointer to it in the
>   * given global directory entry. This only returns the gd entry
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index cbf8dbe..aabe8ff 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -54,6 +54,8 @@
>  #include <asm/uv/uv.h>
>  #include <asm/setup.h>
>  
> +#include "mm_internal.h"
> +
>  static int __init parse_direct_gbpages_off(char *arg)
>  {
>  	direct_gbpages = 0;
> @@ -314,36 +316,6 @@ void __init cleanup_highmap(void)
>  	}
>  }
>  
> -static __ref void *alloc_low_page(void)
> -{
> -	unsigned long pfn;
> -	void *adr;
> -
> -	if (after_bootmem) {
> -		adr = (void *)get_zeroed_page(GFP_ATOMIC | __GFP_NOTRACK);
> -
> -		return adr;
> -	}
> -
> -	if ((pgt_buf_end + 1) >= pgt_buf_top) {
> -		unsigned long ret;
> -		if (min_pfn_mapped >= max_pfn_mapped)
> -			panic("alloc_low_page: ran out of memory");
> -		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
> -					max_pfn_mapped << PAGE_SHIFT,
> -					PAGE_SIZE, PAGE_SIZE);
> -		if (!ret)
> -			panic("alloc_low_page: can not alloc memory");
> -		memblock_reserve(ret, PAGE_SIZE);
> -		pfn = ret >> PAGE_SHIFT;
> -	} else
> -		pfn = pgt_buf_end++;
> -
> -	adr = __va(pfn * PAGE_SIZE);
> -	clear_page(adr);
> -	return adr;
> -}
> -
>  static unsigned long __meminit
>  phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
>  	      pgprot_t prot)
> diff --git a/arch/x86/mm/mm_internal.h b/arch/x86/mm/mm_internal.h
> new file mode 100644
> index 0000000..b3f993a
> --- /dev/null
> +++ b/arch/x86/mm/mm_internal.h
> @@ -0,0 +1,6 @@
> +#ifndef __X86_MM_INTERNAL_H
> +#define __X86_MM_INTERNAL_H
> +
> +void *alloc_low_page(void);
> +
> +#endif	/* __X86_MM_INTERNAL_H */
> -- 
> 1.7.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 11/19] x86, mm, xen: Remove mapping_pagatable_reserve
  2012-10-18 20:50 ` [PATCH 11/19] x86, mm, xen: Remove mapping_pagatable_reserve Yinghai Lu
@ 2012-10-22 15:14   ` Konrad Rzeszutek Wilk
  2012-10-22 19:18     ` Yinghai Lu
  0 siblings, 1 reply; 63+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-10-22 15:14 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Thu, Oct 18, 2012 at 01:50:22PM -0700, Yinghai Lu wrote:
> page table area are pre-mapped now, and mark_page_ro is used to make it RO
  ^ Page     ^^^^->areas                  ^^^^^^^^^^^-> ? I must have
missed that patch. Can you include the title of the patch in this
git commit so one could take a look.

> for xen.
      ^->Xen
> 
> mapping_pagetable_reserve is not used anymore, so remove it.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/include/asm/pgtable_types.h |    1 -
>  arch/x86/include/asm/x86_init.h      |   12 ------------
>  arch/x86/kernel/x86_init.c           |    4 ----
>  arch/x86/mm/init.c                   |    4 ----
>  arch/x86/xen/mmu.c                   |   28 ----------------------------
>  5 files changed, 0 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index ec8a1fc..79738f2 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -301,7 +301,6 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
>  /* Install a pte for a particular vaddr in kernel space. */
>  void set_pte_vaddr(unsigned long vaddr, pte_t pte);
>  
> -extern void native_pagetable_reserve(u64 start, u64 end);
>  #ifdef CONFIG_X86_32
>  extern void native_pagetable_init(void);
>  #else
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index 5769349..3b2ce8f 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -69,17 +69,6 @@ struct x86_init_oem {
>  };
>  
>  /**
> - * struct x86_init_mapping - platform specific initial kernel pagetable setup
> - * @pagetable_reserve:	reserve a range of addresses for kernel pagetable usage
> - *
> - * For more details on the purpose of this hook, look in
> - * init_memory_mapping and the commit that added it.
> - */
> -struct x86_init_mapping {
> -	void (*pagetable_reserve)(u64 start, u64 end);
> -};
> -
> -/**
>   * struct x86_init_paging - platform specific paging functions
>   * @pagetable_init:	platform specific paging initialization call to setup
>   *			the kernel pagetables and prepare accessors functions.
> @@ -136,7 +125,6 @@ struct x86_init_ops {
>  	struct x86_init_mpparse		mpparse;
>  	struct x86_init_irqs		irqs;
>  	struct x86_init_oem		oem;
> -	struct x86_init_mapping		mapping;
>  	struct x86_init_paging		paging;
>  	struct x86_init_timers		timers;
>  	struct x86_init_iommu		iommu;
> diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
> index 7a3d075..50cf83e 100644
> --- a/arch/x86/kernel/x86_init.c
> +++ b/arch/x86/kernel/x86_init.c
> @@ -62,10 +62,6 @@ struct x86_init_ops x86_init __initdata = {
>  		.banner			= default_banner,
>  	},
>  
> -	.mapping = {
> -		.pagetable_reserve		= native_pagetable_reserve,
> -	},
> -
>  	.paging = {
>  		.pagetable_init		= native_pagetable_init,
>  	},
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 2257727..dd09d20 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -112,10 +112,6 @@ static void __init probe_page_size_mask(void)
>  		__supported_pte_mask |= _PAGE_GLOBAL;
>  	}
>  }
> -void __init native_pagetable_reserve(u64 start, u64 end)
> -{
> -	memblock_reserve(start, end - start);
> -}
>  
>  #ifdef CONFIG_X86_32
>  #define NR_RANGE_MR 3
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 6226c99..efc5260 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1178,20 +1178,6 @@ static void xen_exit_mmap(struct mm_struct *mm)
>  
>  static void xen_post_allocator_init(void);
>  
> -static __init void xen_mapping_pagetable_reserve(u64 start, u64 end)
> -{
> -	/* reserve the range used */
> -	native_pagetable_reserve(start, end);
> -
> -	/* set as RW the rest */
> -	printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end,
> -			PFN_PHYS(pgt_buf_top));
> -	while (end < PFN_PHYS(pgt_buf_top)) {
> -		make_lowmem_page_readwrite(__va(end));
> -		end += PAGE_SIZE;
> -	}
> -}
> -
>  #ifdef CONFIG_X86_64
>  static void __init xen_cleanhighmap(unsigned long vaddr,
>  				    unsigned long vaddr_end)
> @@ -1484,19 +1470,6 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
>  #else /* CONFIG_X86_64 */
>  static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
>  {
> -	unsigned long pfn = pte_pfn(pte);
> -
> -	/*
> -	 * If the new pfn is within the range of the newly allocated
> -	 * kernel pagetable, and it isn't being mapped into an
> -	 * early_ioremap fixmap slot as a freshly allocated page, make sure
> -	 * it is RO.
> -	 */
> -	if (((!is_early_ioremap_ptep(ptep) &&
> -			pfn >= pgt_buf_start && pfn < pgt_buf_top)) ||
> -			(is_early_ioremap_ptep(ptep) && pfn != (pgt_buf_end - 1)))
> -		pte = pte_wrprotect(pte);
> -
>  	return pte;
>  }
>  #endif /* CONFIG_X86_64 */
> @@ -2178,7 +2151,6 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
>  
>  void __init xen_init_mmu_ops(void)
>  {
> -	x86_init.mapping.pagetable_reserve = xen_mapping_pagetable_reserve;
>  	x86_init.paging.pagetable_init = xen_pagetable_init;
>  	pv_mmu_ops = xen_mmu_ops;
>  
> -- 
> 1.7.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 12/19] x86, mm: Add alloc_low_pages(num)
  2012-10-18 20:50 ` [PATCH 12/19] x86, mm: Add alloc_low_pages(num) Yinghai Lu
@ 2012-10-22 15:17   ` Konrad Rzeszutek Wilk
  2012-10-22 19:24     ` Yinghai Lu
  0 siblings, 1 reply; 63+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-10-22 15:17 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Thu, Oct 18, 2012 at 01:50:23PM -0700, Yinghai Lu wrote:
> 32bit kmap mapping need page table to be used for low to high.
                                    ^-s

> 
> At this point those page table are still from pgt_buf_* from BRK,
                                ^s

> So it is ok now.
> But we want to move early_ioremap_page_table_range_init() out of
> init_memory_mapping() and only call it one time later, that will
> make page_table_range_init/page_table_kmap_check/alloc_low_page to
> use memblock to get page.
> 
> memblock allocation for page table are from high to low.
                                    ^s
> 
> So will get panic from page_table_kmap_check() that has BUG_ON to do
> ordering checking.
> 
> This patch add alloc_low_pages to make it possible to alloc serveral pages
> at first, and hand out pages one by one from low to high.

.. But for right now this patch just makes it by default one.

Right?

> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/mm/init.c        |   33 +++++++++++++++++++++------------
>  arch/x86/mm/mm_internal.h |    6 +++++-
>  2 files changed, 26 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index dd09d20..de71c0d 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -25,36 +25,45 @@ unsigned long __meminitdata pgt_buf_top;
>  
>  static unsigned long min_pfn_mapped;
>  
> -__ref void *alloc_low_page(void)
> +__ref void *alloc_low_pages(unsigned int num)
>  {
>  	unsigned long pfn;
> -	void *adr;
> +	int i;
>  
>  #ifdef CONFIG_X86_64
>  	if (after_bootmem) {
> -		adr = (void *)get_zeroed_page(GFP_ATOMIC | __GFP_NOTRACK);
> +		unsigned int order;
>  
> -		return adr;
> +		order = get_order((unsigned long)num << PAGE_SHIFT);
> +		return (void *)__get_free_pages(GFP_ATOMIC | __GFP_NOTRACK |
> +						__GFP_ZERO, order);
>  	}
>  #endif
>  
> -	if ((pgt_buf_end + 1) >= pgt_buf_top) {
> +	if ((pgt_buf_end + num) >= pgt_buf_top) {
>  		unsigned long ret;
>  		if (min_pfn_mapped >= max_pfn_mapped)
>  			panic("alloc_low_page: ran out of memory");
>  		ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
>  					max_pfn_mapped << PAGE_SHIFT,
> -					PAGE_SIZE, PAGE_SIZE);
> +					PAGE_SIZE * num , PAGE_SIZE);
>  		if (!ret)
>  			panic("alloc_low_page: can not alloc memory");
> -		memblock_reserve(ret, PAGE_SIZE);
> +		memblock_reserve(ret, PAGE_SIZE * num);
>  		pfn = ret >> PAGE_SHIFT;
> -	} else
> -		pfn = pgt_buf_end++;
> +	} else {
> +		pfn = pgt_buf_end;
> +		pgt_buf_end += num;
> +	}
> +
> +	for (i = 0; i < num; i++) {
> +		void *adr;
> +
> +		adr = __va((pfn + i) << PAGE_SHIFT);
> +		clear_page(adr);
> +	}
>  
> -	adr = __va(pfn * PAGE_SIZE);
> -	clear_page(adr);
> -	return adr;
> +	return __va(pfn << PAGE_SHIFT);
>  }
>  
>  /* need 4 4k for initial PMD_SIZE, 4k for 0-ISA_END_ADDRESS */
> diff --git a/arch/x86/mm/mm_internal.h b/arch/x86/mm/mm_internal.h
> index b3f993a..7e3b88e 100644
> --- a/arch/x86/mm/mm_internal.h
> +++ b/arch/x86/mm/mm_internal.h
> @@ -1,6 +1,10 @@
>  #ifndef __X86_MM_INTERNAL_H
>  #define __X86_MM_INTERNAL_H
>  
> -void *alloc_low_page(void);
> +void *alloc_low_pages(unsigned int num);
> +static inline void *alloc_low_page(void)
> +{
> +	return alloc_low_pages(1);
> +}
>  
>  #endif	/* __X86_MM_INTERNAL_H */
> -- 
> 1.7.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 18/19] x86, mm: Let "memmap=" take more entries one time
  2012-10-18 20:50 ` [PATCH 18/19] x86, mm: Let "memmap=" take more entries one time Yinghai Lu
@ 2012-10-22 15:19   ` Konrad Rzeszutek Wilk
  2012-10-22 19:26     ` Yinghai Lu
  0 siblings, 1 reply; 63+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-10-22 15:19 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Thu, Oct 18, 2012 at 01:50:29PM -0700, Yinghai Lu wrote:
> Current "memmap=" only can take one entry every time.
> when we have more entries, we have to use memmap= for each of them.
> 
> For pxe booting, we have command line length limitation, those extra
> "memmap=" would waste too much space.
> 
> This patch make memmap= could take several entries one time,
> and those entries will be split with ','

Um, not sure what this patch has to do with this patchset?
Should this be sent seperatly?

> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/kernel/e820.c |   16 +++++++++++++++-
>  1 files changed, 15 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index ed858e9..f281328 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -835,7 +835,7 @@ static int __init parse_memopt(char *p)
>  }
>  early_param("mem", parse_memopt);
>  
> -static int __init parse_memmap_opt(char *p)
> +static int __init parse_memmap_one(char *p)
>  {
>  	char *oldp;
>  	u64 start_at, mem_size;
> @@ -877,6 +877,20 @@ static int __init parse_memmap_opt(char *p)
>  
>  	return *p == '\0' ? 0 : -EINVAL;
>  }
> +static int __init parse_memmap_opt(char *str)
> +{
> +	while (str) {
> +		char *k = strchr(str, ',');
> +
> +		if (k)
> +			*k++ = 0;
> +
> +		parse_memmap_one(str);
> +		str = k;
> +	}
> +
> +	return 0;
> +}
>  early_param("memmap", parse_memmap_opt);
>  
>  void __init finish_e820_parsing(void)
> -- 
> 1.7.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 13/19] x86, mm: only call early_ioremap_page_table_range_init() once
  2012-10-18 20:50 ` [PATCH 13/19] x86, mm: only call early_ioremap_page_table_range_init() once Yinghai Lu
@ 2012-10-22 15:24   ` Konrad Rzeszutek Wilk
  2012-10-22 19:40     ` Yinghai Lu
  0 siblings, 1 reply; 63+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-10-22 15:24 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Thu, Oct 18, 2012 at 01:50:24PM -0700, Yinghai Lu wrote:
> On 32bit, We should not keep calling that during every init_memory_mapping.

Explain pls why.

> 
> Need to update page_table_range_init() to count the pages for kmap page table
> at first, and use new added alloc_low_pages() to get pages in sequence.
> That will conform requirement that page table need to be in low to high order.
                   ^ to the                    ^-s
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/mm/init.c    |   13 +++++--------
>  arch/x86/mm/init_32.c |   47 +++++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 46 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index de71c0d..4eece3c 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -334,14 +334,6 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
>  		ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,
>  						   mr[i].page_size_mask);
>  
> -#ifdef CONFIG_X86_32
> -	early_ioremap_page_table_range_init();
> -
> -	load_cr3(swapper_pg_dir);
> -#endif
> -
> -	__flush_tlb_all();
> -
>  	add_pfn_range_mapped(start >> PAGE_SHIFT, ret >> PAGE_SHIFT);
>  
>  	return ret >> PAGE_SHIFT;
> @@ -435,7 +427,12 @@ void __init init_mem_mapping(void)
>  		/* can we preseve max_low_pfn ?*/
>  		max_low_pfn = max_pfn;
>  	}
> +#else
> +	early_ioremap_page_table_range_init();
> +	load_cr3(swapper_pg_dir);
> +	__flush_tlb_all();
>  #endif
> +
>  	early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
>  }
>  
> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index a7f2df1..ef7f0dc 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -135,8 +135,39 @@ pte_t * __init populate_extra_pte(unsigned long vaddr)
>  	return one_page_table_init(pmd) + pte_idx;
>  }
>  
> +static unsigned long __init
> +page_table_range_init_count(unsigned long start, unsigned long end)
> +{
> +	unsigned long count = 0;
> +#ifdef CONFIG_HIGHMEM
> +	int pmd_idx_kmap_begin = fix_to_virt(FIX_KMAP_END) >> PMD_SHIFT;
> +	int pmd_idx_kmap_end = fix_to_virt(FIX_KMAP_BEGIN) >> PMD_SHIFT;
> +	int pgd_idx, pmd_idx;
> +	unsigned long vaddr;
> +
> +	if (pmd_idx_kmap_begin == pmd_idx_kmap_end)
> +		return count;

Or just 'return 0';


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 01/19] x86, mm: Align start address to correct big page size
  2012-10-22 14:16   ` Konrad Rzeszutek Wilk
@ 2012-10-22 16:31     ` Yinghai Lu
  0 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-22 16:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Mon, Oct 22, 2012 at 7:16 AM, Konrad Rzeszutek Wilk
<konrad@kernel.org> wrote:
>>On Thu, Oct 18, 2012 at 01:50:10PM -0700, Yinghai Lu wrote:
>
>
> I am pretty sure I gave you some ideas of how to fix up the commit
> description in earlier reviews, but it looks like you missed them.
>
> Let me write them here once more.
>
>> We are going to use buffer in BRK to pre-map page table buffer.
>
> What buffer? Is buffer the same thing as page table?
>>
>> Page table buffer could be only page aligned, but range around it are
>
> .. ranges
>> ram too, we could use bigger page to map it to avoid small pages.
>>
>> We will adjust page_size_mask in next patch to use big page size for
>
> Instead of saying "next patch" - include the title of the patch
> so that one can search for it.
>
>> small ram range.
>>
>> Before that, this patch will make start address to be aligned down
>
> s/will make/made/
>
>> according to bigger page size, otherwise entry in page page will
>> not have correct value.
>
>
> I would structure this git commit description to first introduce
> the problem.
>
> Say at the start of the patch:
>
> "Before this patch, the start address was aligned down according
> to bigger a page size (1GB, 2MB). This is a problem b/c an
> entry in the page table will not have correct value. "
>
> Here can you explain why it does not have the correct value?
>> +                             pfn_pte((address & PMD_MASK) >> PAGE_SHIFT,
>>                                       __pgprot(pgprot_val(prot) | _PAGE_PSE)));
>>                       spin_unlock(&init_mm.page_table_lock);
>>                       last_map_addr = next;
>> @@ -536,7 +536,8 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
>>                       pages++;
>>                       spin_lock(&init_mm.page_table_lock);
>>                       set_pte((pte_t *)pud,
>> -                             pfn_pte(addr >> PAGE_SHIFT, PAGE_KERNEL_LARGE));
>> +                             pfn_pte((addr & PUD_MASK) >> PAGE_SHIFT,
>> +                                     PAGE_KERNEL_LARGE));
>>                       spin_unlock(&init_mm.page_table_lock);
>>                       last_map_addr = next;
>>                       continue;
>> --

will update commit log to:

----
We are going to use buffer in BRK to map small range just under memory top,
and use those new mapped ram to map low ram range under it.

The ram range that will be mapped at fist could be only page aligned,
but ranges around it are ram too, we could use bigger page to map it to
avoid small pages.

We will adjust page_size_mask in following patch:
        x86, mm: Use big page size for small memory range
to use big page size for small ram range.

Before that patch, this patch will make sure start address to be
aligned down according to bigger page size, otherwise entry in page
page will not have correct value.

---

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 02/19] x86, mm: Use big page size for small memory range
  2012-10-22 14:21   ` Konrad Rzeszutek Wilk
@ 2012-10-22 16:33     ` Yinghai Lu
  0 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-22 16:33 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Mon, Oct 22, 2012 at 7:21 AM, Konrad Rzeszutek Wilk
<konrad@kernel.org> wrote:
> On Thu, Oct 18, 2012 at 01:50:12PM -0700, Yinghai Lu wrote:
>> We could map small range in the middle of big range at first, so should use
>> big page size at first to avoid using small page size to break down page table.
>>
>> Only can set big page bit when that range has ram area around it.
>
> The code looks good.
>
> I would alter the description to say:
>
> (Describe the problem)
>
> "We are wasting entries in the page-table b/c are not taking advantage
> of the fact that adjoining ranges could be of the same type and
> coalescing them together. Instead we end up using the small size type."
>
> (Explain your patch).
>
> "We fix this by iterating over the ranges, detecting whether the
> ranges that are next to each other are of the same type - and if so
> set them to our type."

I think my commit change log is clear enough, and will not update it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 03/19] x86, mm: Don't clear page table if range is ram
  2012-10-22 14:28   ` Konrad Rzeszutek Wilk
@ 2012-10-22 16:56     ` Yinghai Lu
  0 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-22 16:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Mon, Oct 22, 2012 at 7:28 AM, Konrad Rzeszutek Wilk
<konrad@kernel.org> wrote:
> On Thu, Oct 18, 2012 at 01:50:14PM -0700, Yinghai Lu wrote:
>> After we add code use buffer in BRK to pre-map page table,
>                    ^- to
>
> So .. which patch is that? Can you include the title of the
> patch here?
>
>> it should be safe to remove early_memmap for page table accessing.
>> Instead we get panic with that.
>>
>> It turns out we clear the initial page table wrongly for next range that is
>               ^- that
>
>> separated by holes.
>> And it only happens when we are trying to map range one by one range separately.
>                                                      ^-s
>
>>
>> We need to check if the range is ram before clearing page table.
>
> Ok, so that sounds like a bug-fix... but
>>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> ---
>>  arch/x86/mm/init_64.c |   37 ++++++++++++++++---------------------
>>  1 files changed, 16 insertions(+), 21 deletions(-)
>>
>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> index f40f383..61b3c44 100644
>> --- a/arch/x86/mm/init_64.c
>> +++ b/arch/x86/mm/init_64.c
>> @@ -363,20 +363,19 @@ static unsigned long __meminit
>>  phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
>>             pgprot_t prot)
>>  {
>> -     unsigned pages = 0;
>> +     unsigned long pages = 0, next;
>>       unsigned long last_map_addr = end;
>>       int i;
>>
>>       pte_t *pte = pte_page + pte_index(addr);
>>
>> -     for(i = pte_index(addr); i < PTRS_PER_PTE; i++, addr += PAGE_SIZE, pte++) {
>> -
>> +     for (i = pte_index(addr); i < PTRS_PER_PTE; i++, addr = next, pte++) {
>> +             next = (addr & PAGE_MASK) + PAGE_SIZE;
>>               if (addr >= end) {
>> -                     if (!after_bootmem) {
>> -                             for(; i < PTRS_PER_PTE; i++, pte++)
>> -                                     set_pte(pte, __pte(0));
>> -                     }
>> -                     break;
>> +                     if (!after_bootmem &&
>> +                         !e820_any_mapped(addr & PAGE_MASK, next, 0))
>> +                             set_pte(pte, __pte(0));
>> +                     continue;
>
> .. Interestingly, you also removed the extra loop. How come? Why not
> retain the little loop? (which could call e820_any_mapped?) Is that
> an improvement and cleanup? If so, I would think you should at least
> explain in the git commit:

Merge that loop to top loop, and we need to use "next" from the top loop.

>
> "And while we are at it, also axe the extra loop and instead depend on
> the top loop which we can safely piggyback on."


update commit change log to:

---
After we add code use buffer in BRK to pre-map buf for page table in
following patch:
        x86, mm: setup page table in top-down
it should be safe to remove early_memmap for page table accessing.
Instead we get panic with that.

It turns out that we clear the initial page table wrongly for next range
that is separated by holes.
And it only happens when we are trying to map ram range one by one.

We need to check if the range is ram before clearing page table.

We change the loop structure to remove the extra little loop and use
one loop only, and in that loop will caculate next at first, and check if
[addr,next) is covered by E820_RAM.
---

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 04/19] x86, mm: only keep initial mapping for ram
  2012-10-22 14:33   ` Konrad Rzeszutek Wilk
@ 2012-10-22 17:43     ` Yinghai Lu
  0 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-22 17:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Mon, Oct 22, 2012 at 7:33 AM, Konrad Rzeszutek Wilk
<konrad@kernel.org> wrote:
> On Thu, Oct 18, 2012 at 01:50:15PM -0700, Yinghai Lu wrote:
>> 0 mean any e820 type, for any range is overlapped with any entry in e820,
>> kernel will keep it's initial page table mapping.
>>
>> What we want is only keeping initial page table for ram range.
>
> Can you squash it in the previous patch then pls? It seems like
> this is a bug-fix to a bug-fix. And since this patchset is still
> out-side the Linus's tree you have the option of squashing/rebasing.

ok.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/19] x86, mm: setup page table in top-down
  2012-10-22 13:19       ` Stefano Stabellini
@ 2012-10-22 18:17         ` Yinghai Lu
  2012-10-22 18:22           ` Yinghai Lu
  2012-10-23 12:22           ` Stefano Stabellini
  0 siblings, 2 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-22 18:17 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, linux-kernel

On Mon, Oct 22, 2012 at 6:19 AM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:

>> > The series is starting to get in good shape!
>> > I tested it on a 2G and an 8G VM and it seems to be working fine.
>>
>> domU on 32bit and 64bit?
> domU 64bit

Can you test domU 32bit too?
I did not test that, and looks like Jacob only test 64 bit domU too.

>> alloc_low_page() is used in arch/x86/mm/init*.c. How come it becomes
>> interface to
>> other subsystem?
>
> I chose the wrong words.
>
> I meant that always allocating pages from areas that are already mapped,
> will become an assumption for other x86 subsystems like Xen.
> One shouldn't just go ahead and change this assumption without changing
> the subsystems too.

that looks like xen's problem, it should let us know what kind of
assumption that is there for xen.
We can not go deep to xen find those.

> I just want to make sure that 3 years from now, when somebody comes up
> with a new great idea to improve the intial pagetable allocation, he
> doesn't forget that changing alloc_low_page might break other subsystems.
>
> So I think that a comment is required here and should explicitly
> mention why it is important that alloc_low_page returns a mapped page.

How about put sth:
---
Xen mmu requires pages from this function should be directly mapped already.
---

or you can introduce some doc tag specially that we can out those
assumption easily?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/19] x86, mm: setup page table in top-down
  2012-10-22 18:17         ` Yinghai Lu
@ 2012-10-22 18:22           ` Yinghai Lu
  2012-10-23 12:16             ` Stefano Stabellini
  2012-10-23 12:22           ` Stefano Stabellini
  1 sibling, 1 reply; 63+ messages in thread
From: Yinghai Lu @ 2012-10-22 18:22 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, linux-kernel

On Mon, Oct 22, 2012 at 11:17 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Oct 22, 2012 at 6:19 AM, Stefano Stabellini

> How about put sth:
> ---
> Xen mmu requires pages from this function should be directly mapped already.
> ---
>
> or you can introduce some doc tag specially that we can out those
> assumption easily?

I add

/* Xen requires pages from this function should be directly mapped already */

in   [PATCH] x86, mm: Add alloc_low_pages(num)

hope you are happy with that.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/19] x86, mm: setup page table in top-down
  2012-10-22 15:06   ` Konrad Rzeszutek Wilk
@ 2012-10-22 18:56     ` Yinghai Lu
  0 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-22 18:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Mon, Oct 22, 2012 at 8:06 AM, Konrad Rzeszutek Wilk
<konrad@kernel.org> wrote:
> On Thu, Oct 18, 2012 at 01:50:17PM -0700, Yinghai Lu wrote:
>
> You might want to mention how 'memblock' searches for regions.
> Presumarily it is from top to bottom.

that is not related. Will add explanation about min_pfn_mapped.

>
>
> And also explain the granularity of what the size you are mapping
> _after_ you are done with the PMD_SIZE.

will add reason for step_size
>> +     /* step_size need to be small so pgt_buf from BRK could cover it */
>> +     step_size = PMD_SIZE;
>> +     max_pfn_mapped = 0; /* will get exact value next */
>
> next in the init_rnage_memory_mapping? Might want to spell that out.

the loop and
   init_range_memory_mapping==>init_memory_mapping ==> add_pfn_range_mapped

>
>> +     min_pfn_mapped = real_end >> PAGE_SHIFT;
>> +     last_start = start = real_end;
>
> Might want to add a comment here saying:
>
> "We are looping from the top to the bottom."
>
>> +     while (last_start > ISA_END_ADDRESS) {
>> +             if (last_start > step_size) {
>> +                     start = round_down(last_start - 1, step_size);
>> +                     if (start < ISA_END_ADDRESS)
>> +                             start = ISA_END_ADDRESS;
>> +             } else
>> +                     start = ISA_END_ADDRESS;
>> +             new_mapped_ram_size = init_range_memory_mapping(start,
>> +                                                     last_start);
>> +             last_start = start;
>> +             min_pfn_mapped = last_start >> PAGE_SHIFT;
>> +             if (new_mapped_ram_size > mapped_ram_size)
>> +                     step_size <<= 5;
>
> Should '5' have a #define value?

yes.

>
>> +             mapped_ram_size += new_mapped_ram_size;
>> +     }
>
> It looks like the step_size would keep on increasing on every loop.
> First it would be 2MB, 64MB, then 2GB, and so - until the amount
> of memory that has been mapped is greater then unmapped.
> Is that right?
>
> I am basing that assumption on that the "new_mapped_ram_size"
> would return the size of the newly mapped region (start, last_start)
> in bytes. And the 'mapped_ram_size' is the size of the previously
> mapped region plus all the other ones.
>
> The logic being that  at the start of execution you start with a 2MB,
> compare it to 0, and increase step_size up to 64MB. Then start
> at real_end-2MB-step_size -> real_end-2MB-1. That gets you a 64MB chunk.
>
> Since new_mapped_ram_size (64MB) > mapped_ram_sized (2MB)
> you increase step_size once more.
>
> If so, you should also explain that in the git commit description and
> in the loop logic..

that logic is hard to understand from code?

updated changelog:

---
Get pgt_buf early from BRK, and use it to map PMD_SIZE from top at first.
Then use mapped pages to map more ranges below, and keep looping until
all pages get mapped.

alloc_low_page will use page from BRK at first, after that buffer is used
up, will use memblock to find and reserve pages for page table usage.

Introduce min_pfn_mapped to make sure find new pages from mapped ranges,
that will be updated when lower pages get mapped.

Also add step_size to make sure that don't try to map too big range with
limited mapped pages initially, and increase the step_size when we have
more mapped pages on hand.

At last we can get rid of calculation and find early pgt related code.

---

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 07/19] x86, mm: Remove early_memremap workaround for page table accessing on 64bit
  2012-10-22 15:07   ` Konrad Rzeszutek Wilk
@ 2012-10-22 19:08     ` Yinghai Lu
  0 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-22 19:08 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Mon, Oct 22, 2012 at 8:07 AM, Konrad Rzeszutek Wilk
<konrad@kernel.org> wrote:
> On Thu, Oct 18, 2012 at 01:50:18PM -0700, Yinghai Lu wrote:
>> We do not need that workaround anymore after patches that pre-map page
>> table buf and do not clear initial page table wrongly.
>
>
> .. and somewhere during the v2 posting we had a discussion about
> why this work-around came about. You should include a bit about
> that and copy-n-paste some of that here please.

Not sure which one that you are referring.

anyway I update the changelog to:

---
We try to put page table high to make room for kdump, and at that time
those ranges are not mapped yet, and have to use ioremap to access it.

Now after patch that pre-map page table top down.
        x86, mm: setup page table in top-down
We do not need that workaround anymore.

Just use __va to return directly mapping address.
---

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 08/19] x86, mm: Remove parameter in alloc_low_page for 64bit
  2012-10-22 15:09   ` Konrad Rzeszutek Wilk
@ 2012-10-22 19:09     ` Yinghai Lu
  0 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-22 19:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Mon, Oct 22, 2012 at 8:09 AM, Konrad Rzeszutek Wilk
<konrad@kernel.org> wrote:
> On Thu, Oct 18, 2012 at 01:50:19PM -0700, Yinghai Lu wrote:
>> Now all page table buf are pre-mapped, and could use virtual address directly.
>                      ^^ buffers              ^^^^ -> can
>
>> So don't need to remember physics address anymore.
>   ^^ We
>
> physics? Physical.

ok.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 09/19] x86, mm: Merge alloc_low_page between 64bit and 32bit
  2012-10-22 15:11   ` Konrad Rzeszutek Wilk
@ 2012-10-22 19:14     ` Yinghai Lu
  0 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-22 19:14 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Mon, Oct 22, 2012 at 8:11 AM, Konrad Rzeszutek Wilk
<konrad@kernel.org> wrote:
> On Thu, Oct 18, 2012 at 01:50:20PM -0700, Yinghai Lu wrote:
>> They are almost same except 64 bit need to handle after_bootmem.
>                                                                  ^^ -
> case.
>
>>
>> Add mm_internal.h to hide that alloc_low_page out of arch/x86/mm/init*.c
>
> Huh?
>
> I think what you are saying is that you want to expose alloc_low_page
> decleration in a header since the function resides in mm/init_[32|64].c ?


---
They are almost same except 64 bit need to handle after_bootmem case.

Add mm_internal.h to make that alloc_low_page() only to be accessible
from arch/x86/mm/init*.c
---

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 11/19] x86, mm, xen: Remove mapping_pagatable_reserve
  2012-10-22 15:14   ` Konrad Rzeszutek Wilk
@ 2012-10-22 19:18     ` Yinghai Lu
  0 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-22 19:18 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Mon, Oct 22, 2012 at 8:14 AM, Konrad Rzeszutek Wilk
<konrad@kernel.org> wrote:
> On Thu, Oct 18, 2012 at 01:50:22PM -0700, Yinghai Lu wrote:
>> page table area are pre-mapped now, and mark_page_ro is used to make it RO
>   ^ Page     ^^^^->areas                  ^^^^^^^^^^^-> ? I must have
> missed that patch. Can you include the title of the patch in this
> git commit so one could take a look.

sorry, forget to remove those words.

new version:
---
Page table area are pre-mapped now after
        x86, mm: setup page table in top-down
        x86, mm: Remove early_memremap workaround for page table
accessing on 64bit

mapping_pagetable_reserve is not used anymore, so remove it.
---

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 12/19] x86, mm: Add alloc_low_pages(num)
  2012-10-22 15:17   ` Konrad Rzeszutek Wilk
@ 2012-10-22 19:24     ` Yinghai Lu
  0 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-22 19:24 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Mon, Oct 22, 2012 at 8:17 AM, Konrad Rzeszutek Wilk
<konrad@kernel.org> wrote:
> On Thu, Oct 18, 2012 at 01:50:23PM -0700, Yinghai Lu wrote:
>> 32bit kmap mapping need page table to be used for low to high.
>                                     ^-s
>
>>
>> At this point those page table are still from pgt_buf_* from BRK,
>                                 ^s
>
>> So it is ok now.
>> But we want to move early_ioremap_page_table_range_init() out of
>> init_memory_mapping() and only call it one time later, that will
>> make page_table_range_init/page_table_kmap_check/alloc_low_page to
>> use memblock to get page.
>>
>> memblock allocation for page table are from high to low.
>                                     ^s
>>
>> So will get panic from page_table_kmap_check() that has BUG_ON to do
>> ordering checking.
>>
>> This patch add alloc_low_pages to make it possible to alloc serveral pages
>> at first, and hand out pages one by one from low to high.
>
> .. But for right now this patch just makes it by default one.

?

---
32bit kmap mapping needs pages to be used for low to high.

At this point those pages are still from pgt_buf_* from BRK, so it is
ok now.
But we want to move early_ioremap_page_table_range_init() out of
init_memory_mapping() and only call it one time later, that will
make page_table_range_init/page_table_kmap_check/alloc_low_page to
use memblock to get page.

memblock allocation for pages are from high to low.

So will get panic from page_table_kmap_check() that has BUG_ON to do
ordering checking.

This patch add alloc_low_pages to make it possible to alloc serveral pages
at first, and hand out pages one by one from low to high.

-v2: add one line comment about xen requirements.
---

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 18/19] x86, mm: Let "memmap=" take more entries one time
  2012-10-22 15:19   ` Konrad Rzeszutek Wilk
@ 2012-10-22 19:26     ` Yinghai Lu
  0 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-22 19:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Mon, Oct 22, 2012 at 8:19 AM, Konrad Rzeszutek Wilk
<konrad@kernel.org> wrote:
> On Thu, Oct 18, 2012 at 01:50:29PM -0700, Yinghai Lu wrote:
>> Current "memmap=" only can take one entry every time.
>> when we have more entries, we have to use memmap= for each of them.
>>
>> For pxe booting, we have command line length limitation, those extra
>> "memmap=" would waste too much space.
>>
>> This patch make memmap= could take several entries one time,
>> and those entries will be split with ','
>
> Um, not sure what this patch has to do with this patchset?
> Should this be sent seperatly?

during debug those patches, need to punch hole in memmap find this problem.

thought other guys could have the same problem while test this patch set.

will move it as last one

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 13/19] x86, mm: only call early_ioremap_page_table_range_init() once
  2012-10-22 15:24   ` Konrad Rzeszutek Wilk
@ 2012-10-22 19:40     ` Yinghai Lu
  2012-10-23  0:01       ` Yinghai Lu
  0 siblings, 1 reply; 63+ messages in thread
From: Yinghai Lu @ 2012-10-22 19:40 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel

On Mon, Oct 22, 2012 at 8:24 AM, Konrad Rzeszutek Wilk
<konrad@kernel.org> wrote:
> On Thu, Oct 18, 2012 at 01:50:24PM -0700, Yinghai Lu wrote:
>> On 32bit, We should not keep calling that during every init_memory_mapping.
>
> Explain pls why.
>

clear ?

---
On 32bit, before patcheset that only set page table for ram, we only
call that one time.

Now, we are calling that during every init_memory_mapping if we have holes
under max_low_pfn.

We should only call it one time after all ranges under max_low_page get
mapped just like we did before.

Also that could avoid the risk to run out of pgt_buf in BRK.

Need to update page_table_range_init() to count the pages for kmap page table
at first, and use new added alloc_low_pages() to get pages in sequence.
That will conform to the requirement that pages need to be in low to high order.
---

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 13/19] x86, mm: only call early_ioremap_page_table_range_init() once
  2012-10-22 19:40     ` Yinghai Lu
@ 2012-10-23  0:01       ` Yinghai Lu
  0 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-23  0:01 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Stefano Stabellini
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, linux-kernel

I updated the for-x86-mm branch with updated change log and several
split_mem_range clean up.

Please check them.
	git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
for-x86-mm

At the same time, please do check domU 32bit, that looks like only
path that is not tested yet.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/19] x86, mm: setup page table in top-down
  2012-10-22 18:22           ` Yinghai Lu
@ 2012-10-23 12:16             ` Stefano Stabellini
  2012-10-23 18:47               ` Yinghai Lu
  0 siblings, 1 reply; 63+ messages in thread
From: Stefano Stabellini @ 2012-10-23 12:16 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Jacob Shin, Tejun Heo, linux-kernel

On Mon, 22 Oct 2012, Yinghai Lu wrote:
> On Mon, Oct 22, 2012 at 11:17 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Mon, Oct 22, 2012 at 6:19 AM, Stefano Stabellini
> 
> > How about put sth:
> > ---
> > Xen mmu requires pages from this function should be directly mapped already.
> > ---
> >
> > or you can introduce some doc tag specially that we can out those
> > assumption easily?
> 
> I add
> 
> /* Xen requires pages from this function should be directly mapped already */
> 
> in   [PATCH] x86, mm: Add alloc_low_pages(num)
> 
> hope you are happy with that.
> 

It is not bad, but let's just fix the English a bit and give more
context:

/* Pages returned by this function are already directly mapped.
 *
 * Changing that is likely to break Xen, see commit
 * 279b706bf800b5967037f492dbe4fc5081ad5d0f for more information on the
 * subject.
 */

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/19] x86, mm: setup page table in top-down
  2012-10-22 18:17         ` Yinghai Lu
  2012-10-22 18:22           ` Yinghai Lu
@ 2012-10-23 12:22           ` Stefano Stabellini
  2012-10-23 18:37             ` Yinghai Lu
  1 sibling, 1 reply; 63+ messages in thread
From: Stefano Stabellini @ 2012-10-23 12:22 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Jacob Shin, Tejun Heo, linux-kernel

On Mon, 22 Oct 2012, Yinghai Lu wrote:
> On Mon, Oct 22, 2012 at 6:19 AM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> 
> >> > The series is starting to get in good shape!
> >> > I tested it on a 2G and an 8G VM and it seems to be working fine.
> >>
> >> domU on 32bit and 64bit?
> > domU 64bit
> 
> Can you test domU 32bit too?
> I did not test that, and looks like Jacob only test 64 bit domU too.

Sure. It works fine.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/19] x86, mm: setup page table in top-down
  2012-10-23 12:22           ` Stefano Stabellini
@ 2012-10-23 18:37             ` Yinghai Lu
  2012-10-24 10:55               ` Stefano Stabellini
  0 siblings, 1 reply; 63+ messages in thread
From: Yinghai Lu @ 2012-10-23 18:37 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, linux-kernel

On Tue, Oct 23, 2012 at 5:22 AM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Mon, 22 Oct 2012, Yinghai Lu wrote:
>> Can you test domU 32bit too?
>> I did not test that, and looks like Jacob only test 64 bit domU too.
>
> Sure. It works fine.

great.

Do you know any simple way to test xen domU after pxe booting xen/dom0 via pxe?

>From kvm, it is simple with pxe:
1. build kernel and iso at the same time., initrd is converted from
opensuse rescue initramfs.
2. copy kernel and iso into boot server
3. network boot the kernel
4. mount nfs dir so could access iso, and qemu-kvm.
5. qemu-kvm to load the iso.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/19] x86, mm: setup page table in top-down
  2012-10-23 12:16             ` Stefano Stabellini
@ 2012-10-23 18:47               ` Yinghai Lu
  0 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-23 18:47 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, linux-kernel

On Tue, Oct 23, 2012 at 5:16 AM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Mon, 22 Oct 2012, Yinghai Lu wrote:
>> On Mon, Oct 22, 2012 at 11:17 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> > On Mon, Oct 22, 2012 at 6:19 AM, Stefano Stabellini
>>
>> > How about put sth:
>> > ---
>> > Xen mmu requires pages from this function should be directly mapped already.
>> > ---
>> >
>> > or you can introduce some doc tag specially that we can out those
>> > assumption easily?
>>
>> I add
>>
>> /* Xen requires pages from this function should be directly mapped already */
>>
>> in   [PATCH] x86, mm: Add alloc_low_pages(num)
>>
>> hope you are happy with that.
>>
>
> It is not bad, but let's just fix the English a bit and give more
> context:
>
> /* Pages returned by this function are already directly mapped.
>  *
>  * Changing that is likely to break Xen, see commit
>  * 279b706bf800b5967037f492dbe4fc5081ad5d0f for more information on the
>  * subject.
>  */


I put your change in separate patch.

http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=patch;h=bc6e8a77f049f3ecaad291329238367af044aa57

>From bc6e8a77f049f3ecaad291329238367af044aa57 Mon Sep 17 00:00:00 2001
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date: Tue, 23 Oct 2012 11:24:07 -0700
Subject: [PATCH] x86, mm: Add pointer about Xen mmu requirement for
 alloc_low_pages

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index e98a4b8..88f90f7 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -25,6 +25,13 @@ unsigned long __meminitdata pgt_buf_top;

 static unsigned long min_pfn_mapped;

+/*
+ * Pages returned are already directly mapped.
+ *
+ * Changing that is likely to break Xen, see commit
+ * 279b706bf800b5967037f492dbe4fc5081ad5d0f for more information on the
+ * subject.
+ */
 __ref void *alloc_low_pages(unsigned int num)
 {
 	unsigned long pfn;
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/19] x86, mm: setup page table in top-down
  2012-10-23 18:37             ` Yinghai Lu
@ 2012-10-24 10:55               ` Stefano Stabellini
  0 siblings, 0 replies; 63+ messages in thread
From: Stefano Stabellini @ 2012-10-24 10:55 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Jacob Shin, Tejun Heo, linux-kernel

On Tue, 23 Oct 2012, Yinghai Lu wrote:
> On Tue, Oct 23, 2012 at 5:22 AM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > On Mon, 22 Oct 2012, Yinghai Lu wrote:
> >> Can you test domU 32bit too?
> >> I did not test that, and looks like Jacob only test 64 bit domU too.
> >
> > Sure. It works fine.
> 
> great.
> 
> Do you know any simple way to test xen domU after pxe booting xen/dom0 via pxe?
> 
> >From kvm, it is simple with pxe:
> 1. build kernel and iso at the same time., initrd is converted from
> opensuse rescue initramfs.
> 2. copy kernel and iso into boot server
> 3. network boot the kernel
> 4. mount nfs dir so could access iso, and qemu-kvm.
> 5. qemu-kvm to load the iso.

After building and installing the Xen tools in Dom0 (basically
everything that comes out a xen-unstable build), you can use "xl" to
create VMs:

1) build and install xen and tools in dom0 (make; make install)
If you are running dom0 out of his initramfs you might have to add them to it.
But I guess that you can also make them available to your dom0 via nfs
share.

2) copy your domU kernel and initrd to dom0

3) Write a simple config file in dom0 with the path to the kernel and initrd

name = "linux"
kernel = "/path/to/vmlinuz"
ramdisk = "/path/to/initrd"
memory = 1024
vcpus = 4
vif = [ 'bridge=xenbr0' ]
disk = [ '/path/to/debian_squeeze_amd64_standard.raw,raw,xvda,w' ]
extra = 'textmode=1 xencons=xvc0'
root = "/dev/xvda1"

4) create the VM
xl create -c /path/to/config_file


As an alternative you can get an account on Amazon EC2 for few cents per
hour and test your kernel over there ;)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] PCI: correctly detect ACPI PCI host bridge objects
  2012-10-18 20:50 ` [PATCH 2/3] PCI: correctly detect ACPI PCI host bridge objects Yinghai Lu
@ 2012-10-26  9:10   ` Bjorn Helgaas
  2012-10-26 18:13     ` Yinghai Lu
  0 siblings, 1 reply; 63+ messages in thread
From: Bjorn Helgaas @ 2012-10-26  9:10 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel, Jiang Liu,
	Len Brown, linux-acpi

On Thu, Oct 18, 2012 at 2:50 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> From: Jiang Liu <jiang.liu@huawei.com>
>
> The code in pci_root_hp.c depends on function acpi_is_root_bridge()
> to check whether an ACPI object is a PCI host bridge or not.
> If an ACPI device hasn't been created for the ACPI object yet,
> function acpi_is_root_bridge() will return false even if the object
> is a PCI host bridge object. That behavior will cause two issues:
> 1) No ACPI notification handler installed for PCI host bridges absent
>    at startup, so hotplug events for those bridges won't be handled.
> 2) rescan_root_bridge() can't reenumerate offlined PCI host bridges
>    because the ACPI devices have been already destroyed.
>
> So use acpi_match_object_info_ids() to correctly detect PCI host bridges.
>
> -v2: update to use acpi_match_object_info_ids() from Tang Chen  - Yinghai
>
> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Len Brown <lenb@kernel.org>
> Cc: linux-acpi@vger.kernel.org
> ---
>  drivers/acpi/pci_root_hp.c |   25 ++++++++++++++++++++++++-
>  1 files changed, 24 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/acpi/pci_root_hp.c b/drivers/acpi/pci_root_hp.c
> index 2aebf78..3edec7f 100644
> --- a/drivers/acpi/pci_root_hp.c
> +++ b/drivers/acpi/pci_root_hp.c
> @@ -19,6 +19,12 @@ struct acpi_root_bridge {
>         u32 flags;
>  };
>
> +static const struct acpi_device_id root_device_ids[] = {
> +       {"PNP0A03", 0},
> +       {"PNP0A08", 0},

Why do we need PNP0A08 here?

Per the PCI Firmware Spec, sec 4.1.5, PNP0A08 identifies a PCI Express
or a PCI-X Mode 2 host bridge, and such a device is required to
include PNP0A03 in _CID.  Therefore, it should be sufficient to look
only for PNP0A03.

That raises the question of why the spec defined PNP0A08 in the first
place.  It looks like PNP0A08 is intended specifically to indicate
support for extended config space (offsets 256-4095), per sec 4.1.5
(Note 1) and sec 4.3.1.

I don't think Linux currently does anything differently with PNP0A08,
so if the spec authors thought it was important to add PNP0A08, maybe
Linux is missing something.  For example, maybe we should be limiting
config space size to 256 unless we're below a PNP0A08 device.  I don't
know what that could *fix*, unless it would allow us to get rid of
some quirks or something, but usually spec authors don't add things
unless they think there's a reason it's needed.

> +       {"", 0},
> +};
> +
>  /* bridge flags */
>  #define ROOT_BRIDGE_HAS_EJ0    (0x00000002)
>  #define ROOT_BRIDGE_HAS_PS3    (0x00000080)
> @@ -256,6 +262,23 @@ static void handle_hotplug_event_root(acpi_handle handle, u32 type,
>                                 _handle_hotplug_event_root);
>  }
>
> +static bool acpi_is_root_bridge_object(acpi_handle handle)
> +{
> +       struct acpi_device_info *info = NULL;
> +       acpi_status status;
> +       bool ret;
> +
> +       status = acpi_get_object_info(handle, &info);
> +       if (ACPI_FAILURE(status))
> +               return false;
> +
> +       ret = !acpi_match_object_info_ids(info, root_device_ids);
> +
> +       kfree(info);
> +
> +       return ret;
> +}
> +
>  static acpi_status __init
>  find_root_bridges(acpi_handle handle, u32 lvl, void *context, void **rv)
>  {
> @@ -264,7 +287,7 @@ find_root_bridges(acpi_handle handle, u32 lvl, void *context, void **rv)
>                                       .pointer = objname };
>         int *count = (int *)context;
>
> -       if (!acpi_is_root_bridge(handle))
> +       if (!acpi_is_root_bridge_object(handle))
>                 return AE_OK;
>
>         (*count)++;
> --
> 1.7.7
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] PCI: correctly detect ACPI PCI host bridge objects
  2012-10-26  9:10   ` Bjorn Helgaas
@ 2012-10-26 18:13     ` Yinghai Lu
  0 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-10-26 18:13 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel, Jiang Liu,
	Len Brown, linux-acpi

On Fri, Oct 26, 2012 at 2:10 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, Oct 18, 2012 at 2:50 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>
>> +static const struct acpi_device_id root_device_ids[] = {
>> +       {"PNP0A03", 0},
>> +       {"PNP0A08", 0},
>
> Why do we need PNP0A08 here?
>
> Per the PCI Firmware Spec, sec 4.1.5, PNP0A08 identifies a PCI Express
> or a PCI-X Mode 2 host bridge, and such a device is required to
> include PNP0A03 in _CID.  Therefore, it should be sufficient to look
> only for PNP0A03.

good, I removed that line and updated for-pci-split-pci-root-hp-2 branch.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 1/3] ACPI: Introduce a new acpi handle to determine HID match.
  2012-10-18 20:50 ` [PATCH 1/3] ACPI: Introduce a new acpi handle to determine HID match Yinghai Lu
@ 2012-11-02 12:23   ` Rafael J. Wysocki
  2012-11-02 15:03     ` Yinghai Lu
  0 siblings, 1 reply; 63+ messages in thread
From: Rafael J. Wysocki @ 2012-11-02 12:23 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel, Tang Chen,
	Len Brown, linux-acpi

On Thursday, October 18, 2012 01:50:09 PM Yinghai Lu wrote:
> From: Tang Chen <tangchen@cn.fujitsu.com>
> 
> We need to find out if one handle is for root bridge, and install notify
> handler for it to handle pci root bus hot add.
> At that time, root bridge acpi device is not created yet.
> 
> So acpi_match_device_ids() will not work.
> 
> This patch add a function to check if new acpi handle's HID matches a list
> of IDs.  The new api use acpi_device_info instead acpi_device.
> 
> -v2: updated changelog, also check length for string info...
>      change checking sequence by moving string comaring close to for loop.
> 					- Yinghai
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Len Brown <lenb@kernel.org>
> Cc: linux-acpi@vger.kernel.org
> ---
>  drivers/acpi/scan.c     |   33 +++++++++++++++++++++++++++++++++
>  include/acpi/acpi_bus.h |    2 ++
>  2 files changed, 35 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 5dfec09..33ca993 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -312,6 +312,39 @@ int acpi_match_device_ids(struct acpi_device *device,
>  }
>  EXPORT_SYMBOL(acpi_match_device_ids);
>  
> +int acpi_match_object_info_ids(struct acpi_device_info *info,
> +			       const struct acpi_device_id *ids)
> +{
> +	const struct acpi_device_id *id;
> +	char *str;
> +	u32 len;
> +	int i;
> +
> +	len = info->hardware_id.length;
> +	if (len) {
> +		str = info->hardware_id.string;
> +		if (str)
> +			for (id = ids; id->id[0]; id++)
> +				if (!strcmp((char *)id->id, str))
> +					return 0;
> +	}
> +
> +	for (i = 0; i < info->compatible_id_list.count; i++) {
> +		len = info->compatible_id_list.ids[i].length;
> +		if (!len)
> +			continue;
> +		str = info->compatible_id_list.ids[i].string;
> +		if (!str)
> +			continue;
> +		for (id = ids; id->id[0]; id++)
> +			if (!strcmp((char *)id->id, str))
> +				return 0;
> +	}
> +
> +	return -ENOENT;
> +}
> +EXPORT_SYMBOL(acpi_match_object_info_ids);

EXPORT_SYMBOL_GPL, please?

> +
>  static void acpi_free_ids(struct acpi_device *device)
>  {
>  	struct acpi_hardware_id *id, *tmp;
> diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> index 608f92f..6ac415c 100644
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -374,6 +374,8 @@ int acpi_bus_start(struct acpi_device *device);
>  acpi_status acpi_bus_get_ejd(acpi_handle handle, acpi_handle * ejd);
>  int acpi_match_device_ids(struct acpi_device *device,
>  			  const struct acpi_device_id *ids);
> +int acpi_match_object_info_ids(struct acpi_device_info *info,
> +			       const struct acpi_device_id *ids);
>  int acpi_create_dir(struct acpi_device *);
>  void acpi_remove_dir(struct acpi_device *);

I wonder which code path(s) is(are) going to use the new routine?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 1/3] ACPI: Introduce a new acpi handle to determine HID match.
  2012-11-02 12:23   ` Rafael J. Wysocki
@ 2012-11-02 15:03     ` Yinghai Lu
  0 siblings, 0 replies; 63+ messages in thread
From: Yinghai Lu @ 2012-11-02 15:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jacob Shin,
	Tejun Heo, Stefano Stabellini, linux-kernel, Tang Chen,
	Len Brown, linux-acpi

On Fri, Nov 2, 2012 at 5:23 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Thursday, October 18, 2012 01:50:09 PM Yinghai Lu wrote:
>> From: Tang Chen <tangchen@cn.fujitsu.com>
>>
>> We need to find out if one handle is for root bridge, and install notify
>> handler for it to handle pci root bus hot add.
>> At that time, root bridge acpi device is not created yet.
>>
>> So acpi_match_device_ids() will not work.
>>
>> This patch add a function to check if new acpi handle's HID matches a list
>> of IDs.  The new api use acpi_device_info instead acpi_device.
>>
>> -v2: updated changelog, also check length for string info...
>>      change checking sequence by moving string comaring close to for loop.
>>                                       - Yinghai
>>
>> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> Cc: Len Brown <lenb@kernel.org>
>> Cc: linux-acpi@vger.kernel.org
>> ---
>>  drivers/acpi/scan.c     |   33 +++++++++++++++++++++++++++++++++
>>  include/acpi/acpi_bus.h |    2 ++
>>  2 files changed, 35 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
>> index 5dfec09..33ca993 100644
>> --- a/drivers/acpi/scan.c
>> +++ b/drivers/acpi/scan.c
>> @@ -312,6 +312,39 @@ int acpi_match_device_ids(struct acpi_device *device,
>>  }
>>  EXPORT_SYMBOL(acpi_match_device_ids);
>>
>> +int acpi_match_object_info_ids(struct acpi_device_info *info,
>> +                            const struct acpi_device_id *ids)
>> +{
>> +     const struct acpi_device_id *id;
>> +     char *str;
>> +     u32 len;
>> +     int i;
>> +
>> +     len = info->hardware_id.length;
>> +     if (len) {
>> +             str = info->hardware_id.string;
>> +             if (str)
>> +                     for (id = ids; id->id[0]; id++)
>> +                             if (!strcmp((char *)id->id, str))
>> +                                     return 0;
>> +     }
>> +
>> +     for (i = 0; i < info->compatible_id_list.count; i++) {
>> +             len = info->compatible_id_list.ids[i].length;
>> +             if (!len)
>> +                     continue;
>> +             str = info->compatible_id_list.ids[i].string;
>> +             if (!str)
>> +                     continue;
>> +             for (id = ids; id->id[0]; id++)
>> +                     if (!strcmp((char *)id->id, str))
>> +                             return 0;
>> +     }
>> +
>> +     return -ENOENT;
>> +}
>> +EXPORT_SYMBOL(acpi_match_object_info_ids);
>
> EXPORT_SYMBOL_GPL, please?

yes, will change that while sending next version..

>
>> +
>>  static void acpi_free_ids(struct acpi_device *device)
>>  {
>>       struct acpi_hardware_id *id, *tmp;
>> diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
>> index 608f92f..6ac415c 100644
>> --- a/include/acpi/acpi_bus.h
>> +++ b/include/acpi/acpi_bus.h
>> @@ -374,6 +374,8 @@ int acpi_bus_start(struct acpi_device *device);
>>  acpi_status acpi_bus_get_ejd(acpi_handle handle, acpi_handle * ejd);
>>  int acpi_match_device_ids(struct acpi_device *device,
>>                         const struct acpi_device_id *ids);
>> +int acpi_match_object_info_ids(struct acpi_device_info *info,
>> +                            const struct acpi_device_id *ids);
>>  int acpi_create_dir(struct acpi_device *);
>>  void acpi_remove_dir(struct acpi_device *);
>
> I wonder which code path(s) is(are) going to use the new routine?

that is for installing handler for pci root bus removal. will resend
them in batch 3.

http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=shortlog;h=refs/heads/for-pci-split-pci-root-hp-2

http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=b3752f4571a3db1bbbaf204a6cb85aadbd40b19d
   PCI, acpiphp: Separate out hot-add support of pci host bridge

http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=bda84c28ae8e00315fcc7dffceb301e082369c3e
   PCI: correctly detect ACPI PCI host bridge objects

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2012-11-02 15:03 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-18 20:50 [PATCH -v5 00/19] x86: Use BRK to pre mapping page table to make xen happy Yinghai Lu
2012-10-18 20:50 ` [PATCH 1/3] ACPI: Introduce a new acpi handle to determine HID match Yinghai Lu
2012-11-02 12:23   ` Rafael J. Wysocki
2012-11-02 15:03     ` Yinghai Lu
2012-10-18 20:50 ` [PATCH 01/19] x86, mm: Align start address to correct big page size Yinghai Lu
2012-10-22 14:16   ` Konrad Rzeszutek Wilk
2012-10-22 16:31     ` Yinghai Lu
2012-10-18 20:50 ` [PATCH 2/3] PCI: correctly detect ACPI PCI host bridge objects Yinghai Lu
2012-10-26  9:10   ` Bjorn Helgaas
2012-10-26 18:13     ` Yinghai Lu
2012-10-18 20:50 ` [PATCH 02/19] x86, mm: Use big page size for small memory range Yinghai Lu
2012-10-22 14:21   ` Konrad Rzeszutek Wilk
2012-10-22 16:33     ` Yinghai Lu
2012-10-18 20:50 ` [PATCH 3/3] PCI, ACPI: debug print for installation of acpi root bridge's notifier Yinghai Lu
2012-10-18 20:50 ` [PATCH 03/19] x86, mm: Don't clear page table if range is ram Yinghai Lu
2012-10-22 14:28   ` Konrad Rzeszutek Wilk
2012-10-22 16:56     ` Yinghai Lu
2012-10-18 20:50 ` [PATCH 04/19] x86, mm: only keep initial mapping for ram Yinghai Lu
2012-10-22 14:33   ` Konrad Rzeszutek Wilk
2012-10-22 17:43     ` Yinghai Lu
2012-10-18 20:50 ` [PATCH 05/19] x86, mm: Break down init_all_memory_mapping Yinghai Lu
2012-10-18 20:50 ` [PATCH 06/19] x86, mm: setup page table in top-down Yinghai Lu
2012-10-19 16:24   ` Stefano Stabellini
2012-10-19 16:41     ` Yinghai Lu
2012-10-22 13:19       ` Stefano Stabellini
2012-10-22 18:17         ` Yinghai Lu
2012-10-22 18:22           ` Yinghai Lu
2012-10-23 12:16             ` Stefano Stabellini
2012-10-23 18:47               ` Yinghai Lu
2012-10-23 12:22           ` Stefano Stabellini
2012-10-23 18:37             ` Yinghai Lu
2012-10-24 10:55               ` Stefano Stabellini
2012-10-22 14:14       ` Konrad Rzeszutek Wilk
2012-10-22 15:06   ` Konrad Rzeszutek Wilk
2012-10-22 18:56     ` Yinghai Lu
2012-10-18 20:50 ` [PATCH 07/19] x86, mm: Remove early_memremap workaround for page table accessing on 64bit Yinghai Lu
2012-10-22 15:07   ` Konrad Rzeszutek Wilk
2012-10-22 19:08     ` Yinghai Lu
2012-10-18 20:50 ` [PATCH 08/19] x86, mm: Remove parameter in alloc_low_page for 64bit Yinghai Lu
2012-10-22 15:09   ` Konrad Rzeszutek Wilk
2012-10-22 19:09     ` Yinghai Lu
2012-10-18 20:50 ` [PATCH 09/19] x86, mm: Merge alloc_low_page between 64bit and 32bit Yinghai Lu
2012-10-22 15:11   ` Konrad Rzeszutek Wilk
2012-10-22 19:14     ` Yinghai Lu
2012-10-18 20:50 ` [PATCH 10/19] x86, mm: Move min_pfn_mapped back to mm/init.c Yinghai Lu
2012-10-18 20:50 ` [PATCH 11/19] x86, mm, xen: Remove mapping_pagatable_reserve Yinghai Lu
2012-10-22 15:14   ` Konrad Rzeszutek Wilk
2012-10-22 19:18     ` Yinghai Lu
2012-10-18 20:50 ` [PATCH 12/19] x86, mm: Add alloc_low_pages(num) Yinghai Lu
2012-10-22 15:17   ` Konrad Rzeszutek Wilk
2012-10-22 19:24     ` Yinghai Lu
2012-10-18 20:50 ` [PATCH 13/19] x86, mm: only call early_ioremap_page_table_range_init() once Yinghai Lu
2012-10-22 15:24   ` Konrad Rzeszutek Wilk
2012-10-22 19:40     ` Yinghai Lu
2012-10-23  0:01       ` Yinghai Lu
2012-10-18 20:50 ` [PATCH 14/19] x86, mm: Move back pgt_buf_* to mm/init.c Yinghai Lu
2012-10-18 20:50 ` [PATCH 15/19] x86, mm: Move init_gbpages() out of setup.c Yinghai Lu
2012-10-18 20:50 ` [PATCH 16/19] x86, mm: change low/hignmem_pfn_init to static on 32bit Yinghai Lu
2012-10-18 20:50 ` [PATCH 17/19] x86, mm: Move function declaration into mm_internal.h Yinghai Lu
2012-10-18 20:50 ` [PATCH 18/19] x86, mm: Let "memmap=" take more entries one time Yinghai Lu
2012-10-22 15:19   ` Konrad Rzeszutek Wilk
2012-10-22 19:26     ` Yinghai Lu
2012-10-18 20:50 ` [PATCH 19/19] x86, mm: Add check before clear pte above max_low_pfn on 32bit Yinghai Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.