All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] mm, x86: Fix ioremap RAM check interfaces
@ 2015-06-19 21:44 ` Toshi Kani
  0 siblings, 0 replies; 22+ messages in thread
From: Toshi Kani @ 2015-06-19 21:44 UTC (permalink / raw)
  To: tglx, mingo, hpa, akpm
  Cc: travis, roland, dan.j.williams, x86, linux-nvdimm, linux-kernel

ioremap() checks if a target range is RAM and fails the request
if true.  There are multiple issues in the iormap RAM check
interfaces.

 1. region_is_ram() does not work at all.
 2. The RAM checks, region_is_ram() and __ioremap_caller() via
    walk_system_ram_range(), are redundant.
 3. walk_system_ram_range() requires the RAM ranges page-aligned
    in the resource table.  This restriction has allowed multiple
    ioremap calls to setup_data, which is not page-aligned.

This patchset solves issue 1 and 2.  Issue 3 is not addressed in
this patchset, but is taken into the account that ioremap continues
to allow such callers until it is addressed.

---
Toshi Kani (3):
  1/3 mm, x86: Fix warning in ioremap RAM check
  2/3 mm, x86: Remove region_is_ram() call from ioremap
  3/3 mm: Fix bugs in region_is_ram()

---
 arch/x86/mm/ioremap.c | 23 ++++++-----------------
 kernel/resource.c     |  6 +++---
 2 files changed, 9 insertions(+), 20 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 0/3] mm, x86: Fix ioremap RAM check interfaces
@ 2015-06-19 21:44 ` Toshi Kani
  0 siblings, 0 replies; 22+ messages in thread
From: Toshi Kani @ 2015-06-19 21:44 UTC (permalink / raw)
  To: tglx, mingo, hpa, akpm
  Cc: travis, roland, dan.j.williams, x86, linux-nvdimm, linux-kernel

ioremap() checks if a target range is RAM and fails the request
if true.  There are multiple issues in the iormap RAM check
interfaces.

 1. region_is_ram() does not work at all.
 2. The RAM checks, region_is_ram() and __ioremap_caller() via
    walk_system_ram_range(), are redundant.
 3. walk_system_ram_range() requires the RAM ranges page-aligned
    in the resource table.  This restriction has allowed multiple
    ioremap calls to setup_data, which is not page-aligned.

This patchset solves issue 1 and 2.  Issue 3 is not addressed in
this patchset, but is taken into the account that ioremap continues
to allow such callers until it is addressed.

---
Toshi Kani (3):
  1/3 mm, x86: Fix warning in ioremap RAM check
  2/3 mm, x86: Remove region_is_ram() call from ioremap
  3/3 mm: Fix bugs in region_is_ram()

---
 arch/x86/mm/ioremap.c | 23 ++++++-----------------
 kernel/resource.c     |  6 +++---
 2 files changed, 9 insertions(+), 20 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/3] mm, x86: Fix warning in ioremap RAM check
  2015-06-19 21:44 ` Toshi Kani
@ 2015-06-19 21:44   ` Toshi Kani
  -1 siblings, 0 replies; 22+ messages in thread
From: Toshi Kani @ 2015-06-19 21:44 UTC (permalink / raw)
  To: tglx, mingo, hpa, akpm
  Cc: travis, roland, dan.j.williams, x86, linux-nvdimm, linux-kernel,
	Toshi Kani

__ioremap_caller() calls __ioremap_check_ram() through
walk_system_ram_range() to check if a target range is RAM.
__ioremap_check_ram() has WARN_ONCE() in a wrong place that
it warns when the given range is not RAM.  This misplaced
warning is not exposed since walk_system_ram_range() only
calls __ioremap_check_ram() for RAM ranges.

Move the WARN_ONCE() to the error path of __ioremap_caller(),
and update the message to include the address range.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/mm/ioremap.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 70e7444..56f8af7 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -60,8 +60,6 @@ static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages,
 		    !PageReserved(pfn_to_page(start_pfn + i)))
 			return 1;
 
-	WARN_ONCE(1, "ioremap on RAM pfn 0x%lx\n", start_pfn);
-
 	return 0;
 }
 
@@ -128,8 +126,11 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
 		pfn      = phys_addr >> PAGE_SHIFT;
 		last_pfn = last_addr >> PAGE_SHIFT;
 		if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
-					  __ioremap_check_ram) == 1)
+					  __ioremap_check_ram) == 1) {
+			WARN_ONCE(1, "ioremap on RAM at 0x%llx - 0x%llx\n",
+					phys_addr, last_addr);
 			return NULL;
+		}
 	}
 	/*
 	 * Mappings have to be page-aligned
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 1/3] mm, x86: Fix warning in ioremap RAM check
@ 2015-06-19 21:44   ` Toshi Kani
  0 siblings, 0 replies; 22+ messages in thread
From: Toshi Kani @ 2015-06-19 21:44 UTC (permalink / raw)
  To: tglx, mingo, hpa, akpm
  Cc: travis, roland, dan.j.williams, x86, linux-nvdimm, linux-kernel,
	Toshi Kani

__ioremap_caller() calls __ioremap_check_ram() through
walk_system_ram_range() to check if a target range is RAM.
__ioremap_check_ram() has WARN_ONCE() in a wrong place that
it warns when the given range is not RAM.  This misplaced
warning is not exposed since walk_system_ram_range() only
calls __ioremap_check_ram() for RAM ranges.

Move the WARN_ONCE() to the error path of __ioremap_caller(),
and update the message to include the address range.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/mm/ioremap.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 70e7444..56f8af7 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -60,8 +60,6 @@ static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages,
 		    !PageReserved(pfn_to_page(start_pfn + i)))
 			return 1;
 
-	WARN_ONCE(1, "ioremap on RAM pfn 0x%lx\n", start_pfn);
-
 	return 0;
 }
 
@@ -128,8 +126,11 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
 		pfn      = phys_addr >> PAGE_SHIFT;
 		last_pfn = last_addr >> PAGE_SHIFT;
 		if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
-					  __ioremap_check_ram) == 1)
+					  __ioremap_check_ram) == 1) {
+			WARN_ONCE(1, "ioremap on RAM at 0x%llx - 0x%llx\n",
+					phys_addr, last_addr);
 			return NULL;
+		}
 	}
 	/*
 	 * Mappings have to be page-aligned
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
  2015-06-19 21:44 ` Toshi Kani
@ 2015-06-19 21:44   ` Toshi Kani
  -1 siblings, 0 replies; 22+ messages in thread
From: Toshi Kani @ 2015-06-19 21:44 UTC (permalink / raw)
  To: tglx, mingo, hpa, akpm
  Cc: travis, roland, dan.j.williams, x86, linux-nvdimm, linux-kernel,
	Toshi Kani

__ioremap_caller() calls region_is_ram() to look up the resource
to check if a target range is RAM, which was added as an additinal
check to improve the lookup performance over page_is_ram() (commit
906e36c5c717 "x86: use optimized ioresource lookup in ioremap
function").

__ioremap_caller() then calls walk_system_ram_range(), which had
replaced page_is_ram() to improve the lookup performance (commit
c81c8a1eeede "x86, ioremap: Speed up check for RAM pages").

Since both functions walk through the resource table, there is
no need to call the two functions.  Furthermore, region_is_ram()
has bugs and always returns with -1.  This makes
walk_system_ram_range() as the only check being used.

Hence, remove the call to region_is_ram() from __ioremap_caller().

Note, removing the call to region_is_ram() is also necessary
to fix the bugs in region_is_ram().  walk_system_ram_range()
requires RAM ranges aligned by the page size in the resource
table.  e820_reserve_setup_data() updates the e820 table by
allocating a separate entry to each data region in setup_data,
which is not page-aligned.  Therefore, walk_system_ram_range()
is unable to detect the RAM ranges in setup_data.  This
restriction has allowed multiple uses of ioremap() to map
setup_data.  Using fixed region_is_ram() will cause these callers
to start failing.  After all ioremap to setup_data are converted,
__ioremap_caller() may call region_is_ram() instead.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/mm/ioremap.c |   24 ++++++------------------
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 56f8af7..928867e 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -89,7 +89,6 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
 	pgprot_t prot;
 	int retval;
 	void __iomem *ret_addr;
-	int ram_region;
 
 	/* Don't allow wraparound or zero size */
 	last_addr = phys_addr + size - 1;
@@ -112,26 +111,15 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
 	/*
 	 * Don't allow anybody to remap normal RAM that we're using..
 	 */
-	/* First check if whole region can be identified as RAM or not */
-	ram_region = region_is_ram(phys_addr, size);
-	if (ram_region > 0) {
-		WARN_ONCE(1, "ioremap on RAM at 0x%lx - 0x%lx\n",
-				(unsigned long int)phys_addr,
-				(unsigned long int)last_addr);
-		return NULL;
-	}
-
-	/* If could not be identified(-1), check page by page */
-	if (ram_region < 0) {
-		pfn      = phys_addr >> PAGE_SHIFT;
-		last_pfn = last_addr >> PAGE_SHIFT;
-		if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
+	pfn      = phys_addr >> PAGE_SHIFT;
+	last_pfn = last_addr >> PAGE_SHIFT;
+	if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
 					  __ioremap_check_ram) == 1) {
-			WARN_ONCE(1, "ioremap on RAM at 0x%llx - 0x%llx\n",
+		WARN_ONCE(1, "ioremap on RAM at 0x%llx - 0x%llx\n",
 					phys_addr, last_addr);
-			return NULL;
-		}
+		return NULL;
 	}
+
 	/*
 	 * Mappings have to be page-aligned
 	 */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
@ 2015-06-19 21:44   ` Toshi Kani
  0 siblings, 0 replies; 22+ messages in thread
From: Toshi Kani @ 2015-06-19 21:44 UTC (permalink / raw)
  To: tglx, mingo, hpa, akpm
  Cc: travis, roland, dan.j.williams, x86, linux-nvdimm, linux-kernel,
	Toshi Kani

__ioremap_caller() calls region_is_ram() to look up the resource
to check if a target range is RAM, which was added as an additinal
check to improve the lookup performance over page_is_ram() (commit
906e36c5c717 "x86: use optimized ioresource lookup in ioremap
function").

__ioremap_caller() then calls walk_system_ram_range(), which had
replaced page_is_ram() to improve the lookup performance (commit
c81c8a1eeede "x86, ioremap: Speed up check for RAM pages").

Since both functions walk through the resource table, there is
no need to call the two functions.  Furthermore, region_is_ram()
has bugs and always returns with -1.  This makes
walk_system_ram_range() as the only check being used.

Hence, remove the call to region_is_ram() from __ioremap_caller().

Note, removing the call to region_is_ram() is also necessary
to fix the bugs in region_is_ram().  walk_system_ram_range()
requires RAM ranges aligned by the page size in the resource
table.  e820_reserve_setup_data() updates the e820 table by
allocating a separate entry to each data region in setup_data,
which is not page-aligned.  Therefore, walk_system_ram_range()
is unable to detect the RAM ranges in setup_data.  This
restriction has allowed multiple uses of ioremap() to map
setup_data.  Using fixed region_is_ram() will cause these callers
to start failing.  After all ioremap to setup_data are converted,
__ioremap_caller() may call region_is_ram() instead.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/mm/ioremap.c |   24 ++++++------------------
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 56f8af7..928867e 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -89,7 +89,6 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
 	pgprot_t prot;
 	int retval;
 	void __iomem *ret_addr;
-	int ram_region;
 
 	/* Don't allow wraparound or zero size */
 	last_addr = phys_addr + size - 1;
@@ -112,26 +111,15 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
 	/*
 	 * Don't allow anybody to remap normal RAM that we're using..
 	 */
-	/* First check if whole region can be identified as RAM or not */
-	ram_region = region_is_ram(phys_addr, size);
-	if (ram_region > 0) {
-		WARN_ONCE(1, "ioremap on RAM at 0x%lx - 0x%lx\n",
-				(unsigned long int)phys_addr,
-				(unsigned long int)last_addr);
-		return NULL;
-	}
-
-	/* If could not be identified(-1), check page by page */
-	if (ram_region < 0) {
-		pfn      = phys_addr >> PAGE_SHIFT;
-		last_pfn = last_addr >> PAGE_SHIFT;
-		if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
+	pfn      = phys_addr >> PAGE_SHIFT;
+	last_pfn = last_addr >> PAGE_SHIFT;
+	if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
 					  __ioremap_check_ram) == 1) {
-			WARN_ONCE(1, "ioremap on RAM at 0x%llx - 0x%llx\n",
+		WARN_ONCE(1, "ioremap on RAM at 0x%llx - 0x%llx\n",
 					phys_addr, last_addr);
-			return NULL;
-		}
+		return NULL;
 	}
+
 	/*
 	 * Mappings have to be page-aligned
 	 */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/3] mm: Fix bugs in region_is_ram()
  2015-06-19 21:44 ` Toshi Kani
@ 2015-06-19 21:44   ` Toshi Kani
  -1 siblings, 0 replies; 22+ messages in thread
From: Toshi Kani @ 2015-06-19 21:44 UTC (permalink / raw)
  To: tglx, mingo, hpa, akpm
  Cc: travis, roland, dan.j.williams, x86, linux-nvdimm, linux-kernel,
	Toshi Kani

region_is_ram(), which looks up the resource to check if
a target range is RAM, always returns -1 due to a bug in
the range check.  It always breaks the loop at the first
entry of the resouce table.

The function compares p->flags and flags, which also has
a bug.  The flags is declared as int, which makes it as
a negative value with IORESOURCE_BUSY (0x80000000) set
while p->flags is unsigned long.

Fix the range check and flags so that region_is_ram()
works as advertised.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 kernel/resource.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 90552aa..fed052a 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -504,13 +504,13 @@ int region_is_ram(resource_size_t start, unsigned long size)
 {
 	struct resource *p;
 	resource_size_t end = start + size - 1;
-	int flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+	unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
 	const char *name = "System RAM";
 	int ret = -1;
 
 	read_lock(&resource_lock);
 	for (p = iomem_resource.child; p ; p = p->sibling) {
-		if (end < p->start)
+		if (p->end < start)
 			continue;
 
 		if (p->start <= start && end <= p->end) {
@@ -521,7 +521,7 @@ int region_is_ram(resource_size_t start, unsigned long size)
 				ret = 1;
 			break;
 		}
-		if (p->end < start)
+		if (end < p->start)
 			break;	/* not found */
 	}
 	read_unlock(&resource_lock);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/3] mm: Fix bugs in region_is_ram()
@ 2015-06-19 21:44   ` Toshi Kani
  0 siblings, 0 replies; 22+ messages in thread
From: Toshi Kani @ 2015-06-19 21:44 UTC (permalink / raw)
  To: tglx, mingo, hpa, akpm
  Cc: travis, roland, dan.j.williams, x86, linux-nvdimm, linux-kernel,
	Toshi Kani

region_is_ram(), which looks up the resource to check if
a target range is RAM, always returns -1 due to a bug in
the range check.  It always breaks the loop at the first
entry of the resouce table.

The function compares p->flags and flags, which also has
a bug.  The flags is declared as int, which makes it as
a negative value with IORESOURCE_BUSY (0x80000000) set
while p->flags is unsigned long.

Fix the range check and flags so that region_is_ram()
works as advertised.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 kernel/resource.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 90552aa..fed052a 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -504,13 +504,13 @@ int region_is_ram(resource_size_t start, unsigned long size)
 {
 	struct resource *p;
 	resource_size_t end = start + size - 1;
-	int flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+	unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
 	const char *name = "System RAM";
 	int ret = -1;
 
 	read_lock(&resource_lock);
 	for (p = iomem_resource.child; p ; p = p->sibling) {
-		if (end < p->start)
+		if (p->end < start)
 			continue;
 
 		if (p->start <= start && end <= p->end) {
@@ -521,7 +521,7 @@ int region_is_ram(resource_size_t start, unsigned long size)
 				ret = 1;
 			break;
 		}
-		if (p->end < start)
+		if (end < p->start)
 			break;	/* not found */
 	}
 	read_unlock(&resource_lock);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
  2015-06-19 21:44   ` Toshi Kani
@ 2015-06-22 16:21     ` Mike Travis
  -1 siblings, 0 replies; 22+ messages in thread
From: Mike Travis @ 2015-06-22 16:21 UTC (permalink / raw)
  To: Toshi Kani, tglx, mingo, hpa, akpm
  Cc: roland, dan.j.williams, x86, linux-nvdimm, linux-kernel,
	Clive Harding, Russ Anderson



On 6/19/2015 2:44 PM, Toshi Kani wrote:
> __ioremap_caller() calls region_is_ram() to look up the resource
> to check if a target range is RAM, which was added as an additinal
> check to improve the lookup performance over page_is_ram() (commit
> 906e36c5c717 "x86: use optimized ioresource lookup in ioremap
> function").
> 
> __ioremap_caller() then calls walk_system_ram_range(), which had
> replaced page_is_ram() to improve the lookup performance (commit
> c81c8a1eeede "x86, ioremap: Speed up check for RAM pages").
> 
> Since both functions walk through the resource table, there is
> no need to call the two functions.  Furthermore, region_is_ram()
> has bugs and always returns with -1.  This makes
> walk_system_ram_range() as the only check being used.

Do you have an example of a failing case?  Also, I didn't know that
IOREMAP'd addresses were allowed to be on non-page boundaries?

Here's the comment and reason for the patches from Patch 0:

<<<
We have a large university system in the UK that is experiencing
very long delays modprobing the driver for a specific I/O device.
The delay is from 8-10 minutes per device and there are 31 devices
in the system.  This 4 to 5 hour delay in starting up those I/O
devices is very much a burden on the customer.
...
The problem was tracked down to a very slow IOREMAP operation and
the excessively long ioresource lookup to insure that the user is
not attempting to ioremap RAM.  These patches provide a speed up
to that function.
>>>

The speed up was pretty dramatic, I think to about 15-20 minutes
(the test was done by our local CS person in the UK).  I think this
would prove the function was working since it would have fallen
back to the previous page_is_ram function and the 4 to 5 hour
startup.

If there is a failure, it would be better for all to fix the specific
bug and not re-introduce the original problem.  Perhaps drop to
page is ram if the address is not page aligned?

> Hence, remove the call to region_is_ram() from __ioremap_caller().
> 
> Note, removing the call to region_is_ram() is also necessary
> to fix the bugs in region_is_ram().  walk_system_ram_range()
> requires RAM ranges aligned by the page size in the resource
> table.  e820_reserve_setup_data() updates the e820 table by
> allocating a separate entry to each data region in setup_data,
> which is not page-aligned.  Therefore, walk_system_ram_range()
> is unable to detect the RAM ranges in setup_data.  This
> restriction has allowed multiple uses of ioremap() to map
> setup_data.  Using fixed region_is_ram() will cause these callers
> to start failing.  After all ioremap to setup_data are converted,
> __ioremap_caller() may call region_is_ram() instead.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  arch/x86/mm/ioremap.c |   24 ++++++------------------
>  1 file changed, 6 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 56f8af7..928867e 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -89,7 +89,6 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
>  	pgprot_t prot;
>  	int retval;
>  	void __iomem *ret_addr;
> -	int ram_region;
>  
>  	/* Don't allow wraparound or zero size */
>  	last_addr = phys_addr + size - 1;
> @@ -112,26 +111,15 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
>  	/*
>  	 * Don't allow anybody to remap normal RAM that we're using..
>  	 */
> -	/* First check if whole region can be identified as RAM or not */
> -	ram_region = region_is_ram(phys_addr, size);
> -	if (ram_region > 0) {
> -		WARN_ONCE(1, "ioremap on RAM at 0x%lx - 0x%lx\n",
> -				(unsigned long int)phys_addr,
> -				(unsigned long int)last_addr);
> -		return NULL;
> -	}
> -
> -	/* If could not be identified(-1), check page by page */
> -	if (ram_region < 0) {
> -		pfn      = phys_addr >> PAGE_SHIFT;
> -		last_pfn = last_addr >> PAGE_SHIFT;
> -		if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
> +	pfn      = phys_addr >> PAGE_SHIFT;
> +	last_pfn = last_addr >> PAGE_SHIFT;
> +	if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
>  					  __ioremap_check_ram) == 1) {
> -			WARN_ONCE(1, "ioremap on RAM at 0x%llx - 0x%llx\n",
> +		WARN_ONCE(1, "ioremap on RAM at 0x%llx - 0x%llx\n",
>  					phys_addr, last_addr);
> -			return NULL;
> -		}
> +		return NULL;
>  	}
> +
>  	/*
>  	 * Mappings have to be page-aligned
>  	 */
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
@ 2015-06-22 16:21     ` Mike Travis
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Travis @ 2015-06-22 16:21 UTC (permalink / raw)
  To: Toshi Kani, tglx, mingo, hpa, akpm
  Cc: roland, dan.j.williams, x86, linux-nvdimm, linux-kernel,
	Clive Harding, Russ Anderson



On 6/19/2015 2:44 PM, Toshi Kani wrote:
> __ioremap_caller() calls region_is_ram() to look up the resource
> to check if a target range is RAM, which was added as an additinal
> check to improve the lookup performance over page_is_ram() (commit
> 906e36c5c717 "x86: use optimized ioresource lookup in ioremap
> function").
> 
> __ioremap_caller() then calls walk_system_ram_range(), which had
> replaced page_is_ram() to improve the lookup performance (commit
> c81c8a1eeede "x86, ioremap: Speed up check for RAM pages").
> 
> Since both functions walk through the resource table, there is
> no need to call the two functions.  Furthermore, region_is_ram()
> has bugs and always returns with -1.  This makes
> walk_system_ram_range() as the only check being used.

Do you have an example of a failing case?  Also, I didn't know that
IOREMAP'd addresses were allowed to be on non-page boundaries?

Here's the comment and reason for the patches from Patch 0:

<<<
We have a large university system in the UK that is experiencing
very long delays modprobing the driver for a specific I/O device.
The delay is from 8-10 minutes per device and there are 31 devices
in the system.  This 4 to 5 hour delay in starting up those I/O
devices is very much a burden on the customer.
...
The problem was tracked down to a very slow IOREMAP operation and
the excessively long ioresource lookup to insure that the user is
not attempting to ioremap RAM.  These patches provide a speed up
to that function.
>>>

The speed up was pretty dramatic, I think to about 15-20 minutes
(the test was done by our local CS person in the UK).  I think this
would prove the function was working since it would have fallen
back to the previous page_is_ram function and the 4 to 5 hour
startup.

If there is a failure, it would be better for all to fix the specific
bug and not re-introduce the original problem.  Perhaps drop to
page is ram if the address is not page aligned?

> Hence, remove the call to region_is_ram() from __ioremap_caller().
> 
> Note, removing the call to region_is_ram() is also necessary
> to fix the bugs in region_is_ram().  walk_system_ram_range()
> requires RAM ranges aligned by the page size in the resource
> table.  e820_reserve_setup_data() updates the e820 table by
> allocating a separate entry to each data region in setup_data,
> which is not page-aligned.  Therefore, walk_system_ram_range()
> is unable to detect the RAM ranges in setup_data.  This
> restriction has allowed multiple uses of ioremap() to map
> setup_data.  Using fixed region_is_ram() will cause these callers
> to start failing.  After all ioremap to setup_data are converted,
> __ioremap_caller() may call region_is_ram() instead.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  arch/x86/mm/ioremap.c |   24 ++++++------------------
>  1 file changed, 6 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 56f8af7..928867e 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -89,7 +89,6 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
>  	pgprot_t prot;
>  	int retval;
>  	void __iomem *ret_addr;
> -	int ram_region;
>  
>  	/* Don't allow wraparound or zero size */
>  	last_addr = phys_addr + size - 1;
> @@ -112,26 +111,15 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
>  	/*
>  	 * Don't allow anybody to remap normal RAM that we're using..
>  	 */
> -	/* First check if whole region can be identified as RAM or not */
> -	ram_region = region_is_ram(phys_addr, size);
> -	if (ram_region > 0) {
> -		WARN_ONCE(1, "ioremap on RAM at 0x%lx - 0x%lx\n",
> -				(unsigned long int)phys_addr,
> -				(unsigned long int)last_addr);
> -		return NULL;
> -	}
> -
> -	/* If could not be identified(-1), check page by page */
> -	if (ram_region < 0) {
> -		pfn      = phys_addr >> PAGE_SHIFT;
> -		last_pfn = last_addr >> PAGE_SHIFT;
> -		if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
> +	pfn      = phys_addr >> PAGE_SHIFT;
> +	last_pfn = last_addr >> PAGE_SHIFT;
> +	if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
>  					  __ioremap_check_ram) == 1) {
> -			WARN_ONCE(1, "ioremap on RAM at 0x%llx - 0x%llx\n",
> +		WARN_ONCE(1, "ioremap on RAM at 0x%llx - 0x%llx\n",
>  					phys_addr, last_addr);
> -			return NULL;
> -		}
> +		return NULL;
>  	}
> +
>  	/*
>  	 * Mappings have to be page-aligned
>  	 */
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
  2015-06-22 16:21     ` Mike Travis
@ 2015-06-22 17:23       ` Toshi Kani
  -1 siblings, 0 replies; 22+ messages in thread
From: Toshi Kani @ 2015-06-22 17:23 UTC (permalink / raw)
  To: Mike Travis
  Cc: tglx, mingo, hpa, akpm, roland, dan.j.williams, x86,
	linux-nvdimm, linux-kernel, Clive Harding, Russ Anderson

On Mon, 2015-06-22 at 09:21 -0700, Mike Travis wrote:
> 
> On 6/19/2015 2:44 PM, Toshi Kani wrote:
> > __ioremap_caller() calls region_is_ram() to look up the resource
> > to check if a target range is RAM, which was added as an additinal
> > check to improve the lookup performance over page_is_ram() (commit
> > 906e36c5c717 "x86: use optimized ioresource lookup in ioremap
> > function").
> > 
> > __ioremap_caller() then calls walk_system_ram_range(), which had
> > replaced page_is_ram() to improve the lookup performance (commit
> > c81c8a1eeede "x86, ioremap: Speed up check for RAM pages").
> > 
> > Since both functions walk through the resource table, there is
> > no need to call the two functions.  Furthermore, region_is_ram()
> > has bugs and always returns with -1.  This makes
> > walk_system_ram_range() as the only check being used.
> 
> Do you have an example of a failing case?  

Well, region_is_ram() fails with -1 for every case... This is because it
breaks the for-loop at 'if (p->end < start)' in the first entry (i.e.
the lowest address range) of the resource table.  For example, the first
entry of 'p->end' is 0xfff on my system, which is certainly smaller than
'start'.

  # cat /proc/iomem
  00000000-00000fff : reserved
  00001000-00092fff : System RAM
  00093000-00093fff : reserved
	:

> Also, I didn't know that
> IOREMAP'd addresses were allowed to be on non-page boundaries?

Yes, that is allowed.  Here is a comment in __ioremap_caller(). 

 * NOTE! We need to allow non-page-aligned mappings too: we will
obviously
 * have to convert them into an offset in a page-aligned mapping, but
the
 * caller shouldn't need to know that small detail.

> Here's the comment and reason for the patches from Patch 0:
> 
> <<<
> We have a large university system in the UK that is experiencing
> very long delays modprobing the driver for a specific I/O device.
> The delay is from 8-10 minutes per device and there are 31 devices
> in the system.  This 4 to 5 hour delay in starting up those I/O
> devices is very much a burden on the customer.
> ...
> The problem was tracked down to a very slow IOREMAP operation and
> the excessively long ioresource lookup to insure that the user is
> not attempting to ioremap RAM.  These patches provide a speed up
> to that function.
> >>>
> 
> The speed up was pretty dramatic, I think to about 15-20 minutes
> (the test was done by our local CS person in the UK).  I think this
> would prove the function was working since it would have fallen
> back to the previous page_is_ram function and the 4 to 5 hour
> startup.

I wonder how this test was conducted.  When the region_is_ram() change
got in, the kernel already had the other speed up change (c81c8a1eeede),
which had replaced page_is_ram().

> If there is a failure, it would be better for all to fix the specific
> bug and not re-introduce the original problem.  Perhaps drop to
> page is ram if the address is not page aligned?

walk_system_ram_range() is the one that has the issue with
page-unaligned address.  region_is_ram() after fixed by patch 3/3 does
not have this issue.  ioremap() does not call page_is_ram() anymore.  

pcibios_add_device(), ksysfs.c, kdebugfs.c (and possibly more) call
ioremap for setup_data, which is page-unaligned.  If we are going to
disallow such callers, they all need to be converted with a different
mapping interface, such as vmap().  But vmap() takes an array of page
pointers as an argument, and is not convenient for them to use.  Since
setup_data is a special range, if they need to be changed may be
arguable.  I think issue 3 described in patch 0/3 needs further
discussion.  For now, I'd like to fix issue 1 & 2.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
@ 2015-06-22 17:23       ` Toshi Kani
  0 siblings, 0 replies; 22+ messages in thread
From: Toshi Kani @ 2015-06-22 17:23 UTC (permalink / raw)
  To: Mike Travis
  Cc: tglx, mingo, hpa, akpm, roland, dan.j.williams, x86,
	linux-nvdimm, linux-kernel, Clive Harding, Russ Anderson

On Mon, 2015-06-22 at 09:21 -0700, Mike Travis wrote:
> 
> On 6/19/2015 2:44 PM, Toshi Kani wrote:
> > __ioremap_caller() calls region_is_ram() to look up the resource
> > to check if a target range is RAM, which was added as an additinal
> > check to improve the lookup performance over page_is_ram() (commit
> > 906e36c5c717 "x86: use optimized ioresource lookup in ioremap
> > function").
> > 
> > __ioremap_caller() then calls walk_system_ram_range(), which had
> > replaced page_is_ram() to improve the lookup performance (commit
> > c81c8a1eeede "x86, ioremap: Speed up check for RAM pages").
> > 
> > Since both functions walk through the resource table, there is
> > no need to call the two functions.  Furthermore, region_is_ram()
> > has bugs and always returns with -1.  This makes
> > walk_system_ram_range() as the only check being used.
> 
> Do you have an example of a failing case?  

Well, region_is_ram() fails with -1 for every case... This is because it
breaks the for-loop at 'if (p->end < start)' in the first entry (i.e.
the lowest address range) of the resource table.  For example, the first
entry of 'p->end' is 0xfff on my system, which is certainly smaller than
'start'.

  # cat /proc/iomem
  00000000-00000fff : reserved
  00001000-00092fff : System RAM
  00093000-00093fff : reserved
	:

> Also, I didn't know that
> IOREMAP'd addresses were allowed to be on non-page boundaries?

Yes, that is allowed.  Here is a comment in __ioremap_caller(). 

 * NOTE! We need to allow non-page-aligned mappings too: we will
obviously
 * have to convert them into an offset in a page-aligned mapping, but
the
 * caller shouldn't need to know that small detail.

> Here's the comment and reason for the patches from Patch 0:
> 
> <<<
> We have a large university system in the UK that is experiencing
> very long delays modprobing the driver for a specific I/O device.
> The delay is from 8-10 minutes per device and there are 31 devices
> in the system.  This 4 to 5 hour delay in starting up those I/O
> devices is very much a burden on the customer.
> ...
> The problem was tracked down to a very slow IOREMAP operation and
> the excessively long ioresource lookup to insure that the user is
> not attempting to ioremap RAM.  These patches provide a speed up
> to that function.
> >>>
> 
> The speed up was pretty dramatic, I think to about 15-20 minutes
> (the test was done by our local CS person in the UK).  I think this
> would prove the function was working since it would have fallen
> back to the previous page_is_ram function and the 4 to 5 hour
> startup.

I wonder how this test was conducted.  When the region_is_ram() change
got in, the kernel already had the other speed up change (c81c8a1eeede),
which had replaced page_is_ram().

> If there is a failure, it would be better for all to fix the specific
> bug and not re-introduce the original problem.  Perhaps drop to
> page is ram if the address is not page aligned?

walk_system_ram_range() is the one that has the issue with
page-unaligned address.  region_is_ram() after fixed by patch 3/3 does
not have this issue.  ioremap() does not call page_is_ram() anymore.  

pcibios_add_device(), ksysfs.c, kdebugfs.c (and possibly more) call
ioremap for setup_data, which is page-unaligned.  If we are going to
disallow such callers, they all need to be converted with a different
mapping interface, such as vmap().  But vmap() takes an array of page
pointers as an argument, and is not convenient for them to use.  Since
setup_data is a special range, if they need to be changed may be
arguable.  I think issue 3 described in patch 0/3 needs further
discussion.  For now, I'd like to fix issue 1 & 2.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
  2015-06-22 17:23       ` Toshi Kani
@ 2015-06-22 18:22         ` Mike Travis
  -1 siblings, 0 replies; 22+ messages in thread
From: Mike Travis @ 2015-06-22 18:22 UTC (permalink / raw)
  To: Toshi Kani
  Cc: tglx, mingo, hpa, akpm, roland, dan.j.williams, x86,
	linux-nvdimm, linux-kernel, Clive Harding, Russ Anderson



On 6/22/2015 10:23 AM, Toshi Kani wrote:
> On Mon, 2015-06-22 at 09:21 -0700, Mike Travis wrote:
>>
>> On 6/19/2015 2:44 PM, Toshi Kani wrote:
>>> __ioremap_caller() calls region_is_ram() to look up the resource
>>> to check if a target range is RAM, which was added as an additinal
>>> check to improve the lookup performance over page_is_ram() (commit
>>> 906e36c5c717 "x86: use optimized ioresource lookup in ioremap
>>> function").
>>>
>>> __ioremap_caller() then calls walk_system_ram_range(), which had
>>> replaced page_is_ram() to improve the lookup performance (commit
>>> c81c8a1eeede "x86, ioremap: Speed up check for RAM pages").
>>>
>>> Since both functions walk through the resource table, there is
>>> no need to call the two functions.  Furthermore, region_is_ram()
>>> has bugs and always returns with -1.  This makes
>>> walk_system_ram_range() as the only check being used.
>>
>> Do you have an example of a failing case?  
> 
> Well, region_is_ram() fails with -1 for every case... This is because it
> breaks the for-loop at 'if (p->end < start)' in the first entry (i.e.
> the lowest address range) of the resource table.  For example, the first
> entry of 'p->end' is 0xfff on my system, which is certainly smaller than
> 'start'.
> 
>   # cat /proc/iomem
>   00000000-00000fff : reserved
>   00001000-00092fff : System RAM
>   00093000-00093fff : reserved
> 	:

That is a small entry.  But I don't understand that when it
returned the -1, the page_is_ram function was not used instead?
Or am I missing something?

-	/* If could not be identified(-1), check page by page */
-	if (ram_region < 0) {
-		pfn      = phys_addr >> PAGE_SHIFT;
-		last_pfn = last_addr >> PAGE_SHIFT;
-		if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,

>> Also, I didn't know that
>> IOREMAP'd addresses were allowed to be on non-page boundaries?
> 
> Yes, that is allowed.  Here is a comment in __ioremap_caller(). 
> 
>  * NOTE! We need to allow non-page-aligned mappings too: we will
> obviously
>  * have to convert them into an offset in a page-aligned mapping, but
> the
>  * caller shouldn't need to know that small detail.

You're right, I forgot about that realignment.  The original
intent was to try to optimize and if that failed, fall back
to the original method.  I think this case would have been 
caught as well?

> 
>> Here's the comment and reason for the patches from Patch 0:
>>
>> <<<
>> We have a large university system in the UK that is experiencing
>> very long delays modprobing the driver for a specific I/O device.
>> The delay is from 8-10 minutes per device and there are 31 devices
>> in the system.  This 4 to 5 hour delay in starting up those I/O
>> devices is very much a burden on the customer.
>> ...
>> The problem was tracked down to a very slow IOREMAP operation and
>> the excessively long ioresource lookup to insure that the user is
>> not attempting to ioremap RAM.  These patches provide a speed up
>> to that function.
>>>>>
>>
>> The speed up was pretty dramatic, I think to about 15-20 minutes
>> (the test was done by our local CS person in the UK).  I think this
>> would prove the function was working since it would have fallen
>> back to the previous page_is_ram function and the 4 to 5 hour
>> startup.
> 
> I wonder how this test was conducted.  When the region_is_ram() change
> got in, the kernel already had the other speed up change (c81c8a1eeede),
> which had replaced page_is_ram().

The onsite system was not running the latest kernel (these large
system customers are very reluctant to disrupt their running
environments.)

> 
>> If there is a failure, it would be better for all to fix the specific
>> bug and not re-introduce the original problem.  Perhaps drop to
>> page is ram if the address is not page aligned?
> 
> walk_system_ram_range() is the one that has the issue with
> page-unaligned address.  region_is_ram() after fixed by patch 3/3 does
> not have this issue.  ioremap() does not call page_is_ram() anymore.

It appears ioremap does not call region_is_ram either.

-       /* First check if whole region can be identified as RAM or not */
-       ram_region = region_is_ram(phys_addr, size);
-       if (ram_region > 0) {
...
+       pfn      = phys_addr >> PAGE_SHIFT;
+       last_pfn = last_addr >> PAGE_SHIFT;
+       if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,

It appears that the walk_system_ram_range() patch does supersede
the region_is_ram patch.  Perhaps we can add a caveat to this
patch that you should not use this patch without also using
the patch from c81c8a1eeede?  Otherwise the excessive slowdown
in ioremap will be reintroduced?

(I'm thinking about back ports to distro kernels that customers use.)
 
> 
> pcibios_add_device(), ksysfs.c, kdebugfs.c (and possibly more) call
> ioremap for setup_data, which is page-unaligned.  If we are going to
> disallow such callers, they all need to be converted with a different
> mapping interface, such as vmap().  But vmap() takes an array of page
> pointers as an argument, and is not convenient for them to use.  Since
> setup_data is a special range, if they need to be changed may be
> arguable.  I think issue 3 described in patch 0/3 needs further
> discussion.  For now, I'd like to fix issue 1 & 2.

Since __ioremap_caller() was the only caller of region_is_ram, then
the function can be removed instead of being fixed.

Thanks,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
@ 2015-06-22 18:22         ` Mike Travis
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Travis @ 2015-06-22 18:22 UTC (permalink / raw)
  To: Toshi Kani
  Cc: tglx, mingo, hpa, akpm, roland, dan.j.williams, x86,
	linux-nvdimm, linux-kernel, Clive Harding, Russ Anderson



On 6/22/2015 10:23 AM, Toshi Kani wrote:
> On Mon, 2015-06-22 at 09:21 -0700, Mike Travis wrote:
>>
>> On 6/19/2015 2:44 PM, Toshi Kani wrote:
>>> __ioremap_caller() calls region_is_ram() to look up the resource
>>> to check if a target range is RAM, which was added as an additinal
>>> check to improve the lookup performance over page_is_ram() (commit
>>> 906e36c5c717 "x86: use optimized ioresource lookup in ioremap
>>> function").
>>>
>>> __ioremap_caller() then calls walk_system_ram_range(), which had
>>> replaced page_is_ram() to improve the lookup performance (commit
>>> c81c8a1eeede "x86, ioremap: Speed up check for RAM pages").
>>>
>>> Since both functions walk through the resource table, there is
>>> no need to call the two functions.  Furthermore, region_is_ram()
>>> has bugs and always returns with -1.  This makes
>>> walk_system_ram_range() as the only check being used.
>>
>> Do you have an example of a failing case?  
> 
> Well, region_is_ram() fails with -1 for every case... This is because it
> breaks the for-loop at 'if (p->end < start)' in the first entry (i.e.
> the lowest address range) of the resource table.  For example, the first
> entry of 'p->end' is 0xfff on my system, which is certainly smaller than
> 'start'.
> 
>   # cat /proc/iomem
>   00000000-00000fff : reserved
>   00001000-00092fff : System RAM
>   00093000-00093fff : reserved
> 	:

That is a small entry.  But I don't understand that when it
returned the -1, the page_is_ram function was not used instead?
Or am I missing something?

-	/* If could not be identified(-1), check page by page */
-	if (ram_region < 0) {
-		pfn      = phys_addr >> PAGE_SHIFT;
-		last_pfn = last_addr >> PAGE_SHIFT;
-		if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,

>> Also, I didn't know that
>> IOREMAP'd addresses were allowed to be on non-page boundaries?
> 
> Yes, that is allowed.  Here is a comment in __ioremap_caller(). 
> 
>  * NOTE! We need to allow non-page-aligned mappings too: we will
> obviously
>  * have to convert them into an offset in a page-aligned mapping, but
> the
>  * caller shouldn't need to know that small detail.

You're right, I forgot about that realignment.  The original
intent was to try to optimize and if that failed, fall back
to the original method.  I think this case would have been 
caught as well?

> 
>> Here's the comment and reason for the patches from Patch 0:
>>
>> <<<
>> We have a large university system in the UK that is experiencing
>> very long delays modprobing the driver for a specific I/O device.
>> The delay is from 8-10 minutes per device and there are 31 devices
>> in the system.  This 4 to 5 hour delay in starting up those I/O
>> devices is very much a burden on the customer.
>> ...
>> The problem was tracked down to a very slow IOREMAP operation and
>> the excessively long ioresource lookup to insure that the user is
>> not attempting to ioremap RAM.  These patches provide a speed up
>> to that function.
>>>>>
>>
>> The speed up was pretty dramatic, I think to about 15-20 minutes
>> (the test was done by our local CS person in the UK).  I think this
>> would prove the function was working since it would have fallen
>> back to the previous page_is_ram function and the 4 to 5 hour
>> startup.
> 
> I wonder how this test was conducted.  When the region_is_ram() change
> got in, the kernel already had the other speed up change (c81c8a1eeede),
> which had replaced page_is_ram().

The onsite system was not running the latest kernel (these large
system customers are very reluctant to disrupt their running
environments.)

> 
>> If there is a failure, it would be better for all to fix the specific
>> bug and not re-introduce the original problem.  Perhaps drop to
>> page is ram if the address is not page aligned?
> 
> walk_system_ram_range() is the one that has the issue with
> page-unaligned address.  region_is_ram() after fixed by patch 3/3 does
> not have this issue.  ioremap() does not call page_is_ram() anymore.

It appears ioremap does not call region_is_ram either.

-       /* First check if whole region can be identified as RAM or not */
-       ram_region = region_is_ram(phys_addr, size);
-       if (ram_region > 0) {
...
+       pfn      = phys_addr >> PAGE_SHIFT;
+       last_pfn = last_addr >> PAGE_SHIFT;
+       if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,

It appears that the walk_system_ram_range() patch does supersede
the region_is_ram patch.  Perhaps we can add a caveat to this
patch that you should not use this patch without also using
the patch from c81c8a1eeede?  Otherwise the excessive slowdown
in ioremap will be reintroduced?

(I'm thinking about back ports to distro kernels that customers use.)
 
> 
> pcibios_add_device(), ksysfs.c, kdebugfs.c (and possibly more) call
> ioremap for setup_data, which is page-unaligned.  If we are going to
> disallow such callers, they all need to be converted with a different
> mapping interface, such as vmap().  But vmap() takes an array of page
> pointers as an argument, and is not convenient for them to use.  Since
> setup_data is a special range, if they need to be changed may be
> arguable.  I think issue 3 described in patch 0/3 needs further
> discussion.  For now, I'd like to fix issue 1 & 2.

Since __ioremap_caller() was the only caller of region_is_ram, then
the function can be removed instead of being fixed.

Thanks,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
  2015-06-22 18:22         ` Mike Travis
@ 2015-06-22 19:06           ` Toshi Kani
  -1 siblings, 0 replies; 22+ messages in thread
From: Toshi Kani @ 2015-06-22 19:06 UTC (permalink / raw)
  To: Mike Travis
  Cc: tglx, mingo, hpa, akpm, roland, dan.j.williams, x86,
	linux-nvdimm, linux-kernel, Clive Harding, Russ Anderson

On Mon, 2015-06-22 at 11:22 -0700, Mike Travis wrote:
> 
> On 6/22/2015 10:23 AM, Toshi Kani wrote:
> > On Mon, 2015-06-22 at 09:21 -0700, Mike Travis wrote:
> >>
> >> On 6/19/2015 2:44 PM, Toshi Kani wrote:
> >>> __ioremap_caller() calls region_is_ram() to look up the resource
> >>> to check if a target range is RAM, which was added as an additinal
> >>> check to improve the lookup performance over page_is_ram() (commit
> >>> 906e36c5c717 "x86: use optimized ioresource lookup in ioremap
> >>> function").
> >>>
> >>> __ioremap_caller() then calls walk_system_ram_range(), which had
> >>> replaced page_is_ram() to improve the lookup performance (commit
> >>> c81c8a1eeede "x86, ioremap: Speed up check for RAM pages").
> >>>
> >>> Since both functions walk through the resource table, there is
> >>> no need to call the two functions.  Furthermore, region_is_ram()
> >>> has bugs and always returns with -1.  This makes
> >>> walk_system_ram_range() as the only check being used.
> >>
> >> Do you have an example of a failing case?  
> > 
> > Well, region_is_ram() fails with -1 for every case... This is because it
> > breaks the for-loop at 'if (p->end < start)' in the first entry (i.e.
> > the lowest address range) of the resource table.  For example, the first
> > entry of 'p->end' is 0xfff on my system, which is certainly smaller than
> > 'start'.
> > 
> >   # cat /proc/iomem
> >   00000000-00000fff : reserved
> >   00001000-00092fff : System RAM
> >   00093000-00093fff : reserved
> > 	:
> 
> That is a small entry.  But I don't understand that when it
> returned the -1, the page_is_ram function was not used instead?
> Or am I missing something?
> 
> -	/* If could not be identified(-1), check page by page */
> -	if (ram_region < 0) {
> -		pfn      = phys_addr >> PAGE_SHIFT;
> -		last_pfn = last_addr >> PAGE_SHIFT;
> -		if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,

I am afraid that you are missing something...  After region_is_ram()
failed, __ioremap_call() calls walk_system_ram_range(), not
page_is_ram().  This patch 2/3 removes the call to region_is_ram(), and
calls walk_system_ram_range() unconditionally without checking
'ram_region', which is set by region_is_ram().  Please note that ioremap
does not call page_is_ram() any more.  It had been removed by
c81c8a1eeede.

> >> Also, I didn't know that
> >> IOREMAP'd addresses were allowed to be on non-page boundaries?
> > 
> > Yes, that is allowed.  Here is a comment in __ioremap_caller(). 
> > 
> >  * NOTE! We need to allow non-page-aligned mappings too: we will
> > obviously
> >  * have to convert them into an offset in a page-aligned mapping, but
> > the
> >  * caller shouldn't need to know that small detail.
> 
> You're right, I forgot about that realignment.  The original
> intent was to try to optimize and if that failed, fall back
> to the original method.  I think this case would have been 
> caught as well?

Yes, this case would have been caught if region_is_ram() would have
worked.
 
> >> Here's the comment and reason for the patches from Patch 0:
> >>
> >> <<<
> >> We have a large university system in the UK that is experiencing
> >> very long delays modprobing the driver for a specific I/O device.
> >> The delay is from 8-10 minutes per device and there are 31 devices
> >> in the system.  This 4 to 5 hour delay in starting up those I/O
> >> devices is very much a burden on the customer.
> >> ...
> >> The problem was tracked down to a very slow IOREMAP operation and
> >> the excessively long ioresource lookup to insure that the user is
> >> not attempting to ioremap RAM.  These patches provide a speed up
> >> to that function.
> >>>>>
> >>
> >> The speed up was pretty dramatic, I think to about 15-20 minutes
> >> (the test was done by our local CS person in the UK).  I think this
> >> would prove the function was working since it would have fallen
> >> back to the previous page_is_ram function and the 4 to 5 hour
> >> startup.
> > 
> > I wonder how this test was conducted.  When the region_is_ram() change
> > got in, the kernel already had the other speed up change (c81c8a1eeede),
> > which had replaced page_is_ram().
> 
> The onsite system was not running the latest kernel (these large
> system customers are very reluctant to disrupt their running
> environments.)

Then you had probably replaced page_is_ram() with region_is_ram() and
ignored the error in this test...
 
> >> If there is a failure, it would be better for all to fix the specific
> >> bug and not re-introduce the original problem.  Perhaps drop to
> >> page is ram if the address is not page aligned?
> > 
> > walk_system_ram_range() is the one that has the issue with
> > page-unaligned address.  region_is_ram() after fixed by patch 3/3 does
> > not have this issue.  ioremap() does not call page_is_ram() anymore.
> 
> It appears ioremap does not call region_is_ram either.
> 
> -       /* First check if whole region can be identified as RAM or not */
> -       ram_region = region_is_ram(phys_addr, size);
> -       if (ram_region > 0) {
> ...
> +       pfn      = phys_addr >> PAGE_SHIFT;
> +       last_pfn = last_addr >> PAGE_SHIFT;
> +       if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,

That's the case after this patch 2/3 is applied.  ioremap calls
region_is_ram() today.

> It appears that the walk_system_ram_range() patch does supersede
> the region_is_ram patch.  Perhaps we can add a caveat to this
> patch that you should not use this patch without also using
> the patch from c81c8a1eeede?  Otherwise the excessive slowdown
> in ioremap will be reintroduced?
> 
> (I'm thinking about back ports to distro kernels that customers use.)

This patch changes ioremap() to call walk_system_ram_range() only.
Since walk_system_ram_range() walks through the resource table, there is
no such slowdown issue.  In other words, there is no behavioral change
in this patch, except it'd be slightly faster.
 
> > pcibios_add_device(), ksysfs.c, kdebugfs.c (and possibly more) call
> > ioremap for setup_data, which is page-unaligned.  If we are going to
> > disallow such callers, they all need to be converted with a different
> > mapping interface, such as vmap().  But vmap() takes an array of page
> > pointers as an argument, and is not convenient for them to use.  Since
> > setup_data is a special range, if they need to be changed may be
> > arguable.  I think issue 3 described in patch 0/3 needs further
> > discussion.  For now, I'd like to fix issue 1 & 2.
> 
> Since __ioremap_caller() was the only caller of region_is_ram, then
> the function can be removed instead of being fixed.

Well, there is a new caller, memremap_valid(), in Dan's patchset.
https://lkml.org/lkml/2015/6/22/100

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
@ 2015-06-22 19:06           ` Toshi Kani
  0 siblings, 0 replies; 22+ messages in thread
From: Toshi Kani @ 2015-06-22 19:06 UTC (permalink / raw)
  To: Mike Travis
  Cc: tglx, mingo, hpa, akpm, roland, dan.j.williams, x86,
	linux-nvdimm, linux-kernel, Clive Harding, Russ Anderson

On Mon, 2015-06-22 at 11:22 -0700, Mike Travis wrote:
> 
> On 6/22/2015 10:23 AM, Toshi Kani wrote:
> > On Mon, 2015-06-22 at 09:21 -0700, Mike Travis wrote:
> >>
> >> On 6/19/2015 2:44 PM, Toshi Kani wrote:
> >>> __ioremap_caller() calls region_is_ram() to look up the resource
> >>> to check if a target range is RAM, which was added as an additinal
> >>> check to improve the lookup performance over page_is_ram() (commit
> >>> 906e36c5c717 "x86: use optimized ioresource lookup in ioremap
> >>> function").
> >>>
> >>> __ioremap_caller() then calls walk_system_ram_range(), which had
> >>> replaced page_is_ram() to improve the lookup performance (commit
> >>> c81c8a1eeede "x86, ioremap: Speed up check for RAM pages").
> >>>
> >>> Since both functions walk through the resource table, there is
> >>> no need to call the two functions.  Furthermore, region_is_ram()
> >>> has bugs and always returns with -1.  This makes
> >>> walk_system_ram_range() as the only check being used.
> >>
> >> Do you have an example of a failing case?  
> > 
> > Well, region_is_ram() fails with -1 for every case... This is because it
> > breaks the for-loop at 'if (p->end < start)' in the first entry (i.e.
> > the lowest address range) of the resource table.  For example, the first
> > entry of 'p->end' is 0xfff on my system, which is certainly smaller than
> > 'start'.
> > 
> >   # cat /proc/iomem
> >   00000000-00000fff : reserved
> >   00001000-00092fff : System RAM
> >   00093000-00093fff : reserved
> > 	:
> 
> That is a small entry.  But I don't understand that when it
> returned the -1, the page_is_ram function was not used instead?
> Or am I missing something?
> 
> -	/* If could not be identified(-1), check page by page */
> -	if (ram_region < 0) {
> -		pfn      = phys_addr >> PAGE_SHIFT;
> -		last_pfn = last_addr >> PAGE_SHIFT;
> -		if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,

I am afraid that you are missing something...  After region_is_ram()
failed, __ioremap_call() calls walk_system_ram_range(), not
page_is_ram().  This patch 2/3 removes the call to region_is_ram(), and
calls walk_system_ram_range() unconditionally without checking
'ram_region', which is set by region_is_ram().  Please note that ioremap
does not call page_is_ram() any more.  It had been removed by
c81c8a1eeede.

> >> Also, I didn't know that
> >> IOREMAP'd addresses were allowed to be on non-page boundaries?
> > 
> > Yes, that is allowed.  Here is a comment in __ioremap_caller(). 
> > 
> >  * NOTE! We need to allow non-page-aligned mappings too: we will
> > obviously
> >  * have to convert them into an offset in a page-aligned mapping, but
> > the
> >  * caller shouldn't need to know that small detail.
> 
> You're right, I forgot about that realignment.  The original
> intent was to try to optimize and if that failed, fall back
> to the original method.  I think this case would have been 
> caught as well?

Yes, this case would have been caught if region_is_ram() would have
worked.
 
> >> Here's the comment and reason for the patches from Patch 0:
> >>
> >> <<<
> >> We have a large university system in the UK that is experiencing
> >> very long delays modprobing the driver for a specific I/O device.
> >> The delay is from 8-10 minutes per device and there are 31 devices
> >> in the system.  This 4 to 5 hour delay in starting up those I/O
> >> devices is very much a burden on the customer.
> >> ...
> >> The problem was tracked down to a very slow IOREMAP operation and
> >> the excessively long ioresource lookup to insure that the user is
> >> not attempting to ioremap RAM.  These patches provide a speed up
> >> to that function.
> >>>>>
> >>
> >> The speed up was pretty dramatic, I think to about 15-20 minutes
> >> (the test was done by our local CS person in the UK).  I think this
> >> would prove the function was working since it would have fallen
> >> back to the previous page_is_ram function and the 4 to 5 hour
> >> startup.
> > 
> > I wonder how this test was conducted.  When the region_is_ram() change
> > got in, the kernel already had the other speed up change (c81c8a1eeede),
> > which had replaced page_is_ram().
> 
> The onsite system was not running the latest kernel (these large
> system customers are very reluctant to disrupt their running
> environments.)

Then you had probably replaced page_is_ram() with region_is_ram() and
ignored the error in this test...
 
> >> If there is a failure, it would be better for all to fix the specific
> >> bug and not re-introduce the original problem.  Perhaps drop to
> >> page is ram if the address is not page aligned?
> > 
> > walk_system_ram_range() is the one that has the issue with
> > page-unaligned address.  region_is_ram() after fixed by patch 3/3 does
> > not have this issue.  ioremap() does not call page_is_ram() anymore.
> 
> It appears ioremap does not call region_is_ram either.
> 
> -       /* First check if whole region can be identified as RAM or not */
> -       ram_region = region_is_ram(phys_addr, size);
> -       if (ram_region > 0) {
> ...
> +       pfn      = phys_addr >> PAGE_SHIFT;
> +       last_pfn = last_addr >> PAGE_SHIFT;
> +       if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,

That's the case after this patch 2/3 is applied.  ioremap calls
region_is_ram() today.

> It appears that the walk_system_ram_range() patch does supersede
> the region_is_ram patch.  Perhaps we can add a caveat to this
> patch that you should not use this patch without also using
> the patch from c81c8a1eeede?  Otherwise the excessive slowdown
> in ioremap will be reintroduced?
> 
> (I'm thinking about back ports to distro kernels that customers use.)

This patch changes ioremap() to call walk_system_ram_range() only.
Since walk_system_ram_range() walks through the resource table, there is
no such slowdown issue.  In other words, there is no behavioral change
in this patch, except it'd be slightly faster.
 
> > pcibios_add_device(), ksysfs.c, kdebugfs.c (and possibly more) call
> > ioremap for setup_data, which is page-unaligned.  If we are going to
> > disallow such callers, they all need to be converted with a different
> > mapping interface, such as vmap().  But vmap() takes an array of page
> > pointers as an argument, and is not convenient for them to use.  Since
> > setup_data is a special range, if they need to be changed may be
> > arguable.  I think issue 3 described in patch 0/3 needs further
> > discussion.  For now, I'd like to fix issue 1 & 2.
> 
> Since __ioremap_caller() was the only caller of region_is_ram, then
> the function can be removed instead of being fixed.

Well, there is a new caller, memremap_valid(), in Dan's patchset.
https://lkml.org/lkml/2015/6/22/100

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
  2015-06-22 16:21     ` Mike Travis
@ 2015-06-23  9:01       ` Ingo Molnar
  -1 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2015-06-23  9:01 UTC (permalink / raw)
  To: Mike Travis
  Cc: Toshi Kani, tglx, mingo, hpa, akpm, roland, dan.j.williams, x86,
	linux-nvdimm, linux-kernel, Clive Harding, Russ Anderson


* Mike Travis <travis@sgi.com> wrote:

> <<<
> We have a large university system in the UK that is experiencing
> very long delays modprobing the driver for a specific I/O device.
> The delay is from 8-10 minutes per device and there are 31 devices
> in the system.  This 4 to 5 hour delay in starting up those I/O
> devices is very much a burden on the customer.
> ...
> The problem was tracked down to a very slow IOREMAP operation and
> the excessively long ioresource lookup to insure that the user is
> not attempting to ioremap RAM.  These patches provide a speed up
> to that function.
> >>>
> 
> The speed up was pretty dramatic, I think to about 15-20 minutes
> (the test was done by our local CS person in the UK).  I think this
> would prove the function was working since it would have fallen
> back to the previous page_is_ram function and the 4 to 5 hour
> startup.

Btw., I think even 15-20 minutes is still in the 'ridiculously slow' category.
Any chance to fix all of this properly, not just hack by hack?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
@ 2015-06-23  9:01       ` Ingo Molnar
  0 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2015-06-23  9:01 UTC (permalink / raw)
  To: Mike Travis
  Cc: Toshi Kani, tglx, mingo, hpa, akpm, roland, dan.j.williams, x86,
	linux-nvdimm, linux-kernel, Clive Harding, Russ Anderson


* Mike Travis <travis@sgi.com> wrote:

> <<<
> We have a large university system in the UK that is experiencing
> very long delays modprobing the driver for a specific I/O device.
> The delay is from 8-10 minutes per device and there are 31 devices
> in the system.  This 4 to 5 hour delay in starting up those I/O
> devices is very much a burden on the customer.
> ...
> The problem was tracked down to a very slow IOREMAP operation and
> the excessively long ioresource lookup to insure that the user is
> not attempting to ioremap RAM.  These patches provide a speed up
> to that function.
> >>>
> 
> The speed up was pretty dramatic, I think to about 15-20 minutes
> (the test was done by our local CS person in the UK).  I think this
> would prove the function was working since it would have fallen
> back to the previous page_is_ram function and the 4 to 5 hour
> startup.

Btw., I think even 15-20 minutes is still in the 'ridiculously slow' category.
Any chance to fix all of this properly, not just hack by hack?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
  2015-06-23  9:01       ` Ingo Molnar
@ 2015-06-23 15:19         ` Toshi Kani
  -1 siblings, 0 replies; 22+ messages in thread
From: Toshi Kani @ 2015-06-23 15:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mike Travis, tglx, mingo, hpa, akpm, roland, dan.j.williams, x86,
	linux-nvdimm, linux-kernel, Clive Harding, Russ Anderson

On Tue, 2015-06-23 at 11:01 +0200, Ingo Molnar wrote:
> * Mike Travis <travis@sgi.com> wrote:
> 
> > <<<
> > We have a large university system in the UK that is experiencing
> > very long delays modprobing the driver for a specific I/O device.
> > The delay is from 8-10 minutes per device and there are 31 devices
> > in the system.  This 4 to 5 hour delay in starting up those I/O
> > devices is very much a burden on the customer.
> > ...
> > The problem was tracked down to a very slow IOREMAP operation and
> > the excessively long ioresource lookup to insure that the user is
> > not attempting to ioremap RAM.  These patches provide a speed up
> > to that function.
> > >>>
> > 
> > The speed up was pretty dramatic, I think to about 15-20 minutes
> > (the test was done by our local CS person in the UK).  I think this
> > would prove the function was working since it would have fallen
> > back to the previous page_is_ram function and the 4 to 5 hour
> > startup.
> 
> Btw., I think even 15-20 minutes is still in the 'ridiculously slow' category.
> Any chance to fix all of this properly, not just hack by hack?

I agree that 15-20 minutes is till slow, but this slowness did not come
from this ioremap RAM check because region_is_ram() used in this test
had bugs, which led it return immediately.  The slowness came from other
places, such as page table initialization.

I do not think the number of the resource table entries increase
significantly on large systems.  So, walk_system_ram_range() and fixed
region_is_ram() should still perform fine on large systems.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
@ 2015-06-23 15:19         ` Toshi Kani
  0 siblings, 0 replies; 22+ messages in thread
From: Toshi Kani @ 2015-06-23 15:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mike Travis, tglx, mingo, hpa, akpm, roland, dan.j.williams, x86,
	linux-nvdimm, linux-kernel, Clive Harding, Russ Anderson

On Tue, 2015-06-23 at 11:01 +0200, Ingo Molnar wrote:
> * Mike Travis <travis@sgi.com> wrote:
> 
> > <<<
> > We have a large university system in the UK that is experiencing
> > very long delays modprobing the driver for a specific I/O device.
> > The delay is from 8-10 minutes per device and there are 31 devices
> > in the system.  This 4 to 5 hour delay in starting up those I/O
> > devices is very much a burden on the customer.
> > ...
> > The problem was tracked down to a very slow IOREMAP operation and
> > the excessively long ioresource lookup to insure that the user is
> > not attempting to ioremap RAM.  These patches provide a speed up
> > to that function.
> > >>>
> > 
> > The speed up was pretty dramatic, I think to about 15-20 minutes
> > (the test was done by our local CS person in the UK).  I think this
> > would prove the function was working since it would have fallen
> > back to the previous page_is_ram function and the 4 to 5 hour
> > startup.
> 
> Btw., I think even 15-20 minutes is still in the 'ridiculously slow' category.
> Any chance to fix all of this properly, not just hack by hack?

I agree that 15-20 minutes is till slow, but this slowness did not come
from this ioremap RAM check because region_is_ram() used in this test
had bugs, which led it return immediately.  The slowness came from other
places, such as page table initialization.

I do not think the number of the resource table entries increase
significantly on large systems.  So, walk_system_ram_range() and fixed
region_is_ram() should still perform fine on large systems.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
  2015-06-23  9:01       ` Ingo Molnar
@ 2015-06-23 18:57         ` Mike Travis
  -1 siblings, 0 replies; 22+ messages in thread
From: Mike Travis @ 2015-06-23 18:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Toshi Kani, tglx, mingo, hpa, akpm, roland, dan.j.williams, x86,
	linux-nvdimm, linux-kernel, Clive Harding, Russ Anderson,
	Mel Gorman



On 6/23/2015 2:01 AM, Ingo Molnar wrote:
> 
> * Mike Travis <travis@sgi.com> wrote:
> 
>> <<<
>> We have a large university system in the UK that is experiencing
>> very long delays modprobing the driver for a specific I/O device.
>> The delay is from 8-10 minutes per device and there are 31 devices
>> in the system.  This 4 to 5 hour delay in starting up those I/O
>> devices is very much a burden on the customer.
>> ...
>> The problem was tracked down to a very slow IOREMAP operation and
>> the excessively long ioresource lookup to insure that the user is
>> not attempting to ioremap RAM.  These patches provide a speed up
>> to that function.
>>>>>
>>
>> The speed up was pretty dramatic, I think to about 15-20 minutes
>> (the test was done by our local CS person in the UK).  I think this
>> would prove the function was working since it would have fallen
>> back to the previous page_is_ram function and the 4 to 5 hour
>> startup.
> 
> Btw., I think even 15-20 minutes is still in the 'ridiculously slow' category.
> Any chance to fix all of this properly, not just hack by hack?
> 
> Thanks,
> 
> 	Ingo
> 


The current primary cause of the slow start up now lies within
the loading of the kernel and other software to 31 Co-processors
in a serial fashion.  We have suggested to the vendor that they
look at booting and starting these in parallel.

The problem is there are not a whole lot of systems that can
handle more than 4 of them let alone 32.  So it's mostly the
interaction between the customers and the vendor directing
these optimizations.

Any speed up of the kernel startup helps here as well.

[off topic]
Btw, this ~20 minutes time is just for the start up of the
co-processors.  The entire system takes much longer as this is
a huge UV system.  Most of the time is still due to memory
initialization.  Mel's "defer page init" patches help here
tremendously, though it's not clear they will trickle back
down to SLES11 which the customer is running.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
@ 2015-06-23 18:57         ` Mike Travis
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Travis @ 2015-06-23 18:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Toshi Kani, tglx, mingo, hpa, akpm, roland, dan.j.williams, x86,
	linux-nvdimm, linux-kernel, Clive Harding, Russ Anderson,
	Mel Gorman



On 6/23/2015 2:01 AM, Ingo Molnar wrote:
> 
> * Mike Travis <travis@sgi.com> wrote:
> 
>> <<<
>> We have a large university system in the UK that is experiencing
>> very long delays modprobing the driver for a specific I/O device.
>> The delay is from 8-10 minutes per device and there are 31 devices
>> in the system.  This 4 to 5 hour delay in starting up those I/O
>> devices is very much a burden on the customer.
>> ...
>> The problem was tracked down to a very slow IOREMAP operation and
>> the excessively long ioresource lookup to insure that the user is
>> not attempting to ioremap RAM.  These patches provide a speed up
>> to that function.
>>>>>
>>
>> The speed up was pretty dramatic, I think to about 15-20 minutes
>> (the test was done by our local CS person in the UK).  I think this
>> would prove the function was working since it would have fallen
>> back to the previous page_is_ram function and the 4 to 5 hour
>> startup.
> 
> Btw., I think even 15-20 minutes is still in the 'ridiculously slow' category.
> Any chance to fix all of this properly, not just hack by hack?
> 
> Thanks,
> 
> 	Ingo
> 


The current primary cause of the slow start up now lies within
the loading of the kernel and other software to 31 Co-processors
in a serial fashion.  We have suggested to the vendor that they
look at booting and starting these in parallel.

The problem is there are not a whole lot of systems that can
handle more than 4 of them let alone 32.  So it's mostly the
interaction between the customers and the vendor directing
these optimizations.

Any speed up of the kernel startup helps here as well.

[off topic]
Btw, this ~20 minutes time is just for the start up of the
co-processors.  The entire system takes much longer as this is
a huge UV system.  Most of the time is still due to memory
initialization.  Mel's "defer page init" patches help here
tremendously, though it's not clear they will trickle back
down to SLES11 which the customer is running.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-06-23 19:06 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-19 21:44 [PATCH 0/3] mm, x86: Fix ioremap RAM check interfaces Toshi Kani
2015-06-19 21:44 ` Toshi Kani
2015-06-19 21:44 ` [PATCH 1/3] mm, x86: Fix warning in ioremap RAM check Toshi Kani
2015-06-19 21:44   ` Toshi Kani
2015-06-19 21:44 ` [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap Toshi Kani
2015-06-19 21:44   ` Toshi Kani
2015-06-22 16:21   ` Mike Travis
2015-06-22 16:21     ` Mike Travis
2015-06-22 17:23     ` Toshi Kani
2015-06-22 17:23       ` Toshi Kani
2015-06-22 18:22       ` Mike Travis
2015-06-22 18:22         ` Mike Travis
2015-06-22 19:06         ` Toshi Kani
2015-06-22 19:06           ` Toshi Kani
2015-06-23  9:01     ` Ingo Molnar
2015-06-23  9:01       ` Ingo Molnar
2015-06-23 15:19       ` Toshi Kani
2015-06-23 15:19         ` Toshi Kani
2015-06-23 18:57       ` Mike Travis
2015-06-23 18:57         ` Mike Travis
2015-06-19 21:44 ` [PATCH 3/3] mm: Fix bugs in region_is_ram() Toshi Kani
2015-06-19 21:44   ` Toshi Kani

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.