linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Bugfix for kdump on arm
@ 2014-01-22 11:25 Wang Nan
  2014-01-22 11:25 ` [PATCH 1/3] ARM: Premit ioremap() to map reserved pages Wang Nan
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Wang Nan @ 2014-01-22 11:25 UTC (permalink / raw)
  To: kexec
  Cc: Eric Biederman, Russell King, Andrew Morton, Geng Hui,
	linux-arm-kernel, linux-kernel, linux-mm, Wang Nan

This patch series introduce 3 bugfix for kdump (and kexec) on arm platform.

kdump for arm in fact is corrupted (at least for omap4460). With one-month hard
work and with the help of a jtag debugger, we finally make kdump works
reliablly.


Following is the patches. The first 2 patches forms a group, it allow
ioremap_nocache to be taken on reserved pages on arm platform (which is
prohibited by 309caa9cc) and then use ioremap_nocache to copy kexec required
code. The last 1 is for crash dump kernel. It allow kernel to be loaded in the
middle of kernel awared physical memory. Without it, crashdump kernel must be
carefully configured to boot.

Wang Nan (3):
  ARM: Premit ioremap() to map reserved pages
  ARM: kexec: copying code to ioremapped area
  ARM: allow kernel to be loaded in middle of phymem

 arch/arm/kernel/machine_kexec.c | 18 ++++++++++++++++--
 arch/arm/mm/init.c              | 21 ++++++++++++++++++++-
 arch/arm/mm/ioremap.c           |  2 +-
 arch/arm/mm/mmu.c               | 13 +++++++++++++
 kernel/kexec.c                  | 40 +++++++++++++++++++++++++++++++++++-----
 mm/page_alloc.c                 |  7 +++++--
 6 files changed, 90 insertions(+), 11 deletions(-)


Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Geng Hui <hui.geng@huawei.com>

-- 
1.8.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] ARM: Premit ioremap() to map reserved pages
  2014-01-22 11:25 [PATCH 0/3] Bugfix for kdump on arm Wang Nan
@ 2014-01-22 11:25 ` Wang Nan
  2014-01-22 11:38   ` Will Deacon
  2014-01-22 11:42   ` Russell King - ARM Linux
  2014-01-22 11:25 ` [PATCH 2/3] ARM: kexec: copying code to ioremapped area Wang Nan
  2014-01-22 11:25 ` [PATCH 3/3] ARM: allow kernel to be loaded in middle of phymem Wang Nan
  2 siblings, 2 replies; 13+ messages in thread
From: Wang Nan @ 2014-01-22 11:25 UTC (permalink / raw)
  To: kexec
  Cc: Eric Biederman, Russell King, Andrew Morton, Geng Hui,
	linux-arm-kernel, linux-kernel, linux-mm, Wang Nan, stable

This patch relaxes the restriction set by commit 309caa9cc, which
prohibit ioremap() on all kernel managed pages.

Other architectures, such as x86 and (some specific platforms of) powerpc,
allow such mapping.

ioremap() pages is an efficient way to avoid arm's mysterious cache control.
This feature will be used for arm kexec support to ensure copied data goes into
RAM even without cache flushing, because we found that flush_cache_xxx can't
reliably flush code to memory.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: <stable@vger.kernel.org> # 3.4+
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Geng Hui <hui.geng@huawei.com>
---
 arch/arm/mm/ioremap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index f123d6e..98b1c10 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -298,7 +298,7 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
 	/*
 	 * Don't allow RAM to be mapped - this causes problems with ARMv6+
 	 */
-	if (WARN_ON(pfn_valid(pfn)))
+	if (WARN_ON(pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn))))
 		return NULL;
 
 	area = get_vm_area_caller(size, VM_IOREMAP, caller);
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] ARM: kexec: copying code to ioremapped area
  2014-01-22 11:25 [PATCH 0/3] Bugfix for kdump on arm Wang Nan
  2014-01-22 11:25 ` [PATCH 1/3] ARM: Premit ioremap() to map reserved pages Wang Nan
@ 2014-01-22 11:25 ` Wang Nan
       [not found]   ` <CANacCWz2DdLvns9htszpwWnASrYGXQt+tHMsw4aBbjoyw-DmeQ@mail.gmail.com>
  2014-01-22 13:27   ` Russell King - ARM Linux
  2014-01-22 11:25 ` [PATCH 3/3] ARM: allow kernel to be loaded in middle of phymem Wang Nan
  2 siblings, 2 replies; 13+ messages in thread
From: Wang Nan @ 2014-01-22 11:25 UTC (permalink / raw)
  To: kexec
  Cc: Eric Biederman, Russell King, Andrew Morton, Geng Hui,
	linux-arm-kernel, linux-kernel, linux-mm, Wang Nan, stable

ARM's kdump is actually corrupted (at least for omap4460), mainly because of
cache problem: flush_icache_range can't reliably ensure the copied data
correctly goes into RAM. After mmu turned off and jump to the trampoline, kexec
always failed due to random undef instructions.

This patch use ioremap to make sure the destnation of all memcpy() is
uncachable memory, including copying of target kernel and trampoline.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: <stable@vger.kernel.org> # 3.4+
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Geng Hui <hui.geng@huawei.com>
---
 arch/arm/kernel/machine_kexec.c | 18 ++++++++++++++++--
 kernel/kexec.c                  | 40 +++++++++++++++++++++++++++++++++++-----
 2 files changed, 51 insertions(+), 7 deletions(-)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index f0d180d..ba0a5a8 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -144,6 +144,7 @@ void machine_kexec(struct kimage *image)
 	unsigned long page_list;
 	unsigned long reboot_code_buffer_phys;
 	unsigned long reboot_entry = (unsigned long)relocate_new_kernel;
+	void __iomem *reboot_entry_remap;
 	unsigned long reboot_entry_phys;
 	void *reboot_code_buffer;
 
@@ -171,9 +172,22 @@ void machine_kexec(struct kimage *image)
 
 
 	/* copy our kernel relocation code to the control code page */
-	reboot_entry = fncpy(reboot_code_buffer,
-			     reboot_entry,
+	reboot_entry_remap = ioremap_nocache(reboot_code_buffer_phys,
+					     relocate_new_kernel_size);
+	if (reboot_entry_remap == NULL) {
+		pr_warn("startup code may not be reliably flushed\n");
+		reboot_entry_remap = (void __iomem *)reboot_code_buffer;
+	}
+
+	reboot_entry = fncpy(reboot_entry_remap, reboot_entry,
 			     relocate_new_kernel_size);
+	reboot_entry = (unsigned long)reboot_code_buffer +
+			(reboot_entry -
+			 (unsigned long)reboot_entry_remap);
+
+	if (reboot_entry_remap != reboot_code_buffer)
+		iounmap(reboot_entry_remap);
+
 	reboot_entry_phys = (unsigned long)reboot_entry +
 		(reboot_code_buffer_phys - (unsigned long)reboot_code_buffer);
 
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 9c97016..3e92999 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -806,6 +806,7 @@ static int kimage_load_normal_segment(struct kimage *image,
 	while (mbytes) {
 		struct page *page;
 		char *ptr;
+		void __iomem *ioptr;
 		size_t uchunk, mchunk;
 
 		page = kimage_alloc_page(image, GFP_HIGHUSER, maddr);
@@ -818,7 +819,17 @@ static int kimage_load_normal_segment(struct kimage *image,
 		if (result < 0)
 			goto out;
 
-		ptr = kmap(page);
+		/*
+		 * Try ioremap to make sure the copied data goes into RAM
+		 * reliably. If failed (some archs don't allow ioremap RAM),
+		 * use kmap instead.
+		 */
+		ioptr = ioremap(page_to_pfn(page) << PAGE_SHIFT,
+				PAGE_SIZE);
+		if (ioptr != NULL)
+			ptr = ioptr;
+		else
+			ptr = kmap(page);
 		/* Start with a clear page */
 		clear_page(ptr);
 		ptr += maddr & ~PAGE_MASK;
@@ -827,7 +838,10 @@ static int kimage_load_normal_segment(struct kimage *image,
 		uchunk = min(ubytes, mchunk);
 
 		result = copy_from_user(ptr, buf, uchunk);
-		kunmap(page);
+		if (ioptr != NULL)
+			iounmap(ioptr);
+		else
+			kunmap(page);
 		if (result) {
 			result = -EFAULT;
 			goto out;
@@ -846,7 +860,7 @@ static int kimage_load_crash_segment(struct kimage *image,
 {
 	/* For crash dumps kernels we simply copy the data from
 	 * user space to it's destination.
-	 * We do things a page at a time for the sake of kmap.
+	 * We do things a page at a time for the sake of ioremap/kmap.
 	 */
 	unsigned long maddr;
 	size_t ubytes, mbytes;
@@ -861,6 +875,7 @@ static int kimage_load_crash_segment(struct kimage *image,
 	while (mbytes) {
 		struct page *page;
 		char *ptr;
+		void __iomem *ioptr;
 		size_t uchunk, mchunk;
 
 		page = pfn_to_page(maddr >> PAGE_SHIFT);
@@ -868,7 +883,18 @@ static int kimage_load_crash_segment(struct kimage *image,
 			result  = -ENOMEM;
 			goto out;
 		}
-		ptr = kmap(page);
+		/*
+		 * Try ioremap to make sure the copied data goes into RAM
+		 * reliably. If failed (some archs don't allow ioremap RAM),
+		 * use kmap instead.
+		 */
+		ioptr = ioremap_nocache(page_to_pfn(page) << PAGE_SHIFT,
+				        PAGE_SIZE);
+		if (ioptr != NULL)
+			ptr = ioptr;
+		else
+			ptr = kmap(page);
+
 		ptr += maddr & ~PAGE_MASK;
 		mchunk = min_t(size_t, mbytes,
 				PAGE_SIZE - (maddr & ~PAGE_MASK));
@@ -879,7 +905,11 @@ static int kimage_load_crash_segment(struct kimage *image,
 		}
 		result = copy_from_user(ptr, buf, uchunk);
 		kexec_flush_icache_page(page);
-		kunmap(page);
+		if (ioptr != NULL)
+			iounmap(ioptr);
+		else
+			kunmap(page);
+
 		if (result) {
 			result = -EFAULT;
 			goto out;
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] ARM: allow kernel to be loaded in middle of phymem
  2014-01-22 11:25 [PATCH 0/3] Bugfix for kdump on arm Wang Nan
  2014-01-22 11:25 ` [PATCH 1/3] ARM: Premit ioremap() to map reserved pages Wang Nan
  2014-01-22 11:25 ` [PATCH 2/3] ARM: kexec: copying code to ioremapped area Wang Nan
@ 2014-01-22 11:25 ` Wang Nan
  2014-01-23 19:15   ` Nicolas Pitre
  2 siblings, 1 reply; 13+ messages in thread
From: Wang Nan @ 2014-01-22 11:25 UTC (permalink / raw)
  To: kexec
  Cc: Eric Biederman, Russell King, Andrew Morton, Geng Hui,
	linux-arm-kernel, linux-kernel, linux-mm, Wang Nan, stable

This patch allows the kernel to be loaded at the middle of kernel awared
physical memory. Before this patch, users must use mem= or device tree to cheat
kernel about the start address of physical memory.

This feature is useful in some special cases, for example, building a crash
dump kernel. Without it, kernel command line, atag and devicetree must be
adjusted carefully, sometimes is impossible.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: <stable@vger.kernel.org> # 3.4+
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Geng Hui <hui.geng@huawei.com>
---
 arch/arm/mm/init.c | 21 ++++++++++++++++++++-
 arch/arm/mm/mmu.c  | 13 +++++++++++++
 mm/page_alloc.c    |  7 +++++--
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 3e8f106..4952726 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -334,9 +334,28 @@ void __init arm_memblock_init(struct meminfo *mi,
 {
 	int i;
 
-	for (i = 0; i < mi->nr_banks; i++)
+	for (i = 0; i < mi->nr_banks; i++) {
 		memblock_add(mi->bank[i].start, mi->bank[i].size);
 
+		/*
+		 * In some special case, for example, building a crushdump
+		 * kernel, we want the kernel to be loaded in the middle of
+		 * physical memory. In such case, the physical memory before
+		 * PHYS_OFFSET is awkward: it can't get directly mapped
+		 * (because its address will be smaller than PAGE_OFFSET,
+		 * disturbs user address space) also can't be mapped as
+		 * HighMem. We reserve such pages here. The only way to access
+		 * those pages is ioremap.
+		 */
+		if (mi->bank[i].start < PHYS_OFFSET) {
+			unsigned long reserv_size = PHYS_OFFSET -
+						    mi->bank[i].start;
+			if (reserv_size > mi->bank[i].size)
+				reserv_size = mi->bank[i].size;
+			memblock_reserve(mi->bank[i].start, reserv_size);
+		}
+	}
+
 	/* Register the kernel text, kernel data and initrd with memblock. */
 #ifdef CONFIG_XIP_KERNEL
 	memblock_reserve(__pa(_sdata), _end - _sdata);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 580ef2d..2a17c24 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1308,6 +1308,19 @@ static void __init map_lowmem(void)
 		if (start >= end)
 			break;
 
+		/*
+		 * If this memblock contain memory before PAGE_OFFSET, memory
+		 * before PAGE_OFFSET should't get directly mapped, see code
+		 * in create_mapping(). However, memory after PAGE_OFFSET is
+		 * occupyed by kernel and still need to be mapped.
+		 */
+		if (__phys_to_virt(start) < PAGE_OFFSET) {
+			if (__phys_to_virt(end) > PAGE_OFFSET)
+				start = __virt_to_phys(PAGE_OFFSET);
+			else
+				break;
+		}
+
 		map.pfn = __phys_to_pfn(start);
 		map.virtual = __phys_to_virt(start);
 		map.length = end - start;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5248fe0..d2959e3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4840,10 +4840,13 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
 	 */
 	if (pgdat == NODE_DATA(0)) {
 		mem_map = NODE_DATA(0)->node_mem_map;
-#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+		/*
+		 * In case of CONFIG_HAVE_MEMBLOCK_NODE_MAP or when kernel
+		 * loaded at the middle of physical memory, mem_map should
+		 * be adjusted.
+		 */
 		if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
 			mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET);
-#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 	}
 #endif
 #endif /* CONFIG_FLAT_NODE_MEM_MAP */
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] ARM: Premit ioremap() to map reserved pages
  2014-01-22 11:25 ` [PATCH 1/3] ARM: Premit ioremap() to map reserved pages Wang Nan
@ 2014-01-22 11:38   ` Will Deacon
  2014-01-22 11:42   ` Russell King - ARM Linux
  1 sibling, 0 replies; 13+ messages in thread
From: Will Deacon @ 2014-01-22 11:38 UTC (permalink / raw)
  To: Wang Nan
  Cc: kexec, stable, linux-kernel, Geng Hui, linux-mm, Eric Biederman,
	Russell King, Andrew Morton, linux-arm-kernel

On Wed, Jan 22, 2014 at 11:25:14AM +0000, Wang Nan wrote:
> This patch relaxes the restriction set by commit 309caa9cc, which
> prohibit ioremap() on all kernel managed pages.
> 
> Other architectures, such as x86 and (some specific platforms of) powerpc,
> allow such mapping.
> 
> ioremap() pages is an efficient way to avoid arm's mysterious cache control.
> This feature will be used for arm kexec support to ensure copied data goes into
> RAM even without cache flushing, because we found that flush_cache_xxx can't
> reliably flush code to memory.
> 
> Signed-off-by: Wang Nan <wangnan0@huawei.com>
> Cc: <stable@vger.kernel.org> # 3.4+
> Cc: Eric Biederman <ebiederm@xmission.com>
> Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Geng Hui <hui.geng@huawei.com>
> ---
>  arch/arm/mm/ioremap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
> index f123d6e..98b1c10 100644
> --- a/arch/arm/mm/ioremap.c
> +++ b/arch/arm/mm/ioremap.c
> @@ -298,7 +298,7 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
>  	/*
>  	 * Don't allow RAM to be mapped - this causes problems with ARMv6+
>  	 */
> -	if (WARN_ON(pfn_valid(pfn)))
> +	if (WARN_ON(pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn))))

Since reserved pages can still be mapped, how does this avoid the cacheable
alias issue fixed by 309caa9cc6ff ("ARM: Prohibit ioremap() on kernel
managed RAM")?

Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] ARM: Premit ioremap() to map reserved pages
  2014-01-22 11:25 ` [PATCH 1/3] ARM: Premit ioremap() to map reserved pages Wang Nan
  2014-01-22 11:38   ` Will Deacon
@ 2014-01-22 11:42   ` Russell King - ARM Linux
  2014-01-22 11:55     ` Wang Nan
  1 sibling, 1 reply; 13+ messages in thread
From: Russell King - ARM Linux @ 2014-01-22 11:42 UTC (permalink / raw)
  To: Wang Nan
  Cc: kexec, stable, linux-kernel, Geng Hui, linux-mm, Eric Biederman,
	Andrew Morton, linux-arm-kernel

On Wed, Jan 22, 2014 at 07:25:14PM +0800, Wang Nan wrote:
> This patch relaxes the restriction set by commit 309caa9cc, which
> prohibit ioremap() on all kernel managed pages.
> 
> Other architectures, such as x86 and (some specific platforms of) powerpc,
> allow such mapping.
> 
> ioremap() pages is an efficient way to avoid arm's mysterious cache control.
> This feature will be used for arm kexec support to ensure copied data goes into
> RAM even without cache flushing, because we found that flush_cache_xxx can't
> reliably flush code to memory.

Yes, let's bypass the check and allow this in violation of the
architecture specification by allowing mapping the same memory with
different types, which leads to unpredictable behaviour.  Yes, that's
a very good idea, because what we want to do is far more important than
following the requirements of the architecture.

So... NAK.

Yes, flush_cache_xxx() doesn't flush back to physical RAM, that's not
what it's defined to do - it's defined that it flushes enough of the
cache to ensure that page table updates are safe (such as when tearing
down a page mapping.)  So it's hardly surprising that doesn't work.

If you want to be able to have DMA access to memory, then you need to
use an API which has been designed for that purpose, and if there isn't
one, then you need to discuss your requirements, rather than trying to
hack around the problem.

The issue here will be that the APIs we currently have for DMA become
extremely expensive when you want to deal with (eg) all system RAM.
Or, there's flush_cache_all() which should flush all levels of cache
in the system, and thus push all data back to RAM.

Now, why are you copying your patches to the stable people?  That makes
no sense - they haven't been reviewed and they haven't been integrated
into an existing kernel.  So, they don't meet the basic requirements
for stable tree submission...

-- 
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up.  Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] ARM: Premit ioremap() to map reserved pages
  2014-01-22 11:42   ` Russell King - ARM Linux
@ 2014-01-22 11:55     ` Wang Nan
  0 siblings, 0 replies; 13+ messages in thread
From: Wang Nan @ 2014-01-22 11:55 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: kexec, linux-kernel, Geng Hui, linux-mm, Eric Biederman,
	Andrew Morton, linux-arm-kernel

On 2014/1/22 19:42, Russell King - ARM Linux wrote:
> On Wed, Jan 22, 2014 at 07:25:14PM +0800, Wang Nan wrote:
>> This patch relaxes the restriction set by commit 309caa9cc, which
>> prohibit ioremap() on all kernel managed pages.
>>
>> Other architectures, such as x86 and (some specific platforms of) powerpc,
>> allow such mapping.
>>
>> ioremap() pages is an efficient way to avoid arm's mysterious cache control.
>> This feature will be used for arm kexec support to ensure copied data goes into
>> RAM even without cache flushing, because we found that flush_cache_xxx can't
>> reliably flush code to memory.
> 
> Yes, let's bypass the check and allow this in violation of the
> architecture specification by allowing mapping the same memory with
> different types, which leads to unpredictable behaviour.  Yes, that's
> a very good idea, because what we want to do is far more important than
> following the requirements of the architecture.
> 
> So... NAK.
> 
> Yes, flush_cache_xxx() doesn't flush back to physical RAM, that's not
> what it's defined to do - it's defined that it flushes enough of the
> cache to ensure that page table updates are safe (such as when tearing
> down a page mapping.)  So it's hardly surprising that doesn't work.
> 
> If you want to be able to have DMA access to memory, then you need to
> use an API which has been designed for that purpose, and if there isn't
> one, then you need to discuss your requirements, rather than trying to
> hack around the problem.

So what is correct API which is designed for this propose?

> 
> The issue here will be that the APIs we currently have for DMA become
> extremely expensive when you want to deal with (eg) all system RAM.
> Or, there's flush_cache_all() which should flush all levels of cache
> in the system, and thus push all data back to RAM.
> 
> Now, why are you copying your patches to the stable people?  That makes
> no sense - they haven't been reviewed and they haven't been integrated
> into an existing kernel.  So, they don't meet the basic requirements
> for stable tree submission...
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] ARM: kexec: copying code to ioremapped area
       [not found]   ` <CANacCWz2DdLvns9htszpwWnASrYGXQt+tHMsw4aBbjoyw-DmeQ@mail.gmail.com>
@ 2014-01-22 13:03     ` Wang Nan
  0 siblings, 0 replies; 13+ messages in thread
From: Wang Nan @ 2014-01-22 13:03 UTC (permalink / raw)
  To: Vaibhav Bedia
  Cc: kexec, stable, linux-kernel, Geng Hui, linux-mm, Eric Biederman,
	Russell King, Andrew Morton, Linux ARM Kernel List

On 2014/1/22 20:56, Vaibhav Bedia wrote:
> On Wed, Jan 22, 2014 at 6:25 AM, Wang Nan <wangnan0@huawei.com <mailto:wangnan0@huawei.com>> wrote:
> 
>     ARM's kdump is actually corrupted (at least for omap4460), mainly because of
>     cache problem: flush_icache_range can't reliably ensure the copied data
>     correctly goes into RAM. After mmu turned off and jump to the trampoline, kexec
>     always failed due to random undef instructions.
> 
>     This patch use ioremap to make sure the destnation of all memcpy() is
>     uncachable memory, including copying of target kernel and trampoline.
> 
> 
> AFAIK ioremap on RAM in forbidden in ARM and device memory that ioremap()
> ends up creating is not meant for executable code.
> 
> Doesn't this trigger the WARN_ON() in _arm_ioremap_pfn_caller)?

This patch is depend on the previous one:

ARM: Premit ioremap() to map reserved pages

However, Russell is opposed to it.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] ARM: kexec: copying code to ioremapped area
  2014-01-22 11:25 ` [PATCH 2/3] ARM: kexec: copying code to ioremapped area Wang Nan
       [not found]   ` <CANacCWz2DdLvns9htszpwWnASrYGXQt+tHMsw4aBbjoyw-DmeQ@mail.gmail.com>
@ 2014-01-22 13:27   ` Russell King - ARM Linux
  2014-01-23  2:16     ` Wang Nan
  1 sibling, 1 reply; 13+ messages in thread
From: Russell King - ARM Linux @ 2014-01-22 13:27 UTC (permalink / raw)
  To: Wang Nan
  Cc: kexec, Eric Biederman, Andrew Morton, Geng Hui, linux-arm-kernel,
	linux-kernel, linux-mm, stable

On Wed, Jan 22, 2014 at 07:25:15PM +0800, Wang Nan wrote:
> ARM's kdump is actually corrupted (at least for omap4460), mainly because of
> cache problem: flush_icache_range can't reliably ensure the copied data
> correctly goes into RAM.

Quite right too.  You're mistake here is thinking that flush_icache_range()
should push it to RAM.  That's incorrect.

flush_icache_range() is there to deal with such things as loadable modules
and self modifying code, where the MMU is not being turned off.  Hence, it
only flushes to the point of coherency between the I and D caches, and
any further levels of cache between that point and memory are not touched.
Why should it touch any more levels - it's not the function's purpose.

> After mmu turned off and jump to the trampoline, kexec always failed due
> to random undef instructions.

We already have code in the kernel which deals with shutting the MMU off.
An instance of how this can be done is illustrated in the soft_restart()
code path, and kexec already uses this.

One of the first things soft_restart() does is turn off the outer cache -
which OMAP4 does have, but this can only be done if there is a single CPU
running.  If there's multiple CPUs running, then the outer cache can't be
disabled, and that's the most likely cause of the problem you're seeing.

-- 
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up.  Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] ARM: kexec: copying code to ioremapped area
  2014-01-22 13:27   ` Russell King - ARM Linux
@ 2014-01-23  2:16     ` Wang Nan
  0 siblings, 0 replies; 13+ messages in thread
From: Wang Nan @ 2014-01-23  2:16 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: kexec, Eric Biederman, Andrew Morton, Geng Hui, linux-arm-kernel,
	linux-kernel, linux-mm

On 2014/1/22 21:27, Russell King - ARM Linux wrote:
> On Wed, Jan 22, 2014 at 07:25:15PM +0800, Wang Nan wrote:
>> ARM's kdump is actually corrupted (at least for omap4460), mainly because of
>> cache problem: flush_icache_range can't reliably ensure the copied data
>> correctly goes into RAM.
> 
> Quite right too.  You're mistake here is thinking that flush_icache_range()
> should push it to RAM.  That's incorrect.
> 
> flush_icache_range() is there to deal with such things as loadable modules
> and self modifying code, where the MMU is not being turned off.  Hence, it
> only flushes to the point of coherency between the I and D caches, and
> any further levels of cache between that point and memory are not touched.
> Why should it touch any more levels - it's not the function's purpose.
> 
>> After mmu turned off and jump to the trampoline, kexec always failed due
>> to random undef instructions.
> 
> We already have code in the kernel which deals with shutting the MMU off.
> An instance of how this can be done is illustrated in the soft_restart()
> code path, and kexec already uses this.
> 
> One of the first things soft_restart() does is turn off the outer cache -
> which OMAP4 does have, but this can only be done if there is a single CPU
> running.  If there's multiple CPUs running, then the outer cache can't be
> disabled, and that's the most likely cause of the problem you're seeing.
> 

You are right, commit b25f3e1c (OMAP4/highbank: Flush L2 cache before disabling)
solves my problem, it flushes outer cache before disabling. I have tested it in
UP and SMP situations and it works (actually, omap4 has not ready to support kexec
in SMP case, I insert an empty cpu_kill() to make it work), so the first 2
patches are unneeded.

What about the 3rd one (ARM: allow kernel to be loaded in middle of phymem)?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] ARM: allow kernel to be loaded in middle of phymem
  2014-01-22 11:25 ` [PATCH 3/3] ARM: allow kernel to be loaded in middle of phymem Wang Nan
@ 2014-01-23 19:15   ` Nicolas Pitre
  2014-01-23 19:31     ` Russell King - ARM Linux
  0 siblings, 1 reply; 13+ messages in thread
From: Nicolas Pitre @ 2014-01-23 19:15 UTC (permalink / raw)
  To: Wang Nan
  Cc: kexec, Eric Biederman, Russell King, Andrew Morton, Geng Hui,
	linux-arm-kernel, linux-kernel, linux-mm, stable

On Wed, 22 Jan 2014, Wang Nan wrote:

> This patch allows the kernel to be loaded at the middle of kernel awared
> physical memory. Before this patch, users must use mem= or device tree to cheat
> kernel about the start address of physical memory.
> 
> This feature is useful in some special cases, for example, building a crash
> dump kernel. Without it, kernel command line, atag and devicetree must be
> adjusted carefully, sometimes is impossible.

With CONFIG_PATCH_PHYS_VIRT the value for PHYS_OFFSET is determined 
dynamically by rounding down the kernel image start address to the 
previous 16MB boundary.  In the case of a crash kernel, this might be 
cleaner to simply readjust __pv_phys_offset during early boot and call 
fixup_pv_table(), and then reserve away the memory from the previous 
kernel.  That will let you access that memory directly (with gdb for 
example) and no pointer address translation will be required.


> Signed-off-by: Wang Nan <wangnan0@huawei.com>
> Cc: <stable@vger.kernel.org> # 3.4+
> Cc: Eric Biederman <ebiederm@xmission.com>
> Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Geng Hui <hui.geng@huawei.com>
> ---
>  arch/arm/mm/init.c | 21 ++++++++++++++++++++-
>  arch/arm/mm/mmu.c  | 13 +++++++++++++
>  mm/page_alloc.c    |  7 +++++--
>  3 files changed, 38 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 3e8f106..4952726 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -334,9 +334,28 @@ void __init arm_memblock_init(struct meminfo *mi,
>  {
>  	int i;
>  
> -	for (i = 0; i < mi->nr_banks; i++)
> +	for (i = 0; i < mi->nr_banks; i++) {
>  		memblock_add(mi->bank[i].start, mi->bank[i].size);
>  
> +		/*
> +		 * In some special case, for example, building a crushdump
> +		 * kernel, we want the kernel to be loaded in the middle of
> +		 * physical memory. In such case, the physical memory before
> +		 * PHYS_OFFSET is awkward: it can't get directly mapped
> +		 * (because its address will be smaller than PAGE_OFFSET,
> +		 * disturbs user address space) also can't be mapped as
> +		 * HighMem. We reserve such pages here. The only way to access
> +		 * those pages is ioremap.
> +		 */
> +		if (mi->bank[i].start < PHYS_OFFSET) {
> +			unsigned long reserv_size = PHYS_OFFSET -
> +						    mi->bank[i].start;
> +			if (reserv_size > mi->bank[i].size)
> +				reserv_size = mi->bank[i].size;
> +			memblock_reserve(mi->bank[i].start, reserv_size);
> +		}
> +	}
> +
>  	/* Register the kernel text, kernel data and initrd with memblock. */
>  #ifdef CONFIG_XIP_KERNEL
>  	memblock_reserve(__pa(_sdata), _end - _sdata);
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index 580ef2d..2a17c24 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -1308,6 +1308,19 @@ static void __init map_lowmem(void)
>  		if (start >= end)
>  			break;
>  
> +		/*
> +		 * If this memblock contain memory before PAGE_OFFSET, memory
> +		 * before PAGE_OFFSET should't get directly mapped, see code
> +		 * in create_mapping(). However, memory after PAGE_OFFSET is
> +		 * occupyed by kernel and still need to be mapped.
> +		 */
> +		if (__phys_to_virt(start) < PAGE_OFFSET) {
> +			if (__phys_to_virt(end) > PAGE_OFFSET)
> +				start = __virt_to_phys(PAGE_OFFSET);
> +			else
> +				break;
> +		}
> +
>  		map.pfn = __phys_to_pfn(start);
>  		map.virtual = __phys_to_virt(start);
>  		map.length = end - start;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5248fe0..d2959e3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4840,10 +4840,13 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>  	 */
>  	if (pgdat == NODE_DATA(0)) {
>  		mem_map = NODE_DATA(0)->node_mem_map;
> -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> +		/*
> +		 * In case of CONFIG_HAVE_MEMBLOCK_NODE_MAP or when kernel
> +		 * loaded at the middle of physical memory, mem_map should
> +		 * be adjusted.
> +		 */
>  		if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
>  			mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET);
> -#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>  	}
>  #endif
>  #endif /* CONFIG_FLAT_NODE_MEM_MAP */
> -- 
> 1.8.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] ARM: allow kernel to be loaded in middle of phymem
  2014-01-23 19:15   ` Nicolas Pitre
@ 2014-01-23 19:31     ` Russell King - ARM Linux
  2014-01-23 20:01       ` Nicolas Pitre
  0 siblings, 1 reply; 13+ messages in thread
From: Russell King - ARM Linux @ 2014-01-23 19:31 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Wang Nan, kexec, Eric Biederman, Andrew Morton, Geng Hui,
	linux-arm-kernel, linux-kernel, linux-mm, stable

On Thu, Jan 23, 2014 at 02:15:07PM -0500, Nicolas Pitre wrote:
> On Wed, 22 Jan 2014, Wang Nan wrote:
> 
> > This patch allows the kernel to be loaded at the middle of kernel awared
> > physical memory. Before this patch, users must use mem= or device tree to cheat
> > kernel about the start address of physical memory.
> > 
> > This feature is useful in some special cases, for example, building a crash
> > dump kernel. Without it, kernel command line, atag and devicetree must be
> > adjusted carefully, sometimes is impossible.
> 
> With CONFIG_PATCH_PHYS_VIRT the value for PHYS_OFFSET is determined 
> dynamically by rounding down the kernel image start address to the 
> previous 16MB boundary.  In the case of a crash kernel, this might be 
> cleaner to simply readjust __pv_phys_offset during early boot and call 
> fixup_pv_table(), and then reserve away the memory from the previous 
> kernel.  That will let you access that memory directly (with gdb for 
> example) and no pointer address translation will be required.

We already have support in the kernel to ignore memory below the calculated
PHYS_OFFSET.  See 571b14375019c3a66ef70d4d4a7083f4238aca30.

-- 
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up.  Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] ARM: allow kernel to be loaded in middle of phymem
  2014-01-23 19:31     ` Russell King - ARM Linux
@ 2014-01-23 20:01       ` Nicolas Pitre
  0 siblings, 0 replies; 13+ messages in thread
From: Nicolas Pitre @ 2014-01-23 20:01 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Wang Nan, kexec, Eric Biederman, Andrew Morton, Geng Hui,
	linux-arm-kernel, linux-kernel, linux-mm, stable

On Thu, 23 Jan 2014, Russell King - ARM Linux wrote:

> On Thu, Jan 23, 2014 at 02:15:07PM -0500, Nicolas Pitre wrote:
> > On Wed, 22 Jan 2014, Wang Nan wrote:
> > 
> > > This patch allows the kernel to be loaded at the middle of kernel awared
> > > physical memory. Before this patch, users must use mem= or device tree to cheat
> > > kernel about the start address of physical memory.
> > > 
> > > This feature is useful in some special cases, for example, building a crash
> > > dump kernel. Without it, kernel command line, atag and devicetree must be
> > > adjusted carefully, sometimes is impossible.
> > 
> > With CONFIG_PATCH_PHYS_VIRT the value for PHYS_OFFSET is determined 
> > dynamically by rounding down the kernel image start address to the 
> > previous 16MB boundary.  In the case of a crash kernel, this might be 
> > cleaner to simply readjust __pv_phys_offset during early boot and call 
> > fixup_pv_table(), and then reserve away the memory from the previous 
> > kernel.  That will let you access that memory directly (with gdb for 
> > example) and no pointer address translation will be required.
> 
> We already have support in the kernel to ignore memory below the calculated
> PHYS_OFFSET.  See 571b14375019c3a66ef70d4d4a7083f4238aca30.

Sure.  Anyway what I'm suggesting above  would require that the crash 
kernel be linked at a different virtual address for that to work.  
That's probably more trouble than simply mapping the otherwise still 
unmapped memory from the crashed kernel.


Nicolas

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-01-23 20:01 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-22 11:25 [PATCH 0/3] Bugfix for kdump on arm Wang Nan
2014-01-22 11:25 ` [PATCH 1/3] ARM: Premit ioremap() to map reserved pages Wang Nan
2014-01-22 11:38   ` Will Deacon
2014-01-22 11:42   ` Russell King - ARM Linux
2014-01-22 11:55     ` Wang Nan
2014-01-22 11:25 ` [PATCH 2/3] ARM: kexec: copying code to ioremapped area Wang Nan
     [not found]   ` <CANacCWz2DdLvns9htszpwWnASrYGXQt+tHMsw4aBbjoyw-DmeQ@mail.gmail.com>
2014-01-22 13:03     ` Wang Nan
2014-01-22 13:27   ` Russell King - ARM Linux
2014-01-23  2:16     ` Wang Nan
2014-01-22 11:25 ` [PATCH 3/3] ARM: allow kernel to be loaded in middle of phymem Wang Nan
2014-01-23 19:15   ` Nicolas Pitre
2014-01-23 19:31     ` Russell King - ARM Linux
2014-01-23 20:01       ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).