[RFC 0/4] Create infrastructure for running C code from SRAM.

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC 0/4] Create infrastructure for running C code from SRAM.
@ 2013-09-03 16:44 ` Russ Dill
  0 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-03 16:44 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: linux-omap, Philipp Zabel, Shawn Guo, Grant Likely, mans

This RFC patchset explores an idea for loading C code into SRAM.
Currently, all the code I'm aware of that needs to run from SRAM is written
in assembler. The most common reason for code needing to run from SRAM is
that the memory controller is being disabled/ enabled or is already
disabled. arch/arm has by far the most examples, but code also exists in
powerpc and sh.

The code is written in asm for two primary reasons. First so that markers
can be put in indicating the size of the code they it can be copied. Second
so that data can be placed along with text and accessed in a position
independant manner.

SRAM handling code is in the process of being moved from arch directories
into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
patchset builds on that, including the limitation that the SRAM address is
not known at compile time. Because the SRAM address is not known at compile
time, the code that runs from SRAM must be compiled with -fPIC. Even if
the code were loaded to a fixed virtual address, portions of the code must
often be run with the MMU disabled.

The general idea is that for each SRAM user (such as an SoC specific
suspend/resume mechanism) to create a group of sections. The section group
is created with a single macro for each user, but end up looking like this:

.sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
  __sram_am33xx_start = .;
  *(.sram.am33xx.*)
  __sram_am33xx_end = .;
}

Any data or functions that should be copied to SRAM for this use should be
maked with an appropriate __section() attribute. A helper is then added for
translating between the original kernel symbol, and the address of that
function or variable once it has been copied into SRAM. Once control is
passed to a function within the SRAM section grouping, it can access any
variables or functions within that same SRAM section grouping without
translation.

[1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4984c6
[2] http://www.spinics.net/lists/linux-omap/msg96504.html

Russ Dill (4):
  Misc: SRAM: Create helpers for loading C code into SRAM
  ARM: SRAM: Add macro for generating SRAM resume trampoline
  Misc: SRAM: Hack for allowing executable code in SRAM.
  ARM: AM33XX: Move suspend/resume assembly to C

 arch/arm/include/asm/suspend.h    |  14 ++
 arch/arm/kernel/vmlinux.lds.S     |   2 +
 arch/arm/mach-omap2/Makefile      |   2 +-
 arch/arm/mach-omap2/pm33xx.c      |  50 ++---
 arch/arm/mach-omap2/pm33xx.h      |  23 +--
 arch/arm/mach-omap2/sleep33xx.S   | 394 --------------------------------------
 arch/arm/mach-omap2/sleep33xx.c   | 309 ++++++++++++++++++++++++++++++
 arch/arm/mach-omap2/sram.c        |  15 --
 drivers/misc/sram.c               | 106 +++++++++-
 include/asm-generic/vmlinux.lds.h |   7 +
 include/linux/sram.h              |  44 +++++
 11 files changed, 509 insertions(+), 457 deletions(-)
 delete mode 100644 arch/arm/mach-omap2/sleep33xx.S
 create mode 100644 arch/arm/mach-omap2/sleep33xx.c
 create mode 100644 include/linux/sram.h

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 0/4] Create infrastructure for running C code from SRAM.
@ 2013-09-03 16:44 ` Russ Dill
  0 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-03 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

This RFC patchset explores an idea for loading C code into SRAM.
Currently, all the code I'm aware of that needs to run from SRAM is written
in assembler. The most common reason for code needing to run from SRAM is
that the memory controller is being disabled/ enabled or is already
disabled. arch/arm has by far the most examples, but code also exists in
powerpc and sh.

The code is written in asm for two primary reasons. First so that markers
can be put in indicating the size of the code they it can be copied. Second
so that data can be placed along with text and accessed in a position
independant manner.

SRAM handling code is in the process of being moved from arch directories
into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
patchset builds on that, including the limitation that the SRAM address is
not known at compile time. Because the SRAM address is not known at compile
time, the code that runs from SRAM must be compiled with -fPIC. Even if
the code were loaded to a fixed virtual address, portions of the code must
often be run with the MMU disabled.

The general idea is that for each SRAM user (such as an SoC specific
suspend/resume mechanism) to create a group of sections. The section group
is created with a single macro for each user, but end up looking like this:

.sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
  __sram_am33xx_start = .;
  *(.sram.am33xx.*)
  __sram_am33xx_end = .;
}

Any data or functions that should be copied to SRAM for this use should be
maked with an appropriate __section() attribute. A helper is then added for
translating between the original kernel symbol, and the address of that
function or variable once it has been copied into SRAM. Once control is
passed to a function within the SRAM section grouping, it can access any
variables or functions within that same SRAM section grouping without
translation.

[1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4984c6
[2] http://www.spinics.net/lists/linux-omap/msg96504.html

Russ Dill (4):
  Misc: SRAM: Create helpers for loading C code into SRAM
  ARM: SRAM: Add macro for generating SRAM resume trampoline
  Misc: SRAM: Hack for allowing executable code in SRAM.
  ARM: AM33XX: Move suspend/resume assembly to C

 arch/arm/include/asm/suspend.h    |  14 ++
 arch/arm/kernel/vmlinux.lds.S     |   2 +
 arch/arm/mach-omap2/Makefile      |   2 +-
 arch/arm/mach-omap2/pm33xx.c      |  50 ++---
 arch/arm/mach-omap2/pm33xx.h      |  23 +--
 arch/arm/mach-omap2/sleep33xx.S   | 394 --------------------------------------
 arch/arm/mach-omap2/sleep33xx.c   | 309 ++++++++++++++++++++++++++++++
 arch/arm/mach-omap2/sram.c        |  15 --
 drivers/misc/sram.c               | 106 +++++++++-
 include/asm-generic/vmlinux.lds.h |   7 +
 include/linux/sram.h              |  44 +++++
 11 files changed, 509 insertions(+), 457 deletions(-)
 delete mode 100644 arch/arm/mach-omap2/sleep33xx.S
 create mode 100644 arch/arm/mach-omap2/sleep33xx.c
 create mode 100644 include/linux/sram.h

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 1/4] Misc: SRAM: Create helpers for loading C code into SRAM
  2013-09-03 16:44 ` Russ Dill
  (?)
@ 2013-09-03 16:44 ` Russ Dill
  -1 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-03 16:44 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: linux-omap, Philipp Zabel, Shawn Guo, Grant Likely, mans

This commit adds helpers for loading, tracking, and running code from
SRAM. Some platforms need to run code from SRAM, such as during
suspend/resume when the access to SDRAM is disabled.

Currently, this is done in assembly. By using the assembler, labels can
be added in order to mark the size of the code, and areas can be added
for position independent access to data for when the code is copied
into SRAM. As more and more such code is added, the maintenance
burden grows.

This commit adds mechanisms to allow C code to be run from SRAM.
The first step is to put code and data that will be needed when running
from SRAM into special sections. The sections should follow the naming
convention of .sram.<sram_user>.<section_name>. For example:

#define __sram_am33xx           __section(.sram.am33xx.text)
#define __sram_am33xxdata   __section(.sram.am33xx.data)

Functions can then be marked with __sram_am33xx and data with
__sram_am33xxdata. These sections and markers to them can be added
to the kernel image by inserting a "SRAM_SECTIONS(<sram_user>)" into
your architecture's vmlinux.lds.S file.

A call to sram_load_sections("<sram_dt_path>", <sram_user>) will copy
the sections into SRAM:

sram_load_sections("/ocp/ocmcram@40300000", am33xx);

Because the code is now relocated to SRAM, special accessors must
be used when accessing functions or variables.

 - kern_to_sram(ptr): Translate a pointer to a function or variable that
is contained in a .sram.<sram_user>.* section to the address it has
been loaded to in SRAM. For instance, when calling a function in
sram, one would use: "ret = kern_to_sram(&am33xx_suspend)(flags);"

- sram_to_phys(addr): Translate an pointer into SRAM to a physical
address. This is useful for obtaining a physical pointer to a resume
function contained in SRAM.

Any compilation unit that will be copied to SRAM that accesses other
functions or variables that are also copied to SRAM should be
compiled with -fPIC as calling the kern_to_sram accessors
requires reads from SDRAM.

Functions within a SRAM section group can access functions and
data also within the same SRAM section group without using
kern_to_sram accessors so long as the unit is compiled with -fPIC.
Care should be taken to use a method for obtaining an absolute
address when accessing functions or data outside of the SRAM
section group.

Signed-off-by: Russ Dill <Russ.Dill@ti.com>
---
 drivers/misc/sram.c               | 103 ++++++++++++++++++++++++++++++++++++++
 include/asm-generic/vmlinux.lds.h |   7 +++
 include/linux/sram.h              |  44 ++++++++++++++++
 3 files changed, 154 insertions(+)
 create mode 100644 include/linux/sram.h

diff --git a/drivers/misc/sram.c b/drivers/misc/sram.c
index d87cc91..08baaab 100644
--- a/drivers/misc/sram.c
+++ b/drivers/misc/sram.c
@@ -25,17 +25,110 @@
 #include <linux/io.h>
 #include <linux/of.h>
 #include <linux/platform_device.h>
+#include <linux/of_platform.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/genalloc.h>
+#include <linux/sram.h>
+#include <asm-generic/cacheflush.h>
 
 #define SRAM_GRANULARITY	32
 
+static DEFINE_SPINLOCK(sram_lock);
+static LIST_HEAD(chunk_list);
+
 struct sram_dev {
 	struct gen_pool *pool;
 	struct clk *clk;
 };
 
+struct sram_chunk {
+	struct gen_pool *pool;
+	struct list_head node;
+	struct rcu_head head;
+	void *kern;
+	unsigned long addr;
+	size_t sz;
+};
+
+int sram_load_data(const char *of_path, void *data, size_t sz)
+{
+	struct device_node *np;
+	struct platform_device *pdev;
+	struct sram_chunk *chunk;
+	struct sram_dev *sram;
+
+	np = of_find_node_by_path(of_path);
+	if (!np)
+		return -ENODEV;
+
+	pdev = of_find_device_by_node(np);
+	if (!pdev)
+		return -ENODEV;
+
+	sram = platform_get_drvdata(pdev);
+
+	chunk = kzalloc(sizeof(*chunk), GFP_KERNEL);
+	if (!chunk)
+		return -ENOMEM;
+
+	chunk->pool = sram->pool;
+	chunk->kern = data;
+	chunk->sz = sz;
+
+	chunk->addr = gen_pool_alloc(sram->pool, sz);
+	if (!chunk->addr) {
+		kfree(chunk);
+		return -ENOMEM;
+	}
+
+	memcpy((void *) chunk->addr, data, sz);
+	flush_icache_range(chunk->addr, chunk->addr + sz);
+
+	spin_lock(&sram_lock);
+	list_add_rcu(&chunk->node, &chunk_list);
+	spin_unlock(&sram_lock);
+
+	return 0;
+
+}
+EXPORT_SYMBOL_GPL(sram_load_data);
+
+phys_addr_t sram_to_phys(unsigned long addr)
+{
+	struct sram_chunk *chunk;
+	phys_addr_t ret = -1;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(chunk, &chunk_list, node) {
+		ret = gen_pool_virt_to_phys(chunk->pool, addr);
+		if (ret != -1)
+			break;
+	}
+	rcu_read_unlock();
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(sram_to_phys);
+
+void __iomem *__kern_to_sram(void *ptr)
+{
+	struct sram_chunk *chunk;
+	void __iomem *ret = NULL;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(chunk, &chunk_list, node) {
+		if (ptr >= chunk->kern && ptr < chunk->kern + chunk->sz) {
+			ret = (void *) (ptr - chunk->kern + chunk->addr);
+			break;
+		}
+	}
+	rcu_read_unlock();
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(__kern_to_sram);
+
 static int sram_probe(struct platform_device *pdev)
 {
 	void __iomem *virt_base;
@@ -82,6 +175,16 @@ static int sram_probe(struct platform_device *pdev)
 static int sram_remove(struct platform_device *pdev)
 {
 	struct sram_dev *sram = platform_get_drvdata(pdev);
+	struct sram_chunk *chunk;
+
+	list_for_each_entry(chunk, &chunk_list, node) {
+		if (chunk->pool == sram->pool) {
+			spin_lock(&sram_lock);
+			list_del_rcu(&chunk->node);
+			spin_unlock(&sram_lock);
+			kfree_rcu(chunk, head);
+		}
+	}
 
 	if (gen_pool_avail(sram->pool) < gen_pool_size(sram->pool))
 		dev_dbg(&pdev->dev, "removed while SRAM allocated\n");
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 69732d2..4817732 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -797,3 +797,10 @@
 	BSS(bss_align)							\
 	. = ALIGN(stop_align);						\
 	VMLINUX_SYMBOL(__bss_stop) = .;
+
+#define SRAM_SECTIONS(name)						\
+	.sram.##name : AT(ADDR(.sram.##name) - LOAD_OFFSET) {		\
+		VMLINUX_SYMBOL(__sram_##name##_start) = .;		\
+		*(.sram.##name##.*)					\
+		VMLINUX_SYMBOL(__sram_##name##_end) = .;		\
+	}
diff --git a/include/linux/sram.h b/include/linux/sram.h
new file mode 100644
index 0000000..8ebc0bb
--- /dev/null
+++ b/include/linux/sram.h
@@ -0,0 +1,44 @@
+/*
+ * Generic on-chip SRAM allocation driver
+ *
+ * Copyright (C) 2012 Philipp Zabel, Pengutronix
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+ * MA 02110-1301, USA.
+ */
+
+#ifndef _LINUX_SRAM_H
+#define _LINUX_SRAM_H
+
+#include <linux/kernel.h>
+
+extern int sram_load_data(const char *of_path, void *data, size_t sz);
+extern phys_addr_t sram_to_phys(unsigned long addr);
+extern void __iomem *__kern_to_sram(void *ptr);
+
+#define sram_load_sections(of_path, name)				\
+	({								\
+		extern unsigned char __sram_##name##_start[];		\
+		extern unsigned char __sram_##name##_end[];		\
+		sram_load_data(of_path, __sram_##name##_start,		\
+			__sram_##name##_end - __sram_##name##_start);	\
+	})
+
+#define kern_to_sram(p)							\
+	({ 								\
+		typeof(p) r = __kern_to_sram((void *) (p));		\
+		r;							\
+	})
+
+#endif
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 2/4] ARM: SRAM: Add macro for generating SRAM resume trampoline
  2013-09-03 16:44 ` Russ Dill
  (?)
  (?)
@ 2013-09-03 16:44 ` Russ Dill
  -1 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-03 16:44 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: linux-omap, Philipp Zabel, Shawn Guo, Grant Likely, mans

Add a helper that generates a short snippet of code that loads
the stack pointer and calls a c (or asm) function. The code gets
placed into a SRAM section for loading into SRAM.

Signed-off-by: Russ Dill <Russ.Dill@ti.com>
---
 arch/arm/include/asm/suspend.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/arm/include/asm/suspend.h b/arch/arm/include/asm/suspend.h
index cd20029..d23d004 100644
--- a/arch/arm/include/asm/suspend.h
+++ b/arch/arm/include/asm/suspend.h
@@ -9,4 +9,18 @@ struct sleep_save_sp {
 extern void cpu_resume(void);
 extern int cpu_suspend(unsigned long, int (*)(unsigned long));
 
+/*
+ * Generate a SRAM trampoline for resume.
+ * @proc: SoC, should match argument used with SRAM_SECTIONS().
+ * @func: C or asm function to call at resume.
+ * @stack: Stack to use before calling func.
+ */
+#define ARM_SRAM_RESUME(proc, func, stack)				\
+void __naked __noreturn __sram_##proc proc##_resume_trampoline(void)	\
+{									\
+	__asm__ __volatile__("mov sp, %0" : : "r"((stack)) : "sp");	\
+	func();								\
+}
+
+
 #endif
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 3/4] Misc: SRAM: Hack for allowing executable code in SRAM.
  2013-09-03 16:44 ` Russ Dill
                   ` (2 preceding siblings ...)
  (?)
@ 2013-09-03 16:44 ` Russ Dill
  2013-09-04 18:06     ` Tony Lindgren
  -1 siblings, 1 reply; 38+ messages in thread
From: Russ Dill @ 2013-09-03 16:44 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: linux-omap, Philipp Zabel, Shawn Guo, Grant Likely, mans

The generic SRAM mechanism does not ioremap memory in a
manner that allows code to be executed from SRAM. There is
currently no generic way to request ioremap to return a
memory area with execution allowed.

Insert a temporary hack for proof of concept on ARM.

Signed-off-by: Russ Dill <Russ.Dill@ti.com>
---
 drivers/misc/sram.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/sram.c b/drivers/misc/sram.c
index 08baaab..e059a23 100644
--- a/drivers/misc/sram.c
+++ b/drivers/misc/sram.c
@@ -31,6 +31,7 @@
 #include <linux/genalloc.h>
 #include <linux/sram.h>
 #include <asm-generic/cacheflush.h>
+#include <asm/io.h>
 
 #define SRAM_GRANULARITY	32
 
@@ -138,7 +139,7 @@ static int sram_probe(struct platform_device *pdev)
 	int ret;
 
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	virt_base = devm_ioremap_resource(&pdev->dev, res);
+	virt_base = __arm_ioremap_exec(res->start, resource_size(res), false);
 	if (IS_ERR(virt_base))
 		return PTR_ERR(virt_base);
 
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 4/4] ARM: AM33XX: Move suspend/resume assembly to C
  2013-09-03 16:44 ` Russ Dill
                   ` (3 preceding siblings ...)
  (?)
@ 2013-09-03 16:44 ` Russ Dill
  -1 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-03 16:44 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: linux-omap, Philipp Zabel, Shawn Guo, Grant Likely, mans

This provides a proof of concept of the SRAM helpers. The
SRAM_SECTIONS macro is added to vmlinux.lds.S, the code
in sleep33xx.S is translated to C and moved to sleep33xx.c.
In the process, a struct used for passing arguments is
removed and instead the fixed arguments are only passed
once at init time, only an argument with some flags
remains.

Signed-off-by: Russ Dill <Russ.Dill@ti.com>
---
 arch/arm/kernel/vmlinux.lds.S   |   2 +
 arch/arm/mach-omap2/Makefile    |   2 +-
 arch/arm/mach-omap2/pm33xx.c    |  50 ++---
 arch/arm/mach-omap2/pm33xx.h    |  23 +--
 arch/arm/mach-omap2/sleep33xx.S | 394 ----------------------------------------
 arch/arm/mach-omap2/sleep33xx.c | 309 +++++++++++++++++++++++++++++++
 arch/arm/mach-omap2/sram.c      |  15 --
 7 files changed, 339 insertions(+), 456 deletions(-)
 delete mode 100644 arch/arm/mach-omap2/sleep33xx.S
 create mode 100644 arch/arm/mach-omap2/sleep33xx.c

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 7bcee5c..21c3b64 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -112,6 +112,8 @@ SECTIONS
 			ARM_CPU_KEEP(PROC_INFO)
 	}
 
+	SRAM_SECTIONS(am33xx)
+
 	RO_DATA(PAGE_SIZE)
 
 	. = ALIGN(4);
diff --git a/arch/arm/mach-omap2/Makefile b/arch/arm/mach-omap2/Makefile
index dcf5d89..195485c 100644
--- a/arch/arm/mach-omap2/Makefile
+++ b/arch/arm/mach-omap2/Makefile
@@ -95,7 +95,7 @@ obj-$(CONFIG_POWER_AVS_OMAP_CLASS3)    += smartreflex-class3.o
 
 AFLAGS_sleep24xx.o			:=-Wa,-march=armv6
 AFLAGS_sleep34xx.o			:=-Wa,-march=armv7-a$(plus_sec)
-AFLAGS_sleep33xx.o			:=-Wa,-march=armv7-a$(plus_sec)
+CFLAGS_sleep33xx.o			+= -fPIC -march=armv7-a
 
 endif
 
diff --git a/arch/arm/mach-omap2/pm33xx.c b/arch/arm/mach-omap2/pm33xx.c
index ea36415..0cd3e76 100644
--- a/arch/arm/mach-omap2/pm33xx.c
+++ b/arch/arm/mach-omap2/pm33xx.c
@@ -27,6 +27,7 @@
 #include <linux/module.h>
 #include <linux/interrupt.h>
 #include <linux/ti_emif.h>
+#include <linux/sram.h>
 #include <linux/omap-mailbox.h>
 
 #include <asm/suspend.h>
@@ -45,8 +46,8 @@
 #include "omap_hwmod.h"
 #include "omap_device.h"
 #include "soc.h"
-#include "sram.h"
 
+static unsigned long am33xx_wfi_flags;
 static void __iomem *am33xx_emif_base;
 static struct powerdomain *cefuse_pwrdm, *gfx_pwrdm, *per_pwrdm, *mpu_pwrdm;
 static struct clockdomain *gfx_l4ls_clkdm;
@@ -61,9 +62,6 @@ struct forced_standby_module am33xx_mod[] = {
 	{.oh_name = "cpgmac0"},
 };
 
-static void (*am33xx_do_wfi_sram)(struct am33xx_suspend_params *);
-static struct am33xx_suspend_params susp_params;
-
 #ifdef CONFIG_SUSPEND
 
 struct wakeup_src wakeups[] = {
@@ -87,11 +85,6 @@ static DECLARE_COMPLETION(am33xx_pm_sync);
 
 #endif
 
-static int am33xx_do_sram_idle(long unsigned int arg)
-{
-	am33xx_do_wfi_sram((struct am33xx_suspend_params *) arg);
-	return 0;
-}
 
 int am33xx_do_sram_cpuidle(u32 wfi_flags, u32 m3_flags)
 {
@@ -158,8 +151,7 @@ static int am33xx_pm_suspend(void)
 
 	/* Try to put GFX to sleep */
 	omap_set_pwrdm_state(gfx_pwrdm, PWRDM_POWER_OFF);
-	ret = cpu_suspend((long unsigned int) &susp_params,
-							am33xx_do_sram_idle);
+	ret = cpu_suspend(am33xx_wfi_flags, am33xx_suspend);
 
 	status = pwrdm_read_prev_pwrst(gfx_pwrdm);
 	if (status != PWRDM_POWER_OFF)
@@ -366,6 +358,7 @@ static void am33xx_pm_firmware_cb(const struct firmware *fw, void *context)
 {
 	struct am33xx_pm_context *am33xx_pm = context;
 	int ret = 0;
+	unsigned long sram_trampoline;
 
 	/* no firmware found */
 	if (!fw) {
@@ -395,8 +388,8 @@ static void am33xx_pm_firmware_cb(const struct firmware *fw, void *context)
 	}
 
 	/* Physical resume address to be used by ROM code */
-	am33xx_pm->ipc.resume_addr = (AM33XX_OCMC_END -
-		am33xx_do_wfi_sz + am33xx_resume_offset + 0x4);
+	sram_trampoline = (long) kern_to_sram(&am33xx_resume_trampoline);
+	am33xx_pm->ipc.resume_addr = sram_to_phys(sram_trampoline);
 
 	am33xx_pm->mbox = omap_mbox_get("wkup_m3", &wkup_mbox_notifier);
 
@@ -413,15 +406,6 @@ static void am33xx_pm_firmware_cb(const struct firmware *fw, void *context)
 
 #endif /* CONFIG_SUSPEND */
 
-/*
- * Push the minimal suspend-resume code to SRAM
- */
-void am33xx_push_sram_idle(void)
-{
-	am33xx_do_wfi_sram = (void *)omap_sram_push
-					(am33xx_do_wfi, am33xx_do_wfi_sz);
-}
-
 static int __init am33xx_map_emif(void)
 {
 	am33xx_emif_base = ioremap(AM33XX_EMIF_BASE, SZ_32K);
@@ -480,21 +464,27 @@ int __init am33xx_pm_init(void)
 	temp = readl(am33xx_emif_base + EMIF_SDRAM_CONFIG);
 	temp = (temp & SDRAM_TYPE_MASK) >> SDRAM_TYPE_SHIFT;
 	/* Parameters to pass to assembly code */
-	susp_params.wfi_flags = 0;
-	susp_params.emif_addr_virt = am33xx_emif_base;
-	susp_params.dram_sync = am33xx_dram_sync;
+	am33xx_wfi_flags = 0;
 	am33xx_pm->ipc.param3 = temp;
 	switch (temp) {
 	case MEM_TYPE_DDR2:
-		susp_params.wfi_flags |= WFI_MEM_TYPE_DDR2;
+		am33xx_wfi_flags |= WFI_MEM_TYPE_DDR2;
 		break;
 	case MEM_TYPE_DDR3:
-		susp_params.wfi_flags |= WFI_MEM_TYPE_DDR3;
+		am33xx_wfi_flags |= WFI_MEM_TYPE_DDR3;
 		break;
 	}
-	susp_params.wfi_flags |= WFI_SELF_REFRESH;
-	susp_params.wfi_flags |= WFI_SAVE_EMIF;
-	susp_params.wfi_flags |= WFI_WAKE_M3;
+	am33xx_wfi_flags |= WFI_SELF_REFRESH;
+	am33xx_wfi_flags |= WFI_SAVE_EMIF;
+	am33xx_wfi_flags |= WFI_WAKE_M3;
+
+	ret = sram_load_sections("/ocp/ocmcram@40300000", am33xx);
+	if (ret < 0) {
+		pr_err("%s: Could not load SRAM\n", __func__);
+		goto err;
+	}
+
+	am33xx_sram_init(am33xx_emif_base, am33xx_dram_sync);
 
 	np = of_find_compatible_node(NULL, NULL, "ti,am3353-wkup-m3");
 	if (np) {
diff --git a/arch/arm/mach-omap2/pm33xx.h b/arch/arm/mach-omap2/pm33xx.h
index b9c3a3f..aaa5f71 100644
--- a/arch/arm/mach-omap2/pm33xx.h
+++ b/arch/arm/mach-omap2/pm33xx.h
@@ -16,10 +16,9 @@
 #ifndef __ARCH_ARM_MACH_OMAP2_PM33XX_H
 #define __ARCH_ARM_MACH_OMAP2_PM33XX_H
 
+#include <linux/kernel.h>
 #include "control.h"
 
-#ifndef __ASSEMBLER__
-
 struct am33xx_pm_context {
 	struct am33xx_ipc_data	ipc;
 	struct firmware		*firmware;
@@ -28,19 +27,6 @@ struct am33xx_pm_context {
 	u32			ver;
 };
 
-/*
- * Params passed to suspend routine
- *
- * Since these are used to load into registers by suspend code,
- * entries here must always be in sync with the suspend code
- * in arm/mach-omap2/sleep33xx.S
- */
-struct am33xx_suspend_params {
-	void __iomem *emif_addr_virt;
-	u32 wfi_flags;
-	void __iomem *dram_sync;
-};
-
 struct wakeup_src {
 	int irq_nr;
 	char src[10];
@@ -55,8 +41,13 @@ int wkup_m3_copy_code(const u8 *data, size_t size);
 int wkup_m3_prepare(void);
 void wkup_m3_register_txev_handler(void (*txev_handler)(void));
 int am33xx_do_sram_cpuidle(u32, u32);
+int am33xx_suspend(long unsigned int flags);
+void am33xx_resume_trampoline(void);
+void am33xx_sram_init(void __iomem *emif_base, void __iomem *dram_sync);
 
-#endif
+#define __sram_am33xx		__section(.sram.am33xx.text)
+#define __sram_am33xxdata	__section(.sram.am33xx.data)
+#define __sram_am33xxconst	__section(.sram.am33xx.rodata)
 
 #define	IPC_CMD_DS0			0x4
 #define	IPC_CMD_IDLE			0xd
diff --git a/arch/arm/mach-omap2/sleep33xx.S b/arch/arm/mach-omap2/sleep33xx.S
deleted file mode 100644
index 317fb77..0000000
--- a/arch/arm/mach-omap2/sleep33xx.S
+++ /dev/null
@@ -1,394 +0,0 @@
-/*
- * Low level suspend code for AM33XX SoCs
- *
- * Copyright (C) 2012 Texas Instruments Incorporated - http://www.ti.com/
- * Vaibhav Bedia <vaibhav.bedia@ti.com>
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation version 2.
- *
- * This program is distributed "as is" WITHOUT ANY WARRANTY of any
- * kind, whether express or implied; without even the implied warranty
- * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- */
-
-#include <linux/linkage.h>
-#include <linux/ti_emif.h>
-#include <asm/memory.h>
-#include <asm/assembler.h>
-
-#include "cm33xx.h"
-#include "pm33xx.h"
-#include "prm33xx.h"
-
-	.text
-	.align 3
-
-/*
- * This routine is executed from internal RAM and expects some
- * parameters to be passed in r0 _strictly_ in following order:
- * 1) emif_addr_virt - ioremapped EMIF address
- * 2) wfi_flags - actions to perform
- * 3) dram_sync_word - uncached word in SDRAM
- *
- * The code loads these values taking r0 value as reference to
- * the array in registers starting from r0, i.e emif_addr_virt
- * goes to r1, wfi_flags goes to r2 and and so on. These are
- * then saved into memory locations before proceeding with the
- * sleep sequence and hence registers r0, r1 etc can still be
- * used in the rest of the sleep code.
- */
-
-ENTRY(am33xx_do_wfi)
-	stmfd	sp!, {r4 - r11, lr}	@ save registers on stack
-
-	ldm	r0, {r1-r3}		@ gather values passed
-
-	/* Save the values passed */
-	str	r1, emif_addr_virt
-	str	r2, wfi_flags
-	str	r3, dram_sync_word
-
-	tst	r2, #WFI_SELF_REFRESH
-	beq	skip_sr
-
-	/*
-	 * Flush all data from the L1 data cache before disabling
-	 * SCTLR.C bit.
-	 */
-	ldr	r1, kernel_flush
-	blx	r1
-
-	/*
-	 * Clear the SCTLR.C bit to prevent further data cache
-	 * allocation. Clearing SCTLR.C would make all the data accesses
-	 * strongly ordered and would not hit the cache.
-	 */
-	mrc	p15, 0, r0, c1, c0, 0
-	bic	r0, r0, #(1 << 2)	@ Disable the C bit
-	mcr	p15, 0, r0, c1, c0, 0
-	isb
-
-	/*
-	 * Invalidate L1 data cache. Even though only invalidate is
-	 * necessary exported flush API is used here. Doing clean
-	 * on already clean cache would be almost NOP.
-	 */
-	ldr	r1, kernel_flush
-	blx	r1
-
-	ldr	r0, emif_addr_virt
-
-	/* Save config register */
-	ldr	r1, [r0, #EMIF_SDRAM_CONFIG]
-	str	r1, emif_sdcfg_val
-
-	/* Only necessary if PER is losing context */
-	ldr	r2, wfi_flags
-	tst	r2, #WFI_SAVE_EMIF
-	beq	skip_save_emif
-
-	/* Save EMIF configuration */
-	ldr	r1, [r0, #EMIF_SDRAM_REFRESH_CONTROL]
-	str	r1, emif_ref_ctrl_val
-	ldr	r1, [r0, #EMIF_SDRAM_TIMING_1]
-	str	r1, emif_timing1_val
-	ldr	r1, [r0, #EMIF_SDRAM_TIMING_2]
-	str	r1, emif_timing2_val
-	ldr	r1, [r0, #EMIF_SDRAM_TIMING_3]
-	str	r1, emif_timing3_val
-	ldr	r1, [r0, #EMIF_POWER_MANAGEMENT_CONTROL]
-	str	r1, emif_pmcr_val
-	ldr	r1, [r0, #EMIF_POWER_MANAGEMENT_CTRL_SHDW]
-	str	r1, emif_pmcr_shdw_val
-	ldr	r1, [r0, #EMIF_SDRAM_OUTPUT_IMPEDANCE_CALIBRATION_CONFIG]
-	str	r1, emif_zqcfg_val
-	ldr	r1, [r0, #EMIF_DDR_PHY_CTRL_1]
-	str	r1, emif_rd_lat_val
-
-skip_save_emif:
-	/* Put SDRAM in self-refresh */
-	ldr	r1, [r0, #EMIF_POWER_MANAGEMENT_CONTROL]
-	orr	r1, r1, #0xa0
-	str	r1, [r0, #EMIF_POWER_MANAGEMENT_CTRL_SHDW]
-	str	r1, [r0, #4]
-
-	ldr	r1, dram_sync_word	@ a dummy access to DDR as per spec
-	ldr	r2, [r1, #0]
-	ldr	r1, [r0, #EMIF_POWER_MANAGEMENT_CONTROL]
-	orr	r1, r1, #0x200
-	str	r1, [r0, #EMIF_POWER_MANAGEMENT_CONTROL]
-
-	mov	r1, #0x1000		@ Wait for system to enter SR
-wait_sr:
-	subs	r1, r1, #1
-	bne	wait_sr
-
-	/* Disable EMIF */
-	ldr	r1, virt_emif_clkctrl
-	ldr	r2, [r1]
-	bic	r2, r2, #0x03
-	str	r2, [r1]
-
-	ldr	r1, virt_emif_clkctrl
-wait_emif_disable:
-	ldr	r2, [r1]
-	ldr	r3, module_disabled_val
-	cmp	r2, r3
-	bne	wait_emif_disable
-
-skip_sr:
-	ldr	r2, wfi_flags
-	tst	r2, #WFI_WAKE_M3
-	beq	skip_m3
-
-	/*
-	 * For the MPU WFI to be registered as an interrupt
-	 * to WKUP_M3, MPU_CLKCTRL.MODULEMODE needs to be set
-	 * to DISABLED
-	 */
-	ldr	r1, virt_mpu_clkctrl
-	ldr	r2, [r1]
-	bic	r2, r2, #0x03
-	str	r2, [r1]
-
-skip_m3:
-	/*
-	 * Execute an ISB instruction to ensure that all of the
-	 * CP15 register changes have been committed.
-	 */
-	isb
-
-	/*
-	 * Execute a barrier instruction to ensure that all cache,
-	 * TLB and branch predictor maintenance operations issued
-	 * have completed.
-	 */
-	dsb
-	dmb
-
-	/*
-	 * Execute a WFI instruction and wait until the
-	 * STANDBYWFI output is asserted to indicate that the
-	 * CPU is in idle and low power state. CPU can specualatively
-	 * prefetch the instructions so add NOPs after WFI. Thirteen
-	 * NOPs as per Cortex-A8 pipeline.
-	 */
-	wfi
-
-	nop
-	nop
-	nop
-	nop
-	nop
-	nop
-	nop
-	nop
-	nop
-	nop
-	nop
-	nop
-	nop
-
-	/* We come here in case of an abort due to a late interrupt */
-	ldr	r2, wfi_flags
-	tst	r2, #WFI_WAKE_M3
-
-	/* Set MPU_CLKCTRL.MODULEMODE back to ENABLE */
-	ldrne	r1, virt_mpu_clkctrl
-	movne	r2, #0x02
-	strne	r2, [r1]
-
-	ldr	r2, wfi_flags
-	tst	r2, #WFI_SELF_REFRESH
-	beq	skip_reenable_emif
-
-	/* Re-enable EMIF */
-	ldr	r1, virt_emif_clkctrl
-	mov	r2, #0x02
-	str	r2, [r1]
-wait_emif_enable:
-	ldr	r3, [r1]
-	cmp	r2, r3
-	bne	wait_emif_enable
-
-	/* Disable EMIF self-refresh */
-	ldr	r0, emif_addr_virt
-	ldr	r1, [r0, #EMIF_POWER_MANAGEMENT_CONTROL]
-	bic	r1, r1, #LP_MODE_MASK
-	str	r1, [r0, #EMIF_POWER_MANAGEMENT_CONTROL]
-	str	r1, [r0, #EMIF_POWER_MANAGEMENT_CTRL_SHDW]
-
-	/*
-	 * A write to SDRAM CONFIG register triggers
-	 * an init sequence and hence it must be done
-	 * at the end for DDR2
-	 */
-	ldr r0, emif_addr_virt
-	add r0, r0, #EMIF_SDRAM_CONFIG
-	ldr r4, emif_sdcfg_val
-	str r4, [r0]
-
-	/*
-	 * Set SCTLR.C bit to allow data cache allocation
-	 */
-	mrc	p15, 0, r0, c1, c0, 0
-	orr	r0, r0, #(1 << 2)	@ Enable the C bit
-	mcr	p15, 0, r0, c1, c0, 0
-	isb
-
-	/* Kill some time for sanity to settle in */
-	mov r0, #0x1000
-wait_abt:
-	subs   r0, r0, #1
-	bne wait_abt
-
-	/* Let the suspend code know about the abort */
-skip_reenable_emif:
-	mov	r0, #1
-	ldmfd	sp!, {r4 - r11, pc}	@ restore regs and return
-ENDPROC(am33xx_do_wfi)
-
-	.align
-ENTRY(am33xx_resume_offset)
-	.word . - am33xx_do_wfi
-
-ENTRY(am33xx_resume_from_deep_sleep)
-	ldr	r2, wfi_flags
-	tst	r2, #WFI_SELF_REFRESH
-	beq	skip_reenable_emif1
-
-	/* Re-enable EMIF */
-	ldr	r0, phys_emif_clkctrl
-	mov	r1, #0x02
-	str	r1, [r0]
-wait_emif_enable1:
-	ldr	r2, [r0]
-	cmp	r1, r2
-	bne	wait_emif_enable1
-
-	ldr	r0, emif_phys_addr
-
-	ldr	r2, wfi_flags
-	tst	r2, #WFI_SAVE_EMIF
-	beq	skip_restore_emif
-
-	/* Config EMIF Timings */
-	ldr	r1, emif_rd_lat_val
-	str	r1, [r0, #EMIF_DDR_PHY_CTRL_1]
-	str	r1, [r0, #EMIF_DDR_PHY_CTRL_1_SHDW]
-	ldr	r1, emif_timing1_val
-	str	r1, [r0, #EMIF_SDRAM_TIMING_1]
-	str	r1, [r0, #EMIF_SDRAM_TIMING_1_SHDW]
-	ldr	r1, emif_timing2_val
-	str	r1, [r0, #EMIF_SDRAM_TIMING_2]
-	str	r1, [r0, #EMIF_SDRAM_TIMING_2_SHDW]
-	ldr	r1, emif_timing3_val
-	str	r1, [r0, #EMIF_SDRAM_TIMING_3]
-	str	r1, [r0, #EMIF_SDRAM_TIMING_3_SHDW]
-	ldr	r1, emif_ref_ctrl_val
-	str	r1, [r0, #EMIF_SDRAM_REFRESH_CONTROL]
-	str	r1, [r0, #EMIF_SDRAM_REFRESH_CTRL_SHDW]
-	ldr	r1, emif_pmcr_val
-	str	r1, [r0, #EMIF_POWER_MANAGEMENT_CONTROL]
-	ldr	r1, emif_pmcr_shdw_val
-	str	r1, [r0, #EMIF_POWER_MANAGEMENT_CTRL_SHDW]
-
-	/*
-	 * Output impedence calib needed only for DDR3
-	 * but since the initial state of this will be
-	 * disabled for DDR2 no harm in restoring the
-	 * old configuration
-	 */
-	ldr	r1, emif_zqcfg_val
-	str	r1, [r0, #EMIF_SDRAM_OUTPUT_IMPEDANCE_CALIBRATION_CONFIG]
-
-	/* Write to SDRAM_CONFIG only for DDR2 */
-	ldr	r2, wfi_flags
-	tst	r2, #WFI_MEM_TYPE_DDR2
-	bne	resume_to_ddr2
-	b	resume_to_ddr3
-
-skip_restore_emif:
-	/* Disable EMIF self-refresh */
-	ldr	r1, [r0, #EMIF_POWER_MANAGEMENT_CONTROL]
-	bic	r1, r1, #LP_MODE_MASK
-	str	r1, [r0, #EMIF_POWER_MANAGEMENT_CONTROL]
-	str	r1, [r0, #EMIF_POWER_MANAGEMENT_CTRL_SHDW]
-
-resume_to_ddr2:
-	/*
-	 * A write to SDRAM CONFIG register triggers
-	 * an init sequence and hence it must be done
-	 * at the end for DDR2
-	 */
-	ldr	r1, emif_sdcfg_val
-	str	r1, [r0, #EMIF_SDRAM_CONFIG]
-
-resume_to_ddr3:
-	/* Back from la-la-land. Kill some time for sanity to settle in */
-	mov	r0, #0x1000
-wait_resume:
-	subs	r0, r0, #1
-	bne	wait_resume
-
-skip_reenable_emif1:
-	/* We are back. Branch to the common CPU resume routine */
-	mov	r0, #0
-	ldr	pc, resume_addr
-ENDPROC(am33xx_resume_from_deep_sleep)
-
-
-/*
- * Local variables
- */
-	.align
-resume_addr:
-	.word	cpu_resume - PAGE_OFFSET + 0x80000000
-kernel_flush:
-	.word   v7_flush_dcache_all
-ddr_start:
-	.word	PAGE_OFFSET
-emif_phys_addr:
-	.word	AM33XX_EMIF_BASE
-virt_mpu_clkctrl:
-	.word	AM33XX_CM_MPU_MPU_CLKCTRL
-virt_emif_clkctrl:
-	.word	AM33XX_CM_PER_EMIF_CLKCTRL
-phys_emif_clkctrl:
-	.word	(AM33XX_CM_BASE + AM33XX_CM_PER_MOD + \
-		AM33XX_CM_PER_EMIF_CLKCTRL_OFFSET)
-module_disabled_val:
-	.word	0x30000
-
-/* DDR related defines */
-dram_sync_word:
-	.word	0xDEADBEEF
-wfi_flags:
-	.word	0xDEADBEEF
-emif_addr_virt:
-	.word	0xDEADBEEF
-emif_rd_lat_val:
-	.word	0xDEADBEEF
-emif_timing1_val:
-	.word	0xDEADBEEF
-emif_timing2_val:
-	.word	0xDEADBEEF
-emif_timing3_val:
-	.word	0xDEADBEEF
-emif_sdcfg_val:
-	.word	0xDEADBEEF
-emif_ref_ctrl_val:
-	.word	0xDEADBEEF
-emif_zqcfg_val:
-	.word	0xDEADBEEF
-emif_pmcr_val:
-	.word	0xDEADBEEF
-emif_pmcr_shdw_val:
-	.word	0xDEADBEEF
-
-	.align 3
-ENTRY(am33xx_do_wfi_sz)
-	.word	. - am33xx_do_wfi
diff --git a/arch/arm/mach-omap2/sleep33xx.c b/arch/arm/mach-omap2/sleep33xx.c
new file mode 100644
index 0000000..a2b8608
--- /dev/null
+++ b/arch/arm/mach-omap2/sleep33xx.c
@@ -0,0 +1,309 @@
+/*
+ * AM33XX Power Management Routines
+ *
+ * Copyright (C) 2012 Texas Instruments Incorporated - http://www.ti.com/
+ * Vaibhav Bedia <vaibhav.bedia@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include <linux/kernel.h>
+#include <linux/io.h>
+#include <linux/ti_emif.h>
+#include <linux/platform_data/emif_plat.h>
+#include <linux/sram.h>
+
+#include <asm/suspend.h>
+#include <asm/cp15.h>
+
+#include "pm33xx.h"
+#include "cm33xx.h"
+#include "cm-regbits-33xx.h"
+#include "omap_hwmod.h"
+
+#define CLKCTRL_IDLEST_FUNCTIONAL	0x0
+#define CLKCTRL_IDLEST_DISABLED		0x3
+
+struct emif_regs {
+	u32 sdcfg;
+	u32 ref_ctrl;
+	u32 timing1;
+	u32 timing2;
+	u32 timing3;
+	u32 pmcr;
+	u32 pmcr_shdw;
+	u32 zqcfg;
+	u32 rd_lat;
+};
+
+extern int call_with_stack(int (*fn)(void *), void *arg, void *sp);
+extern void v7_flush_dcache_all(void);
+
+static void (*__abs_v7_flush_dcache_all)(void) __sram_am33xxdata;
+static char sram_stack[1024] __sram_am33xxdata;
+static void __noreturn (*__cpu_resume_phys)(void) __sram_am33xxdata;
+static struct emif_regs emif_regs __sram_am33xxdata;
+static void __iomem *emif_virt_base __sram_am33xxdata;
+static void __iomem *emif_base __sram_am33xxdata;
+static void __iomem *dram_sync_addr __sram_am33xxdata;
+static u32 wfi_flags __sram_am33xxdata;
+static u32 cm_offset __sram_am33xxdata;
+
+static inline void flush_dcache_all(void)
+{
+	__asm__ __volatile__("" : : : "r0", "r1", "r2", "r3", "r4", "r5",
+				"r5", "r6", "r7", "r9", "r10", "r11");
+	__abs_v7_flush_dcache_all();
+}
+
+static u32 __sram_am33xx emif_read(u16 idx)
+{
+	return __raw_readl(emif_base + idx);
+}
+
+static void __sram_am33xx emif_write(u32 val, u16 idx)
+{
+	__raw_writel(val, emif_base + idx);
+}
+
+static inline void am33xx_cm_write(u32 val, void __iomem *reg)
+{
+	__raw_writel(val, reg + cm_offset);
+}
+
+static inline u32 am33xx_cm_read(void __iomem *reg)
+{
+	return __raw_readl(reg + cm_offset);
+}
+
+static void __sram_am33xx am33xx_module_set(u16 mode, void __iomem *reg)
+{
+	u32 val = am33xx_cm_read(reg) & ~AM33XX_MODULEMODE_MASK;
+	am33xx_cm_write(val | mode, reg);
+}
+
+static void __sram_am33xx am33xx_module_disable(void __iomem *reg)
+{
+	am33xx_module_set(MODULEMODE_SWCTRL, reg);
+}
+
+static void __sram_am33xx am33xx_module_disable_wait(void __iomem *reg)
+{
+	u32 val;
+	am33xx_module_disable(reg);
+	do {
+		val = am33xx_cm_read(reg) & AM33XX_IDLEST_MASK;
+		val >>= AM33XX_IDLEST_SHIFT;
+	} while (val != CLKCTRL_IDLEST_DISABLED);
+}
+
+static void __sram_am33xx am33xx_module_enable(void __iomem *reg)
+{
+	am33xx_module_set(0, reg);
+}
+
+static void __sram_am33xx am33xx_module_enable_wait(void __iomem *reg)
+{
+	u32 val;
+	am33xx_module_enable(reg);
+	do {
+		val = am33xx_cm_read(reg) & AM33XX_IDLEST_MASK;
+		val >>= AM33XX_IDLEST_SHIFT;
+	} while (val != CLKCTRL_IDLEST_FUNCTIONAL);
+}
+
+static void __sram_am33xx noinline am33xx_enable_sr(void)
+{
+	u32 val;
+
+	emif_regs.sdcfg = emif_read(EMIF_SDRAM_CONFIG);
+	val = emif_read(EMIF_POWER_MANAGEMENT_CONTROL);
+	val &= ~SR_TIM_MASK;
+	val |= 0xa << SR_TIM_SHIFT;
+	emif_write(val, EMIF_POWER_MANAGEMENT_CONTROL);
+	emif_write(val, EMIF_POWER_MANAGEMENT_CTRL_SHDW);
+
+	__raw_readl(dram_sync_addr);
+	val &= ~LP_MODE_MASK;
+	val |= EMIF_LP_MODE_SELF_REFRESH << LP_MODE_SHIFT;
+	emif_write(val, EMIF_POWER_MANAGEMENT_CONTROL);
+}
+
+static void __sram_am33xx noinline am33xx_disable_sr(void)
+{
+	u32 val;
+
+	val = emif_read(EMIF_POWER_MANAGEMENT_CONTROL);
+	val &= ~LP_MODE_MASK;
+	val |= EMIF_LP_MODE_DISABLE << LP_MODE_SHIFT;
+	emif_write(val, EMIF_POWER_MANAGEMENT_CONTROL);
+	emif_write(val, EMIF_POWER_MANAGEMENT_CTRL_SHDW);
+
+	/*
+	 * A write to SDRAM CONFIG register triggers
+	 * an init sequence and hence it must be done
+	 * at the end for DDR2
+	 */
+	emif_write(emif_regs.sdcfg, EMIF_SDRAM_CONFIG);
+}
+
+static void __sram_am33xx noinline am33xx_emif_save(void)
+{
+	emif_regs.ref_ctrl = emif_read(EMIF_SDRAM_REFRESH_CONTROL);
+	emif_regs.timing1 = emif_read(EMIF_SDRAM_TIMING_1);
+	emif_regs.timing2 = emif_read(EMIF_SDRAM_TIMING_2);
+	emif_regs.timing3 = emif_read(EMIF_SDRAM_TIMING_3);
+	emif_regs.pmcr = emif_read(EMIF_POWER_MANAGEMENT_CONTROL);
+	emif_regs.pmcr_shdw = emif_read(EMIF_POWER_MANAGEMENT_CTRL_SHDW);
+	emif_regs.zqcfg = emif_read(EMIF_SDRAM_OUTPUT_IMPEDANCE_CALIBRATION_CONFIG);
+	emif_regs.rd_lat = emif_read(EMIF_DDR_PHY_CTRL_1);
+}
+
+static void __sram_am33xx noinline am33xx_emif_restore(void)
+{
+	emif_write(emif_regs.rd_lat, EMIF_DDR_PHY_CTRL_1);
+	emif_write(emif_regs.rd_lat, EMIF_DDR_PHY_CTRL_1_SHDW);
+	emif_write(emif_regs.timing1, EMIF_SDRAM_TIMING_1);
+	emif_write(emif_regs.timing1, EMIF_SDRAM_TIMING_1_SHDW);
+	emif_write(emif_regs.timing2, EMIF_SDRAM_TIMING_2);
+	emif_write(emif_regs.timing2, EMIF_SDRAM_TIMING_2_SHDW);
+	emif_write(emif_regs.timing3, EMIF_SDRAM_TIMING_3);
+	emif_write(emif_regs.timing3, EMIF_SDRAM_TIMING_3_SHDW);
+	emif_write(emif_regs.ref_ctrl, EMIF_SDRAM_REFRESH_CONTROL);
+	emif_write(emif_regs.ref_ctrl, EMIF_SDRAM_REFRESH_CTRL_SHDW);
+	emif_write(emif_regs.pmcr, EMIF_POWER_MANAGEMENT_CONTROL);
+	emif_write(emif_regs.pmcr_shdw, EMIF_POWER_MANAGEMENT_CTRL_SHDW);
+	/*
+	 * Output impedence calib needed only for DDR3
+	 * but since the initial state of this will be
+	 * disabled for DDR2 no harm in restoring the
+	 * old configuration
+	 */
+	emif_write(emif_regs.zqcfg, EMIF_SDRAM_OUTPUT_IMPEDANCE_CALIBRATION_CONFIG);
+	/* Write to SDRAM_CONFIG only for DDR2 */
+	if (wfi_flags & WFI_MEM_TYPE_DDR2)
+		emif_write(emif_regs.sdcfg, EMIF_SDRAM_CONFIG);
+}
+
+static int __sram_am33xx am33xx_wfi_sram(void *data)
+{
+	wfi_flags = (unsigned long) data;
+	emif_base = emif_virt_base;
+	cm_offset = 0;
+
+	if (wfi_flags & WFI_SELF_REFRESH) {
+		/*
+		 * Flush all data from the L1 data cache before disabling
+		 * SCTLR.C bit.
+		 */
+		flush_dcache_all();
+		/*
+		 * Clear the SCTLR.C bit to prevent further data cache
+		 * allocation. Clearing SCTLR.C would make all the data
+		 * accesses strongly ordered and would not hit the cache.
+		 */
+		set_cr(get_cr() & ~CR_C);
+		/*
+		 * Invalidate L1 data cache. Even though only invalidate is
+		 * necessary exported flush API is used here. Doing clean
+		 * on already clean cache would be almost NOP.
+		 */
+		flush_dcache_all();
+
+		/* Only necessary if PER is losing context */
+		if (wfi_flags & WFI_SAVE_EMIF)
+			am33xx_emif_save();
+
+		am33xx_enable_sr();
+		am33xx_module_disable_wait(AM33XX_CM_PER_EMIF_CLKCTRL);
+	}
+
+
+	/*
+	 * For the MPU WFI to be registered as an interrupt
+	 * to WKUP_M3, MPU_CLKCTRL.MODULEMODE needs to be set
+	 * to DISABLED
+	 */
+	if (wfi_flags & WFI_WAKE_M3)
+		am33xx_module_disable(AM33XX_CM_MPU_MPU_CLKCTRL);
+
+	__asm__ __volatile__ (
+		/*
+		 * Execute an ISB instruction to ensure that all of the
+		 * CP15 register changes have been committed.
+		 */
+		"isb\n\t"
+		/*
+		 * Execute a barrier instruction to ensure that all cache,
+		 * TLB and branch predictor maintenance operations issued
+		 * have completed.
+		 */
+		"dsb\n\t"
+		"dmb\n\t"
+		/*
+		 * Execute a WFI instruction and wait until the
+		 * STANDBYWFI output is asserted to indicate that the
+		 * CPU is in idle and low power state. CPU can specualatively
+		 * prefetch the instructions so add NOPs after WFI. Thirteen
+		 * NOPs as per Cortex-A8 pipeline.
+		 */
+		"wfi\n\t"
+		".rept 13\n\t"
+		"nop\n\t"
+		".endr" : : : "memory");
+
+	/* We come here in case of an abort due to a late interrupt */
+
+	if (wfi_flags & WFI_WAKE_M3)
+		am33xx_module_enable(AM33XX_CM_MPU_MPU_CLKCTRL);
+
+	if (wfi_flags & WFI_SELF_REFRESH) {
+		am33xx_module_enable_wait(AM33XX_CM_PER_EMIF_CLKCTRL);
+		am33xx_disable_sr();
+		/* Set SCTLR.C bit to allow data cache allocation */
+		set_cr(get_cr() | CR_C);
+	}
+
+	/* Let the suspend code know about the abort */
+	return 1;
+}
+
+int am33xx_suspend(long unsigned int flags)
+{
+	return call_with_stack(kern_to_sram(&am33xx_wfi_sram), (void *) flags,
+		kern_to_sram((char *) sram_stack) + ARRAY_SIZE(sram_stack));
+}
+
+static void __sram_am33xx __noreturn noinline am33xx_resume(void)
+{
+	emif_base = (void *) AM33XX_EMIF_BASE;
+	/* Undo the offset built into the register defines */
+	cm_offset = -AM33XX_L4_WK_IO_OFFSET;
+
+	if (wfi_flags & WFI_SELF_REFRESH) {
+		am33xx_module_enable_wait(AM33XX_CM_PER_EMIF_CLKCTRL);
+		if (wfi_flags & WFI_SAVE_EMIF)
+			am33xx_emif_restore();
+		else
+			am33xx_disable_sr();
+	}
+
+	/* We are back. Branch to the common CPU resume routine */
+	__cpu_resume_phys();
+}
+
+ARM_SRAM_RESUME(am33xx, am33xx_resume, sram_stack + ARRAY_SIZE(sram_stack));
+
+void am33xx_sram_init(void __iomem *emif_base, void __iomem *dram_sync)
+{
+	*kern_to_sram(&__abs_v7_flush_dcache_all) = v7_flush_dcache_all;
+	*kern_to_sram(&__cpu_resume_phys) = (void *) virt_to_phys(cpu_resume);
+	*kern_to_sram(&emif_virt_base) = emif_base;
+	*kern_to_sram(&dram_sync_addr) = dram_sync;
+}
diff --git a/arch/arm/mach-omap2/sram.c b/arch/arm/mach-omap2/sram.c
index 303c562..97b7f77 100644
--- a/arch/arm/mach-omap2/sram.c
+++ b/arch/arm/mach-omap2/sram.c
@@ -285,19 +285,6 @@ static inline int omap34xx_sram_init(void)
 }
 #endif /* CONFIG_ARCH_OMAP3 */
 
-#ifdef CONFIG_SOC_AM33XX
-static inline int am33xx_sram_init(void)
-{
-	am33xx_push_sram_idle();
-	return 0;
-}
-#else
-static inline int am33xx_sram_init(void)
-{
-	return 0;
-}
-#endif
-
 int __init omap_sram_init(void)
 {
 	omap_detect_sram();
@@ -307,8 +294,6 @@ int __init omap_sram_init(void)
 		omap242x_sram_init();
 	else if (cpu_is_omap2430())
 		omap243x_sram_init();
-	else if (soc_is_am33xx())
-		am33xx_sram_init();
 	else if (cpu_is_omap34xx())
 		omap34xx_sram_init();
 
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [RFC 3/4] Misc: SRAM: Hack for allowing executable code in SRAM.
  2013-09-03 16:44 ` [RFC 3/4] Misc: SRAM: Hack for allowing executable code in SRAM Russ Dill
@ 2013-09-04 18:06     ` Tony Lindgren
  0 siblings, 0 replies; 38+ messages in thread
From: Tony Lindgren @ 2013-09-04 18:06 UTC (permalink / raw)
  To: Russ Dill
  Cc: linux-arm-kernel, linux-omap, Philipp Zabel, Shawn Guo,
	Grant Likely, mans

* Russ Dill <Russ.Dill@ti.com> [130903 09:52]:
> The generic SRAM mechanism does not ioremap memory in a
> manner that allows code to be executed from SRAM. There is
> currently no generic way to request ioremap to return a
> memory area with execution allowed.
> 
> Insert a temporary hack for proof of concept on ARM.
> 
> Signed-off-by: Russ Dill <Russ.Dill@ti.com>
> ---
>  drivers/misc/sram.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/sram.c b/drivers/misc/sram.c
> index 08baaab..e059a23 100644
> --- a/drivers/misc/sram.c
> +++ b/drivers/misc/sram.c
> @@ -31,6 +31,7 @@
>  #include <linux/genalloc.h>
>  #include <linux/sram.h>
>  #include <asm-generic/cacheflush.h>
> +#include <asm/io.h>
>  
>  #define SRAM_GRANULARITY	32
>  
> @@ -138,7 +139,7 @@ static int sram_probe(struct platform_device *pdev)
>  	int ret;
>  
>  	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> -	virt_base = devm_ioremap_resource(&pdev->dev, res);
> +	virt_base = __arm_ioremap_exec(res->start, resource_size(res), false);
>  	if (IS_ERR(virt_base))
>  		return PTR_ERR(virt_base);

You can get rid of this hack by defining ioremap_exec in
include/asm-generic/io.h the same way as ioremap_nocache
is done:

#ifndef ioremap_exec
#define ioremap_exec ioremap
#endif

Then the arch that need ioremap_exec can define and
implement it. Needs to be reviewed on LKML naturally :)

Regards,

Tony

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 3/4] Misc: SRAM: Hack for allowing executable code in SRAM.
@ 2013-09-04 18:06     ` Tony Lindgren
  0 siblings, 0 replies; 38+ messages in thread
From: Tony Lindgren @ 2013-09-04 18:06 UTC (permalink / raw)
  To: linux-arm-kernel

* Russ Dill <Russ.Dill@ti.com> [130903 09:52]:
> The generic SRAM mechanism does not ioremap memory in a
> manner that allows code to be executed from SRAM. There is
> currently no generic way to request ioremap to return a
> memory area with execution allowed.
> 
> Insert a temporary hack for proof of concept on ARM.
> 
> Signed-off-by: Russ Dill <Russ.Dill@ti.com>
> ---
>  drivers/misc/sram.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/sram.c b/drivers/misc/sram.c
> index 08baaab..e059a23 100644
> --- a/drivers/misc/sram.c
> +++ b/drivers/misc/sram.c
> @@ -31,6 +31,7 @@
>  #include <linux/genalloc.h>
>  #include <linux/sram.h>
>  #include <asm-generic/cacheflush.h>
> +#include <asm/io.h>
>  
>  #define SRAM_GRANULARITY	32
>  
> @@ -138,7 +139,7 @@ static int sram_probe(struct platform_device *pdev)
>  	int ret;
>  
>  	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> -	virt_base = devm_ioremap_resource(&pdev->dev, res);
> +	virt_base = __arm_ioremap_exec(res->start, resource_size(res), false);
>  	if (IS_ERR(virt_base))
>  		return PTR_ERR(virt_base);

You can get rid of this hack by defining ioremap_exec in
include/asm-generic/io.h the same way as ioremap_nocache
is done:

#ifndef ioremap_exec
#define ioremap_exec ioremap
#endif

Then the arch that need ioremap_exec can define and
implement it. Needs to be reviewed on LKML naturally :)

Regards,

Tony

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/4] Create infrastructure for running C code from SRAM.
  2013-09-03 16:44 ` Russ Dill
@ 2013-09-04 19:52   ` Emilio López
  -1 siblings, 0 replies; 38+ messages in thread
From: Emilio López @ 2013-09-04 19:52 UTC (permalink / raw)
  To: Russ Dill
  Cc: linux-arm-kernel, Grant Likely, linux-omap, mans, Philipp Zabel,
	Shawn Guo

Hi,

El 03/09/13 13:44, Russ Dill escribió:
> This RFC patchset explores an idea for loading C code into SRAM.
> Currently, all the code I'm aware of that needs to run from SRAM is written
> in assembler. The most common reason for code needing to run from SRAM is
> that the memory controller is being disabled/ enabled or is already
> disabled. arch/arm has by far the most examples, but code also exists in
> powerpc and sh.
>
> The code is written in asm for two primary reasons. First so that markers
> can be put in indicating the size of the code they it can be copied. Second
> so that data can be placed along with text and accessed in a position
> independant manner.
>
> SRAM handling code is in the process of being moved from arch directories
> into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
> patchset builds on that, including the limitation that the SRAM address is
> not known at compile time. Because the SRAM address is not known at compile
> time, the code that runs from SRAM must be compiled with -fPIC. Even if
> the code were loaded to a fixed virtual address, portions of the code must
> often be run with the MMU disabled.
>
> The general idea is that for each SRAM user (such as an SoC specific
> suspend/resume mechanism) to create a group of sections. The section group
> is created with a single macro for each user, but end up looking like this:
>
> .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
>    __sram_am33xx_start = .;
>    *(.sram.am33xx.*)
>    __sram_am33xx_end = .;
> }
>
> Any data or functions that should be copied to SRAM for this use should be
> maked with an appropriate __section() attribute. A helper is then added for
> translating between the original kernel symbol, and the address of that
> function or variable once it has been copied into SRAM. Once control is
> passed to a function within the SRAM section grouping, it can access any
> variables or functions within that same SRAM section grouping without
> translation.
>
> [1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4984c6
> [2] http://www.spinics.net/lists/linux-omap/msg96504.html
>
> Russ Dill (4):
>    Misc: SRAM: Create helpers for loading C code into SRAM
>    ARM: SRAM: Add macro for generating SRAM resume trampoline
>    Misc: SRAM: Hack for allowing executable code in SRAM.
>    ARM: AM33XX: Move suspend/resume assembly to C
>
>   arch/arm/include/asm/suspend.h    |  14 ++
>   arch/arm/kernel/vmlinux.lds.S     |   2 +
>   arch/arm/mach-omap2/Makefile      |   2 +-
>   arch/arm/mach-omap2/pm33xx.c      |  50 ++---
>   arch/arm/mach-omap2/pm33xx.h      |  23 +--
>   arch/arm/mach-omap2/sleep33xx.S   | 394 --------------------------------------
>   arch/arm/mach-omap2/sleep33xx.c   | 309 ++++++++++++++++++++++++++++++
>   arch/arm/mach-omap2/sram.c        |  15 --
>   drivers/misc/sram.c               | 106 +++++++++-
>   include/asm-generic/vmlinux.lds.h |   7 +
>   include/linux/sram.h              |  44 +++++
>   11 files changed, 509 insertions(+), 457 deletions(-)
>   delete mode 100644 arch/arm/mach-omap2/sleep33xx.S
>   create mode 100644 arch/arm/mach-omap2/sleep33xx.c
>   create mode 100644 include/linux/sram.h
>

I'm interested in this, as I'll need something like it for 
suspend/resume on sunxi. Unfortunately, I only got the cover letter on 
my email, and the web lakml archives don't seem to have the rest either. 
After a bit of searching on Google I found a copy on linux-omap[1], but 
it'd be great if I didn't have to hunt for the patches :)

I only have one comment, from a quick look at the code

+       memcpy((void *) chunk->addr, data, sz);
+       flush_icache_range(chunk->addr, chunk->addr + sz);

How would that behave on Thumb-2 mode? I believe that's the reason why 
fncpy() got introduced[2] some time ago.

Thanks for working on this!

Emilio

[1] http://www.mail-archive.com/linux-omap@vger.kernel.org/msg94995.html
[2] http://www.spinics.net/lists/arm-kernel/msg110706.html
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 0/4] Create infrastructure for running C code from SRAM.
@ 2013-09-04 19:52   ` Emilio López
  0 siblings, 0 replies; 38+ messages in thread
From: Emilio López @ 2013-09-04 19:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

El 03/09/13 13:44, Russ Dill escribi?:
> This RFC patchset explores an idea for loading C code into SRAM.
> Currently, all the code I'm aware of that needs to run from SRAM is written
> in assembler. The most common reason for code needing to run from SRAM is
> that the memory controller is being disabled/ enabled or is already
> disabled. arch/arm has by far the most examples, but code also exists in
> powerpc and sh.
>
> The code is written in asm for two primary reasons. First so that markers
> can be put in indicating the size of the code they it can be copied. Second
> so that data can be placed along with text and accessed in a position
> independant manner.
>
> SRAM handling code is in the process of being moved from arch directories
> into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
> patchset builds on that, including the limitation that the SRAM address is
> not known at compile time. Because the SRAM address is not known at compile
> time, the code that runs from SRAM must be compiled with -fPIC. Even if
> the code were loaded to a fixed virtual address, portions of the code must
> often be run with the MMU disabled.
>
> The general idea is that for each SRAM user (such as an SoC specific
> suspend/resume mechanism) to create a group of sections. The section group
> is created with a single macro for each user, but end up looking like this:
>
> .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
>    __sram_am33xx_start = .;
>    *(.sram.am33xx.*)
>    __sram_am33xx_end = .;
> }
>
> Any data or functions that should be copied to SRAM for this use should be
> maked with an appropriate __section() attribute. A helper is then added for
> translating between the original kernel symbol, and the address of that
> function or variable once it has been copied into SRAM. Once control is
> passed to a function within the SRAM section grouping, it can access any
> variables or functions within that same SRAM section grouping without
> translation.
>
> [1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4984c6
> [2] http://www.spinics.net/lists/linux-omap/msg96504.html
>
> Russ Dill (4):
>    Misc: SRAM: Create helpers for loading C code into SRAM
>    ARM: SRAM: Add macro for generating SRAM resume trampoline
>    Misc: SRAM: Hack for allowing executable code in SRAM.
>    ARM: AM33XX: Move suspend/resume assembly to C
>
>   arch/arm/include/asm/suspend.h    |  14 ++
>   arch/arm/kernel/vmlinux.lds.S     |   2 +
>   arch/arm/mach-omap2/Makefile      |   2 +-
>   arch/arm/mach-omap2/pm33xx.c      |  50 ++---
>   arch/arm/mach-omap2/pm33xx.h      |  23 +--
>   arch/arm/mach-omap2/sleep33xx.S   | 394 --------------------------------------
>   arch/arm/mach-omap2/sleep33xx.c   | 309 ++++++++++++++++++++++++++++++
>   arch/arm/mach-omap2/sram.c        |  15 --
>   drivers/misc/sram.c               | 106 +++++++++-
>   include/asm-generic/vmlinux.lds.h |   7 +
>   include/linux/sram.h              |  44 +++++
>   11 files changed, 509 insertions(+), 457 deletions(-)
>   delete mode 100644 arch/arm/mach-omap2/sleep33xx.S
>   create mode 100644 arch/arm/mach-omap2/sleep33xx.c
>   create mode 100644 include/linux/sram.h
>

I'm interested in this, as I'll need something like it for 
suspend/resume on sunxi. Unfortunately, I only got the cover letter on 
my email, and the web lakml archives don't seem to have the rest either. 
After a bit of searching on Google I found a copy on linux-omap[1], but 
it'd be great if I didn't have to hunt for the patches :)

I only have one comment, from a quick look at the code

+       memcpy((void *) chunk->addr, data, sz);
+       flush_icache_range(chunk->addr, chunk->addr + sz);

How would that behave on Thumb-2 mode? I believe that's the reason why 
fncpy() got introduced[2] some time ago.

Thanks for working on this!

Emilio

[1] http://www.mail-archive.com/linux-omap at vger.kernel.org/msg94995.html
[2] http://www.spinics.net/lists/arm-kernel/msg110706.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/4] Create infrastructure for running C code from SRAM.
  2013-09-04 19:52   ` Emilio López
@ 2013-09-04 21:47     ` Russ Dill
  -1 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-04 21:47 UTC (permalink / raw)
  To: Emilio López
  Cc: mans, Shawn Guo, Grant Likely, Philipp Zabel, linux-omap,
	Linux ARM Kernel List

On Wed, Sep 4, 2013 at 12:52 PM, Emilio López <emilio@elopez.com.ar> wrote:
> Hi,
>
> El 03/09/13 13:44, Russ Dill escribió:
>
>> This RFC patchset explores an idea for loading C code into SRAM.
>> Currently, all the code I'm aware of that needs to run from SRAM is
>> written
>> in assembler. The most common reason for code needing to run from SRAM is
>> that the memory controller is being disabled/ enabled or is already
>> disabled. arch/arm has by far the most examples, but code also exists in
>> powerpc and sh.
>>
>> The code is written in asm for two primary reasons. First so that markers
>> can be put in indicating the size of the code they it can be copied.
>> Second
>> so that data can be placed along with text and accessed in a position
>> independant manner.
>>
>> SRAM handling code is in the process of being moved from arch directories
>> into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
>> patchset builds on that, including the limitation that the SRAM address is
>> not known at compile time. Because the SRAM address is not known at
>> compile
>> time, the code that runs from SRAM must be compiled with -fPIC. Even if
>> the code were loaded to a fixed virtual address, portions of the code must
>> often be run with the MMU disabled.
>>
>> The general idea is that for each SRAM user (such as an SoC specific
>> suspend/resume mechanism) to create a group of sections. The section group
>> is created with a single macro for each user, but end up looking like
>> this:
>>
>> .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
>>    __sram_am33xx_start = .;
>>    *(.sram.am33xx.*)
>>    __sram_am33xx_end = .;
>> }
>>
>> Any data or functions that should be copied to SRAM for this use should be
>> maked with an appropriate __section() attribute. A helper is then added
>> for
>> translating between the original kernel symbol, and the address of that
>> function or variable once it has been copied into SRAM. Once control is
>> passed to a function within the SRAM section grouping, it can access any
>> variables or functions within that same SRAM section grouping without
>> translation.
>>
>> [1]
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4984c6
>> [2] http://www.spinics.net/lists/linux-omap/msg96504.html
>>
>> Russ Dill (4):
>>    Misc: SRAM: Create helpers for loading C code into SRAM
>>    ARM: SRAM: Add macro for generating SRAM resume trampoline
>>    Misc: SRAM: Hack for allowing executable code in SRAM.
>>    ARM: AM33XX: Move suspend/resume assembly to C
>>
>>   arch/arm/include/asm/suspend.h    |  14 ++
>>   arch/arm/kernel/vmlinux.lds.S     |   2 +
>>   arch/arm/mach-omap2/Makefile      |   2 +-
>>   arch/arm/mach-omap2/pm33xx.c      |  50 ++---
>>   arch/arm/mach-omap2/pm33xx.h      |  23 +--
>>   arch/arm/mach-omap2/sleep33xx.S   | 394
>> --------------------------------------
>>   arch/arm/mach-omap2/sleep33xx.c   | 309 ++++++++++++++++++++++++++++++
>>   arch/arm/mach-omap2/sram.c        |  15 --
>>   drivers/misc/sram.c               | 106 +++++++++-
>>   include/asm-generic/vmlinux.lds.h |   7 +
>>   include/linux/sram.h              |  44 +++++
>>   11 files changed, 509 insertions(+), 457 deletions(-)
>>   delete mode 100644 arch/arm/mach-omap2/sleep33xx.S
>>   create mode 100644 arch/arm/mach-omap2/sleep33xx.c
>>   create mode 100644 include/linux/sram.h
>>
>
> I'm interested in this, as I'll need something like it for suspend/resume on
> sunxi. Unfortunately, I only got the cover letter on my email, and the web
> lakml archives don't seem to have the rest either. After a bit of searching
> on Google I found a copy on linux-omap[1], but it'd be great if I didn't
> have to hunt for the patches :)

The mails to arm-kernel are "awaiting moderation".

> I only have one comment, from a quick look at the code
>
> +       memcpy((void *) chunk->addr, data, sz);
> +       flush_icache_range(chunk->addr, chunk->addr + sz);
>
> How would that behave on Thumb-2 mode? I believe that's the reason why
> fncpy() got introduced[2] some time ago.
>
> Thanks for working on this!

I think this is already taken care of by the way sram.c is using
genalloc. The allocation returned should be aligned to 32 bytes. The
thumb bit shouldn't be an issue as code is copied based on the start
and end makers made by the linker. I may need to add .align statements
in the linker so that the start and end markers for the copied code
are aligned to at least 8 bytes.

Thanks!

> Emilio
>
> [1] http://www.mail-archive.com/linux-omap@vger.kernel.org/msg94995.html
> [2] http://www.spinics.net/lists/arm-kernel/msg110706.html
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 0/4] Create infrastructure for running C code from SRAM.
@ 2013-09-04 21:47     ` Russ Dill
  0 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-04 21:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 4, 2013 at 12:52 PM, Emilio L?pez <emilio@elopez.com.ar> wrote:
> Hi,
>
> El 03/09/13 13:44, Russ Dill escribi?:
>
>> This RFC patchset explores an idea for loading C code into SRAM.
>> Currently, all the code I'm aware of that needs to run from SRAM is
>> written
>> in assembler. The most common reason for code needing to run from SRAM is
>> that the memory controller is being disabled/ enabled or is already
>> disabled. arch/arm has by far the most examples, but code also exists in
>> powerpc and sh.
>>
>> The code is written in asm for two primary reasons. First so that markers
>> can be put in indicating the size of the code they it can be copied.
>> Second
>> so that data can be placed along with text and accessed in a position
>> independant manner.
>>
>> SRAM handling code is in the process of being moved from arch directories
>> into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
>> patchset builds on that, including the limitation that the SRAM address is
>> not known at compile time. Because the SRAM address is not known at
>> compile
>> time, the code that runs from SRAM must be compiled with -fPIC. Even if
>> the code were loaded to a fixed virtual address, portions of the code must
>> often be run with the MMU disabled.
>>
>> The general idea is that for each SRAM user (such as an SoC specific
>> suspend/resume mechanism) to create a group of sections. The section group
>> is created with a single macro for each user, but end up looking like
>> this:
>>
>> .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
>>    __sram_am33xx_start = .;
>>    *(.sram.am33xx.*)
>>    __sram_am33xx_end = .;
>> }
>>
>> Any data or functions that should be copied to SRAM for this use should be
>> maked with an appropriate __section() attribute. A helper is then added
>> for
>> translating between the original kernel symbol, and the address of that
>> function or variable once it has been copied into SRAM. Once control is
>> passed to a function within the SRAM section grouping, it can access any
>> variables or functions within that same SRAM section grouping without
>> translation.
>>
>> [1]
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4984c6
>> [2] http://www.spinics.net/lists/linux-omap/msg96504.html
>>
>> Russ Dill (4):
>>    Misc: SRAM: Create helpers for loading C code into SRAM
>>    ARM: SRAM: Add macro for generating SRAM resume trampoline
>>    Misc: SRAM: Hack for allowing executable code in SRAM.
>>    ARM: AM33XX: Move suspend/resume assembly to C
>>
>>   arch/arm/include/asm/suspend.h    |  14 ++
>>   arch/arm/kernel/vmlinux.lds.S     |   2 +
>>   arch/arm/mach-omap2/Makefile      |   2 +-
>>   arch/arm/mach-omap2/pm33xx.c      |  50 ++---
>>   arch/arm/mach-omap2/pm33xx.h      |  23 +--
>>   arch/arm/mach-omap2/sleep33xx.S   | 394
>> --------------------------------------
>>   arch/arm/mach-omap2/sleep33xx.c   | 309 ++++++++++++++++++++++++++++++
>>   arch/arm/mach-omap2/sram.c        |  15 --
>>   drivers/misc/sram.c               | 106 +++++++++-
>>   include/asm-generic/vmlinux.lds.h |   7 +
>>   include/linux/sram.h              |  44 +++++
>>   11 files changed, 509 insertions(+), 457 deletions(-)
>>   delete mode 100644 arch/arm/mach-omap2/sleep33xx.S
>>   create mode 100644 arch/arm/mach-omap2/sleep33xx.c
>>   create mode 100644 include/linux/sram.h
>>
>
> I'm interested in this, as I'll need something like it for suspend/resume on
> sunxi. Unfortunately, I only got the cover letter on my email, and the web
> lakml archives don't seem to have the rest either. After a bit of searching
> on Google I found a copy on linux-omap[1], but it'd be great if I didn't
> have to hunt for the patches :)

The mails to arm-kernel are "awaiting moderation".

> I only have one comment, from a quick look at the code
>
> +       memcpy((void *) chunk->addr, data, sz);
> +       flush_icache_range(chunk->addr, chunk->addr + sz);
>
> How would that behave on Thumb-2 mode? I believe that's the reason why
> fncpy() got introduced[2] some time ago.
>
> Thanks for working on this!

I think this is already taken care of by the way sram.c is using
genalloc. The allocation returned should be aligned to 32 bytes. The
thumb bit shouldn't be an issue as code is copied based on the start
and end makers made by the linker. I may need to add .align statements
in the linker so that the start and end markers for the copied code
are aligned to at least 8 bytes.

Thanks!

> Emilio
>
> [1] http://www.mail-archive.com/linux-omap at vger.kernel.org/msg94995.html
> [2] http://www.spinics.net/lists/arm-kernel/msg110706.html
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/4] Create infrastructure for running C code from SRAM.
  2013-09-04 21:47     ` Russ Dill
@ 2013-09-06 11:02       ` Sekhar Nori
  -1 siblings, 0 replies; 38+ messages in thread
From: Sekhar Nori @ 2013-09-06 11:02 UTC (permalink / raw)
  To: Russ Dill
  Cc: mans, Emilio López, Shawn Guo, Grant Likely, Philipp Zabel,
	linux-omap, Linux ARM Kernel List

On Thursday 05 September 2013 03:17 AM, Russ Dill wrote:
> On Wed, Sep 4, 2013 at 12:52 PM, Emilio López <emilio@elopez.com.ar> wrote:

>> I'm interested in this, as I'll need something like it for suspend/resume on
>> sunxi. Unfortunately, I only got the cover letter on my email, and the web
>> lakml archives don't seem to have the rest either. After a bit of searching
>> on Google I found a copy on linux-omap[1], but it'd be great if I didn't
>> have to hunt for the patches :)
> 
> The mails to arm-kernel are "awaiting moderation".

That is because you have RFC in the subject line instead of PATCH RFC

Thanks,
Sekhar

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 0/4] Create infrastructure for running C code from SRAM.
@ 2013-09-06 11:02       ` Sekhar Nori
  0 siblings, 0 replies; 38+ messages in thread
From: Sekhar Nori @ 2013-09-06 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 05 September 2013 03:17 AM, Russ Dill wrote:
> On Wed, Sep 4, 2013 at 12:52 PM, Emilio L?pez <emilio@elopez.com.ar> wrote:

>> I'm interested in this, as I'll need something like it for suspend/resume on
>> sunxi. Unfortunately, I only got the cover letter on my email, and the web
>> lakml archives don't seem to have the rest either. After a bit of searching
>> on Google I found a copy on linux-omap[1], but it'd be great if I didn't
>> have to hunt for the patches :)
> 
> The mails to arm-kernel are "awaiting moderation".

That is because you have RFC in the subject line instead of PATCH RFC

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/4] Create infrastructure for running C code from SRAM.
  2013-09-03 16:44 ` Russ Dill
@ 2013-09-06 11:12   ` Russell King - ARM Linux
  -1 siblings, 0 replies; 38+ messages in thread
From: Russell King - ARM Linux @ 2013-09-06 11:12 UTC (permalink / raw)
  To: Russ Dill
  Cc: linux-arm-kernel, Grant Likely, linux-omap, mans, Philipp Zabel,
	Shawn Guo

On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote:
> SRAM handling code is in the process of being moved from arch directories
> into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
> patchset builds on that, including the limitation that the SRAM address is
> not known at compile time. Because the SRAM address is not known at compile
> time, the code that runs from SRAM must be compiled with -fPIC. Even if
> the code were loaded to a fixed virtual address, portions of the code must
> often be run with the MMU disabled.

What are you doing about the various gcc utility functions that may be
implicitly called from C code such as memcpy and memset?

> The general idea is that for each SRAM user (such as an SoC specific
> suspend/resume mechanism) to create a group of sections. The section group
> is created with a single macro for each user, but end up looking like this:
> 
> .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
>   __sram_am33xx_start = .;
>   *(.sram.am33xx.*)
>   __sram_am33xx_end = .;
> }
> 
> Any data or functions that should be copied to SRAM for this use should be
> maked with an appropriate __section() attribute. A helper is then added for
> translating between the original kernel symbol, and the address of that
> function or variable once it has been copied into SRAM. Once control is
> passed to a function within the SRAM section grouping, it can access any
> variables or functions within that same SRAM section grouping without
> translation.

What about the relocations which will need to be fixed up - eg, addresses
in the literal pool, the GOT table contents, etc?  You say nothing about
this.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 0/4] Create infrastructure for running C code from SRAM.
@ 2013-09-06 11:12   ` Russell King - ARM Linux
  0 siblings, 0 replies; 38+ messages in thread
From: Russell King - ARM Linux @ 2013-09-06 11:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote:
> SRAM handling code is in the process of being moved from arch directories
> into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
> patchset builds on that, including the limitation that the SRAM address is
> not known at compile time. Because the SRAM address is not known at compile
> time, the code that runs from SRAM must be compiled with -fPIC. Even if
> the code were loaded to a fixed virtual address, portions of the code must
> often be run with the MMU disabled.

What are you doing about the various gcc utility functions that may be
implicitly called from C code such as memcpy and memset?

> The general idea is that for each SRAM user (such as an SoC specific
> suspend/resume mechanism) to create a group of sections. The section group
> is created with a single macro for each user, but end up looking like this:
> 
> .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
>   __sram_am33xx_start = .;
>   *(.sram.am33xx.*)
>   __sram_am33xx_end = .;
> }
> 
> Any data or functions that should be copied to SRAM for this use should be
> maked with an appropriate __section() attribute. A helper is then added for
> translating between the original kernel symbol, and the address of that
> function or variable once it has been copied into SRAM. Once control is
> passed to a function within the SRAM section grouping, it can access any
> variables or functions within that same SRAM section grouping without
> translation.

What about the relocations which will need to be fixed up - eg, addresses
in the literal pool, the GOT table contents, etc?  You say nothing about
this.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/4] Create infrastructure for running C code from SRAM.
  2013-09-04 21:47     ` Russ Dill
@ 2013-09-06 11:14       ` Russell King - ARM Linux
  -1 siblings, 0 replies; 38+ messages in thread
From: Russell King - ARM Linux @ 2013-09-06 11:14 UTC (permalink / raw)
  To: Russ Dill
  Cc: Emilio López, mans, Shawn Guo, Grant Likely, Philipp Zabel,
	linux-omap, Linux ARM Kernel List

On Wed, Sep 04, 2013 at 02:47:51PM -0700, Russ Dill wrote:
> I think this is already taken care of by the way sram.c is using
> genalloc. The allocation returned should be aligned to 32 bytes. The
> thumb bit shouldn't be an issue as code is copied based on the start
> and end makers made by the linker. I may need to add .align statements
> in the linker so that the start and end markers for the copied code
> are aligned to at least 8 bytes.

I think you need to read up on what fncpy does... there's more to it
than just merely copying code at an appropriate alignment.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 0/4] Create infrastructure for running C code from SRAM.
@ 2013-09-06 11:14       ` Russell King - ARM Linux
  0 siblings, 0 replies; 38+ messages in thread
From: Russell King - ARM Linux @ 2013-09-06 11:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 04, 2013 at 02:47:51PM -0700, Russ Dill wrote:
> I think this is already taken care of by the way sram.c is using
> genalloc. The allocation returned should be aligned to 32 bytes. The
> thumb bit shouldn't be an issue as code is copied based on the start
> and end makers made by the linker. I may need to add .align statements
> in the linker so that the start and end markers for the copied code
> are aligned to at least 8 bytes.

I think you need to read up on what fncpy does... there's more to it
than just merely copying code at an appropriate alignment.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/4] Create infrastructure for running C code from SRAM.
  2013-09-06 11:12   ` Russell King - ARM Linux
@ 2013-09-06 16:19     ` Dave Martin
  -1 siblings, 0 replies; 38+ messages in thread
From: Dave Martin @ 2013-09-06 16:19 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: mans, Shawn Guo, Grant Likely, Russ Dill, Philipp Zabel,
	linux-omap, linux-arm-kernel

On Fri, Sep 06, 2013 at 12:12:21PM +0100, Russell King - ARM Linux wrote:
> On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote:
> > SRAM handling code is in the process of being moved from arch directories
> > into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
> > patchset builds on that, including the limitation that the SRAM address is
> > not known at compile time. Because the SRAM address is not known at compile
> > time, the code that runs from SRAM must be compiled with -fPIC. Even if
> > the code were loaded to a fixed virtual address, portions of the code must
> > often be run with the MMU disabled.
> 
> What are you doing about the various gcc utility functions that may be
> implicitly called from C code such as memcpy and memset?
> 
> > The general idea is that for each SRAM user (such as an SoC specific
> > suspend/resume mechanism) to create a group of sections. The section group
> > is created with a single macro for each user, but end up looking like this:
> > 
> > .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
> >   __sram_am33xx_start = .;
> >   *(.sram.am33xx.*)
> >   __sram_am33xx_end = .;
> > }
> > 
> > Any data or functions that should be copied to SRAM for this use should be
> > maked with an appropriate __section() attribute. A helper is then added for
> > translating between the original kernel symbol, and the address of that
> > function or variable once it has been copied into SRAM. Once control is
> > passed to a function within the SRAM section grouping, it can access any
> > variables or functions within that same SRAM section grouping without
> > translation.
> 
> What about the relocations which will need to be fixed up - eg, addresses
> in the literal pool, the GOT table contents, etc?  You say nothing about
> this.

I was also thinking about this, and there are more problems.

As well as what has already been mentioned:

 * Calls from inside the SRAM code to vmlinux (including lib1funcs etc.)
   will typically break, except on architectures where function calls are
   (absolute by default not ARM).

 * The compiler/linker won't detect unsafe constructs or code generation,
   because it assumes that anything built with -fPIC is going to be patched
   up later by ld.so or equivalent.

 * The GOT is generated by the linker, and is a single table.  Yet each
   SRAM blob needs to be able to refer to its own GOT entries position-
   independently.  Moving the blobs independently won't work.

In other words: -fPIC does not generate position-independent code.

It generates position-dependent code that is easier to move around than
non-fPIC code, but you still need a dynamic linker (or equivalent) to
make it all work.

There are various "correct" ways to handle this, the simplest of which
is probably to build each SRAM blob as a kernel module, embed the result
in the kernel somehow, and then use the module loader infrastructure
to handle fixing the module up to the right address.

But this is still likely to be overkill, given the small scale of the
SRAM code.

Restricting such code to carefully-written assembler (as now) may be
the more practical approrach, unless there's a good example of somewhere
that C code would provide a big benefit.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 0/4] Create infrastructure for running C code from SRAM.
@ 2013-09-06 16:19     ` Dave Martin
  0 siblings, 0 replies; 38+ messages in thread
From: Dave Martin @ 2013-09-06 16:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 06, 2013 at 12:12:21PM +0100, Russell King - ARM Linux wrote:
> On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote:
> > SRAM handling code is in the process of being moved from arch directories
> > into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
> > patchset builds on that, including the limitation that the SRAM address is
> > not known at compile time. Because the SRAM address is not known at compile
> > time, the code that runs from SRAM must be compiled with -fPIC. Even if
> > the code were loaded to a fixed virtual address, portions of the code must
> > often be run with the MMU disabled.
> 
> What are you doing about the various gcc utility functions that may be
> implicitly called from C code such as memcpy and memset?
> 
> > The general idea is that for each SRAM user (such as an SoC specific
> > suspend/resume mechanism) to create a group of sections. The section group
> > is created with a single macro for each user, but end up looking like this:
> > 
> > .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
> >   __sram_am33xx_start = .;
> >   *(.sram.am33xx.*)
> >   __sram_am33xx_end = .;
> > }
> > 
> > Any data or functions that should be copied to SRAM for this use should be
> > maked with an appropriate __section() attribute. A helper is then added for
> > translating between the original kernel symbol, and the address of that
> > function or variable once it has been copied into SRAM. Once control is
> > passed to a function within the SRAM section grouping, it can access any
> > variables or functions within that same SRAM section grouping without
> > translation.
> 
> What about the relocations which will need to be fixed up - eg, addresses
> in the literal pool, the GOT table contents, etc?  You say nothing about
> this.

I was also thinking about this, and there are more problems.

As well as what has already been mentioned:

 * Calls from inside the SRAM code to vmlinux (including lib1funcs etc.)
   will typically break, except on architectures where function calls are
   (absolute by default not ARM).

 * The compiler/linker won't detect unsafe constructs or code generation,
   because it assumes that anything built with -fPIC is going to be patched
   up later by ld.so or equivalent.

 * The GOT is generated by the linker, and is a single table.  Yet each
   SRAM blob needs to be able to refer to its own GOT entries position-
   independently.  Moving the blobs independently won't work.

In other words: -fPIC does not generate position-independent code.

It generates position-dependent code that is easier to move around than
non-fPIC code, but you still need a dynamic linker (or equivalent) to
make it all work.

There are various "correct" ways to handle this, the simplest of which
is probably to build each SRAM blob as a kernel module, embed the result
in the kernel somehow, and then use the module loader infrastructure
to handle fixing the module up to the right address.

But this is still likely to be overkill, given the small scale of the
SRAM code.

Restricting such code to carefully-written assembler (as now) may be
the more practical approrach, unless there's a good example of somewhere
that C code would provide a big benefit.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/4] Create infrastructure for running C code from SRAM.
  2013-09-06 11:14       ` Russell King - ARM Linux
@ 2013-09-06 16:40         ` Dave Martin
  -1 siblings, 0 replies; 38+ messages in thread
From: Dave Martin @ 2013-09-06 16:40 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: mans, Emilio López, Grant Likely, linux-omap, Russ Dill,
	Philipp Zabel, Shawn Guo, Linux ARM Kernel List

On Fri, Sep 06, 2013 at 12:14:08PM +0100, Russell King - ARM Linux wrote:
> On Wed, Sep 04, 2013 at 02:47:51PM -0700, Russ Dill wrote:
> > I think this is already taken care of by the way sram.c is using
> > genalloc. The allocation returned should be aligned to 32 bytes. The
> > thumb bit shouldn't be an issue as code is copied based on the start
> > and end makers made by the linker. I may need to add .align statements
> > in the linker so that the start and end markers for the copied code
> > are aligned to at least 8 bytes.
> 
> I think you need to read up on what fncpy does... there's more to it
> than just merely copying code at an appropriate alignment.

The technique of putting each loadable blob in a specific vmlinux
section, and then adjusting entry-point symbols by adding/subtracting
the appropriate offset, probably does work.

This relies on the functions' code alignment requirement being
honoured by both the vmlinux link map, and the allocator used to find
SRAM space to copy the functions to.

Searching the entire list of known blobs every time we want to convert
a symbol seems unnecessary though.  Surely the caller could know the
blob<->symbol mapping anyway?

One thing fncpy() doesn't provide is a way to copy groups of functions
that call each other, if vmlinux needs to know about any symbol other
than the one at the start.  We might need a better mechanism if that is
needed.

I actually wonder whether fncpy() contains a buglet, whereby
flush_icache_range() is used instead of coherent_kern_range().
The SRAM is probably not mapped cached, but at least a DSB would be
needed before flushing the relevant lines from the I-cache.

However, flush_icache_range() seems to be implemented by a call to
coherent_kern_range() anyway, so perhaps that's not a problem.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 0/4] Create infrastructure for running C code from SRAM.
@ 2013-09-06 16:40         ` Dave Martin
  0 siblings, 0 replies; 38+ messages in thread
From: Dave Martin @ 2013-09-06 16:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 06, 2013 at 12:14:08PM +0100, Russell King - ARM Linux wrote:
> On Wed, Sep 04, 2013 at 02:47:51PM -0700, Russ Dill wrote:
> > I think this is already taken care of by the way sram.c is using
> > genalloc. The allocation returned should be aligned to 32 bytes. The
> > thumb bit shouldn't be an issue as code is copied based on the start
> > and end makers made by the linker. I may need to add .align statements
> > in the linker so that the start and end markers for the copied code
> > are aligned to at least 8 bytes.
> 
> I think you need to read up on what fncpy does... there's more to it
> than just merely copying code at an appropriate alignment.

The technique of putting each loadable blob in a specific vmlinux
section, and then adjusting entry-point symbols by adding/subtracting
the appropriate offset, probably does work.

This relies on the functions' code alignment requirement being
honoured by both the vmlinux link map, and the allocator used to find
SRAM space to copy the functions to.

Searching the entire list of known blobs every time we want to convert
a symbol seems unnecessary though.  Surely the caller could know the
blob<->symbol mapping anyway?

One thing fncpy() doesn't provide is a way to copy groups of functions
that call each other, if vmlinux needs to know about any symbol other
than the one at the start.  We might need a better mechanism if that is
needed.

I actually wonder whether fncpy() contains a buglet, whereby
flush_icache_range() is used instead of coherent_kern_range().
The SRAM is probably not mapped cached, but at least a DSB would be
needed before flushing the relevant lines from the I-cache.

However, flush_icache_range() seems to be implemented by a call to
coherent_kern_range() anyway, so perhaps that's not a problem.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/4] Create infrastructure for running C code from SRAM.
  2013-09-06 11:14       ` Russell King - ARM Linux
@ 2013-09-06 18:40         ` Russ Dill
  -1 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-06 18:40 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: mans, Emilio López, Shawn Guo, Grant Likely, Philipp Zabel,
	linux-omap, Linux ARM Kernel List

On Fri, Sep 6, 2013 at 4:14 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Sep 04, 2013 at 02:47:51PM -0700, Russ Dill wrote:
>> I think this is already taken care of by the way sram.c is using
>> genalloc. The allocation returned should be aligned to 32 bytes. The
>> thumb bit shouldn't be an issue as code is copied based on the start
>> and end makers made by the linker. I may need to add .align statements
>> in the linker so that the start and end markers for the copied code
>> are aligned to at least 8 bytes.
>
> I think you need to read up on what fncpy does... there's more to it
> than just merely copying code at an appropriate alignment.

Yes, I need to add a pair of inlines that do the asm trickery to/from
function addresses. Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 0/4] Create infrastructure for running C code from SRAM.
@ 2013-09-06 18:40         ` Russ Dill
  0 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-06 18:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 6, 2013 at 4:14 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Sep 04, 2013 at 02:47:51PM -0700, Russ Dill wrote:
>> I think this is already taken care of by the way sram.c is using
>> genalloc. The allocation returned should be aligned to 32 bytes. The
>> thumb bit shouldn't be an issue as code is copied based on the start
>> and end makers made by the linker. I may need to add .align statements
>> in the linker so that the start and end markers for the copied code
>> are aligned to at least 8 bytes.
>
> I think you need to read up on what fncpy does... there's more to it
> than just merely copying code at an appropriate alignment.

Yes, I need to add a pair of inlines that do the asm trickery to/from
function addresses. Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/4] Create infrastructure for running C code from SRAM.
  2013-09-06 16:40         ` Dave Martin
@ 2013-09-06 18:50           ` Russ Dill
  -1 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-06 18:50 UTC (permalink / raw)
  To: Dave Martin
  Cc: Russell King - ARM Linux, mans, Emilio López, Grant Likely,
	linux-omap, Philipp Zabel, Shawn Guo, Linux ARM Kernel List

On Fri, Sep 6, 2013 at 9:40 AM, Dave Martin <Dave.Martin@arm.com> wrote:
> On Fri, Sep 06, 2013 at 12:14:08PM +0100, Russell King - ARM Linux wrote:
>> On Wed, Sep 04, 2013 at 02:47:51PM -0700, Russ Dill wrote:
>> > I think this is already taken care of by the way sram.c is using
>> > genalloc. The allocation returned should be aligned to 32 bytes. The
>> > thumb bit shouldn't be an issue as code is copied based on the start
>> > and end makers made by the linker. I may need to add .align statements
>> > in the linker so that the start and end markers for the copied code
>> > are aligned to at least 8 bytes.
>>
>> I think you need to read up on what fncpy does... there's more to it
>> than just merely copying code at an appropriate alignment.
>
> The technique of putting each loadable blob in a specific vmlinux
> section, and then adjusting entry-point symbols by adding/subtracting
> the appropriate offset, probably does work.
>
> This relies on the functions' code alignment requirement being
> honoured by both the vmlinux link map, and the allocator used to find
> SRAM space to copy the functions to.
>
> Searching the entire list of known blobs every time we want to convert
> a symbol seems unnecessary though.  Surely the caller could know the
> blob<->symbol mapping anyway?

It doesn't search the list of known blobs, only loaded blobs. On all
the platforms I'm aware of, only one SRAM section is loaded with code.

> One thing fncpy() doesn't provide is a way to copy groups of functions
> that call each other, if vmlinux needs to know about any symbol other
> than the one at the start.  We might need a better mechanism if that is
> needed.
>
>
> I actually wonder whether fncpy() contains a buglet, whereby
> flush_icache_range() is used instead of coherent_kern_range().
> The SRAM is probably not mapped cached, but at least a DSB would be
> needed before flushing the relevant lines from the I-cache.

It is mapped cached on most platforms.

> However, flush_icache_range() seems to be implemented by a call to
> coherent_kern_range() anyway, so perhaps that's not a problem.
>
> Cheers
> ---Dave
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 0/4] Create infrastructure for running C code from SRAM.
@ 2013-09-06 18:50           ` Russ Dill
  0 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-06 18:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 6, 2013 at 9:40 AM, Dave Martin <Dave.Martin@arm.com> wrote:
> On Fri, Sep 06, 2013 at 12:14:08PM +0100, Russell King - ARM Linux wrote:
>> On Wed, Sep 04, 2013 at 02:47:51PM -0700, Russ Dill wrote:
>> > I think this is already taken care of by the way sram.c is using
>> > genalloc. The allocation returned should be aligned to 32 bytes. The
>> > thumb bit shouldn't be an issue as code is copied based on the start
>> > and end makers made by the linker. I may need to add .align statements
>> > in the linker so that the start and end markers for the copied code
>> > are aligned to at least 8 bytes.
>>
>> I think you need to read up on what fncpy does... there's more to it
>> than just merely copying code at an appropriate alignment.
>
> The technique of putting each loadable blob in a specific vmlinux
> section, and then adjusting entry-point symbols by adding/subtracting
> the appropriate offset, probably does work.
>
> This relies on the functions' code alignment requirement being
> honoured by both the vmlinux link map, and the allocator used to find
> SRAM space to copy the functions to.
>
> Searching the entire list of known blobs every time we want to convert
> a symbol seems unnecessary though.  Surely the caller could know the
> blob<->symbol mapping anyway?

It doesn't search the list of known blobs, only loaded blobs. On all
the platforms I'm aware of, only one SRAM section is loaded with code.

> One thing fncpy() doesn't provide is a way to copy groups of functions
> that call each other, if vmlinux needs to know about any symbol other
> than the one at the start.  We might need a better mechanism if that is
> needed.
>
>
> I actually wonder whether fncpy() contains a buglet, whereby
> flush_icache_range() is used instead of coherent_kern_range().
> The SRAM is probably not mapped cached, but at least a DSB would be
> needed before flushing the relevant lines from the I-cache.

It is mapped cached on most platforms.

> However, flush_icache_range() seems to be implemented by a call to
> coherent_kern_range() anyway, so perhaps that's not a problem.
>
> Cheers
> ---Dave
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/4] Create infrastructure for running C code from SRAM.
  2013-09-06 11:12   ` Russell King - ARM Linux
@ 2013-09-06 19:32     ` Russ Dill
  -1 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-06 19:32 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: mans, Shawn Guo, Grant Likely, Philipp Zabel, linux-omap,
	Linux ARM Kernel List

On Fri, Sep 6, 2013 at 4:12 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote:
>> SRAM handling code is in the process of being moved from arch directories
>> into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
>> patchset builds on that, including the limitation that the SRAM address is
>> not known at compile time. Because the SRAM address is not known at compile
>> time, the code that runs from SRAM must be compiled with -fPIC. Even if
>> the code were loaded to a fixed virtual address, portions of the code must
>> often be run with the MMU disabled.
>
> What are you doing about the various gcc utility functions that may be
> implicitly called from C code such as memcpy and memset?

That would create a problem. Would '-ffreestanding' be the correct
flag to add? As far as the family of __aeabi_*, I need to add
documentation stating that on ARM, you can't divide, perform modulo,
and can't do 64 bit multiplications. I can then add a make rule that
will grep the symbol lists of .sram sections for ^__aeabi_. Is this
enough?

>> The general idea is that for each SRAM user (such as an SoC specific
>> suspend/resume mechanism) to create a group of sections. The section group
>> is created with a single macro for each user, but end up looking like this:
>>
>> .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
>>   __sram_am33xx_start = .;
>>   *(.sram.am33xx.*)
>>   __sram_am33xx_end = .;
>> }
>>
>> Any data or functions that should be copied to SRAM for this use should be
>> maked with an appropriate __section() attribute. A helper is then added for
>> translating between the original kernel symbol, and the address of that
>> function or variable once it has been copied into SRAM. Once control is
>> passed to a function within the SRAM section grouping, it can access any
>> variables or functions within that same SRAM section grouping without
>> translation.
>
> What about the relocations which will need to be fixed up - eg, addresses
> in the literal pool, the GOT table contents, etc?  You say nothing about
> this.

The C code would need to be written so that such accesses do not
occur. From functions that are in the sram text section, only accesses
to other sram sections in their group would be allowed. And above, a
compilation step could be added to make the compilation fail when such
things happen.

The direction you are going though is good, if this is fragile, it
doesn't put us in a better place then having fragile asm that at least
complies reliably. As I'm looking towards the future and platforms
that are similar to am335x in that they are placing more and more of
the PM state machine burden on the CPU, I'd really like to try to make
this happen.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 0/4] Create infrastructure for running C code from SRAM.
@ 2013-09-06 19:32     ` Russ Dill
  0 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-06 19:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 6, 2013 at 4:12 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote:
>> SRAM handling code is in the process of being moved from arch directories
>> into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
>> patchset builds on that, including the limitation that the SRAM address is
>> not known at compile time. Because the SRAM address is not known at compile
>> time, the code that runs from SRAM must be compiled with -fPIC. Even if
>> the code were loaded to a fixed virtual address, portions of the code must
>> often be run with the MMU disabled.
>
> What are you doing about the various gcc utility functions that may be
> implicitly called from C code such as memcpy and memset?

That would create a problem. Would '-ffreestanding' be the correct
flag to add? As far as the family of __aeabi_*, I need to add
documentation stating that on ARM, you can't divide, perform modulo,
and can't do 64 bit multiplications. I can then add a make rule that
will grep the symbol lists of .sram sections for ^__aeabi_. Is this
enough?

>> The general idea is that for each SRAM user (such as an SoC specific
>> suspend/resume mechanism) to create a group of sections. The section group
>> is created with a single macro for each user, but end up looking like this:
>>
>> .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
>>   __sram_am33xx_start = .;
>>   *(.sram.am33xx.*)
>>   __sram_am33xx_end = .;
>> }
>>
>> Any data or functions that should be copied to SRAM for this use should be
>> maked with an appropriate __section() attribute. A helper is then added for
>> translating between the original kernel symbol, and the address of that
>> function or variable once it has been copied into SRAM. Once control is
>> passed to a function within the SRAM section grouping, it can access any
>> variables or functions within that same SRAM section grouping without
>> translation.
>
> What about the relocations which will need to be fixed up - eg, addresses
> in the literal pool, the GOT table contents, etc?  You say nothing about
> this.

The C code would need to be written so that such accesses do not
occur. From functions that are in the sram text section, only accesses
to other sram sections in their group would be allowed. And above, a
compilation step could be added to make the compilation fail when such
things happen.

The direction you are going though is good, if this is fragile, it
doesn't put us in a better place then having fragile asm that at least
complies reliably. As I'm looking towards the future and platforms
that are similar to am335x in that they are placing more and more of
the PM state machine burden on the CPU, I'd really like to try to make
this happen.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/4] Create infrastructure for running C code from SRAM.
  2013-09-06 16:19     ` Dave Martin
@ 2013-09-06 19:42       ` Russ Dill
  -1 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-06 19:42 UTC (permalink / raw)
  To: Dave Martin
  Cc: Russell King - ARM Linux, Måns Rullgård, Shawn Guo,
	Grant Likely, Philipp Zabel, linux-omap, Linux ARM Kernel List

On Fri, Sep 6, 2013 at 9:19 AM, Dave Martin <Dave.Martin@arm.com> wrote:
> On Fri, Sep 06, 2013 at 12:12:21PM +0100, Russell King - ARM Linux wrote:
>> On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote:
>> > SRAM handling code is in the process of being moved from arch directories
>> > into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
>> > patchset builds on that, including the limitation that the SRAM address is
>> > not known at compile time. Because the SRAM address is not known at compile
>> > time, the code that runs from SRAM must be compiled with -fPIC. Even if
>> > the code were loaded to a fixed virtual address, portions of the code must
>> > often be run with the MMU disabled.
>>
>> What are you doing about the various gcc utility functions that may be
>> implicitly called from C code such as memcpy and memset?
>>
>> > The general idea is that for each SRAM user (such as an SoC specific
>> > suspend/resume mechanism) to create a group of sections. The section group
>> > is created with a single macro for each user, but end up looking like this:
>> >
>> > .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
>> >   __sram_am33xx_start = .;
>> >   *(.sram.am33xx.*)
>> >   __sram_am33xx_end = .;
>> > }
>> >
>> > Any data or functions that should be copied to SRAM for this use should be
>> > maked with an appropriate __section() attribute. A helper is then added for
>> > translating between the original kernel symbol, and the address of that
>> > function or variable once it has been copied into SRAM. Once control is
>> > passed to a function within the SRAM section grouping, it can access any
>> > variables or functions within that same SRAM section grouping without
>> > translation.
>>
>> What about the relocations which will need to be fixed up - eg, addresses
>> in the literal pool, the GOT table contents, etc?  You say nothing about
>> this.
>
> I was also thinking about this, and there are more problems.
>
> As well as what has already been mentioned:
>
>  * Calls from inside the SRAM code to vmlinux (including lib1funcs etc.)
>    will typically break, except on architectures where function calls are
>    (absolute by default not ARM).

As in the response to RMK, I think compiler flags are enough to
prevent implicit memcpy/memset calls. The code would not be allowed to
do divisions, module, or 64 bit multiplication. A make rule would
check the sram sections for any dynamically relocatable symbols.

>  * The compiler/linker won't detect unsafe constructs or code generation,
>    because it assumes that anything built with -fPIC is going to be patched
>    up later by ld.so or equivalent.

Can you provide examples of what some of these other unsafe constructs might be?

>  * The GOT is generated by the linker, and is a single table.  Yet each
>    SRAM blob needs to be able to refer to its own GOT entries position-
>    independently.  Moving the blobs independently won't work.

Would GOT entries only exist if there are accesses to .data or .bss?
The SRAM C code would not support such a thing, only access to data
and text within the SRAM grouping is allowed. Is there a way to make
the compiler or linker complain if such an access is done? If not,
it'd be another make rule as above.

> In other words: -fPIC does not generate position-independent code.
>
> It generates position-dependent code that is easier to move around than
> non-fPIC code, but you still need a dynamic linker (or equivalent) to
> make it all work.

arch/arm/boot/compressed/ seems to manage it. Hopefully, by allowing
only more limited code, I can get by with less tricks.

> There are various "correct" ways to handle this, the simplest of which
> is probably to build each SRAM blob as a kernel module, embed the result
> in the kernel somehow, and then use the module loader infrastructure
> to handle fixing the module up to the right address.
>
> But this is still likely to be overkill, given the small scale of the
> SRAM code.

Yes, I'm pretty sure several people would scream rather loudly if
getting suspend/resume support on their platform required
CONFIG_MODULES=y.

> Restricting such code to carefully-written assembler (as now) may be
> the more practical approrach, unless there's a good example of somewhere
> that C code would provide a big benefit.

There are currently about 5000 or so lines of assembly code in
arch/arm that are used for suspend/resume stubs. In one stage of
am335x development, the sleep/resume stub for am335x was about 1200
lines long. Since then, a lot of that code has been moved to a
firmware blob, but there has been some pushback on that, which is why
I'm investigating this path. Especially given that there are some
future platforms that will follow the am335x pm model.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 0/4] Create infrastructure for running C code from SRAM.
@ 2013-09-06 19:42       ` Russ Dill
  0 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-06 19:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 6, 2013 at 9:19 AM, Dave Martin <Dave.Martin@arm.com> wrote:
> On Fri, Sep 06, 2013 at 12:12:21PM +0100, Russell King - ARM Linux wrote:
>> On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote:
>> > SRAM handling code is in the process of being moved from arch directories
>> > into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
>> > patchset builds on that, including the limitation that the SRAM address is
>> > not known at compile time. Because the SRAM address is not known at compile
>> > time, the code that runs from SRAM must be compiled with -fPIC. Even if
>> > the code were loaded to a fixed virtual address, portions of the code must
>> > often be run with the MMU disabled.
>>
>> What are you doing about the various gcc utility functions that may be
>> implicitly called from C code such as memcpy and memset?
>>
>> > The general idea is that for each SRAM user (such as an SoC specific
>> > suspend/resume mechanism) to create a group of sections. The section group
>> > is created with a single macro for each user, but end up looking like this:
>> >
>> > .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
>> >   __sram_am33xx_start = .;
>> >   *(.sram.am33xx.*)
>> >   __sram_am33xx_end = .;
>> > }
>> >
>> > Any data or functions that should be copied to SRAM for this use should be
>> > maked with an appropriate __section() attribute. A helper is then added for
>> > translating between the original kernel symbol, and the address of that
>> > function or variable once it has been copied into SRAM. Once control is
>> > passed to a function within the SRAM section grouping, it can access any
>> > variables or functions within that same SRAM section grouping without
>> > translation.
>>
>> What about the relocations which will need to be fixed up - eg, addresses
>> in the literal pool, the GOT table contents, etc?  You say nothing about
>> this.
>
> I was also thinking about this, and there are more problems.
>
> As well as what has already been mentioned:
>
>  * Calls from inside the SRAM code to vmlinux (including lib1funcs etc.)
>    will typically break, except on architectures where function calls are
>    (absolute by default not ARM).

As in the response to RMK, I think compiler flags are enough to
prevent implicit memcpy/memset calls. The code would not be allowed to
do divisions, module, or 64 bit multiplication. A make rule would
check the sram sections for any dynamically relocatable symbols.

>  * The compiler/linker won't detect unsafe constructs or code generation,
>    because it assumes that anything built with -fPIC is going to be patched
>    up later by ld.so or equivalent.

Can you provide examples of what some of these other unsafe constructs might be?

>  * The GOT is generated by the linker, and is a single table.  Yet each
>    SRAM blob needs to be able to refer to its own GOT entries position-
>    independently.  Moving the blobs independently won't work.

Would GOT entries only exist if there are accesses to .data or .bss?
The SRAM C code would not support such a thing, only access to data
and text within the SRAM grouping is allowed. Is there a way to make
the compiler or linker complain if such an access is done? If not,
it'd be another make rule as above.

> In other words: -fPIC does not generate position-independent code.
>
> It generates position-dependent code that is easier to move around than
> non-fPIC code, but you still need a dynamic linker (or equivalent) to
> make it all work.

arch/arm/boot/compressed/ seems to manage it. Hopefully, by allowing
only more limited code, I can get by with less tricks.

> There are various "correct" ways to handle this, the simplest of which
> is probably to build each SRAM blob as a kernel module, embed the result
> in the kernel somehow, and then use the module loader infrastructure
> to handle fixing the module up to the right address.
>
> But this is still likely to be overkill, given the small scale of the
> SRAM code.

Yes, I'm pretty sure several people would scream rather loudly if
getting suspend/resume support on their platform required
CONFIG_MODULES=y.

> Restricting such code to carefully-written assembler (as now) may be
> the more practical approrach, unless there's a good example of somewhere
> that C code would provide a big benefit.

There are currently about 5000 or so lines of assembly code in
arch/arm that are used for suspend/resume stubs. In one stage of
am335x development, the sleep/resume stub for am335x was about 1200
lines long. Since then, a lot of that code has been moved to a
firmware blob, but there has been some pushback on that, which is why
I'm investigating this path. Especially given that there are some
future platforms that will follow the am335x pm model.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 3/4] Misc: SRAM: Hack for allowing executable code in SRAM.
  2013-09-04 18:06     ` Tony Lindgren
@ 2013-09-06 20:50       ` Russ Dill
  -1 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-06 20:50 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Måns Rullgård, Shawn Guo, Grant Likely, Philipp Zabel,
	linux-omap, Linux ARM Kernel List

On Wed, Sep 4, 2013 at 11:06 AM, Tony Lindgren <tony@atomide.com> wrote:
> * Russ Dill <Russ.Dill@ti.com> [130903 09:52]:
>> The generic SRAM mechanism does not ioremap memory in a
>> manner that allows code to be executed from SRAM. There is
>> currently no generic way to request ioremap to return a
>> memory area with execution allowed.
>>
>> Insert a temporary hack for proof of concept on ARM.
>>
>> Signed-off-by: Russ Dill <Russ.Dill@ti.com>
>> ---
>>  drivers/misc/sram.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/misc/sram.c b/drivers/misc/sram.c
>> index 08baaab..e059a23 100644
>> --- a/drivers/misc/sram.c
>> +++ b/drivers/misc/sram.c
>> @@ -31,6 +31,7 @@
>>  #include <linux/genalloc.h>
>>  #include <linux/sram.h>
>>  #include <asm-generic/cacheflush.h>
>> +#include <asm/io.h>
>>
>>  #define SRAM_GRANULARITY     32
>>
>> @@ -138,7 +139,7 @@ static int sram_probe(struct platform_device *pdev)
>>       int ret;
>>
>>       res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> -     virt_base = devm_ioremap_resource(&pdev->dev, res);
>> +     virt_base = __arm_ioremap_exec(res->start, resource_size(res), false);
>>       if (IS_ERR(virt_base))
>>               return PTR_ERR(virt_base);
>
> You can get rid of this hack by defining ioremap_exec in
> include/asm-generic/io.h the same way as ioremap_nocache
> is done:
>
> #ifndef ioremap_exec
> #define ioremap_exec ioremap
> #endif
>
> Then the arch that need ioremap_exec can define and
> implement it. Needs to be reviewed on LKML naturally :)

The similar statement for nocache in asm-generic/io.h appears in an
#ifndef CONFIG_MMU block. I think the better example would be
ioremap_wc, which looks like:

#ifndef ARCH_HAS_IOREMAP_WC
#define ioremap_wc ioremap_nocache
#endif

Course, ioremap_exec on ARM has a slight complication since it has an
extra bool nocache argument.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 3/4] Misc: SRAM: Hack for allowing executable code in SRAM.
@ 2013-09-06 20:50       ` Russ Dill
  0 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-06 20:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 4, 2013 at 11:06 AM, Tony Lindgren <tony@atomide.com> wrote:
> * Russ Dill <Russ.Dill@ti.com> [130903 09:52]:
>> The generic SRAM mechanism does not ioremap memory in a
>> manner that allows code to be executed from SRAM. There is
>> currently no generic way to request ioremap to return a
>> memory area with execution allowed.
>>
>> Insert a temporary hack for proof of concept on ARM.
>>
>> Signed-off-by: Russ Dill <Russ.Dill@ti.com>
>> ---
>>  drivers/misc/sram.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/misc/sram.c b/drivers/misc/sram.c
>> index 08baaab..e059a23 100644
>> --- a/drivers/misc/sram.c
>> +++ b/drivers/misc/sram.c
>> @@ -31,6 +31,7 @@
>>  #include <linux/genalloc.h>
>>  #include <linux/sram.h>
>>  #include <asm-generic/cacheflush.h>
>> +#include <asm/io.h>
>>
>>  #define SRAM_GRANULARITY     32
>>
>> @@ -138,7 +139,7 @@ static int sram_probe(struct platform_device *pdev)
>>       int ret;
>>
>>       res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> -     virt_base = devm_ioremap_resource(&pdev->dev, res);
>> +     virt_base = __arm_ioremap_exec(res->start, resource_size(res), false);
>>       if (IS_ERR(virt_base))
>>               return PTR_ERR(virt_base);
>
> You can get rid of this hack by defining ioremap_exec in
> include/asm-generic/io.h the same way as ioremap_nocache
> is done:
>
> #ifndef ioremap_exec
> #define ioremap_exec ioremap
> #endif
>
> Then the arch that need ioremap_exec can define and
> implement it. Needs to be reviewed on LKML naturally :)

The similar statement for nocache in asm-generic/io.h appears in an
#ifndef CONFIG_MMU block. I think the better example would be
ioremap_wc, which looks like:

#ifndef ARCH_HAS_IOREMAP_WC
#define ioremap_wc ioremap_nocache
#endif

Course, ioremap_exec on ARM has a slight complication since it has an
extra bool nocache argument.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/4] Create infrastructure for running C code from SRAM.
  2013-09-06 16:40         ` Dave Martin
@ 2013-09-07  8:57           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 38+ messages in thread
From: Russell King - ARM Linux @ 2013-09-07  8:57 UTC (permalink / raw)
  To: Dave Martin
  Cc: Russ Dill, mans, Emilio López, Shawn Guo, Grant Likely,
	Philipp Zabel, linux-omap, Linux ARM Kernel List

On Fri, Sep 06, 2013 at 05:40:59PM +0100, Dave Martin wrote:
> I actually wonder whether fncpy() contains a buglet, whereby
> flush_icache_range() is used instead of coherent_kern_range().
> The SRAM is probably not mapped cached, but at least a DSB would be
> needed before flushing the relevant lines from the I-cache.

flush_icache_range() is correct - it's there to ensure that memory which
has been written will be readable to the instruction stream.  That's it's
whole purpose, and it's used when modules are loaded too.

You're reading too much into the name: it doesn't just touch the I cache.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 0/4] Create infrastructure for running C code from SRAM.
@ 2013-09-07  8:57           ` Russell King - ARM Linux
  0 siblings, 0 replies; 38+ messages in thread
From: Russell King - ARM Linux @ 2013-09-07  8:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 06, 2013 at 05:40:59PM +0100, Dave Martin wrote:
> I actually wonder whether fncpy() contains a buglet, whereby
> flush_icache_range() is used instead of coherent_kern_range().
> The SRAM is probably not mapped cached, but at least a DSB would be
> needed before flushing the relevant lines from the I-cache.

flush_icache_range() is correct - it's there to ensure that memory which
has been written will be readable to the instruction stream.  That's it's
whole purpose, and it's used when modules are loaded too.

You're reading too much into the name: it doesn't just touch the I cache.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/4] Create infrastructure for running C code from SRAM.
  2013-09-06 19:32     ` Russ Dill
@ 2013-09-07 16:21       ` Ard Biesheuvel
  -1 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2013-09-07 16:21 UTC (permalink / raw)
  To: Russ Dill
  Cc: Russell King - ARM Linux, mans, Shawn Guo, Grant Likely,
	Philipp Zabel, linux-omap, Linux ARM Kernel List

On 6 September 2013 21:32, Russ Dill <Russ.Dill@ti.com> wrote:
> On Fri, Sep 6, 2013 at 4:12 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote:
>>> SRAM handling code is in the process of being moved from arch directories
>>> into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
>>> patchset builds on that, including the limitation that the SRAM address is
>>> not known at compile time. Because the SRAM address is not known at compile
>>> time, the code that runs from SRAM must be compiled with -fPIC. Even if
>>> the code were loaded to a fixed virtual address, portions of the code must
>>> often be run with the MMU disabled.
>>
>> What are you doing about the various gcc utility functions that may be
>> implicitly called from C code such as memcpy and memset?
>
> That would create a problem. Would '-ffreestanding' be the correct
> flag to add?

No, unfortunately, -ffreestanding won't prevent GCC from generating
implicit calls to memzero() et al. These are mainly issued when using
initialized non-POD stack variables so avoiding those might help you
there.

> As far as the family of __aeabi_*, I need to add
> documentation stating that on ARM, you can't divide, perform modulo,
> and can't do 64 bit multiplications. I can then add a make rule that
> will grep the symbol lists of .sram sections for ^__aeabi_. Is this
> enough?
>

Well, even printk() needs integer division for its %d/%u modifiers, so
this is really not so easy to achieve.

>>> The general idea is that for each SRAM user (such as an SoC specific
>>> suspend/resume mechanism) to create a group of sections. The section group
>>> is created with a single macro for each user, but end up looking like this:
>>>
>>> .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
>>>   __sram_am33xx_start = .;
>>>   *(.sram.am33xx.*)
>>>   __sram_am33xx_end = .;
>>> }
>>>
>>> Any data or functions that should be copied to SRAM for this use should be
>>> maked with an appropriate __section() attribute. A helper is then added for
>>> translating between the original kernel symbol, and the address of that
>>> function or variable once it has been copied into SRAM. Once control is
>>> passed to a function within the SRAM section grouping, it can access any
>>> variables or functions within that same SRAM section grouping without
>>> translation.
>>
>> What about the relocations which will need to be fixed up - eg, addresses
>> in the literal pool, the GOT table contents, etc?  You say nothing about
>> this.
>
> The C code would need to be written so that such accesses do not
> occur. From functions that are in the sram text section, only accesses
> to other sram sections in their group would be allowed. And above, a
> compilation step could be added to make the compilation fail when such
> things happen.
>

The point is that, sadly, GCC is just not very good at generating
relocatable code for embedded targets. Playing with -fvisibility may
result in code that contains fewer dynamic relocations, but you will
always end up with a few that need to be fixed up before the code can
run. Another thing to note is that usually, these relocations can only
be fixed up once, as the addend is overwritten by the fixed-up
address. This means that the code can only run in SRAM, and you should
probably best avoid the module loader machinery as it may clobber the
addends before you get to process them.

One thing that remains implicit in this discussion is that you are
executing from SRAM because DRAM is not available (I presume).
Wouldn't it be better to treat the code that lives in the SRAM as a
completely separate executable? You can generate a PIE executable that
supplies minimal memzero et al,  fixup the relocations yourself (look
at the uboot sources for an example of this) and you will be
absolutely sure that the code can run completely autonomously. In
fact, some of this stuff could potentially be reused for other
disjoint execution domains such as TZ secure world.

Regards,
Ard.



> The direction you are going though is good, if this is fragile, it
> doesn't put us in a better place then having fragile asm that at least
> complies reliably. As I'm looking towards the future and platforms
> that are similar to am335x in that they are placing more and more of
> the PM state machine burden on the CPU, I'd really like to try to make
> this happen.
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 0/4] Create infrastructure for running C code from SRAM.
@ 2013-09-07 16:21       ` Ard Biesheuvel
  0 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2013-09-07 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 September 2013 21:32, Russ Dill <Russ.Dill@ti.com> wrote:
> On Fri, Sep 6, 2013 at 4:12 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote:
>>> SRAM handling code is in the process of being moved from arch directories
>>> into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
>>> patchset builds on that, including the limitation that the SRAM address is
>>> not known at compile time. Because the SRAM address is not known at compile
>>> time, the code that runs from SRAM must be compiled with -fPIC. Even if
>>> the code were loaded to a fixed virtual address, portions of the code must
>>> often be run with the MMU disabled.
>>
>> What are you doing about the various gcc utility functions that may be
>> implicitly called from C code such as memcpy and memset?
>
> That would create a problem. Would '-ffreestanding' be the correct
> flag to add?

No, unfortunately, -ffreestanding won't prevent GCC from generating
implicit calls to memzero() et al. These are mainly issued when using
initialized non-POD stack variables so avoiding those might help you
there.

> As far as the family of __aeabi_*, I need to add
> documentation stating that on ARM, you can't divide, perform modulo,
> and can't do 64 bit multiplications. I can then add a make rule that
> will grep the symbol lists of .sram sections for ^__aeabi_. Is this
> enough?
>

Well, even printk() needs integer division for its %d/%u modifiers, so
this is really not so easy to achieve.

>>> The general idea is that for each SRAM user (such as an SoC specific
>>> suspend/resume mechanism) to create a group of sections. The section group
>>> is created with a single macro for each user, but end up looking like this:
>>>
>>> .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
>>>   __sram_am33xx_start = .;
>>>   *(.sram.am33xx.*)
>>>   __sram_am33xx_end = .;
>>> }
>>>
>>> Any data or functions that should be copied to SRAM for this use should be
>>> maked with an appropriate __section() attribute. A helper is then added for
>>> translating between the original kernel symbol, and the address of that
>>> function or variable once it has been copied into SRAM. Once control is
>>> passed to a function within the SRAM section grouping, it can access any
>>> variables or functions within that same SRAM section grouping without
>>> translation.
>>
>> What about the relocations which will need to be fixed up - eg, addresses
>> in the literal pool, the GOT table contents, etc?  You say nothing about
>> this.
>
> The C code would need to be written so that such accesses do not
> occur. From functions that are in the sram text section, only accesses
> to other sram sections in their group would be allowed. And above, a
> compilation step could be added to make the compilation fail when such
> things happen.
>

The point is that, sadly, GCC is just not very good at generating
relocatable code for embedded targets. Playing with -fvisibility may
result in code that contains fewer dynamic relocations, but you will
always end up with a few that need to be fixed up before the code can
run. Another thing to note is that usually, these relocations can only
be fixed up once, as the addend is overwritten by the fixed-up
address. This means that the code can only run in SRAM, and you should
probably best avoid the module loader machinery as it may clobber the
addends before you get to process them.

One thing that remains implicit in this discussion is that you are
executing from SRAM because DRAM is not available (I presume).
Wouldn't it be better to treat the code that lives in the SRAM as a
completely separate executable? You can generate a PIE executable that
supplies minimal memzero et al,  fixup the relocations yourself (look
at the uboot sources for an example of this) and you will be
absolutely sure that the code can run completely autonomously. In
fact, some of this stuff could potentially be reused for other
disjoint execution domains such as TZ secure world.

Regards,
Ard.

> The direction you are going though is good, if this is fragile, it
> doesn't put us in a better place then having fragile asm that at least
> complies reliably. As I'm looking towards the future and platforms
> that are similar to am335x in that they are placing more and more of
> the PM state machine burden on the CPU, I'd really like to try to make
> this happen.
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 0/4] Create infrastructure for running C code from SRAM.
  2013-09-07 16:21       ` Ard Biesheuvel
@ 2013-09-09 23:10         ` Russ Dill
  -1 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-09 23:10 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Måns Rullgård, Russell King - ARM Linux, Shawn Guo,
	Grant Likely, Philipp Zabel, linux-omap, Linux ARM Kernel List

On Sat, Sep 7, 2013 at 9:21 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 6 September 2013 21:32, Russ Dill <Russ.Dill@ti.com> wrote:
>> On Fri, Sep 6, 2013 at 4:12 AM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>>> On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote:
>>>> SRAM handling code is in the process of being moved from arch directories
>>>> into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
>>>> patchset builds on that, including the limitation that the SRAM address is
>>>> not known at compile time. Because the SRAM address is not known at compile
>>>> time, the code that runs from SRAM must be compiled with -fPIC. Even if
>>>> the code were loaded to a fixed virtual address, portions of the code must
>>>> often be run with the MMU disabled.
>>>
>>> What are you doing about the various gcc utility functions that may be
>>> implicitly called from C code such as memcpy and memset?
>>
>> That would create a problem. Would '-ffreestanding' be the correct
>> flag to add?
>
> No, unfortunately, -ffreestanding won't prevent GCC from generating
> implicit calls to memzero() et al. These are mainly issued when using
> initialized non-POD stack variables so avoiding those might help you
> there.
>> As far as the family of __aeabi_*, I need to add
>> documentation stating that on ARM, you can't divide, perform modulo,
>> and can't do 64 bit multiplications. I can then add a make rule that
>> will grep the symbol lists of .sram sections for ^__aeabi_. Is this
>> enough?
>>
>
> Well, even printk() needs integer division for its %d/%u modifiers, so
> this is really not so easy to achieve.
>
>>>> The general idea is that for each SRAM user (such as an SoC specific
>>>> suspend/resume mechanism) to create a group of sections. The section group
>>>> is created with a single macro for each user, but end up looking like this:
>>>>
>>>> .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
>>>>   __sram_am33xx_start = .;
>>>>   *(.sram.am33xx.*)
>>>>   __sram_am33xx_end = .;
>>>> }
>>>>
>>>> Any data or functions that should be copied to SRAM for this use should be
>>>> maked with an appropriate __section() attribute. A helper is then added for
>>>> translating between the original kernel symbol, and the address of that
>>>> function or variable once it has been copied into SRAM. Once control is
>>>> passed to a function within the SRAM section grouping, it can access any
>>>> variables or functions within that same SRAM section grouping without
>>>> translation.
>>>
>>> What about the relocations which will need to be fixed up - eg, addresses
>>> in the literal pool, the GOT table contents, etc?  You say nothing about
>>> this.
>>
>> The C code would need to be written so that such accesses do not
>> occur. From functions that are in the sram text section, only accesses
>> to other sram sections in their group would be allowed. And above, a
>> compilation step could be added to make the compilation fail when such
>> things happen.
>>
>
> The point is that, sadly, GCC is just not very good at generating
> relocatable code for embedded targets. Playing with -fvisibility may
> result in code that contains fewer dynamic relocations, but you will
> always end up with a few that need to be fixed up before the code can
> run. Another thing to note is that usually, these relocations can only
> be fixed up once, as the addend is overwritten by the fixed-up
> address. This means that the code can only run in SRAM, and you should
> probably best avoid the module loader machinery as it may clobber the
> addends before you get to process them.
>
> One thing that remains implicit in this discussion is that you are
> executing from SRAM because DRAM is not available (I presume).
> Wouldn't it be better to treat the code that lives in the SRAM as a
> completely separate executable? You can generate a PIE executable that
> supplies minimal memzero et al,  fixup the relocations yourself (look
> at the uboot sources for an example of this) and you will be
> absolutely sure that the code can run completely autonomously. In
> fact, some of this stuff could potentially be reused for other
> disjoint execution domains such as TZ secure world.

This is the path I'm going down, but I'm trying to do it without
relocations. I'm following the model of arch/arm/boot/compressed and
generating a relocatable gcc builtin library with weak symbols
containing lib1funcs.S, string.c, ashldi3.S, and some stubs for div0
and the unwind symbols, call in sramlib.o.

I'm then doing an objcopy of the .sramlib section, and the .sram.*
sections into a single object file and performing a link with a linker
script like:

SECTIONS
{
    .text : { *(.sramlib) }

    OVERLAY ALIGN(32) : NOCROSSREFS
    {
        .sram.am33xx { *(.sram.am33xx.*) }
        .sram.am437x { *(.sram.am437x.*) }
    }
}

It produces output without any relocations, but from there I'm a
little fuzzy on how to get the symbols of functions and variables into
the kernel. In the meantime, I'll look into the u-boot methods.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 0/4] Create infrastructure for running C code from SRAM.
@ 2013-09-09 23:10         ` Russ Dill
  0 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2013-09-09 23:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 7, 2013 at 9:21 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 6 September 2013 21:32, Russ Dill <Russ.Dill@ti.com> wrote:
>> On Fri, Sep 6, 2013 at 4:12 AM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>>> On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote:
>>>> SRAM handling code is in the process of being moved from arch directories
>>>> into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC
>>>> patchset builds on that, including the limitation that the SRAM address is
>>>> not known at compile time. Because the SRAM address is not known at compile
>>>> time, the code that runs from SRAM must be compiled with -fPIC. Even if
>>>> the code were loaded to a fixed virtual address, portions of the code must
>>>> often be run with the MMU disabled.
>>>
>>> What are you doing about the various gcc utility functions that may be
>>> implicitly called from C code such as memcpy and memset?
>>
>> That would create a problem. Would '-ffreestanding' be the correct
>> flag to add?
>
> No, unfortunately, -ffreestanding won't prevent GCC from generating
> implicit calls to memzero() et al. These are mainly issued when using
> initialized non-POD stack variables so avoiding those might help you
> there.
>> As far as the family of __aeabi_*, I need to add
>> documentation stating that on ARM, you can't divide, perform modulo,
>> and can't do 64 bit multiplications. I can then add a make rule that
>> will grep the symbol lists of .sram sections for ^__aeabi_. Is this
>> enough?
>>
>
> Well, even printk() needs integer division for its %d/%u modifiers, so
> this is really not so easy to achieve.
>
>>>> The general idea is that for each SRAM user (such as an SoC specific
>>>> suspend/resume mechanism) to create a group of sections. The section group
>>>> is created with a single macro for each user, but end up looking like this:
>>>>
>>>> .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) {
>>>>   __sram_am33xx_start = .;
>>>>   *(.sram.am33xx.*)
>>>>   __sram_am33xx_end = .;
>>>> }
>>>>
>>>> Any data or functions that should be copied to SRAM for this use should be
>>>> maked with an appropriate __section() attribute. A helper is then added for
>>>> translating between the original kernel symbol, and the address of that
>>>> function or variable once it has been copied into SRAM. Once control is
>>>> passed to a function within the SRAM section grouping, it can access any
>>>> variables or functions within that same SRAM section grouping without
>>>> translation.
>>>
>>> What about the relocations which will need to be fixed up - eg, addresses
>>> in the literal pool, the GOT table contents, etc?  You say nothing about
>>> this.
>>
>> The C code would need to be written so that such accesses do not
>> occur. From functions that are in the sram text section, only accesses
>> to other sram sections in their group would be allowed. And above, a
>> compilation step could be added to make the compilation fail when such
>> things happen.
>>
>
> The point is that, sadly, GCC is just not very good at generating
> relocatable code for embedded targets. Playing with -fvisibility may
> result in code that contains fewer dynamic relocations, but you will
> always end up with a few that need to be fixed up before the code can
> run. Another thing to note is that usually, these relocations can only
> be fixed up once, as the addend is overwritten by the fixed-up
> address. This means that the code can only run in SRAM, and you should
> probably best avoid the module loader machinery as it may clobber the
> addends before you get to process them.
>
> One thing that remains implicit in this discussion is that you are
> executing from SRAM because DRAM is not available (I presume).
> Wouldn't it be better to treat the code that lives in the SRAM as a
> completely separate executable? You can generate a PIE executable that
> supplies minimal memzero et al,  fixup the relocations yourself (look
> at the uboot sources for an example of this) and you will be
> absolutely sure that the code can run completely autonomously. In
> fact, some of this stuff could potentially be reused for other
> disjoint execution domains such as TZ secure world.

This is the path I'm going down, but I'm trying to do it without
relocations. I'm following the model of arch/arm/boot/compressed and
generating a relocatable gcc builtin library with weak symbols
containing lib1funcs.S, string.c, ashldi3.S, and some stubs for div0
and the unwind symbols, call in sramlib.o.

I'm then doing an objcopy of the .sramlib section, and the .sram.*
sections into a single object file and performing a link with a linker
script like:

SECTIONS
{
    .text : { *(.sramlib) }

    OVERLAY ALIGN(32) : NOCROSSREFS
    {
        .sram.am33xx { *(.sram.am33xx.*) }
        .sram.am437x { *(.sram.am437x.*) }
    }
}

It produces output without any relocations, but from there I'm a
little fuzzy on how to get the symbols of functions and variables into
the kernel. In the meantime, I'll look into the u-boot methods.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2013-09-09 23:10 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-03 16:44 [RFC 0/4] Create infrastructure for running C code from SRAM Russ Dill
2013-09-03 16:44 ` Russ Dill
2013-09-03 16:44 ` [RFC 1/4] Misc: SRAM: Create helpers for loading C code into SRAM Russ Dill
2013-09-03 16:44 ` [RFC 2/4] ARM: SRAM: Add macro for generating SRAM resume trampoline Russ Dill
2013-09-03 16:44 ` [RFC 3/4] Misc: SRAM: Hack for allowing executable code in SRAM Russ Dill
2013-09-04 18:06   ` Tony Lindgren
2013-09-04 18:06     ` Tony Lindgren
2013-09-06 20:50     ` Russ Dill
2013-09-06 20:50       ` Russ Dill
2013-09-03 16:44 ` [RFC 4/4] ARM: AM33XX: Move suspend/resume assembly to C Russ Dill
2013-09-04 19:52 ` [RFC 0/4] Create infrastructure for running C code from SRAM Emilio López
2013-09-04 19:52   ` Emilio López
2013-09-04 21:47   ` Russ Dill
2013-09-04 21:47     ` Russ Dill
2013-09-06 11:02     ` Sekhar Nori
2013-09-06 11:02       ` Sekhar Nori
2013-09-06 11:14     ` Russell King - ARM Linux
2013-09-06 11:14       ` Russell King - ARM Linux
2013-09-06 16:40       ` Dave Martin
2013-09-06 16:40         ` Dave Martin
2013-09-06 18:50         ` Russ Dill
2013-09-06 18:50           ` Russ Dill
2013-09-07  8:57         ` Russell King - ARM Linux
2013-09-07  8:57           ` Russell King - ARM Linux
2013-09-06 18:40       ` Russ Dill
2013-09-06 18:40         ` Russ Dill
2013-09-06 11:12 ` Russell King - ARM Linux
2013-09-06 11:12   ` Russell King - ARM Linux
2013-09-06 16:19   ` Dave Martin
2013-09-06 16:19     ` Dave Martin
2013-09-06 19:42     ` Russ Dill
2013-09-06 19:42       ` Russ Dill
2013-09-06 19:32   ` Russ Dill
2013-09-06 19:32     ` Russ Dill
2013-09-07 16:21     ` Ard Biesheuvel
2013-09-07 16:21       ` Ard Biesheuvel
2013-09-09 23:10       ` Russ Dill
2013-09-09 23:10         ` Russ Dill

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.