All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Use more asm-generic headers
@ 2012-08-05  1:23 Rob Herring
  2012-08-05  1:23 ` [PATCH v2 1/5] ARM: use generic version of identical asm headers Rob Herring
                   ` (6 more replies)
  0 siblings, 7 replies; 46+ messages in thread
From: Rob Herring @ 2012-08-05  1:23 UTC (permalink / raw)
  To: linux-arm-kernel

From: Rob Herring <rob.herring@calxeda.com>

Inspired by the AArgh64 claim that it should be separate from ARM and one
reason was being able to use more asm-generic headers. Doing a diff of
arch/arm/include/asm and include/asm-generic there are numerous asm
headers which are functionally identical to their asm-generic counterparts
and others which can use more of the generic versions.

V2 separates unaligned.h and termios.h changes to separate patches. Also,
it adds a fix for build failures on non-gzip decompressor compile.

Rob

Rob Herring (5):
  ARM: use generic version of identical asm headers
  ARM: add strstr declaration for decompressors
  ARM: use generic unaligned.h
  ARM: use generic termios.h
  ARM: convert asm/irqflags.h to use asm-generic/irqflags.h

 arch/arm/boot/compressed/decompress.c |    3 +
 arch/arm/include/asm/Kbuild           |   17 +++
 arch/arm/include/asm/current.h        |   15 ---
 arch/arm/include/asm/exec.h           |    6 -
 arch/arm/include/asm/ipcbuf.h         |    1 -
 arch/arm/include/asm/irqflags.h       |  131 ++++++++--------------
 arch/arm/include/asm/msgbuf.h         |   31 ------
 arch/arm/include/asm/param.h          |   31 ------
 arch/arm/include/asm/parport.h        |   18 ---
 arch/arm/include/asm/segment.h        |   11 --
 arch/arm/include/asm/sembuf.h         |   25 -----
 arch/arm/include/asm/serial.h         |   19 ----
 arch/arm/include/asm/shmbuf.h         |   42 -------
 arch/arm/include/asm/socket.h         |   72 ------------
 arch/arm/include/asm/sockios.h        |   13 ---
 arch/arm/include/asm/termbits.h       |  198 ---------------------------------
 arch/arm/include/asm/termios.h        |   92 ---------------
 arch/arm/include/asm/timex.h          |    6 +-
 arch/arm/include/asm/types.h          |   16 ---
 arch/arm/include/asm/unaligned.h      |   19 ----
 20 files changed, 67 insertions(+), 699 deletions(-)
 delete mode 100644 arch/arm/include/asm/current.h
 delete mode 100644 arch/arm/include/asm/exec.h
 delete mode 100644 arch/arm/include/asm/ipcbuf.h
 delete mode 100644 arch/arm/include/asm/msgbuf.h
 delete mode 100644 arch/arm/include/asm/param.h
 delete mode 100644 arch/arm/include/asm/parport.h
 delete mode 100644 arch/arm/include/asm/segment.h
 delete mode 100644 arch/arm/include/asm/sembuf.h
 delete mode 100644 arch/arm/include/asm/serial.h
 delete mode 100644 arch/arm/include/asm/shmbuf.h
 delete mode 100644 arch/arm/include/asm/socket.h
 delete mode 100644 arch/arm/include/asm/sockios.h
 delete mode 100644 arch/arm/include/asm/termbits.h
 delete mode 100644 arch/arm/include/asm/termios.h
 delete mode 100644 arch/arm/include/asm/types.h
 delete mode 100644 arch/arm/include/asm/unaligned.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 1/5] ARM: use generic version of identical asm headers
  2012-08-05  1:23 [PATCH v2 0/5] Use more asm-generic headers Rob Herring
@ 2012-08-05  1:23 ` Rob Herring
  2012-08-05  1:23 ` [PATCH v2 2/5] ARM: add strstr declaration for decompressors Rob Herring
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Rob Herring @ 2012-08-05  1:23 UTC (permalink / raw)
  To: linux-arm-kernel

From: Rob Herring <rob.herring@calxeda.com>

Inspired by the AArgh64 claim that it should be separate from ARM and one
reason was being able to use more asm-generic headers. Doing a diff of
arch/arm/include/asm and include/asm-generic there are numerous asm
headers which are functionally identical to their asm-generic counterparts.
Delete the ARM version and use the generic ones.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/include/asm/Kbuild     |   15 +++
 arch/arm/include/asm/current.h  |   15 ---
 arch/arm/include/asm/exec.h     |    6 --
 arch/arm/include/asm/ipcbuf.h   |    1 -
 arch/arm/include/asm/msgbuf.h   |   31 ------
 arch/arm/include/asm/param.h    |   31 ------
 arch/arm/include/asm/parport.h  |   18 ----
 arch/arm/include/asm/segment.h  |   11 ---
 arch/arm/include/asm/sembuf.h   |   25 -----
 arch/arm/include/asm/serial.h   |   19 ----
 arch/arm/include/asm/shmbuf.h   |   42 ---------
 arch/arm/include/asm/socket.h   |   72 --------------
 arch/arm/include/asm/sockios.h  |   13 ---
 arch/arm/include/asm/termbits.h |  198 ---------------------------------------
 arch/arm/include/asm/timex.h    |    6 +-
 arch/arm/include/asm/types.h    |   16 ----
 16 files changed, 17 insertions(+), 502 deletions(-)
 delete mode 100644 arch/arm/include/asm/current.h
 delete mode 100644 arch/arm/include/asm/exec.h
 delete mode 100644 arch/arm/include/asm/ipcbuf.h
 delete mode 100644 arch/arm/include/asm/msgbuf.h
 delete mode 100644 arch/arm/include/asm/param.h
 delete mode 100644 arch/arm/include/asm/parport.h
 delete mode 100644 arch/arm/include/asm/segment.h
 delete mode 100644 arch/arm/include/asm/sembuf.h
 delete mode 100644 arch/arm/include/asm/serial.h
 delete mode 100644 arch/arm/include/asm/shmbuf.h
 delete mode 100644 arch/arm/include/asm/socket.h
 delete mode 100644 arch/arm/include/asm/sockios.h
 delete mode 100644 arch/arm/include/asm/termbits.h
 delete mode 100644 arch/arm/include/asm/types.h

diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 960abce..6cc8a13 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -5,16 +5,31 @@ header-y += hwcap.h
 generic-y += auxvec.h
 generic-y += bitsperlong.h
 generic-y += cputime.h
+generic-y += current.h
 generic-y += emergency-restart.h
 generic-y += errno.h
+generic-y += exec.h
 generic-y += ioctl.h
+generic-y += ipcbuf.h
 generic-y += irq_regs.h
 generic-y += kdebug.h
 generic-y += local.h
 generic-y += local64.h
+generic-y += msgbuf.h
+generic-y += param.h
+generic-y += parport.h
 generic-y += percpu.h
 generic-y += poll.h
 generic-y += resource.h
 generic-y += sections.h
+generic-y += segment.h
+generic-y += sembuf.h
+generic-y += serial.h
+generic-y += shmbuf.h
 generic-y += siginfo.h
 generic-y += sizes.h
+generic-y += socket.h
+generic-y += sockios.h
+generic-y += termbits.h
+generic-y += timex.h
+generic-y += types.h
diff --git a/arch/arm/include/asm/current.h b/arch/arm/include/asm/current.h
deleted file mode 100644
index 75d21e2..0000000
--- a/arch/arm/include/asm/current.h
+++ /dev/null
@@ -1,15 +0,0 @@
-#ifndef _ASMARM_CURRENT_H
-#define _ASMARM_CURRENT_H
-
-#include <linux/thread_info.h>
-
-static inline struct task_struct *get_current(void) __attribute_const__;
-
-static inline struct task_struct *get_current(void)
-{
-	return current_thread_info()->task;
-}
-
-#define current (get_current())
-
-#endif /* _ASMARM_CURRENT_H */
diff --git a/arch/arm/include/asm/exec.h b/arch/arm/include/asm/exec.h
deleted file mode 100644
index 7c4fbef..0000000
--- a/arch/arm/include/asm/exec.h
+++ /dev/null
@@ -1,6 +0,0 @@
-#ifndef __ASM_ARM_EXEC_H
-#define __ASM_ARM_EXEC_H
-
-#define arch_align_stack(x) (x)
-
-#endif /* __ASM_ARM_EXEC_H */
diff --git a/arch/arm/include/asm/ipcbuf.h b/arch/arm/include/asm/ipcbuf.h
deleted file mode 100644
index 84c7e51..0000000
--- a/arch/arm/include/asm/ipcbuf.h
+++ /dev/null
@@ -1 +0,0 @@
-#include <asm-generic/ipcbuf.h>
diff --git a/arch/arm/include/asm/msgbuf.h b/arch/arm/include/asm/msgbuf.h
deleted file mode 100644
index 33b35b9..0000000
--- a/arch/arm/include/asm/msgbuf.h
+++ /dev/null
@@ -1,31 +0,0 @@
-#ifndef _ASMARM_MSGBUF_H
-#define _ASMARM_MSGBUF_H
-
-/* 
- * The msqid64_ds structure for arm architecture.
- * Note extra padding because this structure is passed back and forth
- * between kernel and user space.
- *
- * Pad space is left for:
- * - 64-bit time_t to solve y2038 problem
- * - 2 miscellaneous 32-bit values
- */
-
-struct msqid64_ds {
-	struct ipc64_perm msg_perm;
-	__kernel_time_t msg_stime;	/* last msgsnd time */
-	unsigned long	__unused1;
-	__kernel_time_t msg_rtime;	/* last msgrcv time */
-	unsigned long	__unused2;
-	__kernel_time_t msg_ctime;	/* last change time */
-	unsigned long	__unused3;
-	unsigned long  msg_cbytes;	/* current number of bytes on queue */
-	unsigned long  msg_qnum;	/* number of messages in queue */
-	unsigned long  msg_qbytes;	/* max number of bytes on queue */
-	__kernel_pid_t msg_lspid;	/* pid of last msgsnd */
-	__kernel_pid_t msg_lrpid;	/* last receive pid */
-	unsigned long  __unused4;
-	unsigned long  __unused5;
-};
-
-#endif /* _ASMARM_MSGBUF_H */
diff --git a/arch/arm/include/asm/param.h b/arch/arm/include/asm/param.h
deleted file mode 100644
index 8b24bf94..0000000
--- a/arch/arm/include/asm/param.h
+++ /dev/null
@@ -1,31 +0,0 @@
-/*
- *  arch/arm/include/asm/param.h
- *
- *  Copyright (C) 1995-1999 Russell King
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-#ifndef __ASM_PARAM_H
-#define __ASM_PARAM_H
-
-#ifdef __KERNEL__
-# define HZ		CONFIG_HZ	/* Internal kernel timer frequency */
-# define USER_HZ	100		/* User interfaces are in "ticks" */
-# define CLOCKS_PER_SEC	(USER_HZ)	/* like times() */
-#else
-# define HZ		100
-#endif
-
-#define EXEC_PAGESIZE	4096
-
-#ifndef NOGROUP
-#define NOGROUP         (-1)
-#endif
-
-/* max length of hostname */
-#define MAXHOSTNAMELEN  64
-
-#endif
-
diff --git a/arch/arm/include/asm/parport.h b/arch/arm/include/asm/parport.h
deleted file mode 100644
index 26e94b0..0000000
--- a/arch/arm/include/asm/parport.h
+++ /dev/null
@@ -1,18 +0,0 @@
-/*
- *  arch/arm/include/asm/parport.h: ARM-specific parport initialisation
- *
- *  Copyright (C) 1999, 2000  Tim Waugh <tim@cyberelk.demon.co.uk>
- *
- * This file should only be included by drivers/parport/parport_pc.c.
- */
-
-#ifndef __ASMARM_PARPORT_H
-#define __ASMARM_PARPORT_H
-
-static int __devinit parport_pc_find_isa_ports (int autoirq, int autodma);
-static int __devinit parport_pc_find_nonpci_ports (int autoirq, int autodma)
-{
-	return parport_pc_find_isa_ports (autoirq, autodma);
-}
-
-#endif /* !(_ASMARM_PARPORT_H) */
diff --git a/arch/arm/include/asm/segment.h b/arch/arm/include/asm/segment.h
deleted file mode 100644
index 9e24c21..0000000
--- a/arch/arm/include/asm/segment.h
+++ /dev/null
@@ -1,11 +0,0 @@
-#ifndef __ASM_ARM_SEGMENT_H
-#define __ASM_ARM_SEGMENT_H
-
-#define __KERNEL_CS   0x0
-#define __KERNEL_DS   0x0
-
-#define __USER_CS     0x1
-#define __USER_DS     0x1
-
-#endif /* __ASM_ARM_SEGMENT_H */
-
diff --git a/arch/arm/include/asm/sembuf.h b/arch/arm/include/asm/sembuf.h
deleted file mode 100644
index 1c02839..0000000
--- a/arch/arm/include/asm/sembuf.h
+++ /dev/null
@@ -1,25 +0,0 @@
-#ifndef _ASMARM_SEMBUF_H
-#define _ASMARM_SEMBUF_H
-
-/* 
- * The semid64_ds structure for arm architecture.
- * Note extra padding because this structure is passed back and forth
- * between kernel and user space.
- *
- * Pad space is left for:
- * - 64-bit time_t to solve y2038 problem
- * - 2 miscellaneous 32-bit values
- */
-
-struct semid64_ds {
-	struct ipc64_perm sem_perm;		/* permissions .. see ipc.h */
-	__kernel_time_t	sem_otime;		/* last semop time */
-	unsigned long	__unused1;
-	__kernel_time_t	sem_ctime;		/* last change time */
-	unsigned long	__unused2;
-	unsigned long	sem_nsems;		/* no. of semaphores in array */
-	unsigned long	__unused3;
-	unsigned long	__unused4;
-};
-
-#endif /* _ASMARM_SEMBUF_H */
diff --git a/arch/arm/include/asm/serial.h b/arch/arm/include/asm/serial.h
deleted file mode 100644
index ebb0490..0000000
--- a/arch/arm/include/asm/serial.h
+++ /dev/null
@@ -1,19 +0,0 @@
-/*
- *  arch/arm/include/asm/serial.h
- *
- *  Copyright (C) 1996 Russell King.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- *  Changelog:
- *   15-10-1996	RMK	Created
- */
-
-#ifndef __ASM_SERIAL_H
-#define __ASM_SERIAL_H
-
-#define BASE_BAUD	(1843200 / 16)
-
-#endif
diff --git a/arch/arm/include/asm/shmbuf.h b/arch/arm/include/asm/shmbuf.h
deleted file mode 100644
index 2e5c67b..0000000
--- a/arch/arm/include/asm/shmbuf.h
+++ /dev/null
@@ -1,42 +0,0 @@
-#ifndef _ASMARM_SHMBUF_H
-#define _ASMARM_SHMBUF_H
-
-/* 
- * The shmid64_ds structure for arm architecture.
- * Note extra padding because this structure is passed back and forth
- * between kernel and user space.
- *
- * Pad space is left for:
- * - 64-bit time_t to solve y2038 problem
- * - 2 miscellaneous 32-bit values
- */
-
-struct shmid64_ds {
-	struct ipc64_perm	shm_perm;	/* operation perms */
-	size_t			shm_segsz;	/* size of segment (bytes) */
-	__kernel_time_t		shm_atime;	/* last attach time */
-	unsigned long		__unused1;
-	__kernel_time_t		shm_dtime;	/* last detach time */
-	unsigned long		__unused2;
-	__kernel_time_t		shm_ctime;	/* last change time */
-	unsigned long		__unused3;
-	__kernel_pid_t		shm_cpid;	/* pid of creator */
-	__kernel_pid_t		shm_lpid;	/* pid of last operator */
-	unsigned long		shm_nattch;	/* no. of current attaches */
-	unsigned long		__unused4;
-	unsigned long		__unused5;
-};
-
-struct shminfo64 {
-	unsigned long	shmmax;
-	unsigned long	shmmin;
-	unsigned long	shmmni;
-	unsigned long	shmseg;
-	unsigned long	shmall;
-	unsigned long	__unused1;
-	unsigned long	__unused2;
-	unsigned long	__unused3;
-	unsigned long	__unused4;
-};
-
-#endif /* _ASMARM_SHMBUF_H */
diff --git a/arch/arm/include/asm/socket.h b/arch/arm/include/asm/socket.h
deleted file mode 100644
index 6433cad..0000000
--- a/arch/arm/include/asm/socket.h
+++ /dev/null
@@ -1,72 +0,0 @@
-#ifndef _ASMARM_SOCKET_H
-#define _ASMARM_SOCKET_H
-
-#include <asm/sockios.h>
-
-/* For setsockopt(2) */
-#define SOL_SOCKET	1
-
-#define SO_DEBUG	1
-#define SO_REUSEADDR	2
-#define SO_TYPE		3
-#define SO_ERROR	4
-#define SO_DONTROUTE	5
-#define SO_BROADCAST	6
-#define SO_SNDBUF	7
-#define SO_RCVBUF	8
-#define SO_SNDBUFFORCE	32
-#define SO_RCVBUFFORCE	33
-#define SO_KEEPALIVE	9
-#define SO_OOBINLINE	10
-#define SO_NO_CHECK	11
-#define SO_PRIORITY	12
-#define SO_LINGER	13
-#define SO_BSDCOMPAT	14
-/* To add :#define SO_REUSEPORT 15 */
-#define SO_PASSCRED	16
-#define SO_PEERCRED	17
-#define SO_RCVLOWAT	18
-#define SO_SNDLOWAT	19
-#define SO_RCVTIMEO	20
-#define SO_SNDTIMEO	21
-
-/* Security levels - as per NRL IPv6 - don't actually do anything */
-#define SO_SECURITY_AUTHENTICATION		22
-#define SO_SECURITY_ENCRYPTION_TRANSPORT	23
-#define SO_SECURITY_ENCRYPTION_NETWORK		24
-
-#define SO_BINDTODEVICE 25
-
-/* Socket filtering */
-#define SO_ATTACH_FILTER        26
-#define SO_DETACH_FILTER        27
-
-#define SO_PEERNAME             28
-#define SO_TIMESTAMP		29
-#define SCM_TIMESTAMP		SO_TIMESTAMP
-
-#define SO_ACCEPTCONN		30
-
-#define SO_PEERSEC		31
-#define SO_PASSSEC		34
-#define SO_TIMESTAMPNS		35
-#define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
-
-#define SO_MARK			36
-
-#define SO_TIMESTAMPING		37
-#define SCM_TIMESTAMPING	SO_TIMESTAMPING
-
-#define SO_PROTOCOL		38
-#define SO_DOMAIN		39
-
-#define SO_RXQ_OVFL             40
-
-#define SO_WIFI_STATUS		41
-#define SCM_WIFI_STATUS		SO_WIFI_STATUS
-#define SO_PEEK_OFF		42
-
-/* Instruct lower device to use last 4-bytes of skb data as FCS */
-#define SO_NOFCS		43
-
-#endif /* _ASM_SOCKET_H */
diff --git a/arch/arm/include/asm/sockios.h b/arch/arm/include/asm/sockios.h
deleted file mode 100644
index a2588a2..0000000
--- a/arch/arm/include/asm/sockios.h
+++ /dev/null
@@ -1,13 +0,0 @@
-#ifndef __ARCH_ARM_SOCKIOS_H
-#define __ARCH_ARM_SOCKIOS_H
-
-/* Socket-level I/O control calls. */
-#define FIOSETOWN 	0x8901
-#define SIOCSPGRP	0x8902
-#define FIOGETOWN	0x8903
-#define SIOCGPGRP	0x8904
-#define SIOCATMARK	0x8905
-#define SIOCGSTAMP	0x8906		/* Get stamp (timeval) */
-#define SIOCGSTAMPNS	0x8907		/* Get stamp (timespec) */
-
-#endif
diff --git a/arch/arm/include/asm/termbits.h b/arch/arm/include/asm/termbits.h
deleted file mode 100644
index 704135d..0000000
--- a/arch/arm/include/asm/termbits.h
+++ /dev/null
@@ -1,198 +0,0 @@
-#ifndef __ASM_ARM_TERMBITS_H
-#define __ASM_ARM_TERMBITS_H
-
-typedef unsigned char	cc_t;
-typedef unsigned int	speed_t;
-typedef unsigned int	tcflag_t;
-
-#define NCCS 19
-struct termios {
-	tcflag_t c_iflag;		/* input mode flags */
-	tcflag_t c_oflag;		/* output mode flags */
-	tcflag_t c_cflag;		/* control mode flags */
-	tcflag_t c_lflag;		/* local mode flags */
-	cc_t c_line;			/* line discipline */
-	cc_t c_cc[NCCS];		/* control characters */
-};
-
-struct termios2 {
-	tcflag_t c_iflag;		/* input mode flags */
-	tcflag_t c_oflag;		/* output mode flags */
-	tcflag_t c_cflag;		/* control mode flags */
-	tcflag_t c_lflag;		/* local mode flags */
-	cc_t c_line;			/* line discipline */
-	cc_t c_cc[NCCS];		/* control characters */
-	speed_t c_ispeed;		/* input speed */
-	speed_t c_ospeed;		/* output speed */
-};
-
-struct ktermios {
-	tcflag_t c_iflag;		/* input mode flags */
-	tcflag_t c_oflag;		/* output mode flags */
-	tcflag_t c_cflag;		/* control mode flags */
-	tcflag_t c_lflag;		/* local mode flags */
-	cc_t c_line;			/* line discipline */
-	cc_t c_cc[NCCS];		/* control characters */
-	speed_t c_ispeed;		/* input speed */
-	speed_t c_ospeed;		/* output speed */
-};
-
-
-/* c_cc characters */
-#define VINTR 0
-#define VQUIT 1
-#define VERASE 2
-#define VKILL 3
-#define VEOF 4
-#define VTIME 5
-#define VMIN 6
-#define VSWTC 7
-#define VSTART 8
-#define VSTOP 9
-#define VSUSP 10
-#define VEOL 11
-#define VREPRINT 12
-#define VDISCARD 13
-#define VWERASE 14
-#define VLNEXT 15
-#define VEOL2 16
-
-/* c_iflag bits */
-#define IGNBRK	0000001
-#define BRKINT	0000002
-#define IGNPAR	0000004
-#define PARMRK	0000010
-#define INPCK	0000020
-#define ISTRIP	0000040
-#define INLCR	0000100
-#define IGNCR	0000200
-#define ICRNL	0000400
-#define IUCLC	0001000
-#define IXON	0002000
-#define IXANY	0004000
-#define IXOFF	0010000
-#define IMAXBEL	0020000
-#define IUTF8	0040000
-
-/* c_oflag bits */
-#define OPOST	0000001
-#define OLCUC	0000002
-#define ONLCR	0000004
-#define OCRNL	0000010
-#define ONOCR	0000020
-#define ONLRET	0000040
-#define OFILL	0000100
-#define OFDEL	0000200
-#define NLDLY	0000400
-#define   NL0	0000000
-#define   NL1	0000400
-#define CRDLY	0003000
-#define   CR0	0000000
-#define   CR1	0001000
-#define   CR2	0002000
-#define   CR3	0003000
-#define TABDLY	0014000
-#define   TAB0	0000000
-#define   TAB1	0004000
-#define   TAB2	0010000
-#define   TAB3	0014000
-#define   XTABS	0014000
-#define BSDLY	0020000
-#define   BS0	0000000
-#define   BS1	0020000
-#define VTDLY	0040000
-#define   VT0	0000000
-#define   VT1	0040000
-#define FFDLY	0100000
-#define   FF0	0000000
-#define   FF1	0100000
-
-/* c_cflag bit meaning */
-#define CBAUD	0010017
-#define  B0	0000000		/* hang up */
-#define  B50	0000001
-#define  B75	0000002
-#define  B110	0000003
-#define  B134	0000004
-#define  B150	0000005
-#define  B200	0000006
-#define  B300	0000007
-#define  B600	0000010
-#define  B1200	0000011
-#define  B1800	0000012
-#define  B2400	0000013
-#define  B4800	0000014
-#define  B9600	0000015
-#define  B19200	0000016
-#define  B38400	0000017
-#define EXTA B19200
-#define EXTB B38400
-#define CSIZE	0000060
-#define   CS5	0000000
-#define   CS6	0000020
-#define   CS7	0000040
-#define   CS8	0000060
-#define CSTOPB	0000100
-#define CREAD	0000200
-#define PARENB	0000400
-#define PARODD	0001000
-#define HUPCL	0002000
-#define CLOCAL	0004000
-#define CBAUDEX 0010000
-#define    BOTHER 0010000
-#define    B57600 0010001
-#define   B115200 0010002
-#define   B230400 0010003
-#define   B460800 0010004
-#define   B500000 0010005
-#define   B576000 0010006
-#define   B921600 0010007
-#define  B1000000 0010010
-#define  B1152000 0010011
-#define  B1500000 0010012
-#define  B2000000 0010013
-#define  B2500000 0010014
-#define  B3000000 0010015
-#define  B3500000 0010016
-#define  B4000000 0010017
-#define CIBAUD	  002003600000		/* input baud rate */
-#define CMSPAR    010000000000		/* mark or space (stick) parity */
-#define CRTSCTS	  020000000000		/* flow control */
-
-#define IBSHIFT	   16
-
-/* c_lflag bits */
-#define ISIG	0000001
-#define ICANON	0000002
-#define XCASE	0000004
-#define ECHO	0000010
-#define ECHOE	0000020
-#define ECHOK	0000040
-#define ECHONL	0000100
-#define NOFLSH	0000200
-#define TOSTOP	0000400
-#define ECHOCTL	0001000
-#define ECHOPRT	0002000
-#define ECHOKE	0004000
-#define FLUSHO	0010000
-#define PENDIN	0040000
-#define IEXTEN	0100000
-#define EXTPROC	0200000
-
-/* tcflow() and TCXONC use these */
-#define	TCOOFF		0
-#define	TCOON		1
-#define	TCIOFF		2
-#define	TCION		3
-
-/* tcflush() and TCFLSH use these */
-#define	TCIFLUSH	0
-#define	TCOFLUSH	1
-#define	TCIOFLUSH	2
-
-/* tcsetattr uses these */
-#define	TCSANOW		0
-#define	TCSADRAIN	1
-#define	TCSAFLUSH	2
-
-#endif	/* __ASM_ARM_TERMBITS_H */
diff --git a/arch/arm/include/asm/timex.h b/arch/arm/include/asm/timex.h
index ce11944..5e71172 100644
--- a/arch/arm/include/asm/timex.h
+++ b/arch/arm/include/asm/timex.h
@@ -15,12 +15,10 @@
 #include <asm/arch_timer.h>
 #include <mach/timex.h>
 
-typedef unsigned long cycles_t;
-
 #ifdef ARCH_HAS_READ_CURRENT_TIMER
 #define get_cycles()	({ cycles_t c; read_current_timer(&c) ? 0 : c; })
-#else
-#define get_cycles()	(0)
 #endif
 
+#include <asm-generic/timex.h>
+
 #endif
diff --git a/arch/arm/include/asm/types.h b/arch/arm/include/asm/types.h
deleted file mode 100644
index 28beab9..0000000
--- a/arch/arm/include/asm/types.h
+++ /dev/null
@@ -1,16 +0,0 @@
-#ifndef __ASM_ARM_TYPES_H
-#define __ASM_ARM_TYPES_H
-
-#include <asm-generic/int-ll64.h>
-
-/*
- * These aren't exported outside the kernel to avoid name space clashes
- */
-#ifdef __KERNEL__
-
-#define BITS_PER_LONG 32
-
-#endif /* __KERNEL__ */
-
-#endif
-
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 2/5] ARM: add strstr declaration for decompressors
  2012-08-05  1:23 [PATCH v2 0/5] Use more asm-generic headers Rob Herring
  2012-08-05  1:23 ` [PATCH v2 1/5] ARM: use generic version of identical asm headers Rob Herring
@ 2012-08-05  1:23 ` Rob Herring
  2012-08-05  1:23 ` [PATCH v2 3/5] ARM: use generic unaligned.h Rob Herring
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Rob Herring @ 2012-08-05  1:23 UTC (permalink / raw)
  To: linux-arm-kernel

From: Rob Herring <rob.herring@calxeda.com>

With the generic unaligned.h, more kernel headers get pulled in including
dynamic_debug.h which needs strstr. As it is not really used, we only need
a declaration here.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
---
 arch/arm/boot/compressed/decompress.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm/boot/compressed/decompress.c b/arch/arm/boot/compressed/decompress.c
index f41b38c..9deb56a 100644
--- a/arch/arm/boot/compressed/decompress.c
+++ b/arch/arm/boot/compressed/decompress.c
@@ -32,6 +32,9 @@ extern void error(char *);
 #  define Tracecv(c,x)
 #endif
 
+/* Not needed, but used in some headers pulled in by decompressors */
+extern char * strstr(const char * s1, const char *s2);
+
 #ifdef CONFIG_KERNEL_GZIP
 #include "../../../../lib/decompress_inflate.c"
 #endif
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 3/5] ARM: use generic unaligned.h
  2012-08-05  1:23 [PATCH v2 0/5] Use more asm-generic headers Rob Herring
  2012-08-05  1:23 ` [PATCH v2 1/5] ARM: use generic version of identical asm headers Rob Herring
  2012-08-05  1:23 ` [PATCH v2 2/5] ARM: add strstr declaration for decompressors Rob Herring
@ 2012-08-05  1:23 ` Rob Herring
  2012-10-08 16:43   ` Shawn Guo
  2012-08-05  1:23 ` [PATCH v2 4/5] ARM: use generic termios.h Rob Herring
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 46+ messages in thread
From: Rob Herring @ 2012-08-05  1:23 UTC (permalink / raw)
  To: linux-arm-kernel

From: Rob Herring <rob.herring@calxeda.com>

This moves ARM over to the asm-generic/unaligned.h header. This has the
benefit of better code generated especially for ARMv7 on gcc 4.7+
compilers.

As Arnd Bergmann, points out: The asm-generic version uses the "struct"
version for native-endian unaligned access and the "byteshift" version
for the opposite endianess. The current ARM version however uses the
"byteshift" implementation for both.

Thanks to Nicolas Pitre for the excellent analysis:

Test case:

int foo (int *x) { return get_unaligned(x); }
long long bar (long long *x) { return get_unaligned(x); }

With the current ARM version:

foo:
	ldrb	r3, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	r3, r3, asl #16	@ tmp154, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	r3, r3, r1, asl #8	@, tmp155, tmp154, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r3, r2	@ tmp157, tmp155, MEM[(const u8 *)x_1(D)]
	orr	r0, r3, r0, asl #24	@,, tmp157, MEM[(const u8 *)x_1(D) + 3B],
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	mov	r2, #0	@ tmp184,
	ldrb	r5, [r0, #6]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 6B], MEM[(const u8 *)x_1(D) + 6B]
	ldrb	r4, [r0, #5]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 5B], MEM[(const u8 *)x_1(D) + 5B]
	ldrb	ip, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #4]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 4B], MEM[(const u8 *)x_1(D) + 4B]
	mov	r5, r5, asl #16	@ tmp175, MEM[(const u8 *)x_1(D) + 6B],
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	orr	r5, r5, r4, asl #8	@, tmp176, tmp175, MEM[(const u8 *)x_1(D) + 5B],
	ldrb	r6, [r0, #7]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 7B], MEM[(const u8 *)x_1(D) + 7B]
	orr	r5, r5, r1	@ tmp178, tmp176, MEM[(const u8 *)x_1(D) + 4B]
	ldrb	r4, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	ip, ip, asl #16	@ tmp188, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r1, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	ip, ip, r7, asl #8	@, tmp189, tmp188, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r5, r6, asl #24	@,, tmp178, MEM[(const u8 *)x_1(D) + 7B],
	orr	ip, ip, r4	@ tmp191, tmp189, MEM[(const u8 *)x_1(D)]
	orr	ip, ip, r1, asl #24	@, tmp194, tmp191, MEM[(const u8 *)x_1(D) + 3B],
	mov	r1, r3	@,
	orr	r0, r2, ip	@ tmp171, tmp184, tmp194
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

In both cases the code is slightly suboptimal.  One may wonder why
wasting r2 with the constant 0 in the second case for example.  And all
the mov's could be folded in subsequent orr's, etc.

Now with the asm-generic version:

foo:
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	bx	lr	@

bar:
	mov	r3, r0	@ x, x
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	ldr	r1, [r3, #4]	@ unaligned	@,
	bx	lr	@

This is way better of course, but only because this was compiled for
ARMv7. In this case the compiler knows that the hardware can do
unaligned word access.  This isn't that obvious for foo(), but if we
remove the get_unaligned() from bar as follows:

long long bar (long long *x) {return *x; }

then the resulting code is:

bar:
	ldmia	r0, {r0, r1}	@ x,,
	bx	lr	@

So this proves that the presumed aligned vs unaligned cases does have
influence on the instructions the compiler may use and that the above
unaligned code results are not just an accident.

Still... this isn't fully conclusive without at least looking at the
resulting assembly fron a pre ARMv6 compilation.  Let's see with an
ARMv5 target:

foo:
	ldrb	r3, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r2, [r0, #2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ tmp146,
	orr	r3, r3, r1, asl #8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r2, asl #16	@, tmp145, tmp142, tmp143,
	orr	r0, r3, r0, asl #24	@,, tmp145, tmp146,
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r3, [r0, #4]	@ zero_extendqisi2	@ tmp149,
	ldrb	r6, [r0, #5]	@ zero_extendqisi2	@ tmp150,
	ldrb	r5, [r0, #2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r4, [r0, #6]	@ zero_extendqisi2	@ tmp153,
	ldrb	r1, [r0, #7]	@ zero_extendqisi2	@ tmp156,
	ldrb	ip, [r0, #3]	@ zero_extendqisi2	@ tmp146,
	orr	r2, r2, r7, asl #8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r6, asl #8	@, tmp152, tmp149, tmp150,
	orr	r2, r2, r5, asl #16	@, tmp145, tmp142, tmp143,
	orr	r3, r3, r4, asl #16	@, tmp155, tmp152, tmp153,
	orr	r0, r2, ip, asl #24	@,, tmp145, tmp146,
	orr	r1, r3, r1, asl #24	@,, tmp155, tmp156,
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

Compared to the initial results, this is really nicely optimized and I
couldn't do much better if I were to hand code it myself.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/include/asm/Kbuild      |    1 +
 arch/arm/include/asm/unaligned.h |   19 -------------------
 2 files changed, 1 insertion(+), 19 deletions(-)
 delete mode 100644 arch/arm/include/asm/unaligned.h

diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 6cc8a13..a86cabf 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -33,3 +33,4 @@ generic-y += sockios.h
 generic-y += termbits.h
 generic-y += timex.h
 generic-y += types.h
+generic-y += unaligned.h
diff --git a/arch/arm/include/asm/unaligned.h b/arch/arm/include/asm/unaligned.h
deleted file mode 100644
index 44593a8..0000000
--- a/arch/arm/include/asm/unaligned.h
+++ /dev/null
@@ -1,19 +0,0 @@
-#ifndef _ASM_ARM_UNALIGNED_H
-#define _ASM_ARM_UNALIGNED_H
-
-#include <linux/unaligned/le_byteshift.h>
-#include <linux/unaligned/be_byteshift.h>
-#include <linux/unaligned/generic.h>
-
-/*
- * Select endianness
- */
-#ifndef __ARMEB__
-#define get_unaligned	__get_unaligned_le
-#define put_unaligned	__put_unaligned_le
-#else
-#define get_unaligned	__get_unaligned_be
-#define put_unaligned	__put_unaligned_be
-#endif
-
-#endif /* _ASM_ARM_UNALIGNED_H */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 4/5] ARM: use generic termios.h
  2012-08-05  1:23 [PATCH v2 0/5] Use more asm-generic headers Rob Herring
                   ` (2 preceding siblings ...)
  2012-08-05  1:23 ` [PATCH v2 3/5] ARM: use generic unaligned.h Rob Herring
@ 2012-08-05  1:23 ` Rob Herring
  2012-08-05  1:24 ` [PATCH v2 5/5] ARM: convert asm/irqflags.h to use asm-generic/irqflags.h Rob Herring
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Rob Herring @ 2012-08-05  1:23 UTC (permalink / raw)
  To: linux-arm-kernel

From: Rob Herring <rob.herring@calxeda.com>

As pointed out by Arnd Bergmann, this fixes a couple of issues but will
increase code size:

The original macro user_termio_to_kernel_termios was not endian safe. It
used an unsigned short ptr to access the low bits in a 32-bit word.

Both user_termio_to_kernel_termios and kernel_termios_to_user_termio are
missing error checking on put_user/get_user and copy_to/from_user.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/include/asm/Kbuild    |    1 +
 arch/arm/include/asm/termios.h |   92 ----------------------------------------
 2 files changed, 1 insertion(+), 92 deletions(-)
 delete mode 100644 arch/arm/include/asm/termios.h

diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index a86cabf..8a7196c 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -31,6 +31,7 @@ generic-y += sizes.h
 generic-y += socket.h
 generic-y += sockios.h
 generic-y += termbits.h
+generic-y += termios.h
 generic-y += timex.h
 generic-y += types.h
 generic-y += unaligned.h
diff --git a/arch/arm/include/asm/termios.h b/arch/arm/include/asm/termios.h
deleted file mode 100644
index 293e3f1..0000000
--- a/arch/arm/include/asm/termios.h
+++ /dev/null
@@ -1,92 +0,0 @@
-#ifndef __ASM_ARM_TERMIOS_H
-#define __ASM_ARM_TERMIOS_H
-
-#include <asm/termbits.h>
-#include <asm/ioctls.h>
-
-struct winsize {
-	unsigned short ws_row;
-	unsigned short ws_col;
-	unsigned short ws_xpixel;
-	unsigned short ws_ypixel;
-};
-
-#define NCC 8
-struct termio {
-	unsigned short c_iflag;		/* input mode flags */
-	unsigned short c_oflag;		/* output mode flags */
-	unsigned short c_cflag;		/* control mode flags */
-	unsigned short c_lflag;		/* local mode flags */
-	unsigned char c_line;		/* line discipline */
-	unsigned char c_cc[NCC];	/* control characters */
-};
-
-#ifdef __KERNEL__
-/*	intr=^C		quit=^|		erase=del	kill=^U
-	eof=^D		vtime=\0	vmin=\1		sxtc=\0
-	start=^Q	stop=^S		susp=^Z		eol=\0
-	reprint=^R	discard=^U	werase=^W	lnext=^V
-	eol2=\0
-*/
-#define INIT_C_CC "\003\034\177\025\004\0\1\0\021\023\032\0\022\017\027\026\0"
-#endif
-
-/* modem lines */
-#define TIOCM_LE	0x001
-#define TIOCM_DTR	0x002
-#define TIOCM_RTS	0x004
-#define TIOCM_ST	0x008
-#define TIOCM_SR	0x010
-#define TIOCM_CTS	0x020
-#define TIOCM_CAR	0x040
-#define TIOCM_RNG	0x080
-#define TIOCM_DSR	0x100
-#define TIOCM_CD	TIOCM_CAR
-#define TIOCM_RI	TIOCM_RNG
-#define TIOCM_OUT1	0x2000
-#define TIOCM_OUT2	0x4000
-#define TIOCM_LOOP	0x8000
-
-/* ioctl (fd, TIOCSERGETLSR, &result) where result may be as below */
-
-#ifdef __KERNEL__
-
-/*
- * Translate a "termio" structure into a "termios". Ugh.
- */
-#define SET_LOW_TERMIOS_BITS(termios, termio, x) {		\
-	unsigned short __tmp;					\
-	get_user(__tmp,&(termio)->x);				\
-	*(unsigned short *) &(termios)->x = __tmp;		\
-}
-
-#define user_termio_to_kernel_termios(termios, termio) \
-({ \
-	SET_LOW_TERMIOS_BITS(termios, termio, c_iflag); \
-	SET_LOW_TERMIOS_BITS(termios, termio, c_oflag); \
-	SET_LOW_TERMIOS_BITS(termios, termio, c_cflag); \
-	SET_LOW_TERMIOS_BITS(termios, termio, c_lflag); \
-	copy_from_user((termios)->c_cc, (termio)->c_cc, NCC); \
-})
-
-/*
- * Translate a "termios" structure into a "termio". Ugh.
- */
-#define kernel_termios_to_user_termio(termio, termios) \
-({ \
-	put_user((termios)->c_iflag, &(termio)->c_iflag); \
-	put_user((termios)->c_oflag, &(termio)->c_oflag); \
-	put_user((termios)->c_cflag, &(termio)->c_cflag); \
-	put_user((termios)->c_lflag, &(termio)->c_lflag); \
-	put_user((termios)->c_line,  &(termio)->c_line); \
-	copy_to_user((termio)->c_cc, (termios)->c_cc, NCC); \
-})
-
-#define user_termios_to_kernel_termios(k, u) copy_from_user(k, u, sizeof(struct termios2))
-#define kernel_termios_to_user_termios(u, k) copy_to_user(u, k, sizeof(struct termios2))
-#define user_termios_to_kernel_termios_1(k, u) copy_from_user(k, u, sizeof(struct termios))
-#define kernel_termios_to_user_termios_1(u, k) copy_to_user(u, k, sizeof(struct termios))
-
-#endif	/* __KERNEL__ */
-
-#endif	/* __ASM_ARM_TERMIOS_H */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 5/5] ARM: convert asm/irqflags.h to use asm-generic/irqflags.h
  2012-08-05  1:23 [PATCH v2 0/5] Use more asm-generic headers Rob Herring
                   ` (3 preceding siblings ...)
  2012-08-05  1:23 ` [PATCH v2 4/5] ARM: use generic termios.h Rob Herring
@ 2012-08-05  1:24 ` Rob Herring
  2012-08-05  9:34 ` [PATCH v2 0/5] Use more asm-generic headers Thomas Petazzoni
  2012-08-06 14:35 ` Arnd Bergmann
  6 siblings, 0 replies; 46+ messages in thread
From: Rob Herring @ 2012-08-05  1:24 UTC (permalink / raw)
  To: linux-arm-kernel

From: Rob Herring <rob.herring@calxeda.com>

By implmenting arch_local_irq_restore with constant flags, we can use
asm-generic/irqflags.h and remove some of the ARM version of irqflags.h.

Verified the code is equivalent by checking the disassembled code.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
---
 arch/arm/include/asm/irqflags.h |  131 ++++++++++++++-------------------------
 1 file changed, 45 insertions(+), 86 deletions(-)

diff --git a/arch/arm/include/asm/irqflags.h b/arch/arm/include/asm/irqflags.h
index 1e6cca5..67bfb53 100644
--- a/arch/arm/include/asm/irqflags.h
+++ b/arch/arm/include/asm/irqflags.h
@@ -8,88 +8,68 @@
 /*
  * CPU interrupt mask handling.
  */
-#if __LINUX_ARM_ARCH__ >= 6
+#define ARCH_IRQ_DISABLED	PSR_I_BIT
+#define ARCH_IRQ_ENABLED	0
 
-static inline unsigned long arch_local_irq_save(void)
+static inline unsigned long arch_local_save_flags(void)
 {
 	unsigned long flags;
 
 	asm volatile(
-		"	mrs	%0, cpsr	@ arch_local_irq_save\n"
-		"	cpsid	i"
+		"	mrs	%0, cpsr	@ arch_local_save_flags\n"
 		: "=r" (flags) : : "memory", "cc");
 	return flags;
 }
 
-static inline void arch_local_irq_enable(void)
+static inline void constant_arch_local_irq_restore(unsigned long flags)
 {
-	asm volatile(
-		"	cpsie i			@ arch_local_irq_enable"
-		:
-		:
-		: "memory", "cc");
+#if __LINUX_ARM_ARCH__ >= 6
+	if (flags == ARCH_IRQ_ENABLED)
+		asm volatile(
+			"	cpsie i			@ constant_arch_local_irq_restore"
+			: : : "memory", "cc");
+	else if (flags == ARCH_IRQ_DISABLED)
+		asm volatile(
+			"	cpsid i			@ constant_arch_local_irq_restore"
+			: : : "memory", "cc");
+#else
+	unsigned long temp;
+	if (flags == ARCH_IRQ_ENABLED)
+		asm volatile(
+			"	mrs	%0, cpsr	@ constant_arch_local_irq_restore\n"
+			"	bic	%0, %0, #128\n"
+			"	msr	cpsr_c, %0"
+			: "=r" (temp)
+			:
+			: "memory", "cc");
+	else if (flags == ARCH_IRQ_DISABLED)
+		asm volatile(
+			"	mrs	%0, cpsr	@ constant_arch_local_irq_restore\n"
+			"	orr	%0, %0, #128\n"
+			"	msr	cpsr_c, %0"
+			: "=r" (temp)
+			:
+			: "memory", "cc");
+#endif
 }
 
-static inline void arch_local_irq_disable(void)
+static inline void _arch_local_irq_restore(unsigned long flags)
 {
-	asm volatile(
-		"	cpsid i			@ arch_local_irq_disable"
-		:
+		asm volatile(
+		"	msr	cpsr_c, %0	@ _arch_local_irq_restore"
 		:
+		: "r" (flags)
 		: "memory", "cc");
 }
 
+#define arch_local_irq_restore(flags) \
+	(__builtin_constant_p(flags) ? constant_arch_local_irq_restore(flags) : \
+	_arch_local_irq_restore(flags))
+
+#if __LINUX_ARM_ARCH__ >= 6
 #define local_fiq_enable()  __asm__("cpsie f	@ __stf" : : : "memory", "cc")
 #define local_fiq_disable() __asm__("cpsid f	@ __clf" : : : "memory", "cc")
 #else
-
-/*
- * Save the current interrupt enable state & disable IRQs
- */
-static inline unsigned long arch_local_irq_save(void)
-{
-	unsigned long flags, temp;
-
-	asm volatile(
-		"	mrs	%0, cpsr	@ arch_local_irq_save\n"
-		"	orr	%1, %0, #128\n"
-		"	msr	cpsr_c, %1"
-		: "=r" (flags), "=r" (temp)
-		:
-		: "memory", "cc");
-	return flags;
-}
-
-/*
- * Enable IRQs
- */
-static inline void arch_local_irq_enable(void)
-{
-	unsigned long temp;
-	asm volatile(
-		"	mrs	%0, cpsr	@ arch_local_irq_enable\n"
-		"	bic	%0, %0, #128\n"
-		"	msr	cpsr_c, %0"
-		: "=r" (temp)
-		:
-		: "memory", "cc");
-}
-
-/*
- * Disable IRQs
- */
-static inline void arch_local_irq_disable(void)
-{
-	unsigned long temp;
-	asm volatile(
-		"	mrs	%0, cpsr	@ arch_local_irq_disable\n"
-		"	orr	%0, %0, #128\n"
-		"	msr	cpsr_c, %0"
-		: "=r" (temp)
-		:
-		: "memory", "cc");
-}
-
 /*
  * Enable FIQs
  */
@@ -122,34 +102,13 @@ static inline void arch_local_irq_disable(void)
 
 #endif
 
-/*
- * Save the current interrupt enable state.
- */
-static inline unsigned long arch_local_save_flags(void)
-{
-	unsigned long flags;
-	asm volatile(
-		"	mrs	%0, cpsr	@ local_save_flags"
-		: "=r" (flags) : : "memory", "cc");
-	return flags;
-}
-
-/*
- * restore saved IRQ & FIQ state
- */
-static inline void arch_local_irq_restore(unsigned long flags)
-{
-	asm volatile(
-		"	msr	cpsr_c, %0	@ local_irq_restore"
-		:
-		: "r" (flags)
-		: "memory", "cc");
-}
-
 static inline int arch_irqs_disabled_flags(unsigned long flags)
 {
 	return flags & PSR_I_BIT;
 }
+#define arch_irqs_disabled_flags arch_irqs_disabled_flags
+
+#include <asm-generic/irqflags.h>
 
 #endif
 #endif
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 0/5] Use more asm-generic headers
  2012-08-05  1:23 [PATCH v2 0/5] Use more asm-generic headers Rob Herring
                   ` (4 preceding siblings ...)
  2012-08-05  1:24 ` [PATCH v2 5/5] ARM: convert asm/irqflags.h to use asm-generic/irqflags.h Rob Herring
@ 2012-08-05  9:34 ` Thomas Petazzoni
  2012-08-06 14:35 ` Arnd Bergmann
  6 siblings, 0 replies; 46+ messages in thread
From: Thomas Petazzoni @ 2012-08-05  9:34 UTC (permalink / raw)
  To: linux-arm-kernel

Le Sat,  4 Aug 2012 20:23:55 -0500,
Rob Herring <robherring2@gmail.com> a ?crit :

> From: Rob Herring <rob.herring@calxeda.com>
> 
> Inspired by the AArgh64 claim that it should be separate from ARM and
> one reason was being able to use more asm-generic headers. Doing a
> diff of arch/arm/include/asm and include/asm-generic there are
> numerous asm headers which are functionally identical to their
> asm-generic counterparts and others which can use more of the generic
> versions.
> 
> V2 separates unaligned.h and termios.h changes to separate patches.
> Also, it adds a fix for build failures on non-gzip decompressor
> compile.
> 
> Rob
> 
> Rob Herring (5):
>   ARM: use generic version of identical asm headers
>   ARM: add strstr declaration for decompressors
>   ARM: use generic unaligned.h
>   ARM: use generic termios.h
>   ARM: convert asm/irqflags.h to use asm-generic/irqflags.h

Built and boot tested on Armada XP evaluation platform.

Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
-- 
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 0/5] Use more asm-generic headers
  2012-08-05  1:23 [PATCH v2 0/5] Use more asm-generic headers Rob Herring
                   ` (5 preceding siblings ...)
  2012-08-05  9:34 ` [PATCH v2 0/5] Use more asm-generic headers Thomas Petazzoni
@ 2012-08-06 14:35 ` Arnd Bergmann
  6 siblings, 0 replies; 46+ messages in thread
From: Arnd Bergmann @ 2012-08-06 14:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Sunday 05 August 2012, Rob Herring wrote:
> 
> From: Rob Herring <rob.herring@calxeda.com>
> 
> Inspired by the AArgh64 claim that it should be separate from ARM and one
> reason was being able to use more asm-generic headers. Doing a diff of
> arch/arm/include/asm and include/asm-generic there are numerous asm
> headers which are functionally identical to their asm-generic counterparts
> and others which can use more of the generic versions.
> 
> V2 separates unaligned.h and termios.h changes to separate patches. Also,
> it adds a fix for build failures on non-gzip decompressor compile.
> 

Reviewed-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 3/5] ARM: use generic unaligned.h
  2012-08-05  1:23 ` [PATCH v2 3/5] ARM: use generic unaligned.h Rob Herring
@ 2012-10-08 16:43   ` Shawn Guo
  2012-10-08 20:34     ` Rob Herring
  0 siblings, 1 reply; 46+ messages in thread
From: Shawn Guo @ 2012-10-08 16:43 UTC (permalink / raw)
  To: linux-arm-kernel

This patch has been merged into mainline as commit below.

 d25c881 ARM: 7493/1: use generic unaligned.h

It introduces a regression for me.  Check out the commit on mainline,
build a v7 only kernel (imx5/6) with imx_v6_v7_defconfig, the kernel
halts in decompressor.  But v6/v7 kernel (imx3/5/6) works fine.  The
kernel built on the parent commit below works all fine.

 ef1c209 ARM: 7492/1: add strstr declaration for decompressors

Shawn

On Sat, Aug 04, 2012 at 08:23:58PM -0500, Rob Herring wrote:
> From: Rob Herring <rob.herring@calxeda.com>
> 
> This moves ARM over to the asm-generic/unaligned.h header. This has the
> benefit of better code generated especially for ARMv7 on gcc 4.7+
> compilers.
> 
> As Arnd Bergmann, points out: The asm-generic version uses the "struct"
> version for native-endian unaligned access and the "byteshift" version
> for the opposite endianess. The current ARM version however uses the
> "byteshift" implementation for both.
> 
> Thanks to Nicolas Pitre for the excellent analysis:
> 
> Test case:
> 
> int foo (int *x) { return get_unaligned(x); }
> long long bar (long long *x) { return get_unaligned(x); }
> 
> With the current ARM version:
> 
> foo:
> 	ldrb	r3, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
> 	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
> 	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
> 	mov	r3, r3, asl #16	@ tmp154, MEM[(const u8 *)x_1(D) + 2B],
> 	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
> 	orr	r3, r3, r1, asl #8	@, tmp155, tmp154, MEM[(const u8 *)x_1(D) + 1B],
> 	orr	r3, r3, r2	@ tmp157, tmp155, MEM[(const u8 *)x_1(D)]
> 	orr	r0, r3, r0, asl #24	@,, tmp157, MEM[(const u8 *)x_1(D) + 3B],
> 	bx	lr	@
> 
> bar:
> 	stmfd	sp!, {r4, r5, r6, r7}	@,
> 	mov	r2, #0	@ tmp184,
> 	ldrb	r5, [r0, #6]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 6B], MEM[(const u8 *)x_1(D) + 6B]
> 	ldrb	r4, [r0, #5]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 5B], MEM[(const u8 *)x_1(D) + 5B]
> 	ldrb	ip, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
> 	ldrb	r1, [r0, #4]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 4B], MEM[(const u8 *)x_1(D) + 4B]
> 	mov	r5, r5, asl #16	@ tmp175, MEM[(const u8 *)x_1(D) + 6B],
> 	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
> 	orr	r5, r5, r4, asl #8	@, tmp176, tmp175, MEM[(const u8 *)x_1(D) + 5B],
> 	ldrb	r6, [r0, #7]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 7B], MEM[(const u8 *)x_1(D) + 7B]
> 	orr	r5, r5, r1	@ tmp178, tmp176, MEM[(const u8 *)x_1(D) + 4B]
> 	ldrb	r4, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
> 	mov	ip, ip, asl #16	@ tmp188, MEM[(const u8 *)x_1(D) + 2B],
> 	ldrb	r1, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
> 	orr	ip, ip, r7, asl #8	@, tmp189, tmp188, MEM[(const u8 *)x_1(D) + 1B],
> 	orr	r3, r5, r6, asl #24	@,, tmp178, MEM[(const u8 *)x_1(D) + 7B],
> 	orr	ip, ip, r4	@ tmp191, tmp189, MEM[(const u8 *)x_1(D)]
> 	orr	ip, ip, r1, asl #24	@, tmp194, tmp191, MEM[(const u8 *)x_1(D) + 3B],
> 	mov	r1, r3	@,
> 	orr	r0, r2, ip	@ tmp171, tmp184, tmp194
> 	ldmfd	sp!, {r4, r5, r6, r7}
> 	bx	lr
> 
> In both cases the code is slightly suboptimal.  One may wonder why
> wasting r2 with the constant 0 in the second case for example.  And all
> the mov's could be folded in subsequent orr's, etc.
> 
> Now with the asm-generic version:
> 
> foo:
> 	ldr	r0, [r0, #0]	@ unaligned	@,* x
> 	bx	lr	@
> 
> bar:
> 	mov	r3, r0	@ x, x
> 	ldr	r0, [r0, #0]	@ unaligned	@,* x
> 	ldr	r1, [r3, #4]	@ unaligned	@,
> 	bx	lr	@
> 
> This is way better of course, but only because this was compiled for
> ARMv7. In this case the compiler knows that the hardware can do
> unaligned word access.  This isn't that obvious for foo(), but if we
> remove the get_unaligned() from bar as follows:
> 
> long long bar (long long *x) {return *x; }
> 
> then the resulting code is:
> 
> bar:
> 	ldmia	r0, {r0, r1}	@ x,,
> 	bx	lr	@
> 
> So this proves that the presumed aligned vs unaligned cases does have
> influence on the instructions the compiler may use and that the above
> unaligned code results are not just an accident.
> 
> Still... this isn't fully conclusive without at least looking at the
> resulting assembly fron a pre ARMv6 compilation.  Let's see with an
> ARMv5 target:
> 
> foo:
> 	ldrb	r3, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
> 	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ tmp140,
> 	ldrb	r2, [r0, #2]	@ zero_extendqisi2	@ tmp143,
> 	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ tmp146,
> 	orr	r3, r3, r1, asl #8	@, tmp142, tmp139, tmp140,
> 	orr	r3, r3, r2, asl #16	@, tmp145, tmp142, tmp143,
> 	orr	r0, r3, r0, asl #24	@,, tmp145, tmp146,
> 	bx	lr	@
> 
> bar:
> 	stmfd	sp!, {r4, r5, r6, r7}	@,
> 	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
> 	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ tmp140,
> 	ldrb	r3, [r0, #4]	@ zero_extendqisi2	@ tmp149,
> 	ldrb	r6, [r0, #5]	@ zero_extendqisi2	@ tmp150,
> 	ldrb	r5, [r0, #2]	@ zero_extendqisi2	@ tmp143,
> 	ldrb	r4, [r0, #6]	@ zero_extendqisi2	@ tmp153,
> 	ldrb	r1, [r0, #7]	@ zero_extendqisi2	@ tmp156,
> 	ldrb	ip, [r0, #3]	@ zero_extendqisi2	@ tmp146,
> 	orr	r2, r2, r7, asl #8	@, tmp142, tmp139, tmp140,
> 	orr	r3, r3, r6, asl #8	@, tmp152, tmp149, tmp150,
> 	orr	r2, r2, r5, asl #16	@, tmp145, tmp142, tmp143,
> 	orr	r3, r3, r4, asl #16	@, tmp155, tmp152, tmp153,
> 	orr	r0, r2, ip, asl #24	@,, tmp145, tmp146,
> 	orr	r1, r3, r1, asl #24	@,, tmp155, tmp156,
> 	ldmfd	sp!, {r4, r5, r6, r7}
> 	bx	lr
> 
> Compared to the initial results, this is really nicely optimized and I
> couldn't do much better if I were to hand code it myself.
> 
> Signed-off-by: Rob Herring <rob.herring@calxeda.com>
> Reviewed-by: Nicolas Pitre <nico@linaro.org>
> ---
>  arch/arm/include/asm/Kbuild      |    1 +
>  arch/arm/include/asm/unaligned.h |   19 -------------------
>  2 files changed, 1 insertion(+), 19 deletions(-)
>  delete mode 100644 arch/arm/include/asm/unaligned.h
> 
> diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
> index 6cc8a13..a86cabf 100644
> --- a/arch/arm/include/asm/Kbuild
> +++ b/arch/arm/include/asm/Kbuild
> @@ -33,3 +33,4 @@ generic-y += sockios.h
>  generic-y += termbits.h
>  generic-y += timex.h
>  generic-y += types.h
> +generic-y += unaligned.h
> diff --git a/arch/arm/include/asm/unaligned.h b/arch/arm/include/asm/unaligned.h
> deleted file mode 100644
> index 44593a8..0000000
> --- a/arch/arm/include/asm/unaligned.h
> +++ /dev/null
> @@ -1,19 +0,0 @@
> -#ifndef _ASM_ARM_UNALIGNED_H
> -#define _ASM_ARM_UNALIGNED_H
> -
> -#include <linux/unaligned/le_byteshift.h>
> -#include <linux/unaligned/be_byteshift.h>
> -#include <linux/unaligned/generic.h>
> -
> -/*
> - * Select endianness
> - */
> -#ifndef __ARMEB__
> -#define get_unaligned	__get_unaligned_le
> -#define put_unaligned	__put_unaligned_le
> -#else
> -#define get_unaligned	__get_unaligned_be
> -#define put_unaligned	__put_unaligned_be
> -#endif
> -
> -#endif /* _ASM_ARM_UNALIGNED_H */
> -- 
> 1.7.9.5
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 3/5] ARM: use generic unaligned.h
  2012-10-08 16:43   ` Shawn Guo
@ 2012-10-08 20:34     ` Rob Herring
  2012-10-08 23:28       ` Shawn Guo
  0 siblings, 1 reply; 46+ messages in thread
From: Rob Herring @ 2012-10-08 20:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/08/2012 11:43 AM, Shawn Guo wrote:
> This patch has been merged into mainline as commit below.
> 
>  d25c881 ARM: 7493/1: use generic unaligned.h
> 
> It introduces a regression for me.  Check out the commit on mainline,
> build a v7 only kernel (imx5/6) with imx_v6_v7_defconfig, the kernel
> halts in decompressor.  But v6/v7 kernel (imx3/5/6) works fine.  The
> kernel built on the parent commit below works all fine.

It actually fails in the decompressor or that's the last output you get?
I compared the decompressor disassembly of both cases and get the same
number of ldrb/strb instructions, so I don't think it is directly
related to alignment.

I tried the XY decompressor as that is one difference, but that works
fine for me on highbank.

Does it work with an empty uncompress.h functions? That should be the
only difference in our decompressor code.

Rob

>  ef1c209 ARM: 7492/1: add strstr declaration for decompressors
> 
> Shawn
> 
> On Sat, Aug 04, 2012 at 08:23:58PM -0500, Rob Herring wrote:
>> From: Rob Herring <rob.herring@calxeda.com>
>>
>> This moves ARM over to the asm-generic/unaligned.h header. This has the
>> benefit of better code generated especially for ARMv7 on gcc 4.7+
>> compilers.
>>
>> As Arnd Bergmann, points out: The asm-generic version uses the "struct"
>> version for native-endian unaligned access and the "byteshift" version
>> for the opposite endianess. The current ARM version however uses the
>> "byteshift" implementation for both.
>>
>> Thanks to Nicolas Pitre for the excellent analysis:
>>
>> Test case:
>>
>> int foo (int *x) { return get_unaligned(x); }
>> long long bar (long long *x) { return get_unaligned(x); }
>>
>> With the current ARM version:
>>
>> foo:
>> 	ldrb	r3, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
>> 	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
>> 	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
>> 	mov	r3, r3, asl #16	@ tmp154, MEM[(const u8 *)x_1(D) + 2B],
>> 	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
>> 	orr	r3, r3, r1, asl #8	@, tmp155, tmp154, MEM[(const u8 *)x_1(D) + 1B],
>> 	orr	r3, r3, r2	@ tmp157, tmp155, MEM[(const u8 *)x_1(D)]
>> 	orr	r0, r3, r0, asl #24	@,, tmp157, MEM[(const u8 *)x_1(D) + 3B],
>> 	bx	lr	@
>>
>> bar:
>> 	stmfd	sp!, {r4, r5, r6, r7}	@,
>> 	mov	r2, #0	@ tmp184,
>> 	ldrb	r5, [r0, #6]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 6B], MEM[(const u8 *)x_1(D) + 6B]
>> 	ldrb	r4, [r0, #5]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 5B], MEM[(const u8 *)x_1(D) + 5B]
>> 	ldrb	ip, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
>> 	ldrb	r1, [r0, #4]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 4B], MEM[(const u8 *)x_1(D) + 4B]
>> 	mov	r5, r5, asl #16	@ tmp175, MEM[(const u8 *)x_1(D) + 6B],
>> 	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
>> 	orr	r5, r5, r4, asl #8	@, tmp176, tmp175, MEM[(const u8 *)x_1(D) + 5B],
>> 	ldrb	r6, [r0, #7]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 7B], MEM[(const u8 *)x_1(D) + 7B]
>> 	orr	r5, r5, r1	@ tmp178, tmp176, MEM[(const u8 *)x_1(D) + 4B]
>> 	ldrb	r4, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
>> 	mov	ip, ip, asl #16	@ tmp188, MEM[(const u8 *)x_1(D) + 2B],
>> 	ldrb	r1, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
>> 	orr	ip, ip, r7, asl #8	@, tmp189, tmp188, MEM[(const u8 *)x_1(D) + 1B],
>> 	orr	r3, r5, r6, asl #24	@,, tmp178, MEM[(const u8 *)x_1(D) + 7B],
>> 	orr	ip, ip, r4	@ tmp191, tmp189, MEM[(const u8 *)x_1(D)]
>> 	orr	ip, ip, r1, asl #24	@, tmp194, tmp191, MEM[(const u8 *)x_1(D) + 3B],
>> 	mov	r1, r3	@,
>> 	orr	r0, r2, ip	@ tmp171, tmp184, tmp194
>> 	ldmfd	sp!, {r4, r5, r6, r7}
>> 	bx	lr
>>
>> In both cases the code is slightly suboptimal.  One may wonder why
>> wasting r2 with the constant 0 in the second case for example.  And all
>> the mov's could be folded in subsequent orr's, etc.
>>
>> Now with the asm-generic version:
>>
>> foo:
>> 	ldr	r0, [r0, #0]	@ unaligned	@,* x
>> 	bx	lr	@
>>
>> bar:
>> 	mov	r3, r0	@ x, x
>> 	ldr	r0, [r0, #0]	@ unaligned	@,* x
>> 	ldr	r1, [r3, #4]	@ unaligned	@,
>> 	bx	lr	@
>>
>> This is way better of course, but only because this was compiled for
>> ARMv7. In this case the compiler knows that the hardware can do
>> unaligned word access.  This isn't that obvious for foo(), but if we
>> remove the get_unaligned() from bar as follows:
>>
>> long long bar (long long *x) {return *x; }
>>
>> then the resulting code is:
>>
>> bar:
>> 	ldmia	r0, {r0, r1}	@ x,,
>> 	bx	lr	@
>>
>> So this proves that the presumed aligned vs unaligned cases does have
>> influence on the instructions the compiler may use and that the above
>> unaligned code results are not just an accident.
>>
>> Still... this isn't fully conclusive without at least looking at the
>> resulting assembly fron a pre ARMv6 compilation.  Let's see with an
>> ARMv5 target:
>>
>> foo:
>> 	ldrb	r3, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
>> 	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ tmp140,
>> 	ldrb	r2, [r0, #2]	@ zero_extendqisi2	@ tmp143,
>> 	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ tmp146,
>> 	orr	r3, r3, r1, asl #8	@, tmp142, tmp139, tmp140,
>> 	orr	r3, r3, r2, asl #16	@, tmp145, tmp142, tmp143,
>> 	orr	r0, r3, r0, asl #24	@,, tmp145, tmp146,
>> 	bx	lr	@
>>
>> bar:
>> 	stmfd	sp!, {r4, r5, r6, r7}	@,
>> 	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
>> 	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ tmp140,
>> 	ldrb	r3, [r0, #4]	@ zero_extendqisi2	@ tmp149,
>> 	ldrb	r6, [r0, #5]	@ zero_extendqisi2	@ tmp150,
>> 	ldrb	r5, [r0, #2]	@ zero_extendqisi2	@ tmp143,
>> 	ldrb	r4, [r0, #6]	@ zero_extendqisi2	@ tmp153,
>> 	ldrb	r1, [r0, #7]	@ zero_extendqisi2	@ tmp156,
>> 	ldrb	ip, [r0, #3]	@ zero_extendqisi2	@ tmp146,
>> 	orr	r2, r2, r7, asl #8	@, tmp142, tmp139, tmp140,
>> 	orr	r3, r3, r6, asl #8	@, tmp152, tmp149, tmp150,
>> 	orr	r2, r2, r5, asl #16	@, tmp145, tmp142, tmp143,
>> 	orr	r3, r3, r4, asl #16	@, tmp155, tmp152, tmp153,
>> 	orr	r0, r2, ip, asl #24	@,, tmp145, tmp146,
>> 	orr	r1, r3, r1, asl #24	@,, tmp155, tmp156,
>> 	ldmfd	sp!, {r4, r5, r6, r7}
>> 	bx	lr
>>
>> Compared to the initial results, this is really nicely optimized and I
>> couldn't do much better if I were to hand code it myself.
>>
>> Signed-off-by: Rob Herring <rob.herring@calxeda.com>
>> Reviewed-by: Nicolas Pitre <nico@linaro.org>
>> ---
>>  arch/arm/include/asm/Kbuild      |    1 +
>>  arch/arm/include/asm/unaligned.h |   19 -------------------
>>  2 files changed, 1 insertion(+), 19 deletions(-)
>>  delete mode 100644 arch/arm/include/asm/unaligned.h
>>
>> diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
>> index 6cc8a13..a86cabf 100644
>> --- a/arch/arm/include/asm/Kbuild
>> +++ b/arch/arm/include/asm/Kbuild
>> @@ -33,3 +33,4 @@ generic-y += sockios.h
>>  generic-y += termbits.h
>>  generic-y += timex.h
>>  generic-y += types.h
>> +generic-y += unaligned.h
>> diff --git a/arch/arm/include/asm/unaligned.h b/arch/arm/include/asm/unaligned.h
>> deleted file mode 100644
>> index 44593a8..0000000
>> --- a/arch/arm/include/asm/unaligned.h
>> +++ /dev/null
>> @@ -1,19 +0,0 @@
>> -#ifndef _ASM_ARM_UNALIGNED_H
>> -#define _ASM_ARM_UNALIGNED_H
>> -
>> -#include <linux/unaligned/le_byteshift.h>
>> -#include <linux/unaligned/be_byteshift.h>
>> -#include <linux/unaligned/generic.h>
>> -
>> -/*
>> - * Select endianness
>> - */
>> -#ifndef __ARMEB__
>> -#define get_unaligned	__get_unaligned_le
>> -#define put_unaligned	__put_unaligned_le
>> -#else
>> -#define get_unaligned	__get_unaligned_be
>> -#define put_unaligned	__put_unaligned_be
>> -#endif
>> -
>> -#endif /* _ASM_ARM_UNALIGNED_H */
>> -- 
>> 1.7.9.5
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 3/5] ARM: use generic unaligned.h
  2012-10-08 20:34     ` Rob Herring
@ 2012-10-08 23:28       ` Shawn Guo
  2012-10-09  2:27         ` Rob Herring
  0 siblings, 1 reply; 46+ messages in thread
From: Shawn Guo @ 2012-10-08 23:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 08, 2012 at 03:34:57PM -0500, Rob Herring wrote:
> On 10/08/2012 11:43 AM, Shawn Guo wrote:
> > This patch has been merged into mainline as commit below.
> > 
> >  d25c881 ARM: 7493/1: use generic unaligned.h
> > 
> > It introduces a regression for me.  Check out the commit on mainline,
> > build a v7 only kernel (imx5/6) with imx_v6_v7_defconfig, the kernel
> > halts in decompressor.  But v6/v7 kernel (imx3/5/6) works fine.  The
> > kernel built on the parent commit below works all fine.
> 
> It actually fails in the decompressor or that's the last output you get?

I think it fails in the decompressor, because what I see is 

Uncompressing Linux...

not

Uncompressing Linux... done, booting the kernel.

> I compared the decompressor disassembly of both cases and get the same
> number of ldrb/strb instructions, so I don't think it is directly
> related to alignment.
> 
> I tried the XY decompressor as that is one difference, but that works
> fine for me on highbank.
> 
> Does it work with an empty uncompress.h functions? That should be the
> only difference in our decompressor code.

No, empty putc() in uncompress.h does not help.

New finding is that it only fails with LZO decompressor while the other
3 Gzip, LZMA and XZ all work good.  We happen to have LZO as the default
one in imx_v6_v7_defconfig.

Shawn

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 3/5] ARM: use generic unaligned.h
  2012-10-08 23:28       ` Shawn Guo
@ 2012-10-09  2:27         ` Rob Herring
  2012-10-09  2:45           ` Shawn Guo
  2012-10-09  4:01           ` Nicolas Pitre
  0 siblings, 2 replies; 46+ messages in thread
From: Rob Herring @ 2012-10-09  2:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/08/2012 06:28 PM, Shawn Guo wrote:
> On Mon, Oct 08, 2012 at 03:34:57PM -0500, Rob Herring wrote:
>> On 10/08/2012 11:43 AM, Shawn Guo wrote:
>>> This patch has been merged into mainline as commit below.
>>>
>>>  d25c881 ARM: 7493/1: use generic unaligned.h
>>>
>>> It introduces a regression for me.  Check out the commit on mainline,
>>> build a v7 only kernel (imx5/6) with imx_v6_v7_defconfig, the kernel
>>> halts in decompressor.  But v6/v7 kernel (imx3/5/6) works fine.  The
>>> kernel built on the parent commit below works all fine.
>>
>> It actually fails in the decompressor or that's the last output you get?
> 
> I think it fails in the decompressor, because what I see is 
> 
> Uncompressing Linux...
> 
> not
> 
> Uncompressing Linux... done, booting the kernel.
> 
>> I compared the decompressor disassembly of both cases and get the same
>> number of ldrb/strb instructions, so I don't think it is directly
>> related to alignment.
>>
>> I tried the XY decompressor as that is one difference, but that works
>> fine for me on highbank.
>>
>> Does it work with an empty uncompress.h functions? That should be the
>> only difference in our decompressor code.
> 
> No, empty putc() in uncompress.h does not help.
> 
> New finding is that it only fails with LZO decompressor while the other
> 3 Gzip, LZMA and XZ all work good.  We happen to have LZO as the default
> one in imx_v6_v7_defconfig.

I must have had an old config with XZ. LZO fails because of this:
 
lib/decompress_unlzo.c: version = get_unaligned_be16(parse);
lib/decompress_unlzo.c: if (get_unaligned_be32(parse) & HEADER_HAS_FILTER)
lib/decompress_unlzo.c:         dst_len = get_unaligned_be32(in_buf);
lib/decompress_unlzo.c:         src_len = get_unaligned_be32(in_buf);

This was what I was afraid of. The decompressor runs with the sysctrl
register A bit in whatever state the bootloader left it in. In the case
of u-boot it is set, and the maintainers are pretty set on not allowing
unaligned accesses if you've seen the recent discussion.

This should fix things.

Rob

8<---------------------------------------------------------------------

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index bc67cbf..1f87d22 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -654,6 +654,7 @@ __armv7_mmu_cache_on:
 #endif
                mrc     p15, 0, r0, c1, c0, 0   @ read control reg
                bic     r0, r0, #1 << 28        @ clear SCTLR.TRE
+               bic     r0, r0, #1 << 1         @ clear SCTLR.A
                orr     r0, r0, #0x5000         @ I-cache enable, RR cache replacement
                orr     r0, r0, #0x003c         @ write buffer
 #ifdef CONFIG_MMU

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 3/5] ARM: use generic unaligned.h
  2012-10-09  2:27         ` Rob Herring
@ 2012-10-09  2:45           ` Shawn Guo
  2012-10-09  4:01           ` Nicolas Pitre
  1 sibling, 0 replies; 46+ messages in thread
From: Shawn Guo @ 2012-10-09  2:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 08, 2012 at 09:27:31PM -0500, Rob Herring wrote:
> I must have had an old config with XZ. LZO fails because of this:
>  
> lib/decompress_unlzo.c: version = get_unaligned_be16(parse);
> lib/decompress_unlzo.c: if (get_unaligned_be32(parse) & HEADER_HAS_FILTER)
> lib/decompress_unlzo.c:         dst_len = get_unaligned_be32(in_buf);
> lib/decompress_unlzo.c:         src_len = get_unaligned_be32(in_buf);
> 
> This was what I was afraid of. The decompressor runs with the sysctrl
> register A bit in whatever state the bootloader left it in. In the case
> of u-boot it is set, and the maintainers are pretty set on not allowing
> unaligned accesses if you've seen the recent discussion.
> 
> This should fix things.
> 
> Rob
> 
> 8<---------------------------------------------------------------------
> 
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index bc67cbf..1f87d22 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -654,6 +654,7 @@ __armv7_mmu_cache_on:
>  #endif
>                 mrc     p15, 0, r0, c1, c0, 0   @ read control reg
>                 bic     r0, r0, #1 << 28        @ clear SCTLR.TRE
> +               bic     r0, r0, #1 << 1         @ clear SCTLR.A
>                 orr     r0, r0, #0x5000         @ I-cache enable, RR cache replacement
>                 orr     r0, r0, #0x003c         @ write buffer
>  #ifdef CONFIG_MMU

Tested-by: Shawn Guo <shawn.guo@linaro.org>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 3/5] ARM: use generic unaligned.h
  2012-10-09  2:27         ` Rob Herring
  2012-10-09  2:45           ` Shawn Guo
@ 2012-10-09  4:01           ` Nicolas Pitre
  2012-10-10 13:29             ` Rob Herring
  1 sibling, 1 reply; 46+ messages in thread
From: Nicolas Pitre @ 2012-10-09  4:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 8 Oct 2012, Rob Herring wrote:

> On 10/08/2012 06:28 PM, Shawn Guo wrote:
> > On Mon, Oct 08, 2012 at 03:34:57PM -0500, Rob Herring wrote:
> >> On 10/08/2012 11:43 AM, Shawn Guo wrote:
> >>> This patch has been merged into mainline as commit below.
> >>>
> >>>  d25c881 ARM: 7493/1: use generic unaligned.h
> >>>
> >>> It introduces a regression for me.  Check out the commit on mainline,
> >>> build a v7 only kernel (imx5/6) with imx_v6_v7_defconfig, the kernel
> >>> halts in decompressor.  But v6/v7 kernel (imx3/5/6) works fine.  The
> >>> kernel built on the parent commit below works all fine.
> >>
> >> It actually fails in the decompressor or that's the last output you get?
> > 
> > I think it fails in the decompressor, because what I see is 
> > 
> > Uncompressing Linux...
> > 
> > not
> > 
> > Uncompressing Linux... done, booting the kernel.
> > 
> >> I compared the decompressor disassembly of both cases and get the same
> >> number of ldrb/strb instructions, so I don't think it is directly
> >> related to alignment.
> >>
> >> I tried the XY decompressor as that is one difference, but that works
> >> fine for me on highbank.
> >>
> >> Does it work with an empty uncompress.h functions? That should be the
> >> only difference in our decompressor code.
> > 
> > No, empty putc() in uncompress.h does not help.
> > 
> > New finding is that it only fails with LZO decompressor while the other
> > 3 Gzip, LZMA and XZ all work good.  We happen to have LZO as the default
> > one in imx_v6_v7_defconfig.
> 
> I must have had an old config with XZ. LZO fails because of this:
>  
> lib/decompress_unlzo.c: version = get_unaligned_be16(parse);
> lib/decompress_unlzo.c: if (get_unaligned_be32(parse) & HEADER_HAS_FILTER)
> lib/decompress_unlzo.c:         dst_len = get_unaligned_be32(in_buf);
> lib/decompress_unlzo.c:         src_len = get_unaligned_be32(in_buf);
> 
> This was what I was afraid of. The decompressor runs with the sysctrl
> register A bit in whatever state the bootloader left it in. In the case
> of u-boot it is set, and the maintainers are pretty set on not allowing
> unaligned accesses if you've seen the recent discussion.

This is not an u-Boot issue.  The kernel code expects misaligned 
accesses to be handled by the hardware on ARMv6+ now.  So it better 
enforce proper A bit state itself.

> This should fix things.
> 
> Rob
> 
> 8<---------------------------------------------------------------------
> 
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index bc67cbf..1f87d22 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -654,6 +654,7 @@ __armv7_mmu_cache_on:
>  #endif
>                 mrc     p15, 0, r0, c1, c0, 0   @ read control reg
>                 bic     r0, r0, #1 << 28        @ clear SCTLR.TRE
> +               bic     r0, r0, #1 << 1         @ clear SCTLR.A
>                 orr     r0, r0, #0x5000         @ I-cache enable, RR cache replacement
>                 orr     r0, r0, #0x003c         @ write buffer
>  #ifdef CONFIG_MMU
> 

That's OK for ARMv7, but in the ARMv6 case we use __armv4_mmu_cache_on.  
So this patch is incomplete.


Nicolas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 3/5] ARM: use generic unaligned.h
  2012-10-09  4:01           ` Nicolas Pitre
@ 2012-10-10 13:29             ` Rob Herring
  2012-10-10 13:57               ` Nicolas Pitre
  0 siblings, 1 reply; 46+ messages in thread
From: Rob Herring @ 2012-10-10 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/08/2012 11:01 PM, Nicolas Pitre wrote:
> On Mon, 8 Oct 2012, Rob Herring wrote:
> 
>> On 10/08/2012 06:28 PM, Shawn Guo wrote:
>>> On Mon, Oct 08, 2012 at 03:34:57PM -0500, Rob Herring wrote:
>>>> On 10/08/2012 11:43 AM, Shawn Guo wrote:
>>>>> This patch has been merged into mainline as commit below.
>>>>>
>>>>>  d25c881 ARM: 7493/1: use generic unaligned.h
>>>>>
>>>>> It introduces a regression for me.  Check out the commit on mainline,
>>>>> build a v7 only kernel (imx5/6) with imx_v6_v7_defconfig, the kernel
>>>>> halts in decompressor.  But v6/v7 kernel (imx3/5/6) works fine.  The
>>>>> kernel built on the parent commit below works all fine.
>>>>
>>>> It actually fails in the decompressor or that's the last output you get?
>>>
>>> I think it fails in the decompressor, because what I see is 
>>>
>>> Uncompressing Linux...
>>>
>>> not
>>>
>>> Uncompressing Linux... done, booting the kernel.
>>>
>>>> I compared the decompressor disassembly of both cases and get the same
>>>> number of ldrb/strb instructions, so I don't think it is directly
>>>> related to alignment.
>>>>
>>>> I tried the XY decompressor as that is one difference, but that works
>>>> fine for me on highbank.
>>>>
>>>> Does it work with an empty uncompress.h functions? That should be the
>>>> only difference in our decompressor code.
>>>
>>> No, empty putc() in uncompress.h does not help.
>>>
>>> New finding is that it only fails with LZO decompressor while the other
>>> 3 Gzip, LZMA and XZ all work good.  We happen to have LZO as the default
>>> one in imx_v6_v7_defconfig.
>>
>> I must have had an old config with XZ. LZO fails because of this:
>>  
>> lib/decompress_unlzo.c: version = get_unaligned_be16(parse);
>> lib/decompress_unlzo.c: if (get_unaligned_be32(parse) & HEADER_HAS_FILTER)
>> lib/decompress_unlzo.c:         dst_len = get_unaligned_be32(in_buf);
>> lib/decompress_unlzo.c:         src_len = get_unaligned_be32(in_buf);
>>
>> This was what I was afraid of. The decompressor runs with the sysctrl
>> register A bit in whatever state the bootloader left it in. In the case
>> of u-boot it is set, and the maintainers are pretty set on not allowing
>> unaligned accesses if you've seen the recent discussion.
> 
> This is not an u-Boot issue.  The kernel code expects misaligned 
> accesses to be handled by the hardware on ARMv6+ now.  So it better 
> enforce proper A bit state itself.
> 
>> This should fix things.
>>
>> Rob
>>
>> 8<---------------------------------------------------------------------
>>
>> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
>> index bc67cbf..1f87d22 100644
>> --- a/arch/arm/boot/compressed/head.S
>> +++ b/arch/arm/boot/compressed/head.S
>> @@ -654,6 +654,7 @@ __armv7_mmu_cache_on:
>>  #endif
>>                 mrc     p15, 0, r0, c1, c0, 0   @ read control reg
>>                 bic     r0, r0, #1 << 28        @ clear SCTLR.TRE
>> +               bic     r0, r0, #1 << 1         @ clear SCTLR.A
>>                 orr     r0, r0, #0x5000         @ I-cache enable, RR cache replacement
>>                 orr     r0, r0, #0x003c         @ write buffer
>>  #ifdef CONFIG_MMU
>>
> 
> That's OK for ARMv7, but in the ARMv6 case we use __armv4_mmu_cache_on.  
> So this patch is incomplete.

It is still optional on v6 and the compiler will not emit unaligned
accesses for v6 (which is why it worked when v6 platforms were enabled).
That does raise a bigger question. If we're doing combined v6 and v7
builds, do we want this to require unaligned accesses are enabled and if
so, can we force that on in the compiler?

Rob

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 3/5] ARM: use generic unaligned.h
  2012-10-10 13:29             ` Rob Herring
@ 2012-10-10 13:57               ` Nicolas Pitre
  2012-10-11 12:43                 ` [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores Rob Herring
  0 siblings, 1 reply; 46+ messages in thread
From: Nicolas Pitre @ 2012-10-10 13:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 10 Oct 2012, Rob Herring wrote:

> On 10/08/2012 11:01 PM, Nicolas Pitre wrote:
> > On Mon, 8 Oct 2012, Rob Herring wrote:
> > 
> >> On 10/08/2012 06:28 PM, Shawn Guo wrote:
> >>> On Mon, Oct 08, 2012 at 03:34:57PM -0500, Rob Herring wrote:
> >>>> On 10/08/2012 11:43 AM, Shawn Guo wrote:
> >>>>> This patch has been merged into mainline as commit below.
> >>>>>
> >>>>>  d25c881 ARM: 7493/1: use generic unaligned.h
> >>>>>
> >>>>> It introduces a regression for me.  Check out the commit on mainline,
> >>>>> build a v7 only kernel (imx5/6) with imx_v6_v7_defconfig, the kernel
> >>>>> halts in decompressor.  But v6/v7 kernel (imx3/5/6) works fine.  The
> >>>>> kernel built on the parent commit below works all fine.
> >>>>
> >>>> It actually fails in the decompressor or that's the last output you get?
> >>>
> >>> I think it fails in the decompressor, because what I see is 
> >>>
> >>> Uncompressing Linux...
> >>>
> >>> not
> >>>
> >>> Uncompressing Linux... done, booting the kernel.
> >>>
> >>>> I compared the decompressor disassembly of both cases and get the same
> >>>> number of ldrb/strb instructions, so I don't think it is directly
> >>>> related to alignment.
> >>>>
> >>>> I tried the XY decompressor as that is one difference, but that works
> >>>> fine for me on highbank.
> >>>>
> >>>> Does it work with an empty uncompress.h functions? That should be the
> >>>> only difference in our decompressor code.
> >>>
> >>> No, empty putc() in uncompress.h does not help.
> >>>
> >>> New finding is that it only fails with LZO decompressor while the other
> >>> 3 Gzip, LZMA and XZ all work good.  We happen to have LZO as the default
> >>> one in imx_v6_v7_defconfig.
> >>
> >> I must have had an old config with XZ. LZO fails because of this:
> >>  
> >> lib/decompress_unlzo.c: version = get_unaligned_be16(parse);
> >> lib/decompress_unlzo.c: if (get_unaligned_be32(parse) & HEADER_HAS_FILTER)
> >> lib/decompress_unlzo.c:         dst_len = get_unaligned_be32(in_buf);
> >> lib/decompress_unlzo.c:         src_len = get_unaligned_be32(in_buf);
> >>
> >> This was what I was afraid of. The decompressor runs with the sysctrl
> >> register A bit in whatever state the bootloader left it in. In the case
> >> of u-boot it is set, and the maintainers are pretty set on not allowing
> >> unaligned accesses if you've seen the recent discussion.
> > 
> > This is not an u-Boot issue.  The kernel code expects misaligned 
> > accesses to be handled by the hardware on ARMv6+ now.  So it better 
> > enforce proper A bit state itself.
> > 
> >> This should fix things.
> >>
> >> Rob
> >>
> >> 8<---------------------------------------------------------------------
> >>
> >> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> >> index bc67cbf..1f87d22 100644
> >> --- a/arch/arm/boot/compressed/head.S
> >> +++ b/arch/arm/boot/compressed/head.S
> >> @@ -654,6 +654,7 @@ __armv7_mmu_cache_on:
> >>  #endif
> >>                 mrc     p15, 0, r0, c1, c0, 0   @ read control reg
> >>                 bic     r0, r0, #1 << 28        @ clear SCTLR.TRE
> >> +               bic     r0, r0, #1 << 1         @ clear SCTLR.A
> >>                 orr     r0, r0, #0x5000         @ I-cache enable, RR cache replacement
> >>                 orr     r0, r0, #0x003c         @ write buffer
> >>  #ifdef CONFIG_MMU
> >>
> > 
> > That's OK for ARMv7, but in the ARMv6 case we use __armv4_mmu_cache_on.  
> > So this patch is incomplete.
> 
> It is still optional on v6 and the compiler will not emit unaligned
> accesses for v6 (which is why it worked when v6 platforms were enabled).

Hmmm, right. However if you look at commit 8428e84d42, the A bit is 
cleared whenever we compile for ARMv6 or higher.  Of course, the 
decompressor doesn't have to deal with unknown user space binaries and 
in that case we might trust that the compiler will never emit known to 
be unaligned loads.

In that case...

Acked-by: Nicolas Pitre <nico@linaro.org>

> That does raise a bigger question. If we're doing combined v6 and v7
> builds, do we want this to require unaligned accesses are enabled and if
> so, can we force that on in the compiler?

When we compile ARMv6 and ARMv7 targets together, the compiler is told 
to compile for ARMv6.  We know that unaligned accesses will be done 
usingLDRB/STRB in that case.


Nicolas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-10 13:57               ` Nicolas Pitre
@ 2012-10-11 12:43                 ` Rob Herring
  2012-10-11 13:09                   ` Russell King - ARM Linux
  2012-10-25  9:34                   ` Johannes Stezenbach
  0 siblings, 2 replies; 46+ messages in thread
From: Rob Herring @ 2012-10-11 12:43 UTC (permalink / raw)
  To: linux-arm-kernel

From: Rob Herring <rob.herring@calxeda.com>

With recent compilers and move to generic unaligned.h in commit d25c881
(ARM: 7493/1: use generic unaligned.h), unaligned accesses will be used
by the LZO decompressor on v7 cores. So we need to make sure unaligned
accesses are allowed by clearing the SCTLR A bit.

While v6 can support unaligned accesses, it is optional and current
compilers won't emit unaligned accesses. So we don't clear the A bit for
v6.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
---

The contents of this were already reviewed on this thread, so I sent this
to the patch system and this was Russell's reply:

> NAK for two reasons.
> 
> 1. It hasn't been on the list (I can't find a match for "clear SCTLR.A"
> in my mailbox)
> 
> 2. The behaviour of unaligned accesses vary depending on CPU.  Some
> fix-up the access, others load the word and then rotate it.  If we have
> decompressors which perform unaligned accesses, we need to fix this
> properly to avoid the CPU specific behaviour, rather than tweaking
> control bits to hide the problem.

I'm simply matching the behavior of the kernel itself. The A bit is cleared
for v7 kernels and compilers only generate unaligned accesses for v7.
Without this the initial state of the A bit is undefined as a bootloader
could have cleared it already. We should document the required state or set
it to what we want.

Rob

 arch/arm/boot/compressed/head.S |    1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index bc67cbf..b2e30b8 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -654,6 +654,7 @@ __armv7_mmu_cache_on:
 #endif
 		mrc	p15, 0, r0, c1, c0, 0	@ read control reg
 		bic	r0, r0, #1 << 28	@ clear SCTLR.TRE
+		bic	r0, r0, #1 << 1		@ clear SCTLR.A
 		orr	r0, r0, #0x5000		@ I-cache enable, RR cache replacement
 		orr	r0, r0, #0x003c		@ write buffer
 #ifdef CONFIG_MMU
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-11 12:43                 ` [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores Rob Herring
@ 2012-10-11 13:09                   ` Russell King - ARM Linux
  2012-10-11 13:31                     ` Rob Herring
  2012-10-25  9:34                   ` Johannes Stezenbach
  1 sibling, 1 reply; 46+ messages in thread
From: Russell King - ARM Linux @ 2012-10-11 13:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
> The contents of this were already reviewed on this thread, so I sent this
> to the patch system and this was Russell's reply:

So that's why I couldn't find it - the mailing list thread has a different
subject line to the patch.  Don't do that.  Given the amount of list
traffic we have today, that's as good as not having been posted at all.

> > NAK for two reasons.
> > 
> > 1. It hasn't been on the list (I can't find a match for "clear SCTLR.A"
> > in my mailbox)
> > 
> > 2. The behaviour of unaligned accesses vary depending on CPU.  Some
> > fix-up the access, others load the word and then rotate it.  If we have
> > decompressors which perform unaligned accesses, we need to fix this
> > properly to avoid the CPU specific behaviour, rather than tweaking
> > control bits to hide the problem.
> 
> I'm simply matching the behavior of the kernel itself. The A bit is cleared
> for v7 kernels and compilers only generate unaligned accesses for v7.
> Without this the initial state of the A bit is undefined as a bootloader
> could have cleared it already. We should document the required state or set
> it to what we want.

Irrespective of this, (2) still stands.  Unaligned accesses in the
decompressor without a fixup (which will be very hard to provide)
will return different data depending on the CPU as I mention in point
2.

So, using unaligned accesses in the decompressor may not give expected
results in all cases, so they're best avoided.

And if they're avoided, then we don't care about the setting of the A
bit.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-11 13:09                   ` Russell King - ARM Linux
@ 2012-10-11 13:31                     ` Rob Herring
  2012-10-11 13:41                       ` Russell King - ARM Linux
  2012-10-11 13:59                       ` Russell King - ARM Linux
  0 siblings, 2 replies; 46+ messages in thread
From: Rob Herring @ 2012-10-11 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/11/2012 08:09 AM, Russell King - ARM Linux wrote:
> On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
>> The contents of this were already reviewed on this thread, so I sent this
>> to the patch system and this was Russell's reply:
> 
> So that's why I couldn't find it - the mailing list thread has a different
> subject line to the patch.  Don't do that.  Given the amount of list
> traffic we have today, that's as good as not having been posted at all.
> 
>>> NAK for two reasons.
>>>
>>> 1. It hasn't been on the list (I can't find a match for "clear SCTLR.A"
>>> in my mailbox)
>>>
>>> 2. The behaviour of unaligned accesses vary depending on CPU.  Some
>>> fix-up the access, others load the word and then rotate it.  If we have
>>> decompressors which perform unaligned accesses, we need to fix this
>>> properly to avoid the CPU specific behaviour, rather than tweaking
>>> control bits to hide the problem.
>>
>> I'm simply matching the behavior of the kernel itself. The A bit is cleared
>> for v7 kernels and compilers only generate unaligned accesses for v7.
>> Without this the initial state of the A bit is undefined as a bootloader
>> could have cleared it already. We should document the required state or set
>> it to what we want.
> 
> Irrespective of this, (2) still stands.  Unaligned accesses in the
> decompressor without a fixup (which will be very hard to provide)
> will return different data depending on the CPU as I mention in point
> 2.

This only affects v7 cores. It should not vary for v7 cores as unaligned
access is a required feature. So how is it going to vary on v7 CPUs?
We've got bigger problems if there are v7 cores that don't handle
unaligned accesses.

Rob

> So, using unaligned accesses in the decompressor may not give expected
> results in all cases, so they're best avoided.
>
> And if they're avoided, then we don't care about the setting of the A
> bit.
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-11 13:31                     ` Rob Herring
@ 2012-10-11 13:41                       ` Russell King - ARM Linux
  2012-10-11 15:44                         ` Nicolas Pitre
  2012-10-11 13:59                       ` Russell King - ARM Linux
  1 sibling, 1 reply; 46+ messages in thread
From: Russell King - ARM Linux @ 2012-10-11 13:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 11, 2012 at 08:31:47AM -0500, Rob Herring wrote:
> On 10/11/2012 08:09 AM, Russell King - ARM Linux wrote:
> > On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
> >> The contents of this were already reviewed on this thread, so I sent this
> >> to the patch system and this was Russell's reply:
> > 
> > So that's why I couldn't find it - the mailing list thread has a different
> > subject line to the patch.  Don't do that.  Given the amount of list
> > traffic we have today, that's as good as not having been posted at all.
> > 
> >>> NAK for two reasons.
> >>>
> >>> 1. It hasn't been on the list (I can't find a match for "clear SCTLR.A"
> >>> in my mailbox)
> >>>
> >>> 2. The behaviour of unaligned accesses vary depending on CPU.  Some
> >>> fix-up the access, others load the word and then rotate it.  If we have
> >>> decompressors which perform unaligned accesses, we need to fix this
> >>> properly to avoid the CPU specific behaviour, rather than tweaking
> >>> control bits to hide the problem.
> >>
> >> I'm simply matching the behavior of the kernel itself. The A bit is cleared
> >> for v7 kernels and compilers only generate unaligned accesses for v7.
> >> Without this the initial state of the A bit is undefined as a bootloader
> >> could have cleared it already. We should document the required state or set
> >> it to what we want.
> > 
> > Irrespective of this, (2) still stands.  Unaligned accesses in the
> > decompressor without a fixup (which will be very hard to provide)
> > will return different data depending on the CPU as I mention in point
> > 2.
> 
> This only affects v7 cores. It should not vary for v7 cores as unaligned
> access is a required feature. So how is it going to vary on v7 CPUs?
> We've got bigger problems if there are v7 cores that don't handle
> unaligned accesses.

Rob,

Your patch may only affect v7 cores, but you've raised the issue of the
decompressor performing unaligned accesses in general.  Shall I re-repeat
my point over that or is the problem here going to finally sink in?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-11 13:31                     ` Rob Herring
  2012-10-11 13:41                       ` Russell King - ARM Linux
@ 2012-10-11 13:59                       ` Russell King - ARM Linux
  2012-10-11 15:58                         ` Nicolas Pitre
  1 sibling, 1 reply; 46+ messages in thread
From: Russell King - ARM Linux @ 2012-10-11 13:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 11, 2012 at 08:31:47AM -0500, Rob Herring wrote:
> This only affects v7 cores. It should not vary for v7 cores as unaligned
> access is a required feature. So how is it going to vary on v7 CPUs?
> We've got bigger problems if there are v7 cores that don't handle
> unaligned accesses.

Oh, and this gives me a third reason to NAK this patch.  Why only ensure
that the A bit is clear for v7 CPUs?  Why not v6, v5, v4 too?  Why does
ARMv7 get this special treatment?

Any argument you can make for clearing the bit on ARMv7 also applies to
the other architectures too, but not even that negates my point (2) which
is far more fundamental.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-11 13:41                       ` Russell King - ARM Linux
@ 2012-10-11 15:44                         ` Nicolas Pitre
  2012-10-23 20:32                           ` Gregory CLEMENT
  0 siblings, 1 reply; 46+ messages in thread
From: Nicolas Pitre @ 2012-10-11 15:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 11 Oct 2012, Russell King - ARM Linux wrote:

> On Thu, Oct 11, 2012 at 08:31:47AM -0500, Rob Herring wrote:
> > On 10/11/2012 08:09 AM, Russell King - ARM Linux wrote:
> > > On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
> > >> The contents of this were already reviewed on this thread, so I sent this
> > >> to the patch system and this was Russell's reply:
> > > 
> > > So that's why I couldn't find it - the mailing list thread has a different
> > > subject line to the patch.  Don't do that.  Given the amount of list
> > > traffic we have today, that's as good as not having been posted at all.
> > > 
> > >>> NAK for two reasons.
> > >>>
> > >>> 1. It hasn't been on the list (I can't find a match for "clear SCTLR.A"
> > >>> in my mailbox)
> > >>>
> > >>> 2. The behaviour of unaligned accesses vary depending on CPU.  Some
> > >>> fix-up the access, others load the word and then rotate it.  If we have
> > >>> decompressors which perform unaligned accesses, we need to fix this
> > >>> properly to avoid the CPU specific behaviour, rather than tweaking
> > >>> control bits to hide the problem.
> > >>
> > >> I'm simply matching the behavior of the kernel itself. The A bit is cleared
> > >> for v7 kernels and compilers only generate unaligned accesses for v7.
> > >> Without this the initial state of the A bit is undefined as a bootloader
> > >> could have cleared it already. We should document the required state or set
> > >> it to what we want.
> > > 
> > > Irrespective of this, (2) still stands.  Unaligned accesses in the
> > > decompressor without a fixup (which will be very hard to provide)
> > > will return different data depending on the CPU as I mention in point
> > > 2.
> > 
> > This only affects v7 cores. It should not vary for v7 cores as unaligned
> > access is a required feature. So how is it going to vary on v7 CPUs?
> > We've got bigger problems if there are v7 cores that don't handle
> > unaligned accesses.
> 
> Rob,
> 
> Your patch may only affect v7 cores, but you've raised the issue of the
> decompressor performing unaligned accesses in general.  Shall I re-repeat
> my point over that or is the problem here going to finally sink in?

The decompressor is not performing direct unaligned accesses.  It uses 
the get_unaligned() and put_unaligned() accessors.  That means that 
we're in control of how this is happening.

So let's talk about the how.  On pre ARMv7, those accesses are performed 
with a series of byte accesses.  When compiling for ARMv7, gcc knows and 
that the hardware can do unaligned accesses, and it does optimize its 
output by using ldr/str instructions.  But the A bit has to be cleared 
in that case, and only in that case.  This is why the patch clears the A 
bit only for ARMv7.

So this patch is only setting up the hardware to match gcc's 
expectations when generating code from the use of get_unaligned() and 
put_unaligned() when optimizing for ARMv7.

As always, any code doing unaligned access and _not_ using those 
accessors is broken.


Nicolas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-11 13:59                       ` Russell King - ARM Linux
@ 2012-10-11 15:58                         ` Nicolas Pitre
  0 siblings, 0 replies; 46+ messages in thread
From: Nicolas Pitre @ 2012-10-11 15:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 11 Oct 2012, Russell King - ARM Linux wrote:

> On Thu, Oct 11, 2012 at 08:31:47AM -0500, Rob Herring wrote:
> > This only affects v7 cores. It should not vary for v7 cores as unaligned
> > access is a required feature. So how is it going to vary on v7 CPUs?
> > We've got bigger problems if there are v7 cores that don't handle
> > unaligned accesses.
> 
> Oh, and this gives me a third reason to NAK this patch.  Why only ensure
> that the A bit is clear for v7 CPUs?  Why not v6, v5, v4 too?  Why does
> ARMv7 get this special treatment?

As I said, gcc knows that ARMv7 can perform word sized accesses even 
with misaligned pointers.  So when it is passed a pointer marked with 
the packed attribute, it will generate a series of byte accesses when 
compiling for anything but ARMv7, and use a single ldr or str when 
compiling for ARMv7.


Nicolas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-11 15:44                         ` Nicolas Pitre
@ 2012-10-23 20:32                           ` Gregory CLEMENT
  2012-10-25 12:07                             ` Russell King - ARM Linux
  0 siblings, 1 reply; 46+ messages in thread
From: Gregory CLEMENT @ 2012-10-23 20:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/11/2012 05:44 PM, Nicolas Pitre wrote:
> On Thu, 11 Oct 2012, Russell King - ARM Linux wrote:
> 
>> On Thu, Oct 11, 2012 at 08:31:47AM -0500, Rob Herring wrote:
>>> On 10/11/2012 08:09 AM, Russell King - ARM Linux wrote:
>>>> On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
>>>>> The contents of this were already reviewed on this thread, so I sent this
>>>>> to the patch system and this was Russell's reply:
>>>>
>>>> So that's why I couldn't find it - the mailing list thread has a different
>>>> subject line to the patch.  Don't do that.  Given the amount of list
>>>> traffic we have today, that's as good as not having been posted at all.
>>>>
>>>>>> NAK for two reasons.
>>>>>>
>>>>>> 1. It hasn't been on the list (I can't find a match for "clear SCTLR.A"
>>>>>> in my mailbox)
>>>>>>
>>>>>> 2. The behaviour of unaligned accesses vary depending on CPU.  Some
>>>>>> fix-up the access, others load the word and then rotate it.  If we have
>>>>>> decompressors which perform unaligned accesses, we need to fix this
>>>>>> properly to avoid the CPU specific behaviour, rather than tweaking
>>>>>> control bits to hide the problem.
>>>>>
>>>>> I'm simply matching the behavior of the kernel itself. The A bit is cleared
>>>>> for v7 kernels and compilers only generate unaligned accesses for v7.
>>>>> Without this the initial state of the A bit is undefined as a bootloader
>>>>> could have cleared it already. We should document the required state or set
>>>>> it to what we want.
>>>>
>>>> Irrespective of this, (2) still stands.  Unaligned accesses in the
>>>> decompressor without a fixup (which will be very hard to provide)
>>>> will return different data depending on the CPU as I mention in point
>>>> 2.
>>>
>>> This only affects v7 cores. It should not vary for v7 cores as unaligned
>>> access is a required feature. So how is it going to vary on v7 CPUs?
>>> We've got bigger problems if there are v7 cores that don't handle
>>> unaligned accesses.
>>
>> Rob,
>>
>> Your patch may only affect v7 cores, but you've raised the issue of the
>> decompressor performing unaligned accesses in general.  Shall I re-repeat
>> my point over that or is the problem here going to finally sink in?
> 
> The decompressor is not performing direct unaligned accesses.  It uses 
> the get_unaligned() and put_unaligned() accessors.  That means that 
> we're in control of how this is happening.
> 
> So let's talk about the how.  On pre ARMv7, those accesses are performed 
> with a series of byte accesses.  When compiling for ARMv7, gcc knows and 
> that the hardware can do unaligned accesses, and it does optimize its 
> output by using ldr/str instructions.  But the A bit has to be cleared 
> in that case, and only in that case.  This is why the patch clears the A 
> bit only for ARMv7.
> 
> So this patch is only setting up the hardware to match gcc's 
> expectations when generating code from the use of get_unaligned() and 
> put_unaligned() when optimizing for ARMv7.
> 
> As always, any code doing unaligned access and _not_ using those 
> accessors is broken.

So is there a chance that this patch will be applied for 3.7?

Currently I can't boot anymore Armada XP or Armada 370, if kernel is
compressed in LZO. It's annoying.

Russell, did Nicolas manage to convince you?
If not, what should be the solution to fix this issue?

Thanks,

Gregory

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-11 12:43                 ` [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores Rob Herring
  2012-10-11 13:09                   ` Russell King - ARM Linux
@ 2012-10-25  9:34                   ` Johannes Stezenbach
  2012-10-25 12:41                     ` Rob Herring
  1 sibling, 1 reply; 46+ messages in thread
From: Johannes Stezenbach @ 2012-10-25  9:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
> 
> With recent compilers and move to generic unaligned.h in commit d25c881
> (ARM: 7493/1: use generic unaligned.h), unaligned accesses will be used
> by the LZO decompressor on v7 cores. So we need to make sure unaligned
> accesses are allowed by clearing the SCTLR A bit.

I just read this in http://gcc.gnu.org/gcc-4.7/changes.html:

  On ARM, when compiling for ARMv6 (but not ARMv6-M), ARMv7-A, ARMv7-R, or
  ARMv7-M, the new option -munaligned-access is active by default, which for
  some source codes generates code that accesses memory on unaligned addresses.
  This will require the kernel of those systems to enable such accesses
  (controlled by CP15 register c1, refer to ARM documentation). Alternatively
  or for compatibility with kernels where unaligned accesses are not supported,
  all code has to be compiled with -mno-unaligned-access. Linux/ARM in official
  releases has automatically and unconditionally supported unaligned accesses
  as emitted by GCC due to this option being active, since Linux version 2.6.28.

My understanding is that gcc, using the same generic unaligned.h
source code, will generate code for ARMv6 and ARMv7 that uses
unaligned access, while for ARMv5 and older it won't.  So it seems
gcc requires Linux to clear SCTLR.A and set SCTLR.U for ARMv6+.

Or add -mno-unaligned-access.

Is my understanding correct?

> While v6 can support unaligned accesses, it is optional and current
> compilers won't emit unaligned accesses. So we don't clear the A bit for
> v6.

not true according to the gcc changes page


Thanks
Johannes

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-23 20:32                           ` Gregory CLEMENT
@ 2012-10-25 12:07                             ` Russell King - ARM Linux
  0 siblings, 0 replies; 46+ messages in thread
From: Russell King - ARM Linux @ 2012-10-25 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 23, 2012 at 10:32:44PM +0200, Gregory CLEMENT wrote:
> So is there a chance that this patch will be applied for 3.7?
> 
> Currently I can't boot anymore Armada XP or Armada 370, if kernel is
> compressed in LZO. It's annoying.
> 
> Russell, did Nicolas manage to convince you?

Frankly, no - less so now that we have a question mark over ARMv6 which
seems to conflict with Nicolas' justification.  This issue is larger than
just ARMv7...

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-25  9:34                   ` Johannes Stezenbach
@ 2012-10-25 12:41                     ` Rob Herring
  2012-10-25 13:30                       ` Arnd Bergmann
  2012-10-25 14:16                       ` Johannes Stezenbach
  0 siblings, 2 replies; 46+ messages in thread
From: Rob Herring @ 2012-10-25 12:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/25/2012 04:34 AM, Johannes Stezenbach wrote:
> On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
>>
>> With recent compilers and move to generic unaligned.h in commit d25c881
>> (ARM: 7493/1: use generic unaligned.h), unaligned accesses will be used
>> by the LZO decompressor on v7 cores. So we need to make sure unaligned
>> accesses are allowed by clearing the SCTLR A bit.
> 
> I just read this in http://gcc.gnu.org/gcc-4.7/changes.html:
> 
>   On ARM, when compiling for ARMv6 (but not ARMv6-M), ARMv7-A, ARMv7-R, or
>   ARMv7-M, the new option -munaligned-access is active by default, which for
>   some source codes generates code that accesses memory on unaligned addresses.
>   This will require the kernel of those systems to enable such accesses
>   (controlled by CP15 register c1, refer to ARM documentation). Alternatively
>   or for compatibility with kernels where unaligned accesses are not supported,
>   all code has to be compiled with -mno-unaligned-access. Linux/ARM in official
>   releases has automatically and unconditionally supported unaligned accesses
>   as emitted by GCC due to this option being active, since Linux version 2.6.28.

I don't think there is such a thing as ARMv6-M.

> My understanding is that gcc, using the same generic unaligned.h
> source code, will generate code for ARMv6 and ARMv7 that uses
> unaligned access, while for ARMv5 and older it won't.  So it seems
> gcc requires Linux to clear SCTLR.A and set SCTLR.U for ARMv6+.
> 
> Or add -mno-unaligned-access.
> 
> Is my understanding correct?
> 
>> While v6 can support unaligned accesses, it is optional and current
>> compilers won't emit unaligned accesses. So we don't clear the A bit for
>> v6.
> 
> not true according to the gcc changes page

What are you going to believe: documentation or what the compiler
emitted? At least for ubuntu/linaro 4.6.3 which has the unaligned access
support backported and 4.7.2, unaligned accesses are emitted for v7
only. I guess default here means it is the default unless you change the
default in your build of gcc.

If we are going to do combined v6k and v7 kernels, then it would be nice
if we could get the compiler to emit unaligned accesses (assuming we
agree we can require that for v6).

Rob

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-25 12:41                     ` Rob Herring
@ 2012-10-25 13:30                       ` Arnd Bergmann
  2012-10-25 14:16                       ` Johannes Stezenbach
  1 sibling, 0 replies; 46+ messages in thread
From: Arnd Bergmann @ 2012-10-25 13:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 25 October 2012, Rob Herring wrote:
> > On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
> >>
> >> With recent compilers and move to generic unaligned.h in commit d25c881
> >> (ARM: 7493/1: use generic unaligned.h), unaligned accesses will be used
> >> by the LZO decompressor on v7 cores. So we need to make sure unaligned
> >> accesses are allowed by clearing the SCTLR A bit.
> > 
> > I just read this in http://gcc.gnu.org/gcc-4.7/changes.html:
> > 
> >   On ARM, when compiling for ARMv6 (but not ARMv6-M), ARMv7-A, ARMv7-R, or
> >   ARMv7-M, the new option -munaligned-access is active by default, which for
> >   some source codes generates code that accesses memory on unaligned addresses.
> >   This will require the kernel of those systems to enable such accesses
> >   (controlled by CP15 register c1, refer to ARM documentation). Alternatively
> >   or for compatibility with kernels where unaligned accesses are not supported,
> >   all code has to be compiled with -mno-unaligned-access. Linux/ARM in official
> >   releases has automatically and unconditionally supported unaligned accesses
> >   as emitted by GCC due to this option being active, since Linux version 2.6.28.
> 
> I don't think there is such a thing as ARMv6-M.

There is, that would be the Cortex-M0/M0+/M1. You cannot run Linux on these.

	Arnd

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-25 12:41                     ` Rob Herring
  2012-10-25 13:30                       ` Arnd Bergmann
@ 2012-10-25 14:16                       ` Johannes Stezenbach
  2012-10-25 14:25                         ` Rob Herring
  1 sibling, 1 reply; 46+ messages in thread
From: Johannes Stezenbach @ 2012-10-25 14:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 25, 2012 at 07:41:45AM -0500, Rob Herring wrote:
> On 10/25/2012 04:34 AM, Johannes Stezenbach wrote:
> > On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
> > 
> >> While v6 can support unaligned accesses, it is optional and current
> >> compilers won't emit unaligned accesses. So we don't clear the A bit for
> >> v6.
> > 
> > not true according to the gcc changes page
> 
> What are you going to believe: documentation or what the compiler
> emitted? At least for ubuntu/linaro 4.6.3 which has the unaligned access
> support backported and 4.7.2, unaligned accesses are emitted for v7
> only. I guess default here means it is the default unless you change the
> default in your build of gcc.

Since ARMv6 can handle unaligned access in the same way as ARMv7
it seems a clear bug in gcc which might hopefully get fixed.
Thus in this case I think it is reasonable to follow the
gcc documentation, otherwise the code would break for ARMv6
when gcc gets fixed.


Johannes

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-25 14:16                       ` Johannes Stezenbach
@ 2012-10-25 14:25                         ` Rob Herring
  2012-10-25 15:02                           ` Nicolas Pitre
  2012-10-25 15:08                           ` Johannes Stezenbach
  0 siblings, 2 replies; 46+ messages in thread
From: Rob Herring @ 2012-10-25 14:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/25/2012 09:16 AM, Johannes Stezenbach wrote:
> On Thu, Oct 25, 2012 at 07:41:45AM -0500, Rob Herring wrote:
>> On 10/25/2012 04:34 AM, Johannes Stezenbach wrote:
>>> On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
>>>
>>>> While v6 can support unaligned accesses, it is optional and current
>>>> compilers won't emit unaligned accesses. So we don't clear the A bit for
>>>> v6.
>>>
>>> not true according to the gcc changes page
>>
>> What are you going to believe: documentation or what the compiler
>> emitted? At least for ubuntu/linaro 4.6.3 which has the unaligned access
>> support backported and 4.7.2, unaligned accesses are emitted for v7
>> only. I guess default here means it is the default unless you change the
>> default in your build of gcc.
> 
> Since ARMv6 can handle unaligned access in the same way as ARMv7
> it seems a clear bug in gcc which might hopefully get fixed.
> Thus in this case I think it is reasonable to follow the
> gcc documentation, otherwise the code would break for ARMv6
> when gcc gets fixed.

But the compiler can't assume the state of the U bit. I think it is
still legal on v6 to not support unaligned accesses, but on v7 it is
required. All the standard v6 ARM cores support it, but I'm not sure
about custom cores or if there are SOCs with buses that don't support
unaligned accesses properly.

Rob

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-25 14:25                         ` Rob Herring
@ 2012-10-25 15:02                           ` Nicolas Pitre
  2012-10-25 15:08                           ` Johannes Stezenbach
  1 sibling, 0 replies; 46+ messages in thread
From: Nicolas Pitre @ 2012-10-25 15:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 25 Oct 2012, Rob Herring wrote:

> On 10/25/2012 09:16 AM, Johannes Stezenbach wrote:
> > On Thu, Oct 25, 2012 at 07:41:45AM -0500, Rob Herring wrote:
> >> On 10/25/2012 04:34 AM, Johannes Stezenbach wrote:
> >>> On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
> >>>
> >>>> While v6 can support unaligned accesses, it is optional and current
> >>>> compilers won't emit unaligned accesses. So we don't clear the A bit for
> >>>> v6.
> >>>
> >>> not true according to the gcc changes page
> >>
> >> What are you going to believe: documentation or what the compiler
> >> emitted? At least for ubuntu/linaro 4.6.3 which has the unaligned access
> >> support backported and 4.7.2, unaligned accesses are emitted for v7
> >> only. I guess default here means it is the default unless you change the
> >> default in your build of gcc.
> > 
> > Since ARMv6 can handle unaligned access in the same way as ARMv7
> > it seems a clear bug in gcc which might hopefully get fixed.
> > Thus in this case I think it is reasonable to follow the
> > gcc documentation, otherwise the code would break for ARMv6
> > when gcc gets fixed.
> 
> But the compiler can't assume the state of the U bit. I think it is
> still legal on v6 to not support unaligned accesses, but on v7 it is
> required. All the standard v6 ARM cores support it, but I'm not sure
> about custom cores or if there are SOCs with buses that don't support
> unaligned accesses properly.

The fact is that gcc assumes the A bit is cleared when generating code 
for ARMv7, period.  We need to conform our environment to gcc 
expectations, or always compile the decompressor using a lower 
architecture level than ARMv7.  My preference is the former.


Nicolas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-25 14:25                         ` Rob Herring
  2012-10-25 15:02                           ` Nicolas Pitre
@ 2012-10-25 15:08                           ` Johannes Stezenbach
  2012-11-05 10:48                             ` Dave Martin
  1 sibling, 1 reply; 46+ messages in thread
From: Johannes Stezenbach @ 2012-10-25 15:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 25, 2012 at 09:25:06AM -0500, Rob Herring wrote:
> On 10/25/2012 09:16 AM, Johannes Stezenbach wrote:
> > On Thu, Oct 25, 2012 at 07:41:45AM -0500, Rob Herring wrote:
> >> On 10/25/2012 04:34 AM, Johannes Stezenbach wrote:
> >>> On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
> >>>
> >>>> While v6 can support unaligned accesses, it is optional and current
> >>>> compilers won't emit unaligned accesses. So we don't clear the A bit for
> >>>> v6.
> >>>
> >>> not true according to the gcc changes page
> >>
> >> What are you going to believe: documentation or what the compiler
> >> emitted? At least for ubuntu/linaro 4.6.3 which has the unaligned access
> >> support backported and 4.7.2, unaligned accesses are emitted for v7
> >> only. I guess default here means it is the default unless you change the
> >> default in your build of gcc.
> > 
> > Since ARMv6 can handle unaligned access in the same way as ARMv7
> > it seems a clear bug in gcc which might hopefully get fixed.
> > Thus in this case I think it is reasonable to follow the
> > gcc documentation, otherwise the code would break for ARMv6
> > when gcc gets fixed.
> 
> But the compiler can't assume the state of the U bit. I think it is
> still legal on v6 to not support unaligned accesses, but on v7 it is
> required. All the standard v6 ARM cores support it, but I'm not sure
> about custom cores or if there are SOCs with buses that don't support
> unaligned accesses properly.

Well, I read the "...since Linux version 2.6.28" comment
in the gcc changes page in the way that they assume the
U-bit is set. (Although I'm not sure it really is???)

Looking at it from another side, if using the hw unaligned
access capability gives a performance benefit then it
would be useful to support it even if gcc doesn't
behave as documented.  One could still use a special
unaligned.h for ARMv6 to get the performance benefit.
(If it doesn't give performance benfit, then why
bother, let's just use -mno-unaligned-access for v7, too.)

In the ARMv6 ARM unaligned support and the U-bit is
not optional, so you could use the same SoC bus argument
for some hypothetical v7 SoCs.


Johannes

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-10-25 15:08                           ` Johannes Stezenbach
@ 2012-11-05 10:48                             ` Dave Martin
  2012-11-05 11:13                               ` Russell King - ARM Linux
  0 siblings, 1 reply; 46+ messages in thread
From: Dave Martin @ 2012-11-05 10:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 25, 2012 at 05:08:16PM +0200, Johannes Stezenbach wrote:
> On Thu, Oct 25, 2012 at 09:25:06AM -0500, Rob Herring wrote:
> > On 10/25/2012 09:16 AM, Johannes Stezenbach wrote:
> > > On Thu, Oct 25, 2012 at 07:41:45AM -0500, Rob Herring wrote:
> > >> On 10/25/2012 04:34 AM, Johannes Stezenbach wrote:
> > >>> On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
> > >>>
> > >>>> While v6 can support unaligned accesses, it is optional and current
> > >>>> compilers won't emit unaligned accesses. So we don't clear the A bit for
> > >>>> v6.
> > >>>
> > >>> not true according to the gcc changes page
> > >>
> > >> What are you going to believe: documentation or what the compiler
> > >> emitted? At least for ubuntu/linaro 4.6.3 which has the unaligned access
> > >> support backported and 4.7.2, unaligned accesses are emitted for v7
> > >> only. I guess default here means it is the default unless you change the
> > >> default in your build of gcc.
> > > 
> > > Since ARMv6 can handle unaligned access in the same way as ARMv7
> > > it seems a clear bug in gcc which might hopefully get fixed.
> > > Thus in this case I think it is reasonable to follow the
> > > gcc documentation, otherwise the code would break for ARMv6
> > > when gcc gets fixed.
> > 
> > But the compiler can't assume the state of the U bit. I think it is
> > still legal on v6 to not support unaligned accesses, but on v7 it is
> > required. All the standard v6 ARM cores support it, but I'm not sure
> > about custom cores or if there are SOCs with buses that don't support
> > unaligned accesses properly.
> 
> Well, I read the "...since Linux version 2.6.28" comment
> in the gcc changes page in the way that they assume the
> U-bit is set. (Although I'm not sure it really is???)

Actually, the kernel checks the arch version and the U bit on boot,
and chooses the appropriate setting for the A bit depending on the
result.  (See arch/arm/mm/alignment.c:alignment_init().)

Currently, we depend on the CPU reset behaviour or firmware/
bootloader to set the U bit for v6, but the behaviour should be
correct either way, though unaligned accesses will obviously
perform (much) better with U=1.

>From the compiler's point of view we have always implemented the
U=1 behaviour, but it has to be done via the alignment fault
handler prior to v6 or with U=0.

> Looking at it from another side, if using the hw unaligned
> access capability gives a performance benefit then it
> would be useful to support it even if gcc doesn't
> behave as documented.  One could still use a special
> unaligned.h for ARMv6 to get the performance benefit.
> (If it doesn't give performance benfit, then why
> bother, let's just use -mno-unaligned-access for v7, too.)

For v7, we should definitely use -munaligned-access where available
(unless it's the default?)

For v6, the question is whether there is any legitimate reason
ever to run the kernel with U=0.  If not, could we explicitly
set it early and build with -munaligned-access where the compiler
supports this?

The only counterargument I can think of is that some people might
be running some ancient userspace which actually relies on the U=0
behaviour.  I don't know whether anyone is actually doing that on
v6, though.

Cheers
---Dave

> In the ARMv6 ARM unaligned support and the U-bit is
> not optional, so you could use the same SoC bus argument
> for some hypothetical v7 SoCs.
> 
> 
> Johannes
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-11-05 10:48                             ` Dave Martin
@ 2012-11-05 11:13                               ` Russell King - ARM Linux
  2012-11-05 13:02                                 ` Dave Martin
  2012-11-05 13:48                                 ` Rob Herring
  0 siblings, 2 replies; 46+ messages in thread
From: Russell King - ARM Linux @ 2012-11-05 11:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 05, 2012 at 10:48:50AM +0000, Dave Martin wrote:
> On Thu, Oct 25, 2012 at 05:08:16PM +0200, Johannes Stezenbach wrote:
> > On Thu, Oct 25, 2012 at 09:25:06AM -0500, Rob Herring wrote:
> > > On 10/25/2012 09:16 AM, Johannes Stezenbach wrote:
> > > > On Thu, Oct 25, 2012 at 07:41:45AM -0500, Rob Herring wrote:
> > > >> On 10/25/2012 04:34 AM, Johannes Stezenbach wrote:
> > > >>> On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
> > > >>>
> > > >>>> While v6 can support unaligned accesses, it is optional and current
> > > >>>> compilers won't emit unaligned accesses. So we don't clear the A bit for
> > > >>>> v6.
> > > >>>
> > > >>> not true according to the gcc changes page
> > > >>
> > > >> What are you going to believe: documentation or what the compiler
> > > >> emitted? At least for ubuntu/linaro 4.6.3 which has the unaligned access
> > > >> support backported and 4.7.2, unaligned accesses are emitted for v7
> > > >> only. I guess default here means it is the default unless you change the
> > > >> default in your build of gcc.
> > > > 
> > > > Since ARMv6 can handle unaligned access in the same way as ARMv7
> > > > it seems a clear bug in gcc which might hopefully get fixed.
> > > > Thus in this case I think it is reasonable to follow the
> > > > gcc documentation, otherwise the code would break for ARMv6
> > > > when gcc gets fixed.
> > > 
> > > But the compiler can't assume the state of the U bit. I think it is
> > > still legal on v6 to not support unaligned accesses, but on v7 it is
> > > required. All the standard v6 ARM cores support it, but I'm not sure
> > > about custom cores or if there are SOCs with buses that don't support
> > > unaligned accesses properly.
> > 
> > Well, I read the "...since Linux version 2.6.28" comment
> > in the gcc changes page in the way that they assume the
> > U-bit is set. (Although I'm not sure it really is???)
> 
> Actually, the kernel checks the arch version and the U bit on boot,
> and chooses the appropriate setting for the A bit depending on the
> result.  (See arch/arm/mm/alignment.c:alignment_init().)

That is in the kernel itself, _after_ the decompressor has run.  It is
not relevant to any discussion about the decompressor.

> Currently, we depend on the CPU reset behaviour or firmware/
> bootloader to set the U bit for v6, but the behaviour should be
> correct either way, though unaligned accesses will obviously
> perform (much) better with U=1.

Will someone _PLEASE_ address my initial comments against this patch
in light of the fact that it's now been proven _NOT_ to be just a V7
issue, rather than everyone seemingly buring their heads in the sand
over this.

The fact is, unaligned accesses in the decompressor are *undefined* at
present.

> For v7, we should definitely use -munaligned-access where available
> (unless it's the default?)

No such option on my compiler - according to the manual I have, the only
option there is starting -munaligned is on SPARC for -munaligned-doubles.

However, I believe GCC does believe that unaligned accesses are fine on
V6 and above.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-11-05 11:13                               ` Russell King - ARM Linux
@ 2012-11-05 13:02                                 ` Dave Martin
  2012-11-05 13:43                                   ` Johannes Stezenbach
  2012-11-05 15:39                                   ` Russell King - ARM Linux
  2012-11-05 13:48                                 ` Rob Herring
  1 sibling, 2 replies; 46+ messages in thread
From: Dave Martin @ 2012-11-05 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 05, 2012 at 11:13:46AM +0000, Russell King - ARM Linux wrote:
> On Mon, Nov 05, 2012 at 10:48:50AM +0000, Dave Martin wrote:
> > On Thu, Oct 25, 2012 at 05:08:16PM +0200, Johannes Stezenbach wrote:
> > > On Thu, Oct 25, 2012 at 09:25:06AM -0500, Rob Herring wrote:
> > > > On 10/25/2012 09:16 AM, Johannes Stezenbach wrote:
> > > > > On Thu, Oct 25, 2012 at 07:41:45AM -0500, Rob Herring wrote:
> > > > >> On 10/25/2012 04:34 AM, Johannes Stezenbach wrote:
> > > > >>> On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
> > > > >>>
> > > > >>>> While v6 can support unaligned accesses, it is optional and current
> > > > >>>> compilers won't emit unaligned accesses. So we don't clear the A bit for
> > > > >>>> v6.
> > > > >>>
> > > > >>> not true according to the gcc changes page
> > > > >>
> > > > >> What are you going to believe: documentation or what the compiler
> > > > >> emitted? At least for ubuntu/linaro 4.6.3 which has the unaligned access
> > > > >> support backported and 4.7.2, unaligned accesses are emitted for v7
> > > > >> only. I guess default here means it is the default unless you change the
> > > > >> default in your build of gcc.
> > > > > 
> > > > > Since ARMv6 can handle unaligned access in the same way as ARMv7
> > > > > it seems a clear bug in gcc which might hopefully get fixed.
> > > > > Thus in this case I think it is reasonable to follow the
> > > > > gcc documentation, otherwise the code would break for ARMv6
> > > > > when gcc gets fixed.
> > > > 
> > > > But the compiler can't assume the state of the U bit. I think it is
> > > > still legal on v6 to not support unaligned accesses, but on v7 it is
> > > > required. All the standard v6 ARM cores support it, but I'm not sure
> > > > about custom cores or if there are SOCs with buses that don't support
> > > > unaligned accesses properly.
> > > 
> > > Well, I read the "...since Linux version 2.6.28" comment
> > > in the gcc changes page in the way that they assume the
> > > U-bit is set. (Although I'm not sure it really is???)
> > 
> > Actually, the kernel checks the arch version and the U bit on boot,
> > and chooses the appropriate setting for the A bit depending on the
> > result.  (See arch/arm/mm/alignment.c:alignment_init().)
> 
> That is in the kernel itself, _after_ the decompressor has run.  It is
> not relevant to any discussion about the decompressor.

This was merely meant as an argument that the kernel does not make
assumptions about the U bit, and so the zImage decompressor probably
shouldn't either.

> > Currently, we depend on the CPU reset behaviour or firmware/
> > bootloader to set the U bit for v6, but the behaviour should be
> > correct either way, though unaligned accesses will obviously
> > perform (much) better with U=1.
> 
> Will someone _PLEASE_ address my initial comments against this patch
> in light of the fact that it's now been proven _NOT_ to be just a V7
> issue, rather than everyone seemingly buring their heads in the sand
> over this.
> 
> The fact is, unaligned accesses in the decompressor are *undefined* at
> present.

Why not allow unaligned accesses in the decompressor, though, both
for v6 and v7?

The decompressors run with the cache on, so we should have fault-free
unaligned access on all CPUs which support it, provided that we set the
SCTLR bits appropriately, and provided that the decompressers are
written in correct C.

The counterargument is that zImage would then just fall over if booted
on a CPU older than the baseline architecture the kernel was built for.
However, that should be detected before executing any C code, giving
the zImage code a chance to bail out with a suitable error message.

A kernel which contains support for v6/v7 platforms cannot work on older
CPUs anyway, because the kernel itself doesn't support such combinations.

> > For v7, we should definitely use -munaligned-access where available
> > (unless it's the default?)
> 
> No such option on my compiler - according to the manual I have, the only
> option there is starting -munaligned is on SPARC for -munaligned-doubles.

OK, I guess that's something backported into the Linaro toolchain I'm
currently using then.  But it seems a good idea to use this if available,
because it allows the compiler to generate better code in some situations,
especially for packed struct access.
 
> However, I believe GCC does believe that unaligned accesses are fine on
> V6 and above.

Possibly, but I've never seen it use them deliberately, prior to the
-munaligned-access support.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-11-05 13:02                                 ` Dave Martin
@ 2012-11-05 13:43                                   ` Johannes Stezenbach
  2012-11-05 15:39                                   ` Russell King - ARM Linux
  1 sibling, 0 replies; 46+ messages in thread
From: Johannes Stezenbach @ 2012-11-05 13:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 05, 2012 at 01:02:55PM +0000, Dave Martin wrote:
> On Mon, Nov 05, 2012 at 11:13:46AM +0000, Russell King - ARM Linux wrote:
> > On Mon, Nov 05, 2012 at 10:48:50AM +0000, Dave Martin wrote:
> > > For v7, we should definitely use -munaligned-access where available
> > > (unless it's the default?)
> > 
> > No such option on my compiler - according to the manual I have, the only
> > option there is starting -munaligned is on SPARC for -munaligned-doubles.
> 
> OK, I guess that's something backported into the Linaro toolchain I'm
> currently using then.  But it seems a good idea to use this if available,
> because it allows the compiler to generate better code in some situations,
> especially for packed struct access.
>  
> > However, I believe GCC does believe that unaligned accesses are fine on
> > V6 and above.
> 
> Possibly, but I've never seen it use them deliberately, prior to the
> -munaligned-access support.

http://gcc.gnu.org/gcc-4.7/changes.html says -munaligned-access is
enabled by default.

I haven't had a chance to try gcc-4.7 yet, but gcc-4.6+linaro
has the option but fails to generate the expected code for ARMv6
while it works for ARMv7.  However it does

#define __ARM_FEATURE_UNALIGNED 1

and

        .eabi_attribute 34, 1

This seems to be the commit which introduced unaligned support in gcc:
http://repo.or.cz/w/official-gcc.git/commitdiff/eb04cafba3a6f1eddbdb5ec031d8a7074930d5b9
I cannot figure out why this works for v7 but not for v6.

(I used gcc-4.6.4 20121001 (prerelease) toolchain built with crosstool-NG.)


Johannes

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-11-05 11:13                               ` Russell King - ARM Linux
  2012-11-05 13:02                                 ` Dave Martin
@ 2012-11-05 13:48                                 ` Rob Herring
  2012-11-05 22:02                                   ` Michael Hope
  1 sibling, 1 reply; 46+ messages in thread
From: Rob Herring @ 2012-11-05 13:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/05/2012 05:13 AM, Russell King - ARM Linux wrote:
> On Mon, Nov 05, 2012 at 10:48:50AM +0000, Dave Martin wrote:
>> On Thu, Oct 25, 2012 at 05:08:16PM +0200, Johannes Stezenbach wrote:
>>> On Thu, Oct 25, 2012 at 09:25:06AM -0500, Rob Herring wrote:
>>>> On 10/25/2012 09:16 AM, Johannes Stezenbach wrote:
>>>>> On Thu, Oct 25, 2012 at 07:41:45AM -0500, Rob Herring wrote:
>>>>>> On 10/25/2012 04:34 AM, Johannes Stezenbach wrote:
>>>>>>> On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
>>>>>>>
>>>>>>>> While v6 can support unaligned accesses, it is optional and current
>>>>>>>> compilers won't emit unaligned accesses. So we don't clear the A bit for
>>>>>>>> v6.
>>>>>>>
>>>>>>> not true according to the gcc changes page
>>>>>>
>>>>>> What are you going to believe: documentation or what the compiler
>>>>>> emitted? At least for ubuntu/linaro 4.6.3 which has the unaligned access
>>>>>> support backported and 4.7.2, unaligned accesses are emitted for v7
>>>>>> only. I guess default here means it is the default unless you change the
>>>>>> default in your build of gcc.
>>>>>
>>>>> Since ARMv6 can handle unaligned access in the same way as ARMv7
>>>>> it seems a clear bug in gcc which might hopefully get fixed.
>>>>> Thus in this case I think it is reasonable to follow the
>>>>> gcc documentation, otherwise the code would break for ARMv6
>>>>> when gcc gets fixed.
>>>>
>>>> But the compiler can't assume the state of the U bit. I think it is
>>>> still legal on v6 to not support unaligned accesses, but on v7 it is
>>>> required. All the standard v6 ARM cores support it, but I'm not sure
>>>> about custom cores or if there are SOCs with buses that don't support
>>>> unaligned accesses properly.
>>>
>>> Well, I read the "...since Linux version 2.6.28" comment
>>> in the gcc changes page in the way that they assume the
>>> U-bit is set. (Although I'm not sure it really is???)
>>
>> Actually, the kernel checks the arch version and the U bit on boot,
>> and chooses the appropriate setting for the A bit depending on the
>> result.  (See arch/arm/mm/alignment.c:alignment_init().)
> 
> That is in the kernel itself, _after_ the decompressor has run.  It is
> not relevant to any discussion about the decompressor.
> 
>> Currently, we depend on the CPU reset behaviour or firmware/
>> bootloader to set the U bit for v6, but the behaviour should be
>> correct either way, though unaligned accesses will obviously
>> perform (much) better with U=1.
> 
> Will someone _PLEASE_ address my initial comments against this patch
> in light of the fact that it's now been proven _NOT_ to be just a V7
> issue, rather than everyone seemingly buring their heads in the sand
> over this.

I tried adding -munaligned-accesses on a v6 build and still get byte
accesses rather than unaligned word accesses. So this does seem to be a
v7 only issue based on what gcc will currently produce. Copying Michael
Hope who can hopefully provide some insight on why v6 unaligned accesses
are not enabled.

> The fact is, unaligned accesses in the decompressor are *undefined* at
> present.
> 
>> For v7, we should definitely use -munaligned-access where available
>> (unless it's the default?)
> 
> No such option on my compiler - according to the manual I have, the only
> option there is starting -munaligned is on SPARC for -munaligned-doubles.

It's only added in 4.7 and backported to Linaro 4.6.3.

Rob

> However, I believe GCC does believe that unaligned accesses are fine on
> V6 and above.
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-11-05 13:02                                 ` Dave Martin
  2012-11-05 13:43                                   ` Johannes Stezenbach
@ 2012-11-05 15:39                                   ` Russell King - ARM Linux
  2012-11-05 16:13                                     ` Nicolas Pitre
  1 sibling, 1 reply; 46+ messages in thread
From: Russell King - ARM Linux @ 2012-11-05 15:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 05, 2012 at 01:02:55PM +0000, Dave Martin wrote:
> Why not allow unaligned accesses in the decompressor, though, both
> for v6 and v7?

EXACTLY.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-11-05 15:39                                   ` Russell King - ARM Linux
@ 2012-11-05 16:13                                     ` Nicolas Pitre
  2012-11-05 17:26                                       ` Dave Martin
  0 siblings, 1 reply; 46+ messages in thread
From: Nicolas Pitre @ 2012-11-05 16:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 5 Nov 2012, Russell King - ARM Linux wrote:

> On Mon, Nov 05, 2012 at 01:02:55PM +0000, Dave Martin wrote:
> > Why not allow unaligned accesses in the decompressor, though, both
> > for v6 and v7?
> 
> EXACTLY.

I have no objections to that.  In fact, I made a remark to this effect 
in my initial review of this patch.  Whether or not gcc does take 
advantage of this hardware ability in the end is orthogonal.


Nicolas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-11-05 16:13                                     ` Nicolas Pitre
@ 2012-11-05 17:26                                       ` Dave Martin
  2012-11-05 17:44                                         ` Rob Herring
  2012-11-05 17:46                                         ` Nicolas Pitre
  0 siblings, 2 replies; 46+ messages in thread
From: Dave Martin @ 2012-11-05 17:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 05, 2012 at 11:13:51AM -0500, Nicolas Pitre wrote:
> On Mon, 5 Nov 2012, Russell King - ARM Linux wrote:
> 
> > On Mon, Nov 05, 2012 at 01:02:55PM +0000, Dave Martin wrote:
> > > Why not allow unaligned accesses in the decompressor, though, both
> > > for v6 and v7?
> > 
> > EXACTLY.
> 
> I have no objections to that.  In fact, I made a remark to this effect 
> in my initial review of this patch.  Whether or not gcc does take 
> advantage of this hardware ability in the end is orthogonal.

For the sake of argument, here's how it might look.

Currently, I make no attempt to restore the original state of the U bit.
The A bit if forced later by the kernel during boot, after a short window
during which we should only run low-level arch code and therefore where
no unaligned accesses should happen.

Does anyone think these issues are likely to be important?

Cheers
---Dave


>From 160a5576b53264951ff8164775146b2d4feddecb Mon Sep 17 00:00:00 2001
From: Dave Martin <dave.martin@linaro.org>
Date: Mon, 5 Nov 2012 16:34:57 +0000
Subject: [PATCH] ARM: decompressor: Enable unaligned memory access for v6 and above

Modern GCC can generate code which makes use of the CPU's native
unaligned memory access capabilities.  This is useful for the C
decompressor implementations used for unpacking compressed kernels.

This patch disables the alignment faults and enabled the v6
unaligned access on CPUs which support these features (i.e., v6 and
later), allowing full unaligned access support for C code in the
decompressor.

The decompressor C code must not be built to assume that unaligned
access works if support for v5 or older platforms is included in
the kernel.

Signed-off-by: Dave Martin <dave.martin@linaro.org>
---
Note: I have only build-tested this so far.

 arch/arm/boot/compressed/head.S |   12 +++++++++++-
 1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 90275f0..cfbb4c9 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -652,6 +652,15 @@ __setup_mmu:	sub	r3, r4, #16384		@ Page directory size
 		mov	pc, lr
 ENDPROC(__setup_mmu)
 
+@ Enable unaligned access on v6, to allow better code generation
+@ for the decompressor C code:
+__armv6_mmu_cache_on:
+		mrc	p15, 0, r0, c1, c0, 0	@ read SCTLR
+		bic	r0, r0, #2		@ A (no unaligned access fault)
+		orr	r0, r0, #1 << 22	@ U (v6 unaligned access model)
+		mcr	p15, 0, r0, c1, c0, 0	@ write SCTLR
+		b	__armv4_mmu_cache_on
+
 __arm926ejs_mmu_cache_on:
 #ifdef CONFIG_CPU_DCACHE_WRITETHROUGH
 		mov	r0, #4			@ put dcache in WT mode
@@ -694,6 +703,7 @@ __armv7_mmu_cache_on:
 		bic	r0, r0, #1 << 28	@ clear SCTLR.TRE
 		orr	r0, r0, #0x5000		@ I-cache enable, RR cache replacement
 		orr	r0, r0, #0x003c		@ write buffer
+		bic	r0, r0, #2		@ A (no unaligned access fault)
 #ifdef CONFIG_MMU
 #ifdef CONFIG_CPU_ENDIAN_BE8
 		orr	r0, r0, #1 << 25	@ big-endian page tables
@@ -914,7 +924,7 @@ proc_types:
 
 		.word	0x0007b000		@ ARMv6
 		.word	0x000ff000
-		W(b)	__armv4_mmu_cache_on
+		W(b)	__armv6_mmu_cache_on
 		W(b)	__armv4_mmu_cache_off
 		W(b)	__armv6_mmu_cache_flush
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-11-05 17:26                                       ` Dave Martin
@ 2012-11-05 17:44                                         ` Rob Herring
  2012-11-05 20:02                                           ` Nicolas Pitre
  2012-11-05 17:46                                         ` Nicolas Pitre
  1 sibling, 1 reply; 46+ messages in thread
From: Rob Herring @ 2012-11-05 17:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/05/2012 11:26 AM, Dave Martin wrote:
> On Mon, Nov 05, 2012 at 11:13:51AM -0500, Nicolas Pitre wrote:
>> On Mon, 5 Nov 2012, Russell King - ARM Linux wrote:
>>
>>> On Mon, Nov 05, 2012 at 01:02:55PM +0000, Dave Martin wrote:
>>>> Why not allow unaligned accesses in the decompressor, though, both
>>>> for v6 and v7?
>>>
>>> EXACTLY.
>>
>> I have no objections to that.  In fact, I made a remark to this effect 
>> in my initial review of this patch.  Whether or not gcc does take 
>> advantage of this hardware ability in the end is orthogonal.
> 
> For the sake of argument, here's how it might look.
> 
> Currently, I make no attempt to restore the original state of the U bit.
> The A bit if forced later by the kernel during boot, after a short window
> during which we should only run low-level arch code and therefore where
> no unaligned accesses should happen.
> 
> Does anyone think these issues are likely to be important?
> 

And here is my updated version that does v6 somewhat differently:

8<------------------------------------------------------------------
>From 76c2b7685397f13aa53f426822128430fc24b8a0 Mon Sep 17 00:00:00 2001
From: Rob Herring <rob.herring@calxeda.com>
Date: Mon, 5 Nov 2012 11:39:48 -0600
Subject: [PATCH v2] ARM: decompressor: clear SCTLR.A bit for v6 and v7 cores

With recent compilers and move to generic unaligned.h in commit d25c881
(ARM: 7493/1: use generic unaligned.h), unaligned accesses will be used
by the LZO decompressor on v7 cores. So we need to make sure unaligned
accesses are allowed by clearing the SCTLR A bit.

While v6 can support unaligned accesses, it is optional and current
compilers won't emit unaligned accesses. In case this changes and to align
with the kernel behavior, we clear the A bit and set the U bit.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
---
 arch/arm/boot/compressed/head.S |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index bc67cbf..f14d7ec 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -629,6 +629,11 @@ __armv4_mmu_cache_on:
 		mcr	p15, 0, r0, c7, c10, 4	@ drain write buffer
 		mcr	p15, 0, r0, c8, c7, 0	@ flush I,D TLBs
 		mrc	p15, 0, r0, c1, c0, 0	@ read control reg
+		mrc	p15, 0, r11, c0, c0	@ get processor ID
+		and	r11, r11, #0xf0000
+		tst	r11, #0x70000		@ ARMv6
+		orreq	r0, r0, #1 << 22	@ set SCTLR.U
+		biceq	r0, r0, #1 << 1		@ clear SCTLR.A
 		orr	r0, r0, #0x5000		@ I-cache enable, RR cache replacement
 		orr	r0, r0, #0x0030
 #ifdef CONFIG_CPU_ENDIAN_BE8
@@ -654,6 +659,7 @@ __armv7_mmu_cache_on:
 #endif
 		mrc	p15, 0, r0, c1, c0, 0	@ read control reg
 		bic	r0, r0, #1 << 28	@ clear SCTLR.TRE
+		bic	r0, r0, #1 << 1		@ clear SCTLR.A
 		orr	r0, r0, #0x5000		@ I-cache enable, RR cache replacement
 		orr	r0, r0, #0x003c		@ write buffer
 #ifdef CONFIG_MMU
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-11-05 17:26                                       ` Dave Martin
  2012-11-05 17:44                                         ` Rob Herring
@ 2012-11-05 17:46                                         ` Nicolas Pitre
  1 sibling, 0 replies; 46+ messages in thread
From: Nicolas Pitre @ 2012-11-05 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 5 Nov 2012, Dave Martin wrote:

> On Mon, Nov 05, 2012 at 11:13:51AM -0500, Nicolas Pitre wrote:
> > On Mon, 5 Nov 2012, Russell King - ARM Linux wrote:
> > 
> > > On Mon, Nov 05, 2012 at 01:02:55PM +0000, Dave Martin wrote:
> > > > Why not allow unaligned accesses in the decompressor, though, both
> > > > for v6 and v7?
> > > 
> > > EXACTLY.
> > 
> > I have no objections to that.  In fact, I made a remark to this effect 
> > in my initial review of this patch.  Whether or not gcc does take 
> > advantage of this hardware ability in the end is orthogonal.
> 
> For the sake of argument, here's how it might look.
> 
> Currently, I make no attempt to restore the original state of the U bit.
> The A bit if forced later by the kernel during boot, after a short window
> during which we should only run low-level arch code and therefore where
> no unaligned accesses should happen.
> 
> Does anyone think these issues are likely to be important?
> 
> Cheers
> ---Dave
> 
> 
> >From 160a5576b53264951ff8164775146b2d4feddecb Mon Sep 17 00:00:00 2001
> From: Dave Martin <dave.martin@linaro.org>
> Date: Mon, 5 Nov 2012 16:34:57 +0000
> Subject: [PATCH] ARM: decompressor: Enable unaligned memory access for v6 and above
> 
> Modern GCC can generate code which makes use of the CPU's native
> unaligned memory access capabilities.  This is useful for the C
> decompressor implementations used for unpacking compressed kernels.
> 
> This patch disables the alignment faults and enabled the v6
> unaligned access on CPUs which support these features (i.e., v6 and
> later), allowing full unaligned access support for C code in the
> decompressor.
> 
> The decompressor C code must not be built to assume that unaligned
> access works if support for v5 or older platforms is included in
> the kernel.

I'd suggest here that the code must always use the get_unaligned and 
put_unaligned accessors when dealing with unaligned pointers, regardless 
of this patch.  The compiler will optimize the access performed via 
those accessors when possible.

> Signed-off-by: Dave Martin <dave.martin@linaro.org>

Acked-by: Nicolas Pitre <nico@linaro.org>


> ---
> Note: I have only build-tested this so far.
> 
>  arch/arm/boot/compressed/head.S |   12 +++++++++++-
>  1 files changed, 11 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index 90275f0..cfbb4c9 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -652,6 +652,15 @@ __setup_mmu:	sub	r3, r4, #16384		@ Page directory size
>  		mov	pc, lr
>  ENDPROC(__setup_mmu)
>  
> +@ Enable unaligned access on v6, to allow better code generation
> +@ for the decompressor C code:
> +__armv6_mmu_cache_on:
> +		mrc	p15, 0, r0, c1, c0, 0	@ read SCTLR
> +		bic	r0, r0, #2		@ A (no unaligned access fault)
> +		orr	r0, r0, #1 << 22	@ U (v6 unaligned access model)
> +		mcr	p15, 0, r0, c1, c0, 0	@ write SCTLR
> +		b	__armv4_mmu_cache_on
> +
>  __arm926ejs_mmu_cache_on:
>  #ifdef CONFIG_CPU_DCACHE_WRITETHROUGH
>  		mov	r0, #4			@ put dcache in WT mode
> @@ -694,6 +703,7 @@ __armv7_mmu_cache_on:
>  		bic	r0, r0, #1 << 28	@ clear SCTLR.TRE
>  		orr	r0, r0, #0x5000		@ I-cache enable, RR cache replacement
>  		orr	r0, r0, #0x003c		@ write buffer
> +		bic	r0, r0, #2		@ A (no unaligned access fault)
>  #ifdef CONFIG_MMU
>  #ifdef CONFIG_CPU_ENDIAN_BE8
>  		orr	r0, r0, #1 << 25	@ big-endian page tables
> @@ -914,7 +924,7 @@ proc_types:
>  
>  		.word	0x0007b000		@ ARMv6
>  		.word	0x000ff000
> -		W(b)	__armv4_mmu_cache_on
> +		W(b)	__armv6_mmu_cache_on
>  		W(b)	__armv4_mmu_cache_off
>  		W(b)	__armv6_mmu_cache_flush
>  
> -- 
> 1.7.4.1
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-11-05 17:44                                         ` Rob Herring
@ 2012-11-05 20:02                                           ` Nicolas Pitre
  0 siblings, 0 replies; 46+ messages in thread
From: Nicolas Pitre @ 2012-11-05 20:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 5 Nov 2012, Rob Herring wrote:

> On 11/05/2012 11:26 AM, Dave Martin wrote:
> > On Mon, Nov 05, 2012 at 11:13:51AM -0500, Nicolas Pitre wrote:
> >> On Mon, 5 Nov 2012, Russell King - ARM Linux wrote:
> >>
> >>> On Mon, Nov 05, 2012 at 01:02:55PM +0000, Dave Martin wrote:
> >>>> Why not allow unaligned accesses in the decompressor, though, both
> >>>> for v6 and v7?
> >>>
> >>> EXACTLY.
> >>
> >> I have no objections to that.  In fact, I made a remark to this effect 
> >> in my initial review of this patch.  Whether or not gcc does take 
> >> advantage of this hardware ability in the end is orthogonal.
> > 
> > For the sake of argument, here's how it might look.
> > 
> > Currently, I make no attempt to restore the original state of the U bit.
> > The A bit if forced later by the kernel during boot, after a short window
> > during which we should only run low-level arch code and therefore where
> > no unaligned accesses should happen.
> > 
> > Does anyone think these issues are likely to be important?
> > 
> 
> And here is my updated version that does v6 somewhat differently:

If I had to choose, I'd prefer Dave's version as being a bit cleaner.

> 
> 8<------------------------------------------------------------------
> >From 76c2b7685397f13aa53f426822128430fc24b8a0 Mon Sep 17 00:00:00 2001
> From: Rob Herring <rob.herring@calxeda.com>
> Date: Mon, 5 Nov 2012 11:39:48 -0600
> Subject: [PATCH v2] ARM: decompressor: clear SCTLR.A bit for v6 and v7 cores
> 
> With recent compilers and move to generic unaligned.h in commit d25c881
> (ARM: 7493/1: use generic unaligned.h), unaligned accesses will be used
> by the LZO decompressor on v7 cores. So we need to make sure unaligned
> accesses are allowed by clearing the SCTLR A bit.
> 
> While v6 can support unaligned accesses, it is optional and current
> compilers won't emit unaligned accesses. In case this changes and to align
> with the kernel behavior, we clear the A bit and set the U bit.
> 
> Signed-off-by: Rob Herring <rob.herring@calxeda.com>
> Acked-by: Nicolas Pitre <nico@linaro.org>
> Tested-by: Shawn Guo <shawn.guo@linaro.org>
> ---
>  arch/arm/boot/compressed/head.S |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index bc67cbf..f14d7ec 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -629,6 +629,11 @@ __armv4_mmu_cache_on:
>  		mcr	p15, 0, r0, c7, c10, 4	@ drain write buffer
>  		mcr	p15, 0, r0, c8, c7, 0	@ flush I,D TLBs
>  		mrc	p15, 0, r0, c1, c0, 0	@ read control reg
> +		mrc	p15, 0, r11, c0, c0	@ get processor ID
> +		and	r11, r11, #0xf0000
> +		tst	r11, #0x70000		@ ARMv6
> +		orreq	r0, r0, #1 << 22	@ set SCTLR.U
> +		biceq	r0, r0, #1 << 1		@ clear SCTLR.A
>  		orr	r0, r0, #0x5000		@ I-cache enable, RR cache replacement
>  		orr	r0, r0, #0x0030
>  #ifdef CONFIG_CPU_ENDIAN_BE8
> @@ -654,6 +659,7 @@ __armv7_mmu_cache_on:
>  #endif
>  		mrc	p15, 0, r0, c1, c0, 0	@ read control reg
>  		bic	r0, r0, #1 << 28	@ clear SCTLR.TRE
> +		bic	r0, r0, #1 << 1		@ clear SCTLR.A
>  		orr	r0, r0, #0x5000		@ I-cache enable, RR cache replacement
>  		orr	r0, r0, #0x003c		@ write buffer
>  #ifdef CONFIG_MMU
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-11-05 13:48                                 ` Rob Herring
@ 2012-11-05 22:02                                   ` Michael Hope
  2013-02-20 14:56                                     ` Johannes Stezenbach
  0 siblings, 1 reply; 46+ messages in thread
From: Michael Hope @ 2012-11-05 22:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 November 2012 02:48, Rob Herring <robherring2@gmail.com> wrote:
>
> On 11/05/2012 05:13 AM, Russell King - ARM Linux wrote:
> > On Mon, Nov 05, 2012 at 10:48:50AM +0000, Dave Martin wrote:
> >> On Thu, Oct 25, 2012 at 05:08:16PM +0200, Johannes Stezenbach wrote:
> >>> On Thu, Oct 25, 2012 at 09:25:06AM -0500, Rob Herring wrote:
> >>>> On 10/25/2012 09:16 AM, Johannes Stezenbach wrote:
> >>>>> On Thu, Oct 25, 2012 at 07:41:45AM -0500, Rob Herring wrote:
> >>>>>> On 10/25/2012 04:34 AM, Johannes Stezenbach wrote:
> >>>>>>> On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
> >>>>>>>
> >>>>>>>> While v6 can support unaligned accesses, it is optional and current
> >>>>>>>> compilers won't emit unaligned accesses. So we don't clear the A bit for
> >>>>>>>> v6.
> >>>>>>>
> >>>>>>> not true according to the gcc changes page
> >>>>>>
> >>>>>> What are you going to believe: documentation or what the compiler
> >>>>>> emitted? At least for ubuntu/linaro 4.6.3 which has the unaligned access
> >>>>>> support backported and 4.7.2, unaligned accesses are emitted for v7
> >>>>>> only. I guess default here means it is the default unless you change the
> >>>>>> default in your build of gcc.
> >>>>>
> >>>>> Since ARMv6 can handle unaligned access in the same way as ARMv7
> >>>>> it seems a clear bug in gcc which might hopefully get fixed.
> >>>>> Thus in this case I think it is reasonable to follow the
> >>>>> gcc documentation, otherwise the code would break for ARMv6
> >>>>> when gcc gets fixed.
> >>>>
> >>>> But the compiler can't assume the state of the U bit. I think it is
> >>>> still legal on v6 to not support unaligned accesses, but on v7 it is
> >>>> required. All the standard v6 ARM cores support it, but I'm not sure
> >>>> about custom cores or if there are SOCs with buses that don't support
> >>>> unaligned accesses properly.
> >>>
> >>> Well, I read the "...since Linux version 2.6.28" comment
> >>> in the gcc changes page in the way that they assume the
> >>> U-bit is set. (Although I'm not sure it really is???)
> >>
> >> Actually, the kernel checks the arch version and the U bit on boot,
> >> and chooses the appropriate setting for the A bit depending on the
> >> result.  (See arch/arm/mm/alignment.c:alignment_init().)
> >
> > That is in the kernel itself, _after_ the decompressor has run.  It is
> > not relevant to any discussion about the decompressor.
> >
> >> Currently, we depend on the CPU reset behaviour or firmware/
> >> bootloader to set the U bit for v6, but the behaviour should be
> >> correct either way, though unaligned accesses will obviously
> >> perform (much) better with U=1.
> >
> > Will someone _PLEASE_ address my initial comments against this patch
> > in light of the fact that it's now been proven _NOT_ to be just a V7
> > issue, rather than everyone seemingly buring their heads in the sand
> > over this.
>
> I tried adding -munaligned-accesses on a v6 build and still get byte
> accesses rather than unaligned word accesses. So this does seem to be a
> v7 only issue based on what gcc will currently produce. Copying Michael
> Hope who can hopefully provide some insight on why v6 unaligned accesses
> are not enabled.

This looks like a bug.  Unaligned access is enabled for armv6 but
seems to only take effect for cores with Thumb-2.  Here's a test case
both with unaligned field access and unaligned block copy:

struct foo
{
  char a;
  int b;
  struct
  {
    int x[3];
  } c;
} __attribute__((packed));

int get_field(struct foo *p)
{
  return p->b;
}

int copy_block(struct foo *p, struct foo *q)
{
  p->c = q->c;
}

With -march=armv7-a you get the correct:

bar:
	ldr	r0, [r0, #1]	@ unaligned	@ 11	unaligned_loadsi/2	[length = 4]
	bx	lr	@ 21	*arm_return	[length = 12]

baz:
	str	r4, [sp, #-4]!	@ 25	*push_multi	[length = 4]
	mov	r2, r0	@ 2	*arm_movsi_vfp/1	[length = 4]
	ldr	r4, [r1, #5]!	@ unaligned	@ 9	unaligned_loadsi/2	[length = 4]
	ldr	ip, [r1, #4]	@ unaligned	@ 10	unaligned_loadsi/2	[length = 4]
	ldr	r1, [r1, #8]	@ unaligned	@ 11	unaligned_loadsi/2	[length = 4]
	str	r4, [r2, #5]	@ unaligned	@ 12	unaligned_storesi/2	[length = 4]
	str	ip, [r2, #9]	@ unaligned	@ 13	unaligned_storesi/2	[length = 4]
	str	r1, [r2, #13]	@ unaligned	@ 14	unaligned_storesi/2	[length = 4]
	ldmfd	sp!, {r4}
	bx	lr

With -march=armv6 you get a byte-by-byte field access and a correct
unaligned block copy:

bar:
	ldrb	r1, [r0, #2]	@ zero_extendqisi2
	ldrb	r3, [r0, #1]	@ zero_extendqisi2
	ldrb	r2, [r0, #3]	@ zero_extendqisi2
	ldrb	r0, [r0, #4]	@ zero_extendqisi2
	orr	r3, r3, r1, asl #8
	orr	r3, r3, r2, asl #16
	orr	r0, r3, r0, asl #24
	bx	lr

baz:
	str	r4, [sp, #-4]!
	mov	r2, r0
	ldr	r4, [r1, #5]!	@ unaligned
	ldr	ip, [r1, #4]	@ unaligned
	ldr	r1, [r1, #8]	@ unaligned
	str	r4, [r2, #5]	@ unaligned
	str	ip, [r2, #9]	@ unaligned
	str	r1, [r2, #13]	@ unaligned
	ldmfd	sp!, {r4}
	bx	lr

readelf -A shows that the compiler planned to use unaligned access in
both.  My suspicion is that GCC is using the extv pattern to extract
the field from memory, and that pattern is only enabled for Thumb-2
capable cores.

I've logged PR55218.  We'll discuss it at our next meeting.

-- Michael

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2012-11-05 22:02                                   ` Michael Hope
@ 2013-02-20 14:56                                     ` Johannes Stezenbach
  2013-02-20 15:07                                       ` Johannes Stezenbach
  0 siblings, 1 reply; 46+ messages in thread
From: Johannes Stezenbach @ 2013-02-20 14:56 UTC (permalink / raw)
  To: linux-arm-kernel

Replying to old thread, for full context see
http://thread.gmane.org/gmane.linux.ports.arm.kernel/180226/focus=197914

On Tue, Nov 06, 2012 at 11:02:27AM +1300, Michael Hope wrote:
> On 6 November 2012 02:48, Rob Herring <robherring2@gmail.com> wrote:
> >
> > I tried adding -munaligned-accesses on a v6 build and still get byte
> > accesses rather than unaligned word accesses. So this does seem to be a
> > v7 only issue based on what gcc will currently produce. Copying Michael
> > Hope who can hopefully provide some insight on why v6 unaligned accesses
> > are not enabled.
> 
> This looks like a bug.  Unaligned access is enabled for armv6 but
> seems to only take effect for cores with Thumb-2.  Here's a test case
> both with unaligned field access and unaligned block copy:
> 
> struct foo
> {
>   char a;
>   int b;
>   struct
>   {
>     int x[3];
>   } c;
> } __attribute__((packed));
> 
> int get_field(struct foo *p)
> {
>   return p->b;
> }
> 
> int copy_block(struct foo *p, struct foo *q)
> {
>   p->c = q->c;
> }
> 
> With -march=armv7-a you get the correct:
> 
> bar:
> 	ldr	r0, [r0, #1]	@ unaligned	@ 11	unaligned_loadsi/2	[length = 4]
> 	bx	lr	@ 21	*arm_return	[length = 12]
> 
> baz:
> 	str	r4, [sp, #-4]!	@ 25	*push_multi	[length = 4]
> 	mov	r2, r0	@ 2	*arm_movsi_vfp/1	[length = 4]
> 	ldr	r4, [r1, #5]!	@ unaligned	@ 9	unaligned_loadsi/2	[length = 4]
> 	ldr	ip, [r1, #4]	@ unaligned	@ 10	unaligned_loadsi/2	[length = 4]
> 	ldr	r1, [r1, #8]	@ unaligned	@ 11	unaligned_loadsi/2	[length = 4]
> 	str	r4, [r2, #5]	@ unaligned	@ 12	unaligned_storesi/2	[length = 4]
> 	str	ip, [r2, #9]	@ unaligned	@ 13	unaligned_storesi/2	[length = 4]
> 	str	r1, [r2, #13]	@ unaligned	@ 14	unaligned_storesi/2	[length = 4]
> 	ldmfd	sp!, {r4}
> 	bx	lr
> 
> With -march=armv6 you get a byte-by-byte field access and a correct
> unaligned block copy:
> 
> bar:
> 	ldrb	r1, [r0, #2]	@ zero_extendqisi2
> 	ldrb	r3, [r0, #1]	@ zero_extendqisi2
> 	ldrb	r2, [r0, #3]	@ zero_extendqisi2
> 	ldrb	r0, [r0, #4]	@ zero_extendqisi2
> 	orr	r3, r3, r1, asl #8
> 	orr	r3, r3, r2, asl #16
> 	orr	r0, r3, r0, asl #24
> 	bx	lr
> 
> baz:
> 	str	r4, [sp, #-4]!
> 	mov	r2, r0
> 	ldr	r4, [r1, #5]!	@ unaligned
> 	ldr	ip, [r1, #4]	@ unaligned
> 	ldr	r1, [r1, #8]	@ unaligned
> 	str	r4, [r2, #5]	@ unaligned
> 	str	ip, [r2, #9]	@ unaligned
> 	str	r1, [r2, #13]	@ unaligned
> 	ldmfd	sp!, {r4}
> 	bx	lr
> 
> readelf -A shows that the compiler planned to use unaligned access in
> both.  My suspicion is that GCC is using the extv pattern to extract
> the field from memory, and that pattern is only enabled for Thumb-2
> capable cores.
> 
> I've logged PR55218.  We'll discuss it at our next meeting.

Just tried with gcc-linaro-4.7-2013.01 (gcc-4.7.3 20130102 (prerelease)),
the issue is still unfixed.  Do you have any idea how to fix it?
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55218

Thanks,
Johannes

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores
  2013-02-20 14:56                                     ` Johannes Stezenbach
@ 2013-02-20 15:07                                       ` Johannes Stezenbach
  0 siblings, 0 replies; 46+ messages in thread
From: Johannes Stezenbach @ 2013-02-20 15:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 20, 2013 at 03:56:54PM +0100, Johannes Stezenbach wrote:
> Replying to old thread, for full context see
> http://thread.gmane.org/gmane.linux.ports.arm.kernel/180226/focus=197914
> 
> On Tue, Nov 06, 2012 at 11:02:27AM +1300, Michael Hope wrote:

Oh, mail to michael.hope at linaro.org bounces (account disabled).
That might explain why the bug is not fixed :-(

Johannes

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2013-02-20 15:07 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-05  1:23 [PATCH v2 0/5] Use more asm-generic headers Rob Herring
2012-08-05  1:23 ` [PATCH v2 1/5] ARM: use generic version of identical asm headers Rob Herring
2012-08-05  1:23 ` [PATCH v2 2/5] ARM: add strstr declaration for decompressors Rob Herring
2012-08-05  1:23 ` [PATCH v2 3/5] ARM: use generic unaligned.h Rob Herring
2012-10-08 16:43   ` Shawn Guo
2012-10-08 20:34     ` Rob Herring
2012-10-08 23:28       ` Shawn Guo
2012-10-09  2:27         ` Rob Herring
2012-10-09  2:45           ` Shawn Guo
2012-10-09  4:01           ` Nicolas Pitre
2012-10-10 13:29             ` Rob Herring
2012-10-10 13:57               ` Nicolas Pitre
2012-10-11 12:43                 ` [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores Rob Herring
2012-10-11 13:09                   ` Russell King - ARM Linux
2012-10-11 13:31                     ` Rob Herring
2012-10-11 13:41                       ` Russell King - ARM Linux
2012-10-11 15:44                         ` Nicolas Pitre
2012-10-23 20:32                           ` Gregory CLEMENT
2012-10-25 12:07                             ` Russell King - ARM Linux
2012-10-11 13:59                       ` Russell King - ARM Linux
2012-10-11 15:58                         ` Nicolas Pitre
2012-10-25  9:34                   ` Johannes Stezenbach
2012-10-25 12:41                     ` Rob Herring
2012-10-25 13:30                       ` Arnd Bergmann
2012-10-25 14:16                       ` Johannes Stezenbach
2012-10-25 14:25                         ` Rob Herring
2012-10-25 15:02                           ` Nicolas Pitre
2012-10-25 15:08                           ` Johannes Stezenbach
2012-11-05 10:48                             ` Dave Martin
2012-11-05 11:13                               ` Russell King - ARM Linux
2012-11-05 13:02                                 ` Dave Martin
2012-11-05 13:43                                   ` Johannes Stezenbach
2012-11-05 15:39                                   ` Russell King - ARM Linux
2012-11-05 16:13                                     ` Nicolas Pitre
2012-11-05 17:26                                       ` Dave Martin
2012-11-05 17:44                                         ` Rob Herring
2012-11-05 20:02                                           ` Nicolas Pitre
2012-11-05 17:46                                         ` Nicolas Pitre
2012-11-05 13:48                                 ` Rob Herring
2012-11-05 22:02                                   ` Michael Hope
2013-02-20 14:56                                     ` Johannes Stezenbach
2013-02-20 15:07                                       ` Johannes Stezenbach
2012-08-05  1:23 ` [PATCH v2 4/5] ARM: use generic termios.h Rob Herring
2012-08-05  1:24 ` [PATCH v2 5/5] ARM: convert asm/irqflags.h to use asm-generic/irqflags.h Rob Herring
2012-08-05  9:34 ` [PATCH v2 0/5] Use more asm-generic headers Thomas Petazzoni
2012-08-06 14:35 ` Arnd Bergmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.