linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support
@ 2015-06-09 11:48 Vineet Gupta
  2015-06-09 11:48 ` [PATCH 01/28] ARCv2: [intc] HS38 core interrupt controller Vineet Gupta
                   ` (27 more replies)
  0 siblings, 28 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel; +Cc: arnd, arc-linux-dev, Vineet Gupta

Hi,

ARCv2 is the next generation ISA from Synopsys and basis for the
HS3{4,6,8} families of processors which retain the traditional ARC mantra of
low power and configurability and are now more performant and feature rich.

Linux has been ported to HS38x series, a 10 stage pipeline core which
supports MMU (with huge pages) and SMP (upto 4 cores) among other features.

 - www.synopsys.com/dw/ipdir.php?ds=arc-hs38-processor
 - http://news.synopsys.com/2014-10-14-New-DesignWare-ARC-HS38-Processor-Doubles-Performance-for-Embedded-Linux-Applications
 - http://www.embedded.com/electronics-news/4435975/Synopsys-ARC-HS38-core-gives-2X-boost-to-Linux-based-apps

This sub-series builds upon the preparatory patches posted previously [1] and implements the
actual ARC HS38x core support.

Please review !

Thx,
-Vineet

[1] https://lkml.org/lkml/2015/6/7/25

Alexey Brodkin (1):
  ARC: [axs101] Prepare for AXS103

Claudiu Zissulescu (1):
  ARCv2: optimised string/mem lib routines

Ruud Derwig (1):
  ARCv2: [vdk] dts files and defconfig for HS38 VDK

Vineet Gupta (25):
  ARCv2: [intc] HS38 core interrupt controller
  ARCv2: Support for ARCv2 ISA and HS38x cores
  ARCv2: STAR 9000793984: Handle return from intr to Delay Slot
  ARCv2: STAR 9000808988: signals involving Delay Slot
  ARCv2: STAR 9000814690: Really Re-enable interrupts to avoid deadlocks
  ARCv2: MMUv4: TLB programming Model changes
  ARCv2: MMUv4: cache programming model changes
  ARCv2: MMUv4: support aliasing icache config
  ARCv2: Adhere to Zero Delay loop restriction
  ARCv2: extable: Enable sorting at build time
  ARCv2: clocksource: Introduce 64bit local RTC counter
  ARC: make plat_smp_ops weak to allow over-rides
  ARCv2: SMP: ARConnect debug/robustness
  ARCv2: SMP: clocksource: Enable Global Real Time counter
  ARCv2: SMP: intc: IDU 2nd level intc for dynamic IRQ distribution
  ARC: add compiler barrier to LLSC based cmpxchg
  ARC: add smp barriers around atomics per memory-barrriers.txt
  arch: conditionally define smp_{mb,rmb,wmb}
  ARCv2: barriers
  ARC: Reduce bitops lines of code using macros
  ARCv2: STAR 9000837815 workaround hardware exclusive transactions
    livelock
  ARCv2: SLC: Handle explcit flush for DMA ops (w/o IO-coherency)
  ARCv2: All bits in place, allow ARCv2 builds
  ARCv2: [nsim*hs*] Support simulation platforms for HS38x cores
  ARCv2: [axs103] Support ARC SDP FPGA platform for HS38x cores

 .../devicetree/bindings/arc/archs-idu-intc.txt     |  46 ++
 .../devicetree/bindings/arc/archs-intc.txt         |  22 +
 Documentation/devicetree/bindings/arc/axs103.txt   |   8 +
 arch/arc/Kconfig                                   | 130 ++++-
 arch/arc/Makefile                                  |   8 +-
 arch/arc/boot/dts/axc001.dtsi                      |  21 +
 arch/arc/boot/dts/axc003.dtsi                      | 102 ++++
 arch/arc/boot/dts/axc003_idu.dtsi                  | 126 +++++
 arch/arc/boot/dts/axs103.dts                       |  24 +
 arch/arc/boot/dts/axs103_idu.dts                   |  24 +
 arch/arc/boot/dts/axs10x_mb.dtsi                   |  17 -
 arch/arc/boot/dts/nsim_hs.dts                      |  53 +++
 arch/arc/boot/dts/nsim_hs_idu.dts                  |  72 +++
 arch/arc/boot/dts/nsimosci_hs.dts                  |  80 ++++
 arch/arc/boot/dts/nsimosci_hs_idu.dts              | 101 ++++
 arch/arc/boot/dts/vdk_axc003.dtsi                  |  61 +++
 arch/arc/boot/dts/vdk_axc003_idu.dtsi              |  76 +++
 arch/arc/boot/dts/vdk_axs10x_mb.dtsi               |  93 ++++
 arch/arc/boot/dts/vdk_hs38.dts                     |  21 +
 arch/arc/boot/dts/vdk_hs38_smp.dts                 |  21 +
 arch/arc/configs/axs103_defconfig                  | 117 +++++
 arch/arc/configs/axs103_smp_defconfig              | 118 +++++
 arch/arc/configs/nsim_hs_defconfig                 |  64 +++
 arch/arc/configs/nsim_hs_smp_defconfig             |  63 +++
 arch/arc/configs/nsimosci_hs_defconfig             |  73 +++
 arch/arc/configs/nsimosci_hs_smp_defconfig         |  93 ++++
 arch/arc/configs/vdk_hs38_defconfig                | 102 ++++
 arch/arc/configs/vdk_hs38_smp_defconfig            | 104 ++++
 arch/arc/include/asm/Kbuild                        |   1 -
 arch/arc/include/asm/arcregs.h                     |  60 ++-
 arch/arc/include/asm/atomic.h                      |  24 +-
 arch/arc/include/asm/barrier.h                     |  48 ++
 arch/arc/include/asm/bitops.h                      | 522 ++++++++-------------
 arch/arc/include/asm/cache.h                       |  18 +-
 arch/arc/include/asm/cmpxchg.h                     |  19 +-
 arch/arc/include/asm/delay.h                       |   9 +-
 arch/arc/include/asm/elf.h                         |   5 +
 arch/arc/include/asm/entry-arcv2.h                 | 190 ++++++++
 arch/arc/include/asm/entry.h                       |  21 +-
 arch/arc/include/asm/irq.h                         |   6 +
 arch/arc/include/asm/irqflags-arcv2.h              | 124 +++++
 arch/arc/include/asm/irqflags-compact.h            |   2 +
 arch/arc/include/asm/irqflags.h                    |   4 +
 arch/arc/include/asm/mcip.h                        |  94 ++++
 arch/arc/include/asm/mmu.h                         |  24 +-
 arch/arc/include/asm/pgtable.h                     |  10 +
 arch/arc/include/asm/ptrace.h                      |  43 ++
 arch/arc/include/asm/spinlock.h                    |  10 +
 arch/arc/include/asm/thread_info.h                 |   1 +
 arch/arc/include/asm/uaccess.h                     |  17 +-
 arch/arc/kernel/Makefile                           |   4 +-
 arch/arc/kernel/asm-offsets.c                      |   5 +
 arch/arc/kernel/devtree.c                          |   2 +-
 arch/arc/kernel/entry-arcv2.S                      | 239 ++++++++++
 arch/arc/kernel/head.S                             |   2 -
 arch/arc/kernel/intc-arcv2.c                       | 126 +++++
 arch/arc/kernel/mcip.c                             | 341 ++++++++++++++
 arch/arc/kernel/process.c                          |  12 +-
 arch/arc/kernel/ptrace.c                           |   2 +-
 arch/arc/kernel/setup.c                            |  56 ++-
 arch/arc/kernel/signal.c                           |   6 +-
 arch/arc/kernel/smp.c                              |  22 +-
 arch/arc/kernel/time.c                             |  95 ++++
 arch/arc/kernel/troubleshoot.c                     |  33 +-
 arch/arc/lib/Makefile                              |   6 +-
 arch/arc/lib/memcmp.S                              |  30 +-
 arch/arc/lib/memcpy-archs.S                        | 236 ++++++++++
 arch/arc/lib/memset-archs.S                        |  93 ++++
 arch/arc/lib/strcmp-archs.S                        |  78 +++
 arch/arc/mm/cache.c                                | 188 +++++++-
 arch/arc/mm/dma.c                                  |  12 +
 arch/arc/mm/tlb.c                                  |  54 ++-
 arch/arc/mm/tlbex.S                                |  28 +-
 arch/arc/plat-axs10x/Kconfig                       |  13 +-
 arch/arc/plat-axs10x/axs10x.c                      | 206 +++++++-
 arch/arc/plat-sim/platform.c                       |   7 +
 include/asm-generic/barrier.h                      |  25 +
 scripts/sortextable.c                              |   5 +
 78 files changed, 4567 insertions(+), 451 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arc/archs-idu-intc.txt
 create mode 100644 Documentation/devicetree/bindings/arc/archs-intc.txt
 create mode 100644 Documentation/devicetree/bindings/arc/axs103.txt
 create mode 100644 arch/arc/boot/dts/axc003.dtsi
 create mode 100644 arch/arc/boot/dts/axc003_idu.dtsi
 create mode 100644 arch/arc/boot/dts/axs103.dts
 create mode 100644 arch/arc/boot/dts/axs103_idu.dts
 create mode 100644 arch/arc/boot/dts/nsim_hs.dts
 create mode 100644 arch/arc/boot/dts/nsim_hs_idu.dts
 create mode 100644 arch/arc/boot/dts/nsimosci_hs.dts
 create mode 100644 arch/arc/boot/dts/nsimosci_hs_idu.dts
 create mode 100644 arch/arc/boot/dts/vdk_axc003.dtsi
 create mode 100644 arch/arc/boot/dts/vdk_axc003_idu.dtsi
 create mode 100644 arch/arc/boot/dts/vdk_axs10x_mb.dtsi
 create mode 100644 arch/arc/boot/dts/vdk_hs38.dts
 create mode 100644 arch/arc/boot/dts/vdk_hs38_smp.dts
 create mode 100644 arch/arc/configs/axs103_defconfig
 create mode 100644 arch/arc/configs/axs103_smp_defconfig
 create mode 100644 arch/arc/configs/nsim_hs_defconfig
 create mode 100644 arch/arc/configs/nsim_hs_smp_defconfig
 create mode 100644 arch/arc/configs/nsimosci_hs_defconfig
 create mode 100644 arch/arc/configs/nsimosci_hs_smp_defconfig
 create mode 100644 arch/arc/configs/vdk_hs38_defconfig
 create mode 100644 arch/arc/configs/vdk_hs38_smp_defconfig
 create mode 100644 arch/arc/include/asm/barrier.h
 create mode 100644 arch/arc/include/asm/entry-arcv2.h
 create mode 100644 arch/arc/include/asm/irqflags-arcv2.h
 create mode 100644 arch/arc/include/asm/mcip.h
 create mode 100644 arch/arc/kernel/entry-arcv2.S
 create mode 100644 arch/arc/kernel/intc-arcv2.c
 create mode 100644 arch/arc/kernel/mcip.c
 create mode 100644 arch/arc/lib/memcpy-archs.S
 create mode 100644 arch/arc/lib/memset-archs.S
 create mode 100644 arch/arc/lib/strcmp-archs.S

-- 
1.9.1


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 01/28] ARCv2: [intc] HS38 core interrupt controller
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 02/28] ARCv2: Support for ARCv2 ISA and HS38x cores Vineet Gupta
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: arnd, arc-linux-dev, Vineet Gupta, Jason Cooper, Thomas Gleixner

Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 .../devicetree/bindings/arc/archs-intc.txt         |  22 ++++
 arch/arc/include/asm/irqflags-arcv2.h              | 116 +++++++++++++++++++
 arch/arc/kernel/intc-arcv2.c                       | 126 +++++++++++++++++++++
 3 files changed, 264 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arc/archs-intc.txt
 create mode 100644 arch/arc/include/asm/irqflags-arcv2.h
 create mode 100644 arch/arc/kernel/intc-arcv2.c

diff --git a/Documentation/devicetree/bindings/arc/archs-intc.txt b/Documentation/devicetree/bindings/arc/archs-intc.txt
new file mode 100644
index 000000000000..69f326d6a5ad
--- /dev/null
+++ b/Documentation/devicetree/bindings/arc/archs-intc.txt
@@ -0,0 +1,22 @@
+* ARC-HS incore Interrupt Controller (Provided by cores implementing ARCv2 ISA)
+
+Properties:
+
+- compatible: "snps,archs-intc"
+- interrupt-controller: This is an interrupt controller.
+- #interrupt-cells: Must be <1>.
+
+  Single Cell "interrupts" property of a device specifies the IRQ number
+  between 16 to 256
+
+  intc accessed via the special ARC AUX register interface, hence "reg" property
+  is not specified.
+
+Example:
+
+	intc: interrupt-controller {
+		compatible = "snps,archs-intc";
+		interrupt-controller;
+		#interrupt-cells = <1>;
+		interrupts = <16 17 18 19 20 21 22 23 24 25>;
+	};
diff --git a/arch/arc/include/asm/irqflags-arcv2.h b/arch/arc/include/asm/irqflags-arcv2.h
new file mode 100644
index 000000000000..c946c56f141c
--- /dev/null
+++ b/arch/arc/include/asm/irqflags-arcv2.h
@@ -0,0 +1,116 @@
+/*
+ * Copyright (C) 2014-15 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __ASM_IRQFLAGS_ARCV2_H
+#define __ASM_IRQFLAGS_ARCV2_H
+
+#include <asm/arcregs.h>
+
+/* status32 Bits */
+#define STATUS_AD_BIT	19   /* Disable Align chk: core supports non-aligned */
+#define STATUS_IE_BIT	31
+
+#define STATUS_AD_MASK		(1<<STATUS_AD_BIT)
+#define STATUS_IE_MASK		(1<<STATUS_IE_BIT)
+
+#define AUX_USER_SP		0x00D
+#define AUX_IRQ_CTRL		0x00E
+#define AUX_IRQ_ACT		0x043	/* Active Intr across all levels */
+#define AUX_IRQ_LVL_PEND	0x200	/* Pending Intr across all levels */
+#define AUX_IRQ_PRIORITY	0x206
+#define ICAUSE			0x40a
+#define AUX_IRQ_SELECT		0x40b
+#define AUX_IRQ_ENABLE		0x40c
+
+/* 0 is highest level, but taken by FIRQs, if present in design */
+#define ARCV2_IRQ_DEF_PRIO		0
+
+/* seed value for status register */
+#define ISA_INIT_STATUS_BITS	(STATUS_IE_MASK | STATUS_AD_MASK | \
+					(ARCV2_IRQ_DEF_PRIO << 1))
+
+#ifndef __ASSEMBLY__
+
+/*
+ * Save IRQ state and disable IRQs
+ */
+static inline long arch_local_irq_save(void)
+{
+	unsigned long flags;
+
+	__asm__ __volatile__("	clri %0	\n" : "=r" (flags) : : "memory");
+
+	return flags;
+}
+
+/*
+ * restore saved IRQ state
+ */
+static inline void arch_local_irq_restore(unsigned long flags)
+{
+	__asm__ __volatile__("	seti %0	\n" : : "r" (flags) : "memory");
+}
+
+/*
+ * Unconditionally Enable IRQs
+ */
+static inline void arch_local_irq_enable(void)
+{
+	__asm__ __volatile__("	seti	\n" : : : "memory");
+}
+
+/*
+ * Unconditionally Disable IRQs
+ */
+static inline void arch_local_irq_disable(void)
+{
+	__asm__ __volatile__("	clri	\n" : : : "memory");
+}
+
+/*
+ * save IRQ state
+ */
+static inline long arch_local_save_flags(void)
+{
+	unsigned long temp;
+
+	__asm__ __volatile__(
+	"	lr  %0, [status32]	\n"
+	: "=&r"(temp)
+	:
+	: "memory");
+
+	return temp;
+}
+
+/*
+ * Query IRQ state
+ */
+static inline int arch_irqs_disabled_flags(unsigned long flags)
+{
+	return !(flags & (STATUS_IE_MASK));
+}
+
+static inline int arch_irqs_disabled(void)
+{
+	return arch_irqs_disabled_flags(arch_local_save_flags());
+}
+
+#else
+
+.macro IRQ_DISABLE  scratch
+	clri
+.endm
+
+.macro IRQ_ENABLE  scratch
+	seti
+.endm
+
+#endif	/* __ASSEMBLY__ */
+
+#endif
diff --git a/arch/arc/kernel/intc-arcv2.c b/arch/arc/kernel/intc-arcv2.c
new file mode 100644
index 000000000000..3876e11d4553
--- /dev/null
+++ b/arch/arc/kernel/intc-arcv2.c
@@ -0,0 +1,126 @@
+/*
+ * Copyright (C) 2014 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/irqdomain.h>
+#include <linux/irqchip.h>
+#include "../../drivers/irqchip/irqchip.h"
+#include <asm/irq.h>
+
+/*
+ * Early Hardware specific Interrupt setup
+ * -Called very early (start_kernel -> setup_arch -> setup_processor)
+ * -Platform Independent (must for any ARC Core)
+ * -Needed for each CPU (hence not foldable into init_IRQ)
+ */
+void arc_init_IRQ(void)
+{
+	unsigned int tmp;
+
+	struct aux_irq_ctrl {
+#ifdef CONFIG_CPU_BIG_ENDIAN
+		unsigned int res3:18, save_idx_regs:1, res2:1,
+			     save_u_to_u:1, save_lp_regs:1, save_blink:1,
+			     res:4, save_nr_gpr_pairs:5;
+#else
+		unsigned int save_nr_gpr_pairs:5, res:4,
+			     save_blink:1, save_lp_regs:1, save_u_to_u:1,
+			     res2:1, save_idx_regs:1, res3:18;
+#endif
+	} ictrl;
+
+	*(unsigned int *)&ictrl = 0;
+
+	ictrl.save_nr_gpr_pairs = 6;	/* r0 to r11 (r12 saved manually) */
+	ictrl.save_blink = 1;
+	ictrl.save_lp_regs = 1;		/* LP_COUNT, LP_START, LP_END */
+	ictrl.save_u_to_u = 0;		/* user ctxt saved on kernel stack */
+	ictrl.save_idx_regs = 1;	/* JLI, LDI, EI */
+
+	WRITE_AUX(AUX_IRQ_CTRL, ictrl);
+
+	/* setup status32, don't enable intr yet as kernel doesn't want */
+	tmp = read_aux_reg(0xa);
+	tmp |= ISA_INIT_STATUS_BITS;
+	tmp &= ~STATUS_IE_MASK;
+	asm volatile("flag %0	\n"::"r"(tmp));
+}
+
+static void arcv2_irq_mask(struct irq_data *data)
+{
+	write_aux_reg(AUX_IRQ_SELECT, data->irq);
+	write_aux_reg(AUX_IRQ_ENABLE, 0);
+}
+
+static void arcv2_irq_unmask(struct irq_data *data)
+{
+	write_aux_reg(AUX_IRQ_SELECT, data->irq);
+	write_aux_reg(AUX_IRQ_ENABLE, 1);
+}
+
+void arcv2_irq_enable(struct irq_data *data)
+{
+	/* set default priority */
+	write_aux_reg(AUX_IRQ_SELECT, data->irq);
+	write_aux_reg(AUX_IRQ_PRIORITY, ARCV2_IRQ_DEF_PRIO);
+
+	/*
+	 * hw auto enables (linux unmask) all by default
+	 * So no need to do IRQ_ENABLE here
+	 * XXX: However OSCI LAN need it
+	 */
+	write_aux_reg(AUX_IRQ_ENABLE, 1);
+}
+
+static struct irq_chip arcv2_irq_chip = {
+	.name           = "ARCv2 core Intc",
+	.irq_mask	= arcv2_irq_mask,
+	.irq_unmask	= arcv2_irq_unmask,
+	.irq_enable	= arcv2_irq_enable
+};
+
+static int arcv2_irq_map(struct irq_domain *d, unsigned int irq,
+			 irq_hw_number_t hw)
+{
+	if (irq == TIMER0_IRQ)
+		irq_set_chip_and_handler(irq, &arcv2_irq_chip, handle_percpu_irq);
+	else
+		irq_set_chip_and_handler(irq, &arcv2_irq_chip, handle_level_irq);
+
+	return 0;
+}
+
+static const struct irq_domain_ops arcv2_irq_ops = {
+	.xlate = irq_domain_xlate_onecell,
+	.map = arcv2_irq_map,
+};
+
+static struct irq_domain *root_domain;
+
+static int __init
+init_onchip_IRQ(struct device_node *intc, struct device_node *parent)
+{
+	if (parent)
+		panic("DeviceTree incore intc not a root irq controller\n");
+
+	root_domain = irq_domain_add_legacy(intc, NR_CPU_IRQS, 0, 0,
+					    &arcv2_irq_ops, NULL);
+
+	if (!root_domain)
+		panic("root irq domain not avail\n");
+
+	/* with this we don't need to export root_domain */
+	irq_set_default_host(root_domain);
+
+	return 0;
+}
+
+IRQCHIP_DECLARE(arc_intc, "snps,archs-intc", init_onchip_IRQ);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 02/28] ARCv2: Support for ARCv2 ISA and HS38x cores
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
  2015-06-09 11:48 ` [PATCH 01/28] ARCv2: [intc] HS38 core interrupt controller Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 03/28] ARCv2: STAR 9000793984: Handle return from intr to Delay Slot Vineet Gupta
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel; +Cc: arnd, arc-linux-dev, Vineet Gupta

The notable features are:
    - SMP configurations of upto 4 cores with coherency
    - Optional L2 Cache and IO-Coherency
    - Revised Interrupt Architecture (multiple priorites, reg banks,
        auto stack switch, auto regfile save/restore)
    - MMUv4 (PIPT dcache, Huge Pages)
    - Instructions for
	* 64bit load/store: LDD, STD
	* Hardware assisted divide/remainder: DIV, REM
	* Function prologue/epilogue: ENTER_S, LEAVE_S
	* IRQ enable/disable: CLRI, SETI
	* pop count: FFS, FLS
	* SETcc, BMSKN, XBFU...

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/Kconfig                        |  80 +++++++++++++-
 arch/arc/Makefile                       |   8 +-
 arch/arc/include/asm/arcregs.h          |  53 ++++++++-
 arch/arc/include/asm/bitops.h           |  71 ++++++++++++
 arch/arc/include/asm/elf.h              |   5 +
 arch/arc/include/asm/entry-arcv2.h      | 190 ++++++++++++++++++++++++++++++++
 arch/arc/include/asm/entry.h            |   4 +
 arch/arc/include/asm/irq.h              |   5 +
 arch/arc/include/asm/irqflags-arcv2.h   |   3 +
 arch/arc/include/asm/irqflags-compact.h |   2 +
 arch/arc/include/asm/irqflags.h         |   4 +
 arch/arc/include/asm/ptrace.h           |  43 ++++++++
 arch/arc/include/asm/thread_info.h      |   1 +
 arch/arc/kernel/Makefile                |   3 +-
 arch/arc/kernel/entry-arcv2.S           | 189 +++++++++++++++++++++++++++++++
 arch/arc/kernel/head.S                  |   2 -
 arch/arc/kernel/process.c               |  12 +-
 arch/arc/kernel/ptrace.c                |   2 +-
 arch/arc/kernel/setup.c                 |  45 +++++++-
 arch/arc/kernel/signal.c                |   6 +-
 arch/arc/kernel/troubleshoot.c          |  33 +++++-
 arch/arc/mm/tlbex.S                     |   2 -
 22 files changed, 730 insertions(+), 33 deletions(-)
 create mode 100644 arch/arc/include/asm/entry-arcv2.h
 create mode 100644 arch/arc/kernel/entry-arcv2.S

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 1eeefd9763d1..f72398847b5b 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -88,11 +88,31 @@ source "arch/arc/plat-axs10x/Kconfig"
 
 endmenu
 
+choice
+	prompt "ARC Instruction Set"
+	default ISA_ARCOMPACT
+
+config ISA_ARCOMPACT
+	bool "ARCompact ISA"
+	help
+	  The original ARC ISA of ARC600/700 cores
+
+### For bisectability, disable ARCv2 support until we have all the bits in place
+#config ISA_ARCV2
+#	bool "ARC ISA v2"
+#	help
+#	  ISA for the Next Generation ARC-HS cores
+
+endchoice
+
 menu "ARC CPU Configuration"
 
 choice
 	prompt "ARC Core"
-	default ARC_CPU_770
+	default ARC_CPU_770 if ISA_ARCOMPACT
+	default ARC_CPU_HS if ISA_ARCV2
+
+if ISA_ARCOMPACT
 
 config ARC_CPU_750D
 	bool "ARC750D"
@@ -110,6 +130,27 @@ config ARC_CPU_770
 	  -Caches: New Prog Model, Region Flush
 	  -Insns: endian swap, load-locked/store-conditional, time-stamp-ctr
 
+endif	#ISA_ARCOMPACT
+
+config ARC_CPU_HS
+	bool "ARC-HS"
+	depends on ISA_ARCV2
+	help
+	  Support for ARC HS38x Cores based on ARCv2 ISA
+	  The notable features are:
+	    - SMP configurations of upto 4 core with coherency
+	    - Optional L2 Cache and IO-Coherency
+	    - Revised Interrupt Architecture (multiple priorites, reg banks,
+	        auto stack switch, auto regfile save/restore)
+	    - MMUv4 (PIPT dcache, Huge Pages)
+	    - Instructions for
+		* 64bit load/store: LDD, STD
+		* Hardware assisted divide/remainder: DIV, REM
+		* Function prologue/epilogue: ENTER_S, LEAVE_S
+		* IRQ enable/disable: CLRI, SETI
+		* pop count: FFS, FLS
+		* SETcc, BMSKN, XBFU...
+
 endchoice
 
 config CPU_BIG_ENDIAN
@@ -134,7 +175,7 @@ config ARC_HAS_COH_CACHES
 config ARC_HAS_REENTRANT_IRQ_LV2
 	def_bool n
 
-endif
+endif	#SMP
 
 config NR_CPUS
 	int "Maximum number of CPUs (2-4096)"
@@ -223,7 +264,7 @@ config ARC_HAS_HW_MPY
 	  Multipler. Otherwise software multipy lib is used
 
 choice
-	prompt "ARC700 MMU Version"
+	prompt "MMU Version"
 	default ARC_MMU_V3 if ARC_CPU_770
 	default ARC_MMU_V2 if ARC_CPU_750D
 
@@ -268,6 +309,8 @@ config ARC_PAGE_SIZE_4K
 
 endchoice
 
+if ISA_ARCOMPACT
+
 config ARC_COMPACT_IRQ_LEVELS
 	bool "ARCompact IRQ Priorities: High(2)/Low(1)"
 	default n
@@ -287,7 +330,7 @@ config ARC_IRQ5_LV2
 config ARC_IRQ6_LV2
 	bool
 
-endif
+endif	#ARC_COMPACT_IRQ_LEVELS
 
 config ARC_FPU_SAVE_RESTORE
 	bool "Enable FPU state persistence across context switch"
@@ -300,18 +343,43 @@ config ARC_FPU_SAVE_RESTORE
 	  based on actual usage of FPU by a task. Thus our implemn does
 	  this for all tasks in system.
 
+endif	#ISA_ARCOMPACT
+
 config ARC_CANT_LLSC
 	def_bool n
 
 config ARC_HAS_LLSC
 	bool "Insn: LLOCK/SCOND (efficient atomic ops)"
 	default y
-	depends on ARC_CPU_770 && !ARC_CANT_LLSC
+	depends on !ARC_CPU_750D && !ARC_CANT_LLSC
 
 config ARC_HAS_SWAPE
 	bool "Insn: SWAPE (endian-swap)"
 	default y
 
+if ISA_ARCV2
+
+config ARC_HAS_LL64
+	bool "Insn: 64bit LDD/STD"
+	help
+	  Enable gcc to generate 64-bit load/store instructions
+	  ISA mandates even/odd registers to allow encoding of two
+	  dest operands with 2 possible source operands.
+	default y
+
+config ARC_NUMBER_OF_INTERRUPTS
+	int "Number of interrupts"
+	range 8 240
+	default 32
+	help
+	  This defines the number of interrupts on the ARCv2HS core.
+	  It affects the size of vector table.
+	  The initial 8 IRQs are fixed (Timer, ICI etc) and although configurable
+	  in hardware, it keep things simple for Linux to assume they are always
+	  present.
+
+endif	# ISA_ARCV2
+
 endmenu   # "ARC CPU Configuration"
 
 config LINUX_LINK_BASE
@@ -337,8 +405,10 @@ config ARC_CURR_IN_REG
 
 config ARC_EMUL_UNALIGNED
 	bool "Emulate unaligned memory access (userspace only)"
+	default N
 	select SYSCTL_ARCH_UNALIGN_NO_WARN
 	select SYSCTL_ARCH_UNALIGN_ALLOW
+	depends on ISA_ARCOMPACT
 	help
 	  This enables misaligned 16 & 32 bit memory access from user space.
 	  Use ONLY-IF-ABS-NECESSARY as it will be very slow and also can hide
diff --git a/arch/arc/Makefile b/arch/arc/Makefile
index 86c71b2089d2..bf68dc5a08be 100644
--- a/arch/arc/Makefile
+++ b/arch/arc/Makefile
@@ -14,7 +14,9 @@ endif
 
 KBUILD_DEFCONFIG := nsim_700_defconfig
 
-cflags-y	+= -mA7 -fno-common -pipe -fno-builtin -D__linux__
+cflags-y	+= -fno-common -pipe -fno-builtin -D__linux__
+cflags-${CONFIG_ISA_ARCOMPACT}	+= -mA7
+cflags-${CONFIG_ISA_ARCV2}	+= -mcpu=archs
 
 ifdef CONFIG_ARC_CURR_IN_REG
 # For a global register defintion, make sure it gets passed to every file
@@ -34,6 +36,10 @@ cflags-$(atleast_gcc44)			+= -fsection-anchors
 cflags-$(CONFIG_ARC_HAS_LLSC)		+= -mlock
 cflags-$(CONFIG_ARC_HAS_SWAPE)		+= -mswape
 
+ifndef CONFIG_ARC_HAS_LL64
+cflags-y				+= -mno-ll64
+endif
+
 cflags-$(CONFIG_ARC_DW2_UNWIND)		+= -fasynchronous-unwind-tables
 
 # By default gcc 4.8 generates dwarf4 which kernel unwinder can't grok
diff --git a/arch/arc/include/asm/arcregs.h b/arch/arc/include/asm/arcregs.h
index 336a9f694c2e..d77362dbb864 100644
--- a/arch/arc/include/asm/arcregs.h
+++ b/arch/arc/include/asm/arcregs.h
@@ -16,6 +16,7 @@
 #define ARC_REG_PERIBASE_BCR	0x69
 #define ARC_REG_FP_BCR		0x6B	/* ARCompact: Single-Precision FPU */
 #define ARC_REG_DPFP_BCR	0x6C	/* ARCompact: Dbl Precision FPU */
+#define ARC_REG_FP_V2_BCR	0xc8	/* ARCv2 FPU */
 #define ARC_REG_DCCM_BCR	0x74	/* DCCM Present + SZ */
 #define ARC_REG_TIMERS_BCR	0x75
 #define ARC_REG_AP_BCR		0x76
@@ -51,6 +52,7 @@
  * [15: 8] = Exception Cause Code
  * [ 7: 0] = Exception Parameters (for certain types only)
  */
+#ifdef CONFIG_ISA_ARCOMPACT
 #define ECR_V_MEM_ERR			0x01
 #define ECR_V_INSN_ERR			0x02
 #define ECR_V_MACH_CHK			0x20
@@ -58,6 +60,15 @@
 #define ECR_V_DTLB_MISS			0x22
 #define ECR_V_PROTV			0x23
 #define ECR_V_TRAP			0x25
+#else
+#define ECR_V_MEM_ERR			0x01
+#define ECR_V_INSN_ERR			0x02
+#define ECR_V_MACH_CHK			0x03
+#define ECR_V_ITLB_MISS			0x04
+#define ECR_V_DTLB_MISS			0x05
+#define ECR_V_PROTV			0x06
+#define ECR_V_TRAP			0x09
+#endif
 
 /* DTLB Miss and Protection Violation Cause Codes */
 
@@ -201,9 +212,11 @@ struct bcr_identity {
 
 struct bcr_isa {
 #ifdef CONFIG_CPU_BIG_ENDIAN
-	unsigned int pad1:23, atomic1:1, ver:8;
+	unsigned int div_rem:4, pad2:4, ldd:1, unalign:1, atomic:1, be:1,
+		     pad1:11, atomic1:1, ver:8;
 #else
-	unsigned int ver:8, atomic1:1, pad1:23;
+	unsigned int ver:8, atomic1:1, pad1:11, be:1, atomic:1, unalign:1,
+		     ldd:1, pad2:4, div_rem:4;
 #endif
 };
 
@@ -266,11 +279,19 @@ struct bcr_fp_arcompact {
 #endif
 };
 
+struct bcr_fp_arcv2 {
+#ifdef CONFIG_CPU_BIG_ENDIAN
+	unsigned int pad2:15, dp:1, pad1:7, sp:1, ver:8;
+#else
+	unsigned int ver:8, sp:1, pad1:7, dp:1, pad2:15;
+#endif
+};
+
 struct bcr_timer {
 #ifdef CONFIG_CPU_BIG_ENDIAN
-	unsigned int pad2:15, rtsc:1, pad1:6, t1:1, t0:1, ver:8;
+	unsigned int pad2:15, rtsc:1, pad1:5, rtc:1, t1:1, t0:1, ver:8;
 #else
-	unsigned int ver:8, t0:1, t1:1, pad1:6, rtsc:1, pad2:15;
+	unsigned int ver:8, t0:1, t1:1, rtc:1, pad1:5, rtsc:1, pad2:15;
 #endif
 };
 
@@ -282,6 +303,14 @@ struct bcr_bpu_arcompact {
 #endif
 };
 
+struct bcr_bpu_arcv2 {
+#ifdef CONFIG_CPU_BIG_ENDIAN
+	unsigned int pad:6, fbe:2, tqe:2, ts:4, ft:1, rse:2, pte:3, bce:3, ver:8;
+#else
+	unsigned int ver:8, bce:3, pte:3, rse:2, ft:1, ts:4, tqe:2, fbe:2, pad:6;
+#endif
+};
+
 struct bcr_generic {
 #ifdef CONFIG_CPU_BIG_ENDIAN
 	unsigned int pad:24, ver:8;
@@ -333,6 +362,22 @@ struct cpuinfo_arc {
 
 extern struct cpuinfo_arc cpuinfo_arc700[];
 
+static inline int is_isa_arcv2(void)
+{
+	return IS_ENABLED(CONFIG_ISA_ARCV2);
+}
+
+static inline int is_isa_arcompact(void)
+{
+	return IS_ENABLED(CONFIG_ISA_ARCOMPACT);
+}
+
+#if defined(CONFIG_ISA_ARCOMPACT) && !defined(_CPU_DEFAULT_A7)
+#error "Toolchain not configured for ARCompact builds"
+#elif defined(CONFIG_ISA_ARCV2) && !defined(_CPU_DEFAULT_HS)
+#error "Toolchain not configured for ARCv2 builds"
+#endif
+
 #endif /* __ASEMBLY__ */
 
 #endif /* _ASM_ARC_ARCREGS_H */
diff --git a/arch/arc/include/asm/bitops.h b/arch/arc/include/asm/bitops.h
index 4051e9525939..829a8a2e9704 100644
--- a/arch/arc/include/asm/bitops.h
+++ b/arch/arc/include/asm/bitops.h
@@ -402,6 +402,8 @@ test_bit(unsigned int nr, const volatile unsigned long *addr)
 	return ((mask & *addr) != 0);
 }
 
+#ifdef CONFIG_ISA_ARCOMPACT
+
 /*
  * Count the number of zeros, starting from MSB
  * Helper for fls( ) friends
@@ -494,6 +496,75 @@ static inline __attribute__ ((const)) int __ffs(unsigned long word)
 	return ffs(word) - 1;
 }
 
+#else	/* CONFIG_ISA_ARCV2 */
+
+/*
+ * fls = Find Last Set in word
+ * @result: [1-32]
+ * fls(1) = 1, fls(0x80000000) = 32, fls(0) = 0
+ */
+static inline __attribute__ ((const)) int fls(unsigned long x)
+{
+	int n;
+
+	asm volatile(
+	"	fls.f	%0, %1		\n"  /* 0:31; 0(Z) if src 0 */
+	"	add.nz	%0, %0, 1	\n"  /* 0:31 -> 1:32 */
+	: "=r"(n)	/* Early clobber not needed */
+	: "r"(x)
+	: "cc");
+
+	return n;
+}
+
+/*
+ * __fls: Similar to fls, but zero based (0-31). Also 0 if no bit set
+ */
+static inline __attribute__ ((const)) int __fls(unsigned long x)
+{
+	/* FLS insn has exactly same semantics as the API */
+	return	__builtin_arc_fls(x);
+}
+
+/*
+ * ffs = Find First Set in word (LSB to MSB)
+ * @result: [1-32], 0 if all 0's
+ */
+static inline __attribute__ ((const)) int ffs(unsigned long x)
+{
+	int n;
+
+	asm volatile(
+	"	ffs.f	%0, %1		\n"  /* 0:31; 31(Z) if src 0 */
+	"	add.nz	%0, %0, 1	\n"  /* 0:31 -> 1:32 */
+	"	mov.z	%0, 0		\n"  /* 31(Z)-> 0 */
+	: "=r"(n)	/* Early clobber not needed */
+	: "r"(x)
+	: "cc");
+
+	return n;
+}
+
+/*
+ * __ffs: Similar to ffs, but zero based (0-31)
+ */
+static inline __attribute__ ((const)) int __ffs(unsigned long x)
+{
+	int n;
+
+	asm volatile(
+	"	ffs.f	%0, %1		\n"  /* 0:31; 31(Z) if src 0 */
+	"	mov.z	%0, 0		\n"  /* 31(Z)-> 0 */
+	: "=r"(n)
+	: "r"(x)
+	: "cc");
+
+	return n;
+
+}
+
+#endif	/* CONFIG_ISA_ARCOMPACT */
+
 /*
  * ffz = Find First Zero in word.
  * @return:[0-31], 32 if all 1's
diff --git a/arch/arc/include/asm/elf.h b/arch/arc/include/asm/elf.h
index a26282857683..51a99e25fe33 100644
--- a/arch/arc/include/asm/elf.h
+++ b/arch/arc/include/asm/elf.h
@@ -15,6 +15,11 @@
 /* These ELF defines belong to uapi but libc elf.h already defines them */
 #define EM_ARCOMPACT		93
 
+#define EM_ARCV2		195	/* ARCv2 Cores */
+
+#define EM_ARC_INUSE		(IS_ENABLED(CONFIG_ISA_ARCOMPACT) ? \
+					EM_ARCOMPACT : EM_ARCV2)
+
 /* ARC Relocations (kernel Modules only) */
 #define  R_ARC_32		0x4
 #define  R_ARC_32_ME		0x1B
diff --git a/arch/arc/include/asm/entry-arcv2.h b/arch/arc/include/asm/entry-arcv2.h
new file mode 100644
index 000000000000..b5ff87e6f4b7
--- /dev/null
+++ b/arch/arc/include/asm/entry-arcv2.h
@@ -0,0 +1,190 @@
+
+#ifndef __ASM_ARC_ENTRY_ARCV2_H
+#define __ASM_ARC_ENTRY_ARCV2_H
+
+#include <asm/asm-offsets.h>
+#include <asm/irqflags-arcv2.h>
+#include <asm/thread_info.h>	/* For THREAD_SIZE */
+
+/*------------------------------------------------------------------------*/
+.macro INTERRUPT_PROLOGUE	called_from
+
+	; Before jumping to Interrupt Vector, hardware micro-ops did following:
+	;   1. SP auto-switched to kernel mode stack
+	;   2. STATUS32.Z flag set to U mode at time of interrupt (U:1, K:0)
+	;   3. Auto saved: r0-r11, blink, LPE,LPS,LPC, JLI,LDI,EI, PC, STAT32
+	;
+	; Now manually save: r12, sp, fp, gp, r25
+
+	PUSH	r12
+
+	; Saving pt_regs->sp correctly requires some extra work due to the way
+	; Auto stack switch works
+	;  - U mode: retrieve it from AUX_USER_SP
+	;  - K mode: add the offset from current SP where H/w starts auto push
+	;
+	; Utilize the fact that Z bit is set if Intr taken in U mode
+	mov.nz	r9, sp
+	add.nz	r9, r9, SZ_PT_REGS - PT_sp - 4
+	bnz	1f
+
+	lr	r9, [AUX_USER_SP]
+1:
+	PUSH	r9	; SP
+
+	PUSH	fp
+	PUSH	gp
+
+#ifdef CONFIG_ARC_CURR_IN_REG
+	PUSH	r25			; user_r25
+	GET_CURR_TASK_ON_CPU	r25
+#else
+	sub	sp, sp, 4
+#endif
+
+.ifnc \called_from, exception
+	sub	sp, sp, 12	; BTA/ECR/orig_r0 placeholder per pt_regs
+.endif
+
+.endm
+
+/*------------------------------------------------------------------------*/
+.macro INTERRUPT_EPILOGUE	called_from
+
+.ifnc \called_from, exception
+	add	sp, sp, 12	; skip BTA/ECR/orig_r0 placeholderss
+.endif
+
+#ifdef CONFIG_ARC_CURR_IN_REG
+	POP	r25
+#else
+	add	sp, sp, 4
+#endif
+
+	POP	gp
+	POP	fp
+
+	; Don't touch AUX_USER_SP if returning to K mode (Z bit set)
+	; (Z bit set on K mode is inverse of INTERRUPT_PROLOGUE)
+	add.z	sp, sp, 4
+	bz	1f
+
+	POPAX	AUX_USER_SP
+1:
+	POP	r12
+
+.endm
+
+/*------------------------------------------------------------------------*/
+.macro EXCEPTION_PROLOGUE
+
+	; Before jumping to Exception Vector, hardware micro-ops did following:
+	;   1. SP auto-switched to kernel mode stack
+	;   2. STATUS32.Z flag set to U mode at time of interrupt (U:1,K:0)
+	;
+	; Now manually save the complete reg file
+
+	PUSH	r9		; freeup a register: slot of erstatus
+
+	PUSHAX	eret
+	sub	sp, sp, 12	; skip JLI, LDI, EI
+	PUSH	lp_count
+	PUSHAX	lp_start
+	PUSHAX	lp_end
+	PUSH	blink
+
+	PUSH	r11
+	PUSH	r10
+
+	ld.as	r9,  [sp, 10]	; load stashed r9 (status32 stack slot)
+	lr	r10, [erstatus]
+	st.as	r10, [sp, 10]	; save status32 at it's right stack slot
+
+	PUSH	r9
+	PUSH	r8
+	PUSH	r7
+	PUSH	r6
+	PUSH	r5
+	PUSH	r4
+	PUSH	r3
+	PUSH	r2
+	PUSH	r1
+	PUSH	r0
+
+	; -- for interrupts, regs above are auto-saved by h/w in that order --
+	; Now do what ISR prologue does (manually save r12, sp, fp, gp, r25)
+	;
+	; Set Z flag if this was from U mode (expected by INTERRUPT_PROLOGUE)
+	; Although H/w exception micro-ops do set Z flag for U mode (just like
+	; for interrupts), it could get clobbered in case we soft land here from
+	; a TLB Miss exception handler (tlbex.S)
+
+	and	r10, r10, STATUS_U_MASK
+	xor.f	0, r10, STATUS_U_MASK
+
+	INTERRUPT_PROLOGUE  exception
+
+	PUSHAX	erbta
+	PUSHAX	ecr		; r9 contains ECR, expected by EV_Trap
+
+	PUSH	r0		; orig_r0
+.endm
+
+/*------------------------------------------------------------------------*/
+.macro EXCEPTION_EPILOGUE
+
+	; Assumes r0 has PT_status32
+	btst   r0, STATUS_U_BIT	; Z flag set if K, used in INTERRUPT_EPILOGUE
+
+	add	sp, sp, 8	; orig_r0/ECR don't need restoring
+	POPAX	erbta
+
+	INTERRUPT_EPILOGUE  exception
+
+	POP	r0
+	POP	r1
+	POP	r2
+	POP	r3
+	POP	r4
+	POP	r5
+	POP	r6
+	POP	r7
+	POP	r8
+	POP	r9
+	POP	r10
+	POP	r11
+
+	POP	blink
+	POPAX	lp_end
+	POPAX	lp_start
+
+	POP	r9
+	mov	lp_count, r9
+
+	add	sp, sp, 12	; skip JLI, LDI, EI
+	POPAX	eret
+	POPAX	erstatus
+
+	ld.as	r9, [sp, -12]	; reload r9 which got clobbered
+.endm
+
+.macro FAKE_RET_FROM_EXCPN
+	lr      r9, [status32]
+	bic     r9, r9, (STATUS_U_MASK|STATUS_DE_MASK|STATUS_AE_MASK)
+	or      r9, r9, (STATUS_L_MASK|STATUS_IE_MASK)
+	kflag   r9
+.endm
+
+/* Get thread_info of "current" tsk */
+.macro GET_CURR_THR_INFO_FROM_SP  reg
+	bmskn \reg, sp, THREAD_SHIFT - 1
+.endm
+
+/* Get CPU-ID of this core */
+.macro  GET_CPU_ID  reg
+	lr  \reg, [identity]
+	xbfu \reg, \reg, 0xE8	/* 00111    01000 */
+				/* M = 8-1  N = 8 */
+.endm
+
+#endif
diff --git a/arch/arc/include/asm/entry.h b/arch/arc/include/asm/entry.h
index f61032c53d51..29d0ab6e10f5 100644
--- a/arch/arc/include/asm/entry.h
+++ b/arch/arc/include/asm/entry.h
@@ -16,7 +16,11 @@
 #include <asm/processor.h>	/* For VMALLOC_START */
 #include <asm/mmu.h>
 
+#ifdef CONFIG_ISA_ARCOMPACT
 #include <asm/entry-compact.h>	/* ISA specific bits */
+#else
+#include <asm/entry-arcv2.h>
+#endif
 
 /* Note on the LD/ST addr modes with addr reg wback
  *
diff --git a/arch/arc/include/asm/irq.h b/arch/arc/include/asm/irq.h
index f38652fb2ed7..49014f0ef36d 100644
--- a/arch/arc/include/asm/irq.h
+++ b/arch/arc/include/asm/irq.h
@@ -13,8 +13,13 @@
 #define NR_IRQS		128 /* allow some CPU external IRQ handling */
 
 /* Platform Independent IRQs */
+#ifdef CONFIG_ISA_ARCOMPACT
 #define TIMER0_IRQ      3
 #define TIMER1_IRQ      4
+#else
+#define TIMER0_IRQ      16
+#define TIMER1_IRQ      17
+#endif
 
 #include <linux/interrupt.h>
 #include <asm-generic/irq.h>
diff --git a/arch/arc/include/asm/irqflags-arcv2.h b/arch/arc/include/asm/irqflags-arcv2.h
index c946c56f141c..1eb41b00aac5 100644
--- a/arch/arc/include/asm/irqflags-arcv2.h
+++ b/arch/arc/include/asm/irqflags-arcv2.h
@@ -27,6 +27,9 @@
 #define AUX_IRQ_SELECT		0x40b
 #define AUX_IRQ_ENABLE		0x40c
 
+/* Was Intr taken in User Mode */
+#define AUX_IRQ_ACT_BIT_U	31
+
 /* 0 is highest level, but taken by FIRQs, if present in design */
 #define ARCV2_IRQ_DEF_PRIO		0
 
diff --git a/arch/arc/include/asm/irqflags-compact.h b/arch/arc/include/asm/irqflags-compact.h
index 18f3634ac347..aa805575c320 100644
--- a/arch/arc/include/asm/irqflags-compact.h
+++ b/arch/arc/include/asm/irqflags-compact.h
@@ -39,6 +39,8 @@
 #define AUX_ITRIGGER		0x40d
 #define AUX_IPULSE		0x415
 
+#define ISA_INIT_STATUS_BITS	STATUS_IE_MASK
+
 #ifndef __ASSEMBLY__
 
 /******************************************************************
diff --git a/arch/arc/include/asm/irqflags.h b/arch/arc/include/asm/irqflags.h
index 333972600680..59bc6a64f75d 100644
--- a/arch/arc/include/asm/irqflags.h
+++ b/arch/arc/include/asm/irqflags.h
@@ -10,6 +10,10 @@
 #ifndef __ASM_ARC_IRQFLAGS_H
 #define __ASM_ARC_IRQFLAGS_H
 
+#ifdef CONFIG_ISA_ARCOMPACT
 #include <asm/irqflags-compact.h>
+#else
+#include <asm/irqflags-arcv2.h>
+#endif
 
 #endif
diff --git a/arch/arc/include/asm/ptrace.h b/arch/arc/include/asm/ptrace.h
index 1bfeec2c0558..91755972b9a2 100644
--- a/arch/arc/include/asm/ptrace.h
+++ b/arch/arc/include/asm/ptrace.h
@@ -16,6 +16,7 @@
 
 /* THE pt_regs: Defines how regs are saved during entry into kernel */
 
+#ifdef CONFIG_ISA_ARCOMPACT
 struct pt_regs {
 
 	/* Real registers */
@@ -56,6 +57,48 @@ struct pt_regs {
 
 	long user_r25;
 };
+#else
+
+struct pt_regs {
+
+	long orig_r0;
+
+	union {
+		struct {
+#ifdef CONFIG_CPU_BIG_ENDIAN
+			unsigned long state:8, ecr_vec:8,
+				      ecr_cause:8, ecr_param:8;
+#else
+			unsigned long ecr_param:8, ecr_cause:8,
+				      ecr_vec:8, state:8;
+#endif
+		};
+		unsigned long event;
+	};
+
+	long bta;	/* bta_l1, bta_l2, erbta */
+
+	long user_r25;
+
+	long r26;	/* gp */
+	long fp;
+	long sp;	/* user/kernel sp depending on where we came from  */
+
+	long r12;
+
+	/*------- Below list auto saved by h/w -----------*/
+	long r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11;
+
+	long blink;
+	long lp_end, lp_start, lp_count;
+
+	long ei, ldi, jli;
+
+	long ret;
+	long status32;
+};
+
+#endif
 
 /* Callee saved registers - need to be saved only when you are scheduled out */
 
diff --git a/arch/arc/include/asm/thread_info.h b/arch/arc/include/asm/thread_info.h
index aca0d5a45c7b..3af67455659a 100644
--- a/arch/arc/include/asm/thread_info.h
+++ b/arch/arc/include/asm/thread_info.h
@@ -25,6 +25,7 @@
 #endif
 
 #define THREAD_SIZE     (PAGE_SIZE << THREAD_SIZE_ORDER)
+#define THREAD_SHIFT	(PAGE_SHIFT << THREAD_SIZE_ORDER)
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arc/kernel/Makefile b/arch/arc/kernel/Makefile
index cc929c0e2133..0be7ba087260 100644
--- a/arch/arc/kernel/Makefile
+++ b/arch/arc/kernel/Makefile
@@ -10,7 +10,8 @@ CFLAGS_ptrace.o		+= -DUTS_MACHINE='"$(UTS_MACHINE)"'
 
 obj-y	:= arcksyms.o setup.o irq.o time.o reset.o ptrace.o process.o devtree.o
 obj-y	+= signal.o traps.o sys.o troubleshoot.o stacktrace.o disasm.o clk.o
-obj-y	+= entry-compact.o intc-compact.o
+obj-$(CONFIG_ISA_ARCOMPACT)		+= entry-compact.o intc-compact.o
+obj-$(CONFIG_ISA_ARCV2)			+= entry-arcv2.o intc-arcv2.o
 
 obj-$(CONFIG_MODULES)			+= arcksyms.o module.o
 obj-$(CONFIG_SMP) 			+= smp.o
diff --git a/arch/arc/kernel/entry-arcv2.S b/arch/arc/kernel/entry-arcv2.S
new file mode 100644
index 000000000000..c59a396b7b98
--- /dev/null
+++ b/arch/arc/kernel/entry-arcv2.S
@@ -0,0 +1,189 @@
+/*
+ * ARCv2 ISA based core Low Level Intr/Traps/Exceptions(non-TLB) Handling
+ *
+ * Copyright (C) 2013 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>   /* ARC_{EXTRY,EXIT} */
+#include <asm/entry.h>       /* SAVE_ALL_{INT1,INT2,TRAP...} */
+#include <asm/errno.h>
+#include <asm/arcregs.h>
+#include <asm/irqflags.h>
+
+	.cpu HS
+
+#define VECTOR	.word
+
+;############################ Vector Table #################################
+
+	.section .vector,"a",@progbits
+	.align 4
+
+# Initial 16 slots are Exception Vectors
+VECTOR	stext			; Restart Vector (jump to entry point)
+VECTOR	mem_service		; Mem exception
+VECTOR	instr_service		; Instrn Error
+VECTOR	EV_MachineCheck		; Fatal Machine check
+VECTOR	EV_TLBMissI		; Intruction TLB miss
+VECTOR	EV_TLBMissD		; Data TLB miss
+VECTOR	EV_TLBProtV		; Protection Violation
+VECTOR	EV_PrivilegeV		; Privilege Violation
+VECTOR	EV_SWI			; Software Breakpoint
+VECTOR	EV_Trap			; Trap exception
+VECTOR	EV_Extension		; Extn Instruction Exception
+VECTOR	EV_DivZero		; Divide by Zero
+VECTOR	EV_DCError		; Data Cache Error
+VECTOR	EV_Misaligned		; Misaligned Data Access
+VECTOR	reserved		; Reserved slots
+VECTOR	reserved		; Reserved slots
+
+# Begin Interrupt Vectors
+VECTOR	handle_interrupt	; (16) Timer0
+VECTOR	handle_interrupt	; unused (Timer1)
+VECTOR	handle_interrupt	; unused (WDT)
+VECTOR	handle_interrupt	; (19) ICI (inter core interrupt)
+VECTOR	handle_interrupt
+VECTOR	handle_interrupt
+VECTOR	handle_interrupt
+VECTOR	handle_interrupt	; (23) End of fixed IRQs
+
+.rept CONFIG_ARC_NUMBER_OF_INTERRUPTS - 8
+	VECTOR	handle_interrupt
+.endr
+
+	.section .text, "ax",@progbits
+
+res_service:		; processor restart
+	flag    0x1     ; not implemented
+	nop
+	nop
+
+reserved:		; processor restart
+	rtie            ; jump to processor initializations
+
+;##################### Interrupt Handling ##############################
+
+ENTRY(handle_interrupt)
+
+	INTERRUPT_PROLOGUE  irq
+
+	clri		; To make status32.IE agree with CPU internal state
+
+	lr  r0, [ICAUSE]
+
+	mov   blink, ret_from_exception
+
+	b.d  arch_do_IRQ
+	mov r1, sp
+
+END(handle_interrupt)
+
+;################### Non TLB Exception Handling #############################
+
+ENTRY(EV_SWI)
+	flag 1
+END(EV_SWI)
+
+ENTRY(EV_DivZero)
+	flag 1
+END(EV_DivZero)
+
+ENTRY(EV_DCError)
+	flag 1
+END(EV_DCError)
+
+ENTRY(EV_Misaligned)
+
+	EXCEPTION_PROLOGUE
+
+	lr  r0, [efa]	; Faulting Data address
+	mov r1, sp
+
+	FAKE_RET_FROM_EXCPN
+
+	SAVE_CALLEE_SAVED_USER
+	mov r2, sp              ; callee_regs
+
+	bl  do_misaligned_access
+
+	; TBD: optimize - do this only if a callee reg was involved
+	; either a dst of emulated LD/ST or src with address-writeback
+	RESTORE_CALLEE_SAVED_USER
+
+	b   ret_from_exception
+END(EV_Misaligned)
+
+; ---------------------------------------------
+; Protection Violation Exception Handler
+; ---------------------------------------------
+
+ENTRY(EV_TLBProtV)
+
+	EXCEPTION_PROLOGUE
+
+	lr  r0, [efa]	; Faulting Data address
+	mov r1, sp	; pt_regs
+
+	FAKE_RET_FROM_EXCPN
+
+	mov blink, ret_from_exception
+	b   do_page_fault
+
+END(EV_TLBProtV)
+
+; From Linux standpoint Slow Path I/D TLB Miss is same a ProtV as they
+; need to call do_page_fault().
+; ECR in pt_regs provides whether access was R/W/X
+
+.global        call_do_page_fault
+.set call_do_page_fault, EV_TLBProtV
+
+;############# Common Handlers for ARCompact and ARCv2 ##############
+
+#include "entry.S"
+
+;############# Return from Intr/Excp/Trap (ARCv2 ISA Specifics) ##############
+;
+; Restore the saved sys context (common exit-path for EXCPN/IRQ/Trap)
+; IRQ shd definitely not happen between now and rtie
+; All 2 entry points to here already disable interrupts
+
+.Lrestore_regs:
+
+	ld	r0, [sp, PT_status32]	; U/K mode at time of entry
+	lr	r10, [AUX_IRQ_ACT]
+
+	bmsk	r11, r10, 15	; AUX_IRQ_ACT.ACTIVE
+	breq	r11, 0, .Lexcept_ret	; No intr active, ret from Exception
+
+;####### Return from Intr #######
+
+debug_marker_l1:
+	; Handle special case #1: (Entry via Exception, Return via IRQ)
+	;
+	; Exception in U mode, preempted in kernel, Intr taken (K mode), orig
+	; task now returning to U mode (riding the Intr)
+	; AUX_IRQ_ACTIVE won't have U bit set (since intr in K mode), hence SP
+	; won't be switched to correct U mode value (from AUX_SP)
+	; So force AUX_IRQ_ACT.U for such a case
+
+	btst	r0, STATUS_U_BIT		; Z flag set if K (Z clear for U)
+	bset.nz	r11, r11, AUX_IRQ_ACT_BIT_U	; NZ means U
+	sr	r11, [AUX_IRQ_ACT]
+
+	INTERRUPT_EPILOGUE  irq
+	rtie
+
+;####### Return from Exception / pure kernel mode #######
+
+.Lexcept_ret:	; Expects r0 has PT_status32
+
+debug_marker_syscall:
+	EXCEPTION_EPILOGUE
+	rtie
+
+END(ret_from_exception)
diff --git a/arch/arc/kernel/head.S b/arch/arc/kernel/head.S
index 64a92e0b1e53..812f95e6ae69 100644
--- a/arch/arc/kernel/head.S
+++ b/arch/arc/kernel/head.S
@@ -49,8 +49,6 @@
 1:
 .endm
 
-	.cpu A7
-
 	.section .init.text, "ax",@progbits
 	.type stext, @function
 	.globl stext
diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
index b5426babd3c8..6abe5e7cac8f 100644
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -44,7 +44,10 @@ SYSCALL_DEFINE0(arc_gettls)
 void arch_cpu_idle(void)
 {
 	/* sleep, but enable all interrupts before committing */
-	__asm__("sleep 0x3");
+	if (is_isa_arcompact())
+		__asm__("sleep 0x3");
+	else
+		__asm__("sleep 0x10");
 }
 
 asmlinkage void ret_from_fork(void);
@@ -166,7 +169,7 @@ void start_thread(struct pt_regs * regs, unsigned long pc, unsigned long usp)
 	 * [L] ZOL loop inhibited to begin with - cleared by a LP insn
 	 * Interrupts enabled
 	 */
-	regs->status32 = STATUS_U_MASK | STATUS_L_MASK | STATUS_IE_MASK;
+	regs->status32 = STATUS_U_MASK | STATUS_L_MASK | ISA_INIT_STATUS_BITS;
 
 	/* bogus seed values for debugging */
 	regs->lp_start = 0x10;
@@ -196,8 +199,11 @@ int elf_check_arch(const struct elf32_hdr *x)
 {
 	unsigned int eflags;
 
-	if (x->e_machine != EM_ARCOMPACT)
+	if (x->e_machine != EM_ARC_INUSE) {
+		pr_err("ELF not built for %s ISA\n",
+			is_isa_arcompact() ? "ARCompact":"ARCv2");
 		return 0;
+	}
 
 	eflags = x->e_flags;
 	if ((eflags & EF_ARC_OSABI_MSK) < EF_ARC_OSABI_CURRENT) {
diff --git a/arch/arc/kernel/ptrace.c b/arch/arc/kernel/ptrace.c
index 4dd9e3a8c2da..4442204fe238 100644
--- a/arch/arc/kernel/ptrace.c
+++ b/arch/arc/kernel/ptrace.c
@@ -200,7 +200,7 @@ static const struct user_regset arc_regsets[] = {
 
 static const struct user_regset_view user_arc_view = {
 	.name		= UTS_MACHINE,
-	.e_machine	= EM_ARCOMPACT,
+	.e_machine	= EM_ARC_INUSE,
 	.regsets	= arc_regsets,
 	.n		= ARRAY_SIZE(arc_regsets)
 };
diff --git a/arch/arc/kernel/setup.c b/arch/arc/kernel/setup.c
index 96d44805ea56..d6fe80070bbf 100644
--- a/arch/arc/kernel/setup.c
+++ b/arch/arc/kernel/setup.c
@@ -96,7 +96,7 @@ static void read_arc_build_cfg_regs(void)
 	read_decode_mmu_bcr();
 	read_decode_cache_bcr();
 
-	{
+	if (is_isa_arcompact()) {
 		struct bcr_fp_arcompact sp, dp;
 		struct bcr_bpu_arcompact bpu;
 
@@ -112,6 +112,19 @@ static void read_arc_build_cfg_regs(void)
 			cpu->bpu.num_cache = 256 << (bpu.ent - 1);
 			cpu->bpu.num_pred = 256 << (bpu.ent - 1);
 		}
+	} else {
+		struct bcr_fp_arcv2 spdp;
+		struct bcr_bpu_arcv2 bpu;
+
+		READ_BCR(ARC_REG_FP_V2_BCR, spdp);
+		cpu->extn.fpu_sp = spdp.sp ? 1 : 0;
+		cpu->extn.fpu_dp = spdp.dp ? 1 : 0;
+
+		READ_BCR(ARC_REG_BPU_BCR, bpu);
+		cpu->bpu.ver = bpu.ver;
+		cpu->bpu.full = bpu.ft;
+		cpu->bpu.num_cache = 256 << bpu.bce;
+		cpu->bpu.num_pred = 2048 << bpu.pte;
 	}
 
 	READ_BCR(ARC_REG_AP_BCR, bcr);
@@ -131,6 +144,7 @@ static const struct cpuinfo_data arc_cpu_tbl[] = {
 	{ {0x30, "ARC 700"      }, 0x33},
 	{ {0x34, "ARC 700 R4.10"}, 0x34},
 	{ {0x35, "ARC 700 R4.11"}, 0x35},
+	{ {0x50, "ARC HS38"	}, 0x51},
 	{ {0x00, NULL		} }
 };
 
@@ -149,13 +163,17 @@ static char *arc_cpu_mumbojumbo(int cpu_id, char *buf, int len)
 
 	FIX_PTR(cpu);
 
-	{
+	if (is_isa_arcompact()) {
 		isa_nm = "ARCompact";
 		be = IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
 
 		atomic = cpu->isa.atomic1;
 		if (!cpu->isa.ver)	/* ISA BCR absent, use Kconfig info */
 			atomic = IS_ENABLED(CONFIG_ARC_HAS_LLSC);
+	} else {
+		isa_nm = "ARCv2";
+		be = cpu->isa.be;
+		atomic = cpu->isa.atomic;
 	}
 
 	n += scnprintf(buf + n, len - n,
@@ -184,14 +202,31 @@ static char *arc_cpu_mumbojumbo(int cpu_id, char *buf, int len)
 		       IS_AVAIL1(cpu->timers.t0, "Timer0 "),
 		       IS_AVAIL1(cpu->timers.t1, "Timer1 "));
 
-	n += i = scnprintf(buf + n, len - n, "%s%s",
-			   IS_AVAIL2(atomic, "atomic ", CONFIG_ARC_HAS_LLSC));
+	n += i = scnprintf(buf + n, len - n, "%s%s%s%s%s",
+			   IS_AVAIL2(atomic, "atomic ", CONFIG_ARC_HAS_LLSC),
+			   IS_AVAIL2(cpu->isa.ldd, "ll64 ", CONFIG_ARC_HAS_LL64),
+			   IS_AVAIL1(cpu->isa.unalign, "unalign (not used)"));
 
 	if (i)
 		n += scnprintf(buf + n, len - n, "\n\t\t: ");
 
+	if (cpu->extn_mpy.ver) {
+		if (cpu->extn_mpy.ver <= 0x2) {	/* ARCompact */
+			n += scnprintf(buf + n, len - n, "mpy ");
+		} else {
+			int opt = 2;	/* stock MPY/MPYH */
+
+			if (cpu->extn_mpy.dsp)	/* OPT 7-9 */
+				opt = cpu->extn_mpy.dsp + 6;
+
+			n += scnprintf(buf + n, len - n, "mpy[opt %d] ", opt);
+		}
+		n += scnprintf(buf + n, len - n, "%s",
+			       IS_USED(CONFIG_ARC_HAS_HW_MPY));
+	}
+
 	n += scnprintf(buf + n, len - n, "%s%s%s%s%s%s%s%s\n",
-		       IS_AVAIL1(cpu->extn_mpy.ver, "mpy "),
+		       IS_AVAIL1(cpu->isa.div_rem, "div_rem "),
 		       IS_AVAIL1(cpu->extn.norm, "norm "),
 		       IS_AVAIL1(cpu->extn.barrel, "barrel-shift "),
 		       IS_AVAIL1(cpu->extn.swap, "swap "),
diff --git a/arch/arc/kernel/signal.c b/arch/arc/kernel/signal.c
index b15d2fe9c461..004b7f0bc76c 100644
--- a/arch/arc/kernel/signal.c
+++ b/arch/arc/kernel/signal.c
@@ -336,7 +336,7 @@ static void arc_restart_syscall(struct k_sigaction *ka, struct pt_regs *regs)
 		 * their orig user space value when we ret from kernel
 		 */
 		regs->r0 = regs->orig_r0;
-		regs->ret -= 4;
+		regs->ret -= is_isa_arcv2() ? 2 : 4;
 		break;
 	}
 }
@@ -377,10 +377,10 @@ void do_signal(struct pt_regs *regs)
 		if (regs->r0 == -ERESTARTNOHAND ||
 		    regs->r0 == -ERESTARTSYS || regs->r0 == -ERESTARTNOINTR) {
 			regs->r0 = regs->orig_r0;
-			regs->ret -= 4;
+			regs->ret -= is_isa_arcv2() ? 2 : 4;
 		} else if (regs->r0 == -ERESTART_RESTARTBLOCK) {
 			regs->r8 = __NR_restart_syscall;
-			regs->ret -= 4;
+			regs->ret -= is_isa_arcv2() ? 2 : 4;
 		}
 		syscall_wont_restart(regs);	/* No more restarts */
 	}
diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c
index e00a01879025..e0cf99893212 100644
--- a/arch/arc/kernel/troubleshoot.c
+++ b/arch/arc/kernel/troubleshoot.c
@@ -14,6 +14,7 @@
 #include <linux/proc_fs.h>
 #include <linux/file.h>
 #include <asm/arcregs.h>
+#include <asm/irqflags.h>
 
 /*
  * Common routine to print scratch regs (r0-r12) or callee regs (r13-r25)
@@ -34,7 +35,10 @@ static noinline void print_reg_file(long *reg_rev, int start_num)
 			n += scnprintf(buf + n, len - n, "\n");
 
 		/* because pt_regs has regs reversed: r12..r0, r25..r13 */
-		reg_rev--;
+		if (is_isa_arcv2() && start_num == 0)
+			reg_rev++;
+		else
+			reg_rev--;
 	}
 
 	if (start_num != 0)
@@ -152,6 +156,15 @@ static void show_ecr_verbose(struct pt_regs *regs)
 				((cause_code == 0x02) ? "Write" : "EX"));
 	} else if (vec == ECR_V_INSN_ERR) {
 		pr_cont("Illegal Insn\n");
+#ifdef CONFIG_ISA_ARCV2
+	} else if (vec == ECR_V_MEM_ERR) {
+		if (cause_code == 0x00)
+			pr_cont("Bus Error from Insn Mem\n");
+		else if (cause_code == 0x10)
+			pr_cont("Bus Error from Data Mem\n");
+		else
+			pr_cont("Bus Error, check PRM\n");
+#endif
 	} else {
 		pr_cont("Check Programmer's Manual\n");
 	}
@@ -185,12 +198,20 @@ void show_regs(struct pt_regs *regs)
 
 	pr_info("[STAT32]: 0x%08lx", regs->status32);
 
-#define STS_BIT(r, bit)	r->status32 & STATUS_##bit##_MASK ? #bit : ""
-	if (!user_mode(regs))
-		pr_cont(" : %2s %2s %2s %2s %2s\n",
-			STS_BIT(regs, AE), STS_BIT(regs, A2), STS_BIT(regs, A1),
-			STS_BIT(regs, E2), STS_BIT(regs, E1));
+#define STS_BIT(r, bit)	r->status32 & STATUS_##bit##_MASK ? #bit" " : ""
 
+#ifdef CONFIG_ISA_ARCOMPACT
+	pr_cont(" : %2s%2s%2s%2s%2s%2s%2s\n",
+			(regs->status32 & STATUS_U_MASK) ? "U " : "K ",
+			STS_BIT(regs, DE), STS_BIT(regs, AE),
+			STS_BIT(regs, A2), STS_BIT(regs, A1),
+			STS_BIT(regs, E2), STS_BIT(regs, E1));
+#else
+	pr_cont(" : %2s%2s%2s%2s\n",
+			STS_BIT(regs, IE),
+			(regs->status32 & STATUS_U_MASK) ? "U " : "K ",
+			STS_BIT(regs, DE), STS_BIT(regs, AE));
+#endif
 	pr_info("BTA: 0x%08lx\t SP: 0x%08lx\t FP: 0x%08lx\n",
 		regs->bta, regs->sp, regs->fp);
 	pr_info("LPS: 0x%08lx\tLPE: 0x%08lx\tLPC: 0x%08lx\n",
diff --git a/arch/arc/mm/tlbex.S b/arch/arc/mm/tlbex.S
index d224bf0feefc..00c8d7f772bc 100644
--- a/arch/arc/mm/tlbex.S
+++ b/arch/arc/mm/tlbex.S
@@ -35,8 +35,6 @@
  * Rahul Trivedi, Amit Bhor: Codito Technologies 2004
  */
 
-	.cpu A7
-
 #include <linux/linkage.h>
 #include <asm/entry.h>
 #include <asm/mmu.h>
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 03/28] ARCv2: STAR 9000793984: Handle return from intr to Delay Slot
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
  2015-06-09 11:48 ` [PATCH 01/28] ARCv2: [intc] HS38 core interrupt controller Vineet Gupta
  2015-06-09 11:48 ` [PATCH 02/28] ARCv2: Support for ARCv2 ISA and HS38x cores Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 04/28] ARCv2: STAR 9000808988: signals involving " Vineet Gupta
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel; +Cc: arnd, arc-linux-dev, Vineet Gupta

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/kernel/asm-offsets.c |  1 +
 arch/arc/kernel/entry-arcv2.S | 50 +++++++++++++++++++++++++++++++++++++++++++
 arch/arc/kernel/setup.c       |  2 ++
 3 files changed, 53 insertions(+)

diff --git a/arch/arc/kernel/asm-offsets.c b/arch/arc/kernel/asm-offsets.c
index 6c3aa0edb9b5..b9cf23313273 100644
--- a/arch/arc/kernel/asm-offsets.c
+++ b/arch/arc/kernel/asm-offsets.c
@@ -56,6 +56,7 @@ int main(void)
 	DEFINE(PT_r5, offsetof(struct pt_regs, r5));
 	DEFINE(PT_r6, offsetof(struct pt_regs, r6));
 	DEFINE(PT_r7, offsetof(struct pt_regs, r7));
+	DEFINE(PT_ret, offsetof(struct pt_regs, ret));
 
 	DEFINE(SZ_CALLEE_REGS, sizeof(struct callee_regs));
 	DEFINE(SZ_PT_REGS, sizeof(struct pt_regs));
diff --git a/arch/arc/kernel/entry-arcv2.S b/arch/arc/kernel/entry-arcv2.S
index c59a396b7b98..bd7105d3172f 100644
--- a/arch/arc/kernel/entry-arcv2.S
+++ b/arch/arc/kernel/entry-arcv2.S
@@ -163,6 +163,9 @@ END(EV_TLBProtV)
 ;####### Return from Intr #######
 
 debug_marker_l1:
+	bbit1.nt r0, STATUS_DE_BIT, .Lintr_ret_to_delay_slot
+
+.Lisr_ret_fast_path:
 	; Handle special case #1: (Entry via Exception, Return via IRQ)
 	;
 	; Exception in U mode, preempted in kernel, Intr taken (K mode), orig
@@ -186,4 +189,51 @@ debug_marker_syscall:
 	EXCEPTION_EPILOGUE
 	rtie
 
+;####### Return from Intr to insn in delay slot #######
+
+; Handle special case #2: (Entry via Exception in Delay Slot, Return via IRQ)
+;
+; Intr returning to a Delay Slot (DS) insn
+; (since IRQ NOT allowed in DS in ARCv2, this can only happen if orig
+; entry was via Exception in DS which got preempted in kernel).
+;
+; IRQ RTIE won't reliably restore DE bit and/or BTA, needs handling
+.Lintr_ret_to_delay_slot:
+debug_marker_ds:
+
+	ld	r2, [@intr_to_DE_cnt]
+	add	r2, r2, 1
+	st	r2, [@intr_to_DE_cnt]
+
+	ld	r2, [sp, PT_ret]
+	ld	r3, [sp, PT_status32]
+
+	bic  	r0, r3, STATUS_U_MASK|STATUS_DE_MASK|STATUS_IE_MASK|STATUS_L_MASK
+	st	r0, [sp, PT_status32]
+
+	mov	r1, .Lintr_ret_to_delay_slot_2
+	st	r1, [sp, PT_ret]
+
+	st	r2, [sp, 0]
+	st	r3, [sp, 4]
+
+	b	.Lisr_ret_fast_path
+
+.Lintr_ret_to_delay_slot_2:
+	sub	sp, sp, SZ_PT_REGS
+	st	r9, [sp, -4]
+
+	ld	r9, [sp, 0]
+	sr	r9, [eret]
+
+	ld	r9, [sp, 4]
+	sr	r9, [erstatus]
+
+	ld	r9, [sp, 8]
+	sr	r9, [erbta]
+
+	ld	r9, [sp, -4]
+	add	sp, sp, SZ_PT_REGS
+	rtie
+
 END(ret_from_exception)
diff --git a/arch/arc/kernel/setup.c b/arch/arc/kernel/setup.c
index d6fe80070bbf..ca71cef4bafd 100644
--- a/arch/arc/kernel/setup.c
+++ b/arch/arc/kernel/setup.c
@@ -30,6 +30,8 @@
 
 #define FIX_PTR(x)  __asm__ __volatile__(";" : "+r"(x))
 
+unsigned int intr_to_DE_cnt;
+
 /* Part of U-boot ABI: see head.S */
 int __initdata uboot_tag;
 char __initdata *uboot_arg;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 04/28] ARCv2: STAR 9000808988: signals involving Delay Slot
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (2 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 03/28] ARCv2: STAR 9000793984: Handle return from intr to Delay Slot Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 05/28] ARCv2: STAR 9000814690: Really Re-enable interrupts to avoid deadlocks Vineet Gupta
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel; +Cc: arnd, arc-linux-dev, Vineet Gupta

Reported by Anton as LTP:munmap01 failing with Illegal Instruction
Exception.

   --------------------->8--------------------------------------
   mmap2(NULL, 24576, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x200d2000
   munmap(0x200d2000, 24576)               = 0
   --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x200d2000}
   ---
   potentially unexpected fatal signal 4.
   Path: /munmap01
   CPU: 0 PID: 61 Comm: munmap01 Not tainted 3.13.0-g5d5c46d9a556 #8
   task: 9f1a8000 ti: 9f154000 task.ti: 9f154000

   [ECR   ]: 0x00020100 => Illegal Insn
   [EFA   ]: 0x0001354c
   [BLINK ]: 0x200515d4
   [ERET  ]: 0x1354c
       @off 0x1354c in [/munmap01]
       VMA: 0x00010000 to 0x00018000
   [STAT32]: 0x800802c0
   ...
   --------------------->8--------------------------------------

The issue was
1. munmap01 accessed unmapped memory (on purpose) with signal handler
   installed for SIGSEGV

2. The faulting instruction happened to be in Delay Slot
   00011864 <main>:
      11908:	bl.d       13284 <tst_resm>
      1190c:	stb        r16,[r2]

3. kernel sets up the reg file for signal handler and correctly clears
   the DE bit in pt_regs->status32 placeholder

4. However RESTORE_CALLEE_SAVED_USER macro is not adjusted for ARCv2,
   and it over-writes the above with orig/stale value of status32

5. After RTIE, userspace signal handler executes a non branch
   instruction with DE bit set, triggering Illegal Instruction Exception.

Reported-by: Anton Kolesov <akolesov@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/entry.h  | 17 ++++++++++-------
 arch/arc/kernel/asm-offsets.c |  2 ++
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/arc/include/asm/entry.h b/arch/arc/include/asm/entry.h
index 29d0ab6e10f5..ad7860c5ce15 100644
--- a/arch/arc/include/asm/entry.h
+++ b/arch/arc/include/asm/entry.h
@@ -125,8 +125,6 @@
 	POP	r13
 .endm
 
-#define OFF_USER_R25_FROM_R24	(SZ_CALLEE_REGS + SZ_PT_REGS - 8)/4
-
 /*--------------------------------------------------------------
  * Collect User Mode callee regs as struct callee_regs - needed by
  * fork/do_signal/unaligned-access-emulation.
@@ -139,12 +137,13 @@
  *-------------------------------------------------------------*/
 .macro SAVE_CALLEE_SAVED_USER
 
+	mov	r12, sp		; save SP as ref to pt_regs
 	SAVE_R13_TO_R24
 
 #ifdef CONFIG_ARC_CURR_IN_REG
-	; Retrieve orig r25 and save it on stack
-	ld.as   r12, [sp, OFF_USER_R25_FROM_R24]
-	st.a    r12, [sp, -4]
+	; Retrieve orig r25 and save it with rest of callee_regs
+	ld.as   r12, [r12, PT_user_r25]
+	PUSH	r12
 #else
 	PUSH	r25
 #endif
@@ -191,12 +190,16 @@
 .macro RESTORE_CALLEE_SAVED_USER
 
 #ifdef CONFIG_ARC_CURR_IN_REG
-	ld.ab   r12, [sp, 4]
-	st.as   r12, [sp, OFF_USER_R25_FROM_R24]
+	POP	r12
 #else
 	POP	r25
 #endif
 	RESTORE_R24_TO_R13
+
+	; SP is back to start of pt_regs
+#ifdef CONFIG_ARC_CURR_IN_REG
+	st.as   r12, [sp, PT_user_r25]
+#endif
 .endm
 
 /*--------------------------------------------------------------
diff --git a/arch/arc/kernel/asm-offsets.c b/arch/arc/kernel/asm-offsets.c
index b9cf23313273..605281f5b301 100644
--- a/arch/arc/kernel/asm-offsets.c
+++ b/arch/arc/kernel/asm-offsets.c
@@ -60,5 +60,7 @@ int main(void)
 
 	DEFINE(SZ_CALLEE_REGS, sizeof(struct callee_regs));
 	DEFINE(SZ_PT_REGS, sizeof(struct pt_regs));
+	DEFINE(PT_user_r25, offsetof(struct pt_regs, user_r25));
+
 	return 0;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 05/28] ARCv2: STAR 9000814690: Really Re-enable interrupts to avoid deadlocks
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (3 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 04/28] ARCv2: STAR 9000808988: signals involving " Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 06/28] ARCv2: MMUv4: TLB programming Model changes Vineet Gupta
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel; +Cc: arnd, arc-linux-dev, Vineet Gupta

The issue was, on HS when interrupt is taken, IRQ_ACT is set and that is
NOT cleared unless we do RTIE (or manually clear it). Linux interrupt
handling has top and bottom halves. Latter lead to softirqs (which can
reschedule) AND expect interrupts to be REALLY re-enabled which was NOT
happening for us since we only SETI, dont clear IRQ_ACT

So we can have a state when both cores have taken interrupt (IRQ_ACT set),
get rescheduled, both send IPI and wait in CSD lock which will never be
cleared as cores can't take the pending IPI IRQ due to existing IRQ_ACT
set.

So local_irq_enable() now drops the IRQ_ACT.act bit to re-enable IRQs.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/irqflags-arcv2.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arc/include/asm/irqflags-arcv2.h b/arch/arc/include/asm/irqflags-arcv2.h
index 1eb41b00aac5..ad481c24070d 100644
--- a/arch/arc/include/asm/irqflags-arcv2.h
+++ b/arch/arc/include/asm/irqflags-arcv2.h
@@ -64,6 +64,11 @@ static inline void arch_local_irq_restore(unsigned long flags)
  */
 static inline void arch_local_irq_enable(void)
 {
+	unsigned int irqact = read_aux_reg(AUX_IRQ_ACT);
+
+	if (irqact & 0xffff)
+		write_aux_reg(AUX_IRQ_ACT, irqact & ~0xffff);
+
 	__asm__ __volatile__("	seti	\n" : : : "memory");
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 06/28] ARCv2: MMUv4: TLB programming Model changes
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (4 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 05/28] ARCv2: STAR 9000814690: Really Re-enable interrupts to avoid deadlocks Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 07/28] ARCv2: MMUv4: cache programming model changes Vineet Gupta
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel; +Cc: arnd, arc-linux-dev, Vineet Gupta

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/Kconfig               |  5 ++++
 arch/arc/include/asm/arcregs.h |  2 +-
 arch/arc/include/asm/mmu.h     | 24 ++++++++++++++++++-
 arch/arc/include/asm/pgtable.h | 10 ++++++++
 arch/arc/mm/tlb.c              | 54 +++++++++++++++++++++++++++++++++++++++---
 arch/arc/mm/tlbex.S            | 24 +++++++++++++++++++
 6 files changed, 114 insertions(+), 5 deletions(-)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index f72398847b5b..6568977bb302 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -267,6 +267,7 @@ choice
 	prompt "MMU Version"
 	default ARC_MMU_V3 if ARC_CPU_770
 	default ARC_MMU_V2 if ARC_CPU_750D
+	default ARC_MMU_V4 if ARC_CPU_HS
 
 config ARC_MMU_V1
 	bool "MMU v1"
@@ -287,6 +288,10 @@ config ARC_MMU_V3
 	  Variable Page size (1k-16k), var JTLB size 128 x (2 or 4)
 	  Shared Address Spaces (SASID)
 
+config ARC_MMU_V4
+	bool "MMU v4"
+	depends on ISA_ARCV2
+
 endchoice
 
 
diff --git a/arch/arc/include/asm/arcregs.h b/arch/arc/include/asm/arcregs.h
index d77362dbb864..3a56cd00c59e 100644
--- a/arch/arc/include/asm/arcregs.h
+++ b/arch/arc/include/asm/arcregs.h
@@ -325,7 +325,7 @@ struct bcr_generic {
  */
 
 struct cpuinfo_arc_mmu {
-	unsigned int ver:4, pg_sz_k:8, pad:8, u_dtlb:6, u_itlb:6;
+	unsigned int ver:4, pg_sz_k:8, s_pg_sz_m:8, u_dtlb:6, u_itlb:6;
 	unsigned int num_tlb:16, sets:12, ways:4;
 };
 
diff --git a/arch/arc/include/asm/mmu.h b/arch/arc/include/asm/mmu.h
index 8c84ae98c337..0f9c3eb5327e 100644
--- a/arch/arc/include/asm/mmu.h
+++ b/arch/arc/include/asm/mmu.h
@@ -15,24 +15,41 @@
 #define CONFIG_ARC_MMU_VER 2
 #elif defined(CONFIG_ARC_MMU_V3)
 #define CONFIG_ARC_MMU_VER 3
+#elif defined(CONFIG_ARC_MMU_V4)
+#define CONFIG_ARC_MMU_VER 4
 #endif
 
 /* MMU Management regs */
 #define ARC_REG_MMU_BCR		0x06f
+#if (CONFIG_ARC_MMU_VER < 4)
 #define ARC_REG_TLBPD0		0x405
 #define ARC_REG_TLBPD1		0x406
 #define ARC_REG_TLBINDEX	0x407
 #define ARC_REG_TLBCOMMAND	0x408
 #define ARC_REG_PID		0x409
 #define ARC_REG_SCRATCH_DATA0	0x418
+#else
+#define ARC_REG_TLBPD0		0x460
+#define ARC_REG_TLBPD1		0x461
+#define ARC_REG_TLBINDEX	0x464
+#define ARC_REG_TLBCOMMAND	0x465
+#define ARC_REG_PID		0x468
+#define ARC_REG_SCRATCH_DATA0	0x46c
+#endif
 
 /* Bits in MMU PID register */
-#define MMU_ENABLE		(1 << 31)	/* Enable MMU for process */
+#define __TLB_ENABLE		(1 << 31)
+#define __PROG_ENABLE		(1 << 30)
+#define MMU_ENABLE		(__TLB_ENABLE | __PROG_ENABLE)
 
 /* Error code if probe fails */
 #define TLB_LKUP_ERR		0x80000000
 
+#if (CONFIG_ARC_MMU_VER < 4)
 #define TLB_DUP_ERR	(TLB_LKUP_ERR | 0x00000001)
+#else
+#define TLB_DUP_ERR	(TLB_LKUP_ERR | 0x40000000)
+#endif
 
 /* TLB Commands */
 #define TLBWrite    0x1
@@ -45,6 +62,11 @@
 #define TLBIVUTLB   0x6		/* explicitly inv uTLBs */
 #endif
 
+#if (CONFIG_ARC_MMU_VER >= 4)
+#define TLBInsertEntry	0x7
+#define TLBDeleteEntry	0x8
+#endif
+
 #ifndef __ASSEMBLY__
 
 typedef struct {
diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h
index 9615fe1701c6..1281718802f7 100644
--- a/arch/arc/include/asm/pgtable.h
+++ b/arch/arc/include/asm/pgtable.h
@@ -72,8 +72,18 @@
 #define _PAGE_READ          (1<<3)	/* Page has user read perm (H) */
 #define _PAGE_ACCESSED      (1<<4)	/* Page is accessed (S) */
 #define _PAGE_MODIFIED      (1<<5)	/* Page modified (dirty) (S) */
+
+#if (CONFIG_ARC_MMU_VER >= 4)
+#define _PAGE_WTHRU         (1<<7)	/* Page cache mode write-thru (H) */
+#endif
+
 #define _PAGE_GLOBAL        (1<<8)	/* Page is global (H) */
 #define _PAGE_PRESENT       (1<<9)	/* TLB entry is valid (H) */
+
+#if (CONFIG_ARC_MMU_VER >= 4)
+#define _PAGE_SZ            (1<<10)	/* Page Size indicator (H) */
+#endif
+
 #define _PAGE_SHARED_CODE   (1<<11)	/* Shared Code page with cmn vaddr
 					   usable for shared TLB entries (H) */
 #endif
diff --git a/arch/arc/mm/tlb.c b/arch/arc/mm/tlb.c
index 914d8e0c0318..2c7ce8bb7475 100644
--- a/arch/arc/mm/tlb.c
+++ b/arch/arc/mm/tlb.c
@@ -113,6 +113,8 @@ static inline void __tlb_entry_erase(void)
 	write_aux_reg(ARC_REG_TLBCOMMAND, TLBWrite);
 }
 
+#if (CONFIG_ARC_MMU_VER < 4)
+
 static inline unsigned int tlb_entry_lkup(unsigned long vaddr_n_asid)
 {
 	unsigned int idx;
@@ -210,6 +212,28 @@ static void tlb_entry_insert(unsigned int pd0, unsigned int pd1)
 	write_aux_reg(ARC_REG_TLBCOMMAND, TLBWrite);
 }
 
+#else	/* CONFIG_ARC_MMU_VER >= 4) */
+
+static void utlb_invalidate(void)
+{
+	/* No need since uTLB is always in sync with JTLB */
+}
+
+static void tlb_entry_erase(unsigned int vaddr_n_asid)
+{
+	write_aux_reg(ARC_REG_TLBPD0, vaddr_n_asid | _PAGE_PRESENT);
+	write_aux_reg(ARC_REG_TLBCOMMAND, TLBDeleteEntry);
+}
+
+static void tlb_entry_insert(unsigned int pd0, unsigned int pd1)
+{
+	write_aux_reg(ARC_REG_TLBPD0, pd0);
+	write_aux_reg(ARC_REG_TLBPD1, pd1);
+	write_aux_reg(ARC_REG_TLBCOMMAND, TLBInsertEntry);
+}
+
+#endif
+
 /*
  * Un-conditionally (without lookup) erase the entire MMU contents
  */
@@ -582,6 +606,17 @@ void read_decode_mmu_bcr(void)
 #endif
 	} *mmu3;
 
+	struct bcr_mmu_4 {
+#ifdef CONFIG_CPU_BIG_ENDIAN
+	unsigned int ver:8, sasid:1, sz1:4, sz0:4, res:2, pae:1,
+		     n_ways:2, n_entry:2, n_super:2, u_itlb:3, u_dtlb:3;
+#else
+	/*           DTLB      ITLB      JES        JE         JA      */
+	unsigned int u_dtlb:3, u_itlb:3, n_super:2, n_entry:2, n_ways:2,
+		     pae:1, res:2, sz0:4, sz1:4, sasid:1, ver:8;
+#endif
+	} *mmu4;
+
 	tmp = read_aux_reg(ARC_REG_MMU_BCR);
 	mmu->ver = (tmp >> 24);
 
@@ -592,13 +627,21 @@ void read_decode_mmu_bcr(void)
 		mmu->ways = 1 << mmu2->ways;
 		mmu->u_dtlb = mmu2->u_dtlb;
 		mmu->u_itlb = mmu2->u_itlb;
-	} else {
+	} else if (mmu->ver == 3) {
 		mmu3 = (struct bcr_mmu_3 *)&tmp;
 		mmu->pg_sz_k = 1 << (mmu3->pg_sz - 1);
 		mmu->sets = 1 << mmu3->sets;
 		mmu->ways = 1 << mmu3->ways;
 		mmu->u_dtlb = mmu3->u_dtlb;
 		mmu->u_itlb = mmu3->u_itlb;
+	} else {
+		mmu4 = (struct bcr_mmu_4 *)&tmp;
+		mmu->pg_sz_k = 1 << (mmu4->sz0 - 1);
+		mmu->s_pg_sz_m = 1 << (mmu4->sz1 - 11);
+		mmu->sets = 64 << mmu4->n_entry;
+		mmu->ways = mmu4->n_ways * 2;
+		mmu->u_dtlb = mmu4->u_dtlb * 4;
+		mmu->u_itlb = mmu4->u_itlb * 4;
 	}
 
 	mmu->num_tlb = mmu->sets * mmu->ways;
@@ -608,10 +651,15 @@ char *arc_mmu_mumbojumbo(int cpu_id, char *buf, int len)
 {
 	int n = 0;
 	struct cpuinfo_arc_mmu *p_mmu = &cpuinfo_arc700[cpu_id].mmu;
+	char super_pg[64] = "";
+
+	if (p_mmu->s_pg_sz_m)
+		scnprintf(super_pg, 64, "%dM Super Page%s, ",
+			  p_mmu->s_pg_sz_m, " (not used)");
 
 	n += scnprintf(buf + n, len - n,
-		      "MMU [v%x]\t: %dk PAGE, JTLB %d (%dx%d), uDTLB %d, uITLB %d %s\n",
-		       p_mmu->ver, p_mmu->pg_sz_k,
+		      "MMU [v%x]\t: %dk PAGE, %sJTLB %d (%dx%d), uDTLB %d, uITLB %d %s\n",
+		       p_mmu->ver, p_mmu->pg_sz_k, super_pg,
 		       p_mmu->num_tlb, p_mmu->sets, p_mmu->ways,
 		       p_mmu->u_dtlb, p_mmu->u_itlb,
 		       IS_ENABLED(CONFIG_ARC_MMU_SASID) ? ",SASID" : "");
diff --git a/arch/arc/mm/tlbex.S b/arch/arc/mm/tlbex.S
index 00c8d7f772bc..8624ebd7114e 100644
--- a/arch/arc/mm/tlbex.S
+++ b/arch/arc/mm/tlbex.S
@@ -44,6 +44,7 @@
 #include <asm/processor.h>
 #include <asm/tlb-mmu1.h>
 
+#ifdef CONFIG_ISA_ARCOMPACT
 ;-----------------------------------------------------------------
 ; ARC700 Exception Handling doesn't auto-switch stack and it only provides
 ; ONE scratch AUX reg "ARC_REG_SCRATCH_DATA0"
@@ -121,6 +122,24 @@ ex_saved_reg1:
 #endif
 .endm
 
+#else	/* ARCv2 */
+
+.macro TLBMISS_FREEUP_REGS
+	PUSH  r0
+	PUSH  r1
+	PUSH  r2
+	PUSH  r3
+.endm
+
+.macro TLBMISS_RESTORE_REGS
+	POP   r3
+	POP   r2
+	POP   r1
+	POP   r0
+.endm
+
+#endif
+
 ;============================================================================
 ;  Troubleshooting Stuff
 ;============================================================================
@@ -239,6 +258,7 @@ ex_saved_reg1:
 ; Commit the TLB entry into MMU
 
 .macro COMMIT_ENTRY_TO_MMU
+#if (CONFIG_ARC_MMU_VER < 4)
 
 	/* Get free TLB slot: Set = computed from vaddr, way = random */
 	sr  TLBGetIndex, [ARC_REG_TLBCOMMAND]
@@ -249,6 +269,10 @@ ex_saved_reg1:
 #else
 	sr TLBWrite, [ARC_REG_TLBCOMMAND]
 #endif
+
+#else
+	sr TLBInsertEntry, [ARC_REG_TLBCOMMAND]
+#endif
 .endm
 
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 07/28] ARCv2: MMUv4: cache programming model changes
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (5 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 06/28] ARCv2: MMUv4: TLB programming Model changes Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 08/28] ARCv2: MMUv4: support aliasing icache config Vineet Gupta
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel; +Cc: arnd, arc-linux-dev, Vineet Gupta

Caveats about cache flush on ARCv2 based cores

- dcache is PIPT so paddr is sufficient for cache maintenance ops (no
  need to setup PTAG reg

- icache is still VIPT but only aliasing configs need PTAG setup

So basically this is departure from MMU-v3 which always need vaddr in
line ops registers (DC_IVDL, DC_FLDL, IC_IVIL) but paddr in DC_PTAG,
IC_PTAG respectively.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/Kconfig               |   2 +-
 arch/arc/include/asm/arcregs.h |   5 +-
 arch/arc/include/asm/cache.h   |   3 ++
 arch/arc/mm/cache.c            | 112 +++++++++++++++++++++++++++++++++++------
 4 files changed, 104 insertions(+), 18 deletions(-)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 6568977bb302..974ed9058018 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -223,7 +223,7 @@ config ARC_CACHE_PAGES
 
 config ARC_CACHE_VIPT_ALIASING
 	bool "Support VIPT Aliasing D$"
-	depends on ARC_HAS_DCACHE
+	depends on ARC_HAS_DCACHE && ISA_ARCOMPACT
 	default n
 
 endif	#ARC_CACHE
diff --git a/arch/arc/include/asm/arcregs.h b/arch/arc/include/asm/arcregs.h
index 3a56cd00c59e..4db1af296454 100644
--- a/arch/arc/include/asm/arcregs.h
+++ b/arch/arc/include/asm/arcregs.h
@@ -17,6 +17,7 @@
 #define ARC_REG_FP_BCR		0x6B	/* ARCompact: Single-Precision FPU */
 #define ARC_REG_DPFP_BCR	0x6C	/* ARCompact: Dbl Precision FPU */
 #define ARC_REG_FP_V2_BCR	0xc8	/* ARCv2 FPU */
+#define ARC_REG_SLC_BCR		0xce
 #define ARC_REG_DCCM_BCR	0x74	/* DCCM Present + SZ */
 #define ARC_REG_TIMERS_BCR	0x75
 #define ARC_REG_AP_BCR		0x76
@@ -330,7 +331,7 @@ struct cpuinfo_arc_mmu {
 };
 
 struct cpuinfo_arc_cache {
-	unsigned int sz_k:8, line_len:8, assoc:4, ver:4, alias:1, vipt:1, pad:6;
+	unsigned int sz_k:14, line_len:8, assoc:4, ver:4, alias:1, vipt:1;
 };
 
 struct cpuinfo_arc_bpu {
@@ -342,7 +343,7 @@ struct cpuinfo_arc_ccm {
 };
 
 struct cpuinfo_arc {
-	struct cpuinfo_arc_cache icache, dcache;
+	struct cpuinfo_arc_cache icache, dcache, slc;
 	struct cpuinfo_arc_mmu mmu;
 	struct cpuinfo_arc_bpu bpu;
 	struct bcr_identity core;
diff --git a/arch/arc/include/asm/cache.h b/arch/arc/include/asm/cache.h
index 7861255da32d..e54977a7d006 100644
--- a/arch/arc/include/asm/cache.h
+++ b/arch/arc/include/asm/cache.h
@@ -82,4 +82,7 @@ extern void read_decode_cache_bcr(void);
 #define DC_CTRL_INV_MODE_FLUSH  0x40
 #define DC_CTRL_FLUSH_STATUS    0x100
 
+/*System-level cache (L2 cache) related Auxiliary registers */
+#define ARC_REG_SLC_CFG		0x901
+
 #endif /* _ASM_CACHE_H */
diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
index 6fa5f0f7f549..7a898f57d84b 100644
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -24,6 +24,7 @@
 char *arc_cache_mumbojumbo(int c, char *buf, int len)
 {
 	int n = 0;
+	struct cpuinfo_arc_cache *p;
 
 #define PR_CACHE(p, cfg, str)						\
 	if (!(p)->ver)							\
@@ -39,6 +40,11 @@ char *arc_cache_mumbojumbo(int c, char *buf, int len)
 	PR_CACHE(&cpuinfo_arc700[c].icache, CONFIG_ARC_HAS_ICACHE, "I-Cache");
 	PR_CACHE(&cpuinfo_arc700[c].dcache, CONFIG_ARC_HAS_DCACHE, "D-Cache");
 
+	p = &cpuinfo_arc700[c].slc;
+	if (p->ver)
+		n += scnprintf(buf + n, len - n,
+			"SLC\t\t: %uK, %uB Line\n", p->sz_k, p->line_len);
+
 	return buf;
 }
 
@@ -49,7 +55,7 @@ char *arc_cache_mumbojumbo(int c, char *buf, int len)
  */
 void read_decode_cache_bcr(void)
 {
-	struct cpuinfo_arc_cache *p_ic, *p_dc;
+	struct cpuinfo_arc_cache *p_ic, *p_dc, *p_slc;
 	unsigned int cpu = smp_processor_id();
 	struct bcr_cache {
 #ifdef CONFIG_CPU_BIG_ENDIAN
@@ -59,14 +65,29 @@ void read_decode_cache_bcr(void)
 #endif
 	} ibcr, dbcr;
 
+	struct bcr_generic sbcr;
+
+	struct bcr_slc_cfg {
+#ifdef CONFIG_CPU_BIG_ENDIAN
+		unsigned int pad:24, way:2, lsz:2, sz:4;
+#else
+		unsigned int sz:4, lsz:2, way:2, pad:24;
+#endif
+	} slc_cfg;
+
 	p_ic = &cpuinfo_arc700[cpu].icache;
 	READ_BCR(ARC_REG_IC_BCR, ibcr);
 
 	if (!ibcr.ver)
 		goto dc_chk;
 
-	BUG_ON(ibcr.config != 3);
-	p_ic->assoc = 2;		/* Fixed to 2w set assoc */
+	if (ibcr.ver <= 3) {
+		BUG_ON(ibcr.config != 3);
+		p_ic->assoc = 2;		/* Fixed to 2w set assoc */
+	} else if (ibcr.ver >= 4) {
+		p_ic->assoc = 1 << ibcr.config;	/* 1,2,4,8 */
+	}
+
 	p_ic->line_len = 8 << ibcr.line_len;
 	p_ic->sz_k = 1 << (ibcr.sz - 1);
 	p_ic->ver = ibcr.ver;
@@ -78,15 +99,32 @@ dc_chk:
 	READ_BCR(ARC_REG_DC_BCR, dbcr);
 
 	if (!dbcr.ver)
-		return;
+		goto slc_chk;
+
+	if (dbcr.ver <= 3) {
+		BUG_ON(dbcr.config != 2);
+		p_dc->assoc = 4;		/* Fixed to 4w set assoc */
+		p_dc->vipt = 1;
+		p_dc->alias = p_dc->sz_k/p_dc->assoc/TO_KB(PAGE_SIZE) > 1;
+	} else if (dbcr.ver >= 4) {
+		p_dc->assoc = 1 << dbcr.config;	/* 1,2,4,8 */
+		p_dc->vipt = 0;
+		p_dc->alias = 0;		/* PIPT so can't VIPT alias */
+	}
 
-	BUG_ON(dbcr.config != 2);
-	p_dc->assoc = 4;		/* Fixed to 4w set assoc */
 	p_dc->line_len = 16 << dbcr.line_len;
 	p_dc->sz_k = 1 << (dbcr.sz - 1);
 	p_dc->ver = dbcr.ver;
-	p_dc->vipt = 1;
-	p_dc->alias = p_dc->sz_k/p_dc->assoc/TO_KB(PAGE_SIZE) > 1;
+
+slc_chk:
+	p_slc = &cpuinfo_arc700[cpu].slc;
+	READ_BCR(ARC_REG_SLC_BCR, sbcr);
+	if (sbcr.ver) {
+		READ_BCR(ARC_REG_SLC_CFG, slc_cfg);
+		p_slc->ver = sbcr.ver;
+		p_slc->sz_k = 128 << slc_cfg.sz;
+		p_slc->line_len = (slc_cfg.lsz == 0) ? 128 : 64;
+	}
 }
 
 /*
@@ -225,10 +263,53 @@ void __cache_line_loop_v3(unsigned long paddr, unsigned long vaddr,
 	}
 }
 
+/*
+ * In HS38x (MMU v4), although icache is VIPT, only paddr is needed for cache
+ * maintenance ops (in IVIL reg), as long as icache doesn't alias.
+ *
+ * For Aliasing icache, vaddr is also needed (in IVIL), while paddr is
+ * specified in PTAG (similar to MMU v3)
+ */
+static inline
+void __cache_line_loop_v4(unsigned long paddr, unsigned long vaddr,
+			  unsigned long sz, const int cacheop)
+{
+	unsigned int aux_cmd;
+	int num_lines;
+	const int full_page_op = __builtin_constant_p(sz) && sz == PAGE_SIZE;
+
+	if (cacheop == OP_INV_IC) {
+		aux_cmd = ARC_REG_IC_IVIL;
+	} else {
+		/* d$ cmd: INV (discard or wback-n-discard) OR FLUSH (wback) */
+		aux_cmd = cacheop & OP_INV ? ARC_REG_DC_IVDL : ARC_REG_DC_FLDL;
+	}
+
+	/* Ensure we properly floor/ceil the non-line aligned/sized requests
+	 * and have @paddr - aligned to cache line and integral @num_lines.
+	 * This however can be avoided for page sized since:
+	 *  -@paddr will be cache-line aligned already (being page aligned)
+	 *  -@sz will be integral multiple of line size (being page sized).
+	 */
+	if (!full_page_op) {
+		sz += paddr & ~CACHE_LINE_MASK;
+		paddr &= CACHE_LINE_MASK;
+	}
+
+	num_lines = DIV_ROUND_UP(sz, L1_CACHE_BYTES);
+
+	while (num_lines-- > 0) {
+		write_aux_reg(aux_cmd, paddr);
+		paddr += L1_CACHE_BYTES;
+	}
+}
+
 #if (CONFIG_ARC_MMU_VER < 3)
 #define __cache_line_loop	__cache_line_loop_v2
 #elif (CONFIG_ARC_MMU_VER == 3)
 #define __cache_line_loop	__cache_line_loop_v3
+#elif (CONFIG_ARC_MMU_VER > 3)
+#define __cache_line_loop	__cache_line_loop_v4
 #endif
 
 #ifdef CONFIG_ARC_HAS_DCACHE
@@ -669,7 +750,6 @@ void arc_cache_init(void)
 
 	if (IS_ENABLED(CONFIG_ARC_HAS_DCACHE)) {
 		struct cpuinfo_arc_cache *dc = &cpuinfo_arc700[cpu].dcache;
-		int handled;
 
 		if (!dc->ver)
 			panic("cache support enabled but non-existent cache\n");
@@ -678,12 +758,14 @@ void arc_cache_init(void)
 			panic("DCache line [%d] != kernel Config [%d]",
 			      dc->line_len, L1_CACHE_BYTES);
 
-		/* check for D-Cache aliasing */
-		handled = IS_ENABLED(CONFIG_ARC_CACHE_VIPT_ALIASING);
+		/* check for D-Cache aliasing on ARCompact: ARCv2 has PIPT */
+		if (is_isa_arcompact()) {
+			int handled = IS_ENABLED(CONFIG_ARC_CACHE_VIPT_ALIASING);
 
-		if (dc->alias && !handled)
-			panic("Enable CONFIG_ARC_CACHE_VIPT_ALIASING\n");
-		else if (!dc->alias && handled)
-			panic("Disable CONFIG_ARC_CACHE_VIPT_ALIASING\n");
+			if (dc->alias && !handled)
+				panic("Enable CONFIG_ARC_CACHE_VIPT_ALIASING\n");
+			else if (!dc->alias && handled)
+				panic("Disable CONFIG_ARC_CACHE_VIPT_ALIASING\n");
+		}
 	}
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 08/28] ARCv2: MMUv4: support aliasing icache config
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (6 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 07/28] ARCv2: MMUv4: cache programming model changes Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 09/28] ARCv2: optimised string/mem lib routines Vineet Gupta
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel; +Cc: arnd, arc-linux-dev, Vineet Gupta

This is also default for AXS103 release

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/cache.h |  4 +---
 arch/arc/mm/cache.c          | 14 +++++++++++++-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/arc/include/asm/cache.h b/arch/arc/include/asm/cache.h
index e54977a7d006..d21c76d6b054 100644
--- a/arch/arc/include/asm/cache.h
+++ b/arch/arc/include/asm/cache.h
@@ -60,7 +60,7 @@ extern void read_decode_cache_bcr(void);
 #define ARC_REG_IC_IVIC		0x10
 #define ARC_REG_IC_CTRL		0x11
 #define ARC_REG_IC_IVIL		0x19
-#if defined(CONFIG_ARC_MMU_V3)
+#if defined(CONFIG_ARC_MMU_V3) || defined(CONFIG_ARC_MMU_V4)
 #define ARC_REG_IC_PTAG		0x1E
 #endif
 
@@ -74,9 +74,7 @@ extern void read_decode_cache_bcr(void);
 #define ARC_REG_DC_IVDL		0x4A
 #define ARC_REG_DC_FLSH		0x4B
 #define ARC_REG_DC_FLDL		0x4C
-#if defined(CONFIG_ARC_MMU_V3)
 #define ARC_REG_DC_PTAG		0x5C
-#endif
 
 /* Bit val in DC_CTRL */
 #define DC_CTRL_INV_MODE_FLUSH  0x40
diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
index 7a898f57d84b..0eaaee60fd0b 100644
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -21,6 +21,9 @@
 #include <asm/cachectl.h>
 #include <asm/setup.h>
 
+void (*_cache_line_loop_ic_fn)(unsigned long paddr, unsigned long vaddr,
+			       unsigned long sz, const int cacheop);
+
 char *arc_cache_mumbojumbo(int c, char *buf, int len)
 {
 	int n = 0;
@@ -414,7 +417,7 @@ __ic_line_inv_vaddr_local(unsigned long paddr, unsigned long vaddr,
 	unsigned long flags;
 
 	local_irq_save(flags);
-	__cache_line_loop(paddr, vaddr, sz, OP_INV_IC);
+	(*_cache_line_loop_ic_fn)(paddr, vaddr, sz, OP_INV_IC);
 	local_irq_restore(flags);
 }
 
@@ -746,6 +749,15 @@ void arc_cache_init(void)
 		if (ic->ver != CONFIG_ARC_MMU_VER)
 			panic("Cache ver [%d] doesn't match MMU ver [%d]\n",
 			      ic->ver, CONFIG_ARC_MMU_VER);
+
+		/*
+		 * In MMU v4 (HS38x) the alising icache config uses IVIL/PTAG
+		 * pair to provide vaddr/paddr respectively, just as in MMU v3
+		 */
+		if (is_isa_arcv2() && ic->alias)
+			_cache_line_loop_ic_fn = __cache_line_loop_v3;
+		else
+			_cache_line_loop_ic_fn = __cache_line_loop;
 	}
 
 	if (IS_ENABLED(CONFIG_ARC_HAS_DCACHE)) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 09/28] ARCv2: optimised string/mem lib routines
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (7 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 08/28] ARCv2: MMUv4: support aliasing icache config Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 10/28] ARCv2: Adhere to Zero Delay loop restriction Vineet Gupta
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: arnd, arc-linux-dev, Claudiu Zissulescu, Vineet Gupta

From: Claudiu Zissulescu <claziss@synopsys.com>

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/lib/Makefile       |   6 +-
 arch/arc/lib/memcpy-archs.S | 236 ++++++++++++++++++++++++++++++++++++++++++++
 arch/arc/lib/memset-archs.S |  93 +++++++++++++++++
 arch/arc/lib/strcmp-archs.S |  78 +++++++++++++++
 4 files changed, 411 insertions(+), 2 deletions(-)
 create mode 100644 arch/arc/lib/memcpy-archs.S
 create mode 100644 arch/arc/lib/memset-archs.S
 create mode 100644 arch/arc/lib/strcmp-archs.S

diff --git a/arch/arc/lib/Makefile b/arch/arc/lib/Makefile
index db46e200baba..b1656d156097 100644
--- a/arch/arc/lib/Makefile
+++ b/arch/arc/lib/Makefile
@@ -5,5 +5,7 @@
 # it under the terms of the GNU General Public License version 2 as
 # published by the Free Software Foundation.
 
-lib-y	:= strchr-700.o strcmp.o strcpy-700.o strlen.o
-lib-y	+= memcmp.o memcpy-700.o memset.o
+lib-y	:= strchr-700.o strcpy-700.o strlen.o memcmp.o
+
+lib-$(CONFIG_ISA_ARCOMPACT)	+= memcpy-700.o memset.o strcmp.o
+lib-$(CONFIG_ISA_ARCV2)		+= memcpy-archs.o memset-archs.o strcmp-archs.o
diff --git a/arch/arc/lib/memcpy-archs.S b/arch/arc/lib/memcpy-archs.S
new file mode 100644
index 000000000000..1b2b3acfed52
--- /dev/null
+++ b/arch/arc/lib/memcpy-archs.S
@@ -0,0 +1,236 @@
+/*
+ * Copyright (C) 2014-15 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+
+#ifdef __LITTLE_ENDIAN__
+# define SHIFT_1(RX,RY,IMM)	asl	RX, RY, IMM	; <<
+# define SHIFT_2(RX,RY,IMM)	lsr	RX, RY, IMM	; >>
+# define MERGE_1(RX,RY,IMM)	asl	RX, RY, IMM
+# define MERGE_2(RX,RY,IMM)
+# define EXTRACT_1(RX,RY,IMM)	and	RX, RY, 0xFFFF
+# define EXTRACT_2(RX,RY,IMM)	lsr	RX, RY, IMM
+#else
+# define SHIFT_1(RX,RY,IMM)	lsr	RX, RY, IMM	; >>
+# define SHIFT_2(RX,RY,IMM)	asl	RX, RY, IMM	; <<
+# define MERGE_1(RX,RY,IMM)	asl	RX, RY, IMM	; <<
+# define MERGE_2(RX,RY,IMM)	asl	RX, RY, IMM	; <<
+# define EXTRACT_1(RX,RY,IMM)	lsr	RX, RY, IMM
+# define EXTRACT_2(RX,RY,IMM)	lsr	RX, RY, 0x08
+#endif
+
+#ifdef CONFIG_ARC_HAS_LL64
+# define PREFETCH_READ(RX)	prefetch    [RX, 56]
+# define PREFETCH_WRITE(RX)	prefetchw   [RX, 64]
+# define LOADX(DST,RX)		ldd.ab	DST, [RX, 8]
+# define STOREX(SRC,RX)		std.ab	SRC, [RX, 8]
+# define ZOLSHFT		5
+# define ZOLAND			0x1F
+#else
+# define PREFETCH_READ(RX)	prefetch    [RX, 28]
+# define PREFETCH_WRITE(RX)	prefetchw   [RX, 32]
+# define LOADX(DST,RX)		ld.ab	DST, [RX, 4]
+# define STOREX(SRC,RX)		st.ab	SRC, [RX, 4]
+# define ZOLSHFT		4
+# define ZOLAND			0xF
+#endif
+
+ENTRY(memcpy)
+	prefetch [r1]		; Prefetch the read location
+	prefetchw [r0]		; Prefetch the write location
+	mov.f	0, r2
+;;; if size is zero
+	jz.d	[blink]
+	mov	r3, r0		; don;t clobber ret val
+
+;;; if size <= 8
+	cmp	r2, 8
+	bls.d	@smallchunk
+	mov.f	lp_count, r2
+
+	and.f	r4, r0, 0x03
+	rsub	lp_count, r4, 4
+	lpnz	@aligndestination
+	;; LOOP BEGIN
+	ldb.ab	r5, [r1,1]
+	sub	r2, r2, 1
+	stb.ab	r5, [r3,1]
+aligndestination:
+
+;;; Check the alignment of the source
+	and.f	r4, r1, 0x03
+	bnz.d	@sourceunaligned
+
+;;; CASE 0: Both source and destination are 32bit aligned
+;;; Convert len to Dwords, unfold x4
+	lsr.f	lp_count, r2, ZOLSHFT
+	lpnz	@copy32_64bytes
+	;; LOOP START
+	LOADX (r6, r1)
+	PREFETCH_READ (r1)
+	PREFETCH_WRITE (r3)
+	LOADX (r8, r1)
+	LOADX (r10, r1)
+	LOADX (r4, r1)
+	STOREX (r6, r3)
+	STOREX (r8, r3)
+	STOREX (r10, r3)
+	STOREX (r4, r3)
+copy32_64bytes:
+
+	and.f	lp_count, r2, ZOLAND ;Last remaining 31 bytes
+smallchunk:
+	lpnz	@copyremainingbytes
+	;; LOOP START
+	ldb.ab	r5, [r1,1]
+	stb.ab	r5, [r3,1]
+copyremainingbytes:
+
+	j	[blink]
+;;; END CASE 0
+
+sourceunaligned:
+	cmp	r4, 2
+	beq.d	@unalignedOffby2
+	sub	r2, r2, 1
+
+	bhi.d	@unalignedOffby3
+	ldb.ab	r5, [r1, 1]
+
+;;; CASE 1: The source is unaligned, off by 1
+	;; Hence I need to read 1 byte for a 16bit alignment
+	;; and 2bytes to reach 32bit alignment
+	ldh.ab	r6, [r1, 2]
+	sub	r2, r2, 2
+	;; Convert to words, unfold x2
+	lsr.f	lp_count, r2, 3
+	MERGE_1 (r6, r6, 8)
+	MERGE_2 (r5, r5, 24)
+	or	r5, r5, r6
+
+	;; Both src and dst are aligned
+	lpnz	@copy8bytes_1
+	;; LOOP START
+	ld.ab	r6, [r1, 4]
+	prefetch [r1, 28]	;Prefetch the next read location
+	ld.ab	r8, [r1,4]
+	prefetchw [r3, 32]	;Prefetch the next write location
+
+	SHIFT_1	(r7, r6, 24)
+	or	r7, r7, r5
+	SHIFT_2	(r5, r6, 8)
+
+	SHIFT_1	(r9, r8, 24)
+	or	r9, r9, r5
+	SHIFT_2	(r5, r8, 8)
+
+	st.ab	r7, [r3, 4]
+	st.ab	r9, [r3, 4]
+copy8bytes_1:
+
+	;; Write back the remaining 16bits
+	EXTRACT_1 (r6, r5, 16)
+	sth.ab	r6, [r3, 2]
+	;; Write back the remaining 8bits
+	EXTRACT_2 (r5, r5, 16)
+	stb.ab	r5, [r3, 1]
+
+	and.f	lp_count, r2, 0x07 ;Last 8bytes
+	lpnz	@copybytewise_1
+	;; LOOP START
+	ldb.ab	r6, [r1,1]
+	stb.ab	r6, [r3,1]
+copybytewise_1:
+	j	[blink]
+
+unalignedOffby2:
+;;; CASE 2: The source is unaligned, off by 2
+	ldh.ab	r5, [r1, 2]
+	sub	r2, r2, 1
+
+	;; Both src and dst are aligned
+	;; Convert to words, unfold x2
+	lsr.f	lp_count, r2, 3
+#ifdef __BIG_ENDIAN__
+	asl.nz	r5, r5, 16
+#endif
+	lpnz	@copy8bytes_2
+	;; LOOP START
+	ld.ab	r6, [r1, 4]
+	prefetch [r1, 28]	;Prefetch the next read location
+	ld.ab	r8, [r1,4]
+	prefetchw [r3, 32]	;Prefetch the next write location
+
+	SHIFT_1	(r7, r6, 16)
+	or	r7, r7, r5
+	SHIFT_2	(r5, r6, 16)
+
+	SHIFT_1	(r9, r8, 16)
+	or	r9, r9, r5
+	SHIFT_2	(r5, r8, 16)
+
+	st.ab	r7, [r3, 4]
+	st.ab	r9, [r3, 4]
+copy8bytes_2:
+
+#ifdef __BIG_ENDIAN__
+	lsr.nz	r5, r5, 16
+#endif
+	sth.ab	r5, [r3, 2]
+
+	and.f	lp_count, r2, 0x07 ;Last 8bytes
+	lpnz	@copybytewise_2
+	;; LOOP START
+	ldb.ab	r6, [r1,1]
+	stb.ab	r6, [r3,1]
+copybytewise_2:
+	j	[blink]
+
+unalignedOffby3:
+;;; CASE 3: The source is unaligned, off by 3
+;;; Hence, I need to read 1byte for achieve the 32bit alignment
+
+	;; Both src and dst are aligned
+	;; Convert to words, unfold x2
+	lsr.f	lp_count, r2, 3
+#ifdef __BIG_ENDIAN__
+	asl.ne	r5, r5, 24
+#endif
+	lpnz	@copy8bytes_3
+	;; LOOP START
+	ld.ab	r6, [r1, 4]
+	prefetch [r1, 28]	;Prefetch the next read location
+	ld.ab	r8, [r1,4]
+	prefetch [r3, 32]	;Prefetch the next write location
+
+	SHIFT_1	(r7, r6, 8)
+	or	r7, r7, r5
+	SHIFT_2	(r5, r6, 24)
+
+	SHIFT_1	(r9, r8, 8)
+	or	r9, r9, r5
+	SHIFT_2	(r5, r8, 24)
+
+	st.ab	r7, [r3, 4]
+	st.ab	r9, [r3, 4]
+copy8bytes_3:
+
+#ifdef __BIG_ENDIAN__
+	lsr.nz	r5, r5, 24
+#endif
+	stb.ab	r5, [r3, 1]
+
+	and.f	lp_count, r2, 0x07 ;Last 8bytes
+	lpnz	@copybytewise_3
+	;; LOOP START
+	ldb.ab	r6, [r1,1]
+	stb.ab	r6, [r3,1]
+copybytewise_3:
+	j	[blink]
+
+END(memcpy)
diff --git a/arch/arc/lib/memset-archs.S b/arch/arc/lib/memset-archs.S
new file mode 100644
index 000000000000..92d573c734b5
--- /dev/null
+++ b/arch/arc/lib/memset-archs.S
@@ -0,0 +1,93 @@
+/*
+ * Copyright (C) 2014-15 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+
+#undef PREALLOC_NOT_AVAIL
+
+#ifdef PREALLOC_NOT_AVAIL
+#define PREWRITE(A,B)	prefetchw [(A),(B)]
+#else
+#define PREWRITE(A,B)	prealloc [(A),(B)]
+#endif
+
+ENTRY(memset)
+	prefetchw [r0]		; Prefetch the write location
+	mov.f	0, r2
+;;; if size is zero
+	jz.d	[blink]
+	mov	r3, r0		; don't clobber ret val
+
+;;; if length < 8
+	brls.d.nt	r2, 8, .Lsmallchunk
+	mov.f	lp_count,r2
+
+	and.f	r4, r0, 0x03
+	rsub	lp_count, r4, 4
+	lpnz	@.Laligndestination
+	;; LOOP BEGIN
+	stb.ab	r1, [r3,1]
+	sub	r2, r2, 1
+.Laligndestination:
+
+;;; Destination is aligned
+	and	r1, r1, 0xFF
+	asl	r4, r1, 8
+	or	r4, r4, r1
+	asl	r5, r4, 16
+	or	r5, r5, r4
+	mov	r4, r5
+
+	sub3	lp_count, r2, 8
+	cmp     r2, 64
+	bmsk.hi	r2, r2, 5
+	mov.ls	lp_count, 0
+	add3.hi	r2, r2, 8
+
+;;; Convert len to Dwords, unfold x8
+	lsr.f	lp_count, lp_count, 6
+	lpnz	@.Lset64bytes
+	;; LOOP START
+	PREWRITE(r3, 64)	;Prefetch the next write location
+	std.ab	r4, [r3, 8]
+	std.ab	r4, [r3, 8]
+	std.ab	r4, [r3, 8]
+	std.ab	r4, [r3, 8]
+	std.ab	r4, [r3, 8]
+	std.ab	r4, [r3, 8]
+	std.ab	r4, [r3, 8]
+	std.ab	r4, [r3, 8]
+.Lset64bytes:
+
+	lsr.f	lp_count, r2, 5 ;Last remaining  max 124 bytes
+	lpnz	.Lset32bytes
+	;; LOOP START
+	prefetchw   [r3, 32]	;Prefetch the next write location
+	std.ab	r4, [r3, 8]
+	std.ab	r4, [r3, 8]
+	std.ab	r4, [r3, 8]
+	std.ab	r4, [r3, 8]
+.Lset32bytes:
+
+	and.f	lp_count, r2, 0x1F ;Last remaining 31 bytes
+.Lsmallchunk:
+	lpnz	.Lcopy3bytes
+	;; LOOP START
+	stb.ab	r1, [r3, 1]
+.Lcopy3bytes:
+
+	j	[blink]
+
+END(memset)
+
+ENTRY(memzero)
+    ; adjust bzero args to memset args
+    mov r2, r1
+    b.d  memset    ;tail call so need to tinker with blink
+    mov r1, 0
+END(memzero)
diff --git a/arch/arc/lib/strcmp-archs.S b/arch/arc/lib/strcmp-archs.S
new file mode 100644
index 000000000000..4f338eec3365
--- /dev/null
+++ b/arch/arc/lib/strcmp-archs.S
@@ -0,0 +1,78 @@
+/*
+ * Copyright (C) 2014-15 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+
+ENTRY(strcmp)
+	or	r2, r0, r1
+	bmsk_s	r2, r2, 1
+	brne	r2, 0, @.Lcharloop
+
+;;; s1 and s2 are word aligned
+	ld.ab	r2, [r0, 4]
+
+	mov_s	r12, 0x01010101
+	ror	r11, r12
+	.align  4
+.LwordLoop:
+	ld.ab	r3, [r1, 4]
+	;; Detect NULL char in str1
+	sub	r4, r2, r12
+	ld.ab	r5, [r0, 4]
+	bic	r4, r4, r2
+	and	r4, r4, r11
+	brne.d.nt	r4, 0, .LfoundNULL
+	;; Check if the read locations are the same
+	cmp	r2, r3
+	beq.d	.LwordLoop
+	mov.eq	r2, r5
+
+	;; A match is found, spot it out
+#ifdef __LITTLE_ENDIAN__
+	swape	r3, r3
+	mov_s	r0, 1
+	swape	r2, r2
+#else
+	mov_s	r0, 1
+#endif
+	cmp_s	r2, r3
+	j_s.d	[blink]
+	bset.lo	r0, r0, 31
+
+	.align 4
+.LfoundNULL:
+#ifdef __BIG_ENDIAN__
+	swape	r4, r4
+	swape	r2, r2
+	swape	r3, r3
+#endif
+	;; Find null byte
+	ffs	r0, r4
+	bmsk	r2, r2, r0
+	bmsk	r3, r3, r0
+	swape	r2, r2
+	swape	r3, r3
+	;; make the return value
+	sub.f	r0, r2, r3
+	mov.hi	r0, 1
+	j_s.d	[blink]
+	bset.lo	r0, r0, 31
+
+	.align 4
+.Lcharloop:
+	ldb.ab	r2, [r0, 1]
+	ldb.ab	r3, [r1, 1]
+	nop
+	breq	r2, 0, .Lcmpend
+	breq	r2, r3, .Lcharloop
+
+	.align 4
+.Lcmpend:
+	j_s.d	[blink]
+	sub	r0, r2, r3
+END(strcmp)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 10/28] ARCv2: Adhere to Zero Delay loop restriction
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (8 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 09/28] ARCv2: optimised string/mem lib routines Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 11/28] ARCv2: extable: Enable sorting at build time Vineet Gupta
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel; +Cc: arnd, arc-linux-dev, Vineet Gupta

Branch insn can't be scheduled as last insn of Zero Overhead loop

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/delay.h   |  9 ++++-----
 arch/arc/include/asm/uaccess.h | 17 ++++++++---------
 arch/arc/lib/memcmp.S          | 30 +++++++++++++++++++++++++++++-
 3 files changed, 41 insertions(+), 15 deletions(-)

diff --git a/arch/arc/include/asm/delay.h b/arch/arc/include/asm/delay.h
index 43de30256981..08e7e2a16ac1 100644
--- a/arch/arc/include/asm/delay.h
+++ b/arch/arc/include/asm/delay.h
@@ -22,11 +22,10 @@
 static inline void __delay(unsigned long loops)
 {
 	__asm__ __volatile__(
-	"1:	sub.f %0, %0, 1	\n"
-	"	jpnz 1b		\n"
-	: "+r"(loops)
-	:
-	: "cc");
+	"	lp  1f	\n"
+	"	nop	\n"
+	"1:		\n"
+	: "+l"(loops));
 }
 
 extern void __bad_udelay(void);
diff --git a/arch/arc/include/asm/uaccess.h b/arch/arc/include/asm/uaccess.h
index 30c9baffa96f..d1da6032b715 100644
--- a/arch/arc/include/asm/uaccess.h
+++ b/arch/arc/include/asm/uaccess.h
@@ -659,31 +659,30 @@ static inline unsigned long __arc_clear_user(void __user *to, unsigned long n)
 static inline long
 __arc_strncpy_from_user(char *dst, const char __user *src, long count)
 {
-	long res = count;
+	long res = 0;
 	char val;
-	unsigned int hw_count;
 
 	if (count == 0)
 		return 0;
 
 	__asm__ __volatile__(
-	"	lp 2f		\n"
+	"	lp	3f			\n"
 	"1:	ldb.ab  %3, [%2, 1]		\n"
-	"	breq.d  %3, 0, 2f		\n"
+	"	breq.d	%3, 0, 3f               \n"
 	"	stb.ab  %3, [%1, 1]		\n"
-	"2:	sub %0, %6, %4			\n"
-	"3:	;nop				\n"
+	"	add	%0, %0, 1	# Num of NON NULL bytes copied	\n"
+	"3:								\n"
 	"	.section .fixup, \"ax\"		\n"
 	"	.align 4			\n"
-	"4:	mov %0, %5			\n"
+	"4:	mov %0, %4		# sets @res as -EFAULT	\n"
 	"	j   3b				\n"
 	"	.previous			\n"
 	"	.section __ex_table, \"a\"	\n"
 	"	.align 4			\n"
 	"	.word   1b, 4b			\n"
 	"	.previous			\n"
-	: "=r"(res), "+r"(dst), "+r"(src), "=&r"(val), "=l"(hw_count)
-	: "g"(-EFAULT), "ir"(count), "4"(count)	/* this "4" seeds lp_count */
+	: "+r"(res), "+r"(dst), "+r"(src), "=r"(val)
+	: "g"(-EFAULT), "l"(count)
 	: "memory");
 
 	return res;
diff --git a/arch/arc/lib/memcmp.S b/arch/arc/lib/memcmp.S
index 978bf8314dfb..a4015e7d9ab7 100644
--- a/arch/arc/lib/memcmp.S
+++ b/arch/arc/lib/memcmp.S
@@ -24,14 +24,32 @@ ENTRY(memcmp)
 	ld	r4,[r0,0]
 	ld	r5,[r1,0]
 	lsr.f	lp_count,r3,3
+#ifdef CONFIG_ISA_ARCV2
+	/* In ARCv2 a branch can't be the last instruction in a zero overhead
+	 * loop.
+	 * So we move the branch to the start of the loop, duplicate it
+	 * after the end, and set up r12 so that the branch isn't taken
+	 *  initially.
+	 */
+	mov_s	r12,WORD2
+	lpne	.Loop_end
+	brne	WORD2,r12,.Lodd
+	ld	WORD2,[r0,4]
+#else
 	lpne	.Loop_end
 	ld_s	WORD2,[r0,4]
+#endif
 	ld_s	r12,[r1,4]
 	brne	r4,r5,.Leven
 	ld.a	r4,[r0,8]
 	ld.a	r5,[r1,8]
+#ifdef CONFIG_ISA_ARCV2
+.Loop_end:
+	brne	WORD2,r12,.Lodd
+#else
 	brne	WORD2,r12,.Lodd
 .Loop_end:
+#endif
 	asl_s	SHIFT,SHIFT,3
 	bhs_s	.Last_cmp
 	brne	r4,r5,.Leven
@@ -89,7 +107,6 @@ ENTRY(memcmp)
 	bset.cs	r0,r0,31
 .Lodd:
 	cmp_s	WORD2,r12
-
 	mov_s	r0,1
 	j_s.d	[blink]
 	bset.cs	r0,r0,31
@@ -100,14 +117,25 @@ ENTRY(memcmp)
 	ldb	r4,[r0,0]
 	ldb	r5,[r1,0]
 	lsr.f	lp_count,r3
+#ifdef CONFIG_ISA_ARCV2
+	mov	r12,r3
 	lpne	.Lbyte_end
+	brne	r3,r12,.Lbyte_odd
+#else
+	lpne	.Lbyte_end
+#endif
 	ldb_s	r3,[r0,1]
 	ldb	r12,[r1,1]
 	brne	r4,r5,.Lbyte_even
 	ldb.a	r4,[r0,2]
 	ldb.a	r5,[r1,2]
+#ifdef CONFIG_ISA_ARCV2
+.Lbyte_end:
+	brne	r3,r12,.Lbyte_odd
+#else
 	brne	r3,r12,.Lbyte_odd
 .Lbyte_end:
+#endif
 	bcc	.Lbyte_even
 	brne	r4,r5,.Lbyte_even
 	ldb_s	r3,[r0,1]
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 11/28] ARCv2: extable: Enable sorting at build time
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (9 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 10/28] ARCv2: Adhere to Zero Delay loop restriction Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-24  5:51   ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 12/28] ARCv2: clocksource: Introduce 64bit local RTC counter Vineet Gupta
                   ` (16 subsequent siblings)
  27 siblings, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel; +Cc: arnd, arc-linux-dev, Vineet Gupta

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 scripts/sortextable.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/scripts/sortextable.c b/scripts/sortextable.c
index 1052d4834a44..c2423d913b46 100644
--- a/scripts/sortextable.c
+++ b/scripts/sortextable.c
@@ -47,6 +47,10 @@
 #define EM_MICROBLAZE	189
 #endif
 
+#ifndef EM_ARCV2
+#define EM_ARCV2	195
+#endif
+
 static int fd_map;	/* File descriptor for file being modified. */
 static int mmap_failed; /* Boolean flag. */
 static void *ehdr_curr; /* current ElfXX_Ehdr *  for resource cleanup */
@@ -281,6 +285,7 @@ do_file(char const *const fname)
 		custom_sort = sort_relative_table;
 		break;
 	case EM_ARCOMPACT:
+	case EM_ARCV2:
 	case EM_ARM:
 	case EM_AARCH64:
 	case EM_MICROBLAZE:
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 12/28] ARCv2: clocksource: Introduce 64bit local RTC counter
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (10 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 11/28] ARCv2: extable: Enable sorting at build time Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 13/28] ARC: make plat_smp_ops weak to allow over-rides Vineet Gupta
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: arnd, arc-linux-dev, Vineet Gupta, Daniel Lezcano, Thomas Gleixner

Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/Kconfig        |  5 +++++
 arch/arc/kernel/setup.c |  9 +++++++--
 arch/arc/kernel/time.c  | 50 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 974ed9058018..f09e03a0d604 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -372,6 +372,11 @@ config ARC_HAS_LL64
 	  dest operands with 2 possible source operands.
 	default y
 
+config ARC_HAS_RTC
+	bool "Local 64-bit r/o cycle counter"
+	default n
+	depends on !SMP
+
 config ARC_NUMBER_OF_INTERRUPTS
 	int "Number of interrupts"
 	range 8 240
diff --git a/arch/arc/kernel/setup.c b/arch/arc/kernel/setup.c
index ca71cef4bafd..a3d186211ed3 100644
--- a/arch/arc/kernel/setup.c
+++ b/arch/arc/kernel/setup.c
@@ -200,9 +200,11 @@ static char *arc_cpu_mumbojumbo(int cpu_id, char *buf, int len)
 		       (unsigned int)(arc_get_core_freq() / 1000000),
 		       (unsigned int)(arc_get_core_freq() / 10000) % 100);
 
-	n += scnprintf(buf + n, len - n, "Timers\t\t: %s%s\nISA Extn\t: ",
+	n += scnprintf(buf + n, len - n, "Timers\t\t: %s%s%s%s\nISA Extn\t: ",
 		       IS_AVAIL1(cpu->timers.t0, "Timer0 "),
-		       IS_AVAIL1(cpu->timers.t1, "Timer1 "));
+		       IS_AVAIL1(cpu->timers.t1, "Timer1 "),
+		       IS_AVAIL2(cpu->timers.rtc, "64-bit RTC ",
+				 CONFIG_ARC_HAS_RTC));
 
 	n += i = scnprintf(buf + n, len - n, "%s%s%s%s%s",
 			   IS_AVAIL2(atomic, "atomic ", CONFIG_ARC_HAS_LLSC),
@@ -290,6 +292,9 @@ static void arc_chk_core_config(void)
 	if (!cpu->timers.t1)
 		panic("Timer1 is not present!\n");
 
+	if (IS_ENABLED(CONFIG_ARC_HAS_RTC) && !cpu->timers.rtc)
+		panic("RTC is not present\n");
+
 #ifdef CONFIG_ARC_HAS_DCCM
 	/*
 	 * DCCM can be arbit placed in hardware.
diff --git a/arch/arc/kernel/time.c b/arch/arc/kernel/time.c
index 71493f75ae6b..da495478a40b 100644
--- a/arch/arc/kernel/time.c
+++ b/arch/arc/kernel/time.c
@@ -60,6 +60,54 @@
 
 /********** Clock Source Device *********/
 
+#ifdef CONFIG_ARC_HAS_RTC
+
+#define AUX_RTC_CTRL	0x103
+#define AUX_RTC_LOW	0x104
+#define AUX_RTC_HIGH	0x105
+
+int arc_counter_setup(void)
+{
+	write_aux_reg(AUX_RTC_CTRL, 1);
+
+	/* Not usable in SMP */
+	return !IS_ENABLED(CONFIG_SMP);
+}
+
+static cycle_t arc_counter_read(struct clocksource *cs)
+{
+	unsigned long status;
+	union {
+#ifdef CONFIG_CPU_BIG_ENDIAN
+		struct { u32 high, low; };
+#else
+		struct { u32 low, high; };
+#endif
+		cycle_t  full;
+	} stamp;
+
+
+	__asm__ __volatile(
+	"1:						\n"
+	"	lr		%0, [AUX_RTC_LOW]	\n"
+	"	lr		%1, [AUX_RTC_HIGH]	\n"
+	"	lr		%2, [AUX_RTC_CTRL]	\n"
+	"	bbit0.nt	%2, 31, 1b		\n"
+	: "=r" (stamp.low), "=r" (stamp.high), "=r" (status));
+
+	return stamp.full;
+}
+
+static struct clocksource arc_counter = {
+	.name   = "ARCv2 RTC",
+	.rating = 350,
+	.read   = arc_counter_read,
+	.mask   = CLOCKSOURCE_MASK(64),
+	.flags  = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+#else /* !CONFIG_ARC_HAS_RTC */
+
 /*
  * set 32bit TIMER1 to keep counting monotonically and wraparound
  */
@@ -86,6 +134,8 @@ static struct clocksource arc_counter = {
 	.flags  = CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
+#endif
+
 /********** Clock Event Device *********/
 
 /*
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 13/28] ARC: make plat_smp_ops weak to allow over-rides
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (11 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 12/28] ARCv2: clocksource: Introduce 64bit local RTC counter Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 14/28] ARCv2: SMP: ARConnect debug/robustness Vineet Gupta
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel; +Cc: arnd, arc-linux-dev, Vineet Gupta

This allows platforms to provide their own cpu wakeup routines
as well as IPI send / clear backends, while allowing a SMP kernel w/o
any such backend to build/boot

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/Kconfig             |  21 +++++---
 arch/arc/include/asm/irq.h   |   1 +
 arch/arc/include/asm/mcip.h  |  91 +++++++++++++++++++++++++++++++++
 arch/arc/kernel/Makefile     |   1 +
 arch/arc/kernel/intc-arcv2.c |   2 +-
 arch/arc/kernel/mcip.c       | 117 +++++++++++++++++++++++++++++++++++++++++++
 arch/arc/kernel/smp.c        |   2 +-
 arch/arc/plat-sim/platform.c |   5 ++
 8 files changed, 231 insertions(+), 9 deletions(-)
 create mode 100644 arch/arc/include/asm/mcip.h
 create mode 100644 arch/arc/kernel/mcip.c

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index f09e03a0d604..301525020af7 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -160,12 +160,12 @@ config CPU_BIG_ENDIAN
 	  Build kernel for Big Endian Mode of ARC CPU
 
 config SMP
-	bool "Symmetric Multi-Processing (Incomplete)"
+	bool "Symmetric Multi-Processing"
 	default n
+	select ARC_HAS_COH_CACHES if ISA_ARCV2
+	select ARC_MCIP if ISA_ARCV2
 	help
-	  This enables support for systems with more than one CPU. If you have
-	  a system with only one CPU, say N. If you have a system with more
-	  than one CPU, say Y.
+	  This enables support for systems with more than one CPU.
 
 if SMP
 
@@ -175,13 +175,20 @@ config ARC_HAS_COH_CACHES
 config ARC_HAS_REENTRANT_IRQ_LV2
 	def_bool n
 
-endif	#SMP
+config ARC_MCIP
+	bool "ARConnect Multicore IP (MCIP) Support "
+	depends on ISA_ARCV2
+	help
+	  This IP block enables SMP in ARC-HS38 cores.
+	  It provides for cross-core interrupts, multi-core debug
+	  hardware semaphores, shared memory,....
 
 config NR_CPUS
 	int "Maximum number of CPUs (2-4096)"
 	range 2 4096
-	depends on SMP
-	default "2"
+	default "4"
+
+endif	#SMP
 
 menuconfig ARC_CACHE
 	bool "Enable Cache Support"
diff --git a/arch/arc/include/asm/irq.h b/arch/arc/include/asm/irq.h
index 49014f0ef36d..bc5103637326 100644
--- a/arch/arc/include/asm/irq.h
+++ b/arch/arc/include/asm/irq.h
@@ -19,6 +19,7 @@
 #else
 #define TIMER0_IRQ      16
 #define TIMER1_IRQ      17
+#define IPI_IRQ         19
 #endif
 
 #include <linux/interrupt.h>
diff --git a/arch/arc/include/asm/mcip.h b/arch/arc/include/asm/mcip.h
new file mode 100644
index 000000000000..31f9bac77a27
--- /dev/null
+++ b/arch/arc/include/asm/mcip.h
@@ -0,0 +1,91 @@
+/*
+ * ARConnect IP Support (Multi core enabler: Cross core IPI, RTC ...)
+ *
+ * Copyright (C) 2014-15 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __ASM_MCIP_H
+#define __ASM_MCIP_H
+
+#ifdef CONFIG_ISA_ARCV2
+
+#include <asm/arcregs.h>
+
+#define ARC_REG_MCIP_BCR	0x0d0
+#define ARC_REG_MCIP_CMD	0x600
+#define ARC_REG_MCIP_WDATA	0x601
+#define ARC_REG_MCIP_READBACK	0x602
+
+struct mcip_cmd {
+#ifdef CONFIG_CPU_BIG_ENDIAN
+	unsigned int pad:8, param:16, cmd:8;
+#else
+	unsigned int cmd:8, param:16, pad:8;
+#endif
+
+#define CMD_INTRPT_GENERATE_IRQ		0x01
+#define CMD_INTRPT_GENERATE_ACK		0x02
+#define CMD_INTRPT_READ_STATUS		0x03
+#define CMD_INTRPT_CHECK_SOURCE		0x04
+
+/* Semaphore Commands */
+#define CMD_SEMA_CLAIM_AND_READ		0x11
+#define CMD_SEMA_RELEASE		0x12
+
+#define CMD_DEBUG_SET_MASK		0x34
+#define CMD_DEBUG_SET_SELECT		0x36
+
+#define CMD_IDU_ENABLE			0x71
+#define CMD_IDU_DISABLE			0x72
+#define CMD_IDU_SET_MODE		0x74
+#define CMD_IDU_SET_DEST		0x76
+#define CMD_IDU_SET_MASK		0x7C
+
+#define IDU_M_TRIG_LEVEL		0x0
+#define IDU_M_TRIG_EDGE			0x1
+
+#define IDU_M_DISTRI_RR			0x0
+#define IDU_M_DISTRI_DEST		0x2
+};
+
+/*
+ * MCIP programming model
+ *
+ * - Simple commands write {cmd:8,param:16} to MCIP_CMD aux reg
+ *   (param could be irq, common_irq, core_id ...)
+ * - More involved commands setup MCIP_WDATA with cmd specific data
+ *   before invoking the simple command
+ */
+static inline void __mcip_cmd(unsigned int cmd, unsigned int param)
+{
+	struct mcip_cmd buf;
+
+	buf.pad = 0;
+	buf.cmd = cmd;
+	buf.param = param;
+
+	WRITE_AUX(ARC_REG_MCIP_CMD, buf);
+}
+
+/*
+ * Setup additional data for a cmd
+ * Callers need to lock to ensure atomicity
+ */
+static inline void __mcip_cmd_data(unsigned int cmd, unsigned int param,
+				   unsigned int data)
+{
+	write_aux_reg(ARC_REG_MCIP_WDATA, data);
+
+	__mcip_cmd(cmd, param);
+}
+
+extern void mcip_init_early_smp(void);
+extern void mcip_init_smp(unsigned int cpu);
+
+#endif
+
+#endif
diff --git a/arch/arc/kernel/Makefile b/arch/arc/kernel/Makefile
index 0be7ba087260..e7f3625a19b5 100644
--- a/arch/arc/kernel/Makefile
+++ b/arch/arc/kernel/Makefile
@@ -15,6 +15,7 @@ obj-$(CONFIG_ISA_ARCV2)			+= entry-arcv2.o intc-arcv2.o
 
 obj-$(CONFIG_MODULES)			+= arcksyms.o module.o
 obj-$(CONFIG_SMP) 			+= smp.o
+obj-$(CONFIG_ARC_MCIP)			+= mcip.o
 obj-$(CONFIG_ARC_DW2_UNWIND)		+= unwind.o
 obj-$(CONFIG_KPROBES)      		+= kprobes.o
 obj-$(CONFIG_ARC_EMUL_UNALIGNED) 	+= unaligned.o
diff --git a/arch/arc/kernel/intc-arcv2.c b/arch/arc/kernel/intc-arcv2.c
index 3876e11d4553..945162c1242d 100644
--- a/arch/arc/kernel/intc-arcv2.c
+++ b/arch/arc/kernel/intc-arcv2.c
@@ -90,7 +90,7 @@ static struct irq_chip arcv2_irq_chip = {
 static int arcv2_irq_map(struct irq_domain *d, unsigned int irq,
 			 irq_hw_number_t hw)
 {
-	if (irq == TIMER0_IRQ)
+	if (irq == TIMER0_IRQ || irq == IPI_IRQ)
 		irq_set_chip_and_handler(irq, &arcv2_irq_chip, handle_percpu_irq);
 	else
 		irq_set_chip_and_handler(irq, &arcv2_irq_chip, handle_level_irq);
diff --git a/arch/arc/kernel/mcip.c b/arch/arc/kernel/mcip.c
new file mode 100644
index 000000000000..e6ad6e64440a
--- /dev/null
+++ b/arch/arc/kernel/mcip.c
@@ -0,0 +1,117 @@
+/*
+ * ARC ARConnect (MultiCore IP) support (formerly known as MCIP)
+ *
+ * Copyright (C) 2013 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/smp.h>
+#include <linux/irq.h>
+#include <linux/spinlock.h>
+#include <asm/mcip.h>
+
+static char smp_cpuinfo_buf[128];
+
+static DEFINE_RAW_SPINLOCK(mcip_lock);
+
+
+/*
+ * Any SMP specific init any CPU does when it comes up.
+ * Here we setup the CPU to enable Inter-Processor-Interrupts
+ * Called for each CPU
+ * -Master      : init_IRQ()
+ * -Other(s)    : start_kernel_secondary()
+ */
+void mcip_init_smp(unsigned int cpu)
+{
+	smp_ipi_irq_setup(cpu, IPI_IRQ);
+}
+
+static void mcip_ipi_send(int cpu)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&mcip_lock, flags);
+	__mcip_cmd(CMD_INTRPT_GENERATE_IRQ, cpu);
+	raw_spin_unlock_irqrestore(&mcip_lock, flags);
+}
+
+static void mcip_ipi_clear(int irq)
+{
+	unsigned int cpu;
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&mcip_lock, flags);
+
+	/* Who sent the IPI */
+	__mcip_cmd(CMD_INTRPT_CHECK_SOURCE, 0);
+
+	cpu = read_aux_reg(ARC_REG_MCIP_READBACK);	/* 1,2,4,8... */
+
+	__mcip_cmd(CMD_INTRPT_GENERATE_ACK, __ffs(cpu)); /* 0,1,2,3... */
+
+	raw_spin_unlock_irqrestore(&mcip_lock, flags);
+}
+
+volatile int wake_flag;
+
+static void mcip_wakeup_cpu(int cpu, unsigned long pc)
+{
+	BUG_ON(cpu == 0);
+	wake_flag = cpu;
+}
+
+void arc_platform_smp_wait_to_boot(int cpu)
+{
+	while (wake_flag != cpu)
+		;
+
+	wake_flag = 0;
+	__asm__ __volatile__("j @first_lines_of_secondary	\n");
+}
+
+struct plat_smp_ops plat_smp_ops = {
+	.info		= smp_cpuinfo_buf,
+	.cpu_kick	= mcip_wakeup_cpu,
+	.ipi_send	= mcip_ipi_send,
+	.ipi_clear	= mcip_ipi_clear,
+};
+
+void mcip_init_early_smp(void)
+{
+#define IS_AVAIL1(var, str)    ((var) ? str : "")
+
+	struct mcip_bcr {
+#ifdef CONFIG_CPU_BIG_ENDIAN
+		unsigned int pad3:8,
+			     idu:1, llm:1, num_cores:6,
+			     iocoh:1,  grtc:1, dbg:1, pad2:1,
+			     msg:1, sem:1, ipi:1, pad:1,
+			     ver:8;
+#else
+		unsigned int ver:8,
+			     pad:1, ipi:1, sem:1, msg:1,
+			     pad2:1, dbg:1, grtc:1, iocoh:1,
+			     num_cores:6, llm:1, idu:1,
+			     pad3:8;
+#endif
+	} mp;
+
+	READ_BCR(ARC_REG_MCIP_BCR, mp);
+
+	sprintf(smp_cpuinfo_buf,
+		"Extn [SMP]\t: ARConnect (v%d): %d cores with %s%s%s%s\n",
+		mp.ver, mp.num_cores,
+		IS_AVAIL1(mp.ipi, "IPI "),
+		IS_AVAIL1(mp.idu, "IDU "),
+		IS_AVAIL1(mp.dbg, "DEBUG "),
+		IS_AVAIL1(mp.grtc, "GRTC"));
+
+	if (mp.dbg) {
+		__mcip_cmd_data(CMD_DEBUG_SET_SELECT, 0, 0xf);
+		__mcip_cmd_data(CMD_DEBUG_SET_MASK, 0xf, 0xf);
+	}
+}
diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index ee2982dda5a6..d07cb53d7641 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -31,7 +31,7 @@ arch_spinlock_t smp_atomic_ops_lock = __ARCH_SPIN_LOCK_UNLOCKED;
 arch_spinlock_t smp_bitops_lock = __ARCH_SPIN_LOCK_UNLOCKED;
 #endif
 
-struct plat_smp_ops  plat_smp_ops;
+struct plat_smp_ops  __weak plat_smp_ops;
 
 /* XXX: per cpu ? Only needed once in early seconday boot */
 struct task_struct *secondary_idle_tsk;
diff --git a/arch/arc/plat-sim/platform.c b/arch/arc/plat-sim/platform.c
index 114fdc30941c..8795ae2ef48a 100644
--- a/arch/arc/plat-sim/platform.c
+++ b/arch/arc/plat-sim/platform.c
@@ -10,6 +10,7 @@
 
 #include <linux/init.h>
 #include <asm/mach_desc.h>
+#include <asm/mcip.h>
 
 /*----------------------- Machine Descriptions ------------------------------
  *
@@ -27,4 +28,8 @@ static const char *simulation_compat[] __initconst = {
 
 MACHINE_START(SIMULATION, "simulation")
 	.dt_compat	= simulation_compat,
+#ifdef CONFIG_ARC_MCIP
+	.init_early	= mcip_init_early_smp,
+	.init_smp	= mcip_init_smp,
+#endif
 MACHINE_END
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 14/28] ARCv2: SMP: ARConnect debug/robustness
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (12 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 13/28] ARC: make plat_smp_ops weak to allow over-rides Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 15/28] ARCv2: SMP: clocksource: Enable Global Real Time counter Vineet Gupta
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel; +Cc: arnd, arc-linux-dev, Vineet Gupta

- Handle possible interrupt coalescing from MCIP
- chk if prev IPI ack before sending new

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/Kconfig       | 15 ++++++++++++---
 arch/arc/kernel/mcip.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
 arch/arc/kernel/smp.c  | 20 ++++++++++++++++----
 3 files changed, 72 insertions(+), 11 deletions(-)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 301525020af7..ef5ca5969eaf 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -448,9 +448,10 @@ menuconfig ARC_DBG
 	bool "ARC debugging"
 	default y
 
+if ARC_DBG
+
 config ARC_DW2_UNWIND
 	bool "Enable DWARF specific kernel stack unwind"
-	depends on ARC_DBG
 	default y
 	select KALLSYMS
 	help
@@ -464,18 +465,26 @@ config ARC_DW2_UNWIND
 
 config ARC_DBG_TLB_PARANOIA
 	bool "Paranoia Checks in Low Level TLB Handlers"
-	depends on ARC_DBG
 	default n
 
 config ARC_DBG_TLB_MISS_COUNT
 	bool "Profile TLB Misses"
 	default n
 	select DEBUG_FS
-	depends on ARC_DBG
 	help
 	  Counts number of I and D TLB Misses and exports them via Debugfs
 	  The counters can be cleared via Debugfs as well
 
+if SMP
+
+config ARC_IPI_DBG
+	bool "Debug Inter Core interrupts"
+	default n
+
+endif
+
+endif
+
 config ARC_UBOOT_SUPPORT
 	bool "Support uboot arg Handling"
 	default n
diff --git a/arch/arc/kernel/mcip.c b/arch/arc/kernel/mcip.c
index e6ad6e64440a..35921c3ab394 100644
--- a/arch/arc/kernel/mcip.c
+++ b/arch/arc/kernel/mcip.c
@@ -33,27 +33,67 @@ void mcip_init_smp(unsigned int cpu)
 static void mcip_ipi_send(int cpu)
 {
 	unsigned long flags;
+	int ipi_was_pending;
+
+	/*
+	 * NOTE: We must spin here if the other cpu hasn't yet
+	 * serviced a previous message. This can burn lots
+	 * of time, but we MUST follows this protocol or
+	 * ipi messages can be lost!!!
+	 * Also, we must release the lock in this loop because
+	 * the other side may get to this same loop and not
+	 * be able to ack -- thus causing deadlock.
+	 */
+
+	do {
+		raw_spin_lock_irqsave(&mcip_lock, flags);
+		__mcip_cmd(CMD_INTRPT_READ_STATUS, cpu);
+		ipi_was_pending = read_aux_reg(ARC_REG_MCIP_READBACK);
+		if (ipi_was_pending == 0)
+			break; /* break out but keep lock */
+		raw_spin_unlock_irqrestore(&mcip_lock, flags);
+	} while (1);
 
-	raw_spin_lock_irqsave(&mcip_lock, flags);
 	__mcip_cmd(CMD_INTRPT_GENERATE_IRQ, cpu);
 	raw_spin_unlock_irqrestore(&mcip_lock, flags);
+
+#ifdef CONFIG_ARC_IPI_DBG
+	if (ipi_was_pending)
+		pr_info("IPI ACK delayed from cpu %d\n", cpu);
+#endif
 }
 
 static void mcip_ipi_clear(int irq)
 {
-	unsigned int cpu;
+	unsigned int cpu, c;
 	unsigned long flags;
+	unsigned int __maybe_unused copy;
 
 	raw_spin_lock_irqsave(&mcip_lock, flags);
 
 	/* Who sent the IPI */
 	__mcip_cmd(CMD_INTRPT_CHECK_SOURCE, 0);
 
-	cpu = read_aux_reg(ARC_REG_MCIP_READBACK);	/* 1,2,4,8... */
+	copy = cpu = read_aux_reg(ARC_REG_MCIP_READBACK);	/* 1,2,4,8... */
 
-	__mcip_cmd(CMD_INTRPT_GENERATE_ACK, __ffs(cpu)); /* 0,1,2,3... */
+	/*
+	 * In rare case, multiple concurrent IPIs sent to same target can
+	 * possibly be coalesced by MCIP into 1 asserted IRQ, so @cpus can be
+	 * "vectored" (multiple bits sets) as opposed to typical single bit
+	 */
+	do {
+		c = __ffs(cpu);			/* 0,1,2,3 */
+		__mcip_cmd(CMD_INTRPT_GENERATE_ACK, c);
+		cpu &= ~(1U << c);
+	} while (cpu);
 
 	raw_spin_unlock_irqrestore(&mcip_lock, flags);
+
+#ifdef CONFIG_ARC_IPI_DBG
+	if (c != __ffs(copy))
+		pr_info("IPIs from %x coalesced to %x\n",
+			copy, raw_smp_processor_id());
+#endif
 }
 
 volatile int wake_flag;
diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index d07cb53d7641..be13d12420ba 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -278,8 +278,10 @@ static void ipi_cpu_stop(void)
 	machine_halt();
 }
 
-static inline void __do_IPI(unsigned long msg)
+static inline int __do_IPI(unsigned long msg)
 {
+	int rc = 0;
+
 	switch (msg) {
 	case IPI_RESCHEDULE:
 		scheduler_ipi();
@@ -294,8 +296,10 @@ static inline void __do_IPI(unsigned long msg)
 		break;
 
 	default:
-		pr_warn("IPI with unexpected msg %ld\n", msg);
+		rc = 1;
 	}
+
+	return rc;
 }
 
 /*
@@ -305,6 +309,7 @@ static inline void __do_IPI(unsigned long msg)
 irqreturn_t do_IPI(int irq, void *dev_id)
 {
 	unsigned long pending;
+	unsigned long __maybe_unused copy;
 
 	pr_debug("IPI [%ld] received on cpu %d\n",
 		 *this_cpu_ptr(&ipi_data), smp_processor_id());
@@ -316,11 +321,18 @@ irqreturn_t do_IPI(int irq, void *dev_id)
 	 * "dequeue" the msg corresponding to this IPI (and possibly other
 	 * piggybacked msg from elided IPIs: see ipi_send_msg_one() above)
 	 */
-	pending = xchg(this_cpu_ptr(&ipi_data), 0);
+	copy = pending = xchg(this_cpu_ptr(&ipi_data), 0);
 
 	do {
 		unsigned long msg = __ffs(pending);
-		__do_IPI(msg);
+		int rc;
+
+		rc = __do_IPI(msg);
+#ifdef CONFIG_ARC_IPI_DBG
+		/* IPI received but no valid @msg */
+		if (rc)
+			pr_info("IPI with bogus msg %ld in %ld\n", msg, copy);
+#endif
 		pending &= ~(1U << msg);
 	} while (pending);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 15/28] ARCv2: SMP: clocksource: Enable Global Real Time counter
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (13 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 14/28] ARCv2: SMP: ARConnect debug/robustness Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 16/28] ARCv2: SMP: intc: IDU 2nd level intc for dynamic IRQ distribution Vineet Gupta
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: arnd, arc-linux-dev, Vineet Gupta, Daniel Lezcano, Thomas Gleixner

Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/Kconfig            |  5 +++++
 arch/arc/include/asm/mcip.h |  3 +++
 arch/arc/kernel/mcip.c      |  3 +++
 arch/arc/kernel/time.c      | 45 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 56 insertions(+)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index ef5ca5969eaf..1b684595e258 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -384,6 +384,11 @@ config ARC_HAS_RTC
 	default n
 	depends on !SMP
 
+config ARC_HAS_GRTC
+	bool "SMP synchronized 64-bit cycle counter"
+	default y
+	depends on SMP
+
 config ARC_NUMBER_OF_INTERRUPTS
 	int "Number of interrupts"
 	range 8 240
diff --git a/arch/arc/include/asm/mcip.h b/arch/arc/include/asm/mcip.h
index 31f9bac77a27..52c11f0bb0e5 100644
--- a/arch/arc/include/asm/mcip.h
+++ b/arch/arc/include/asm/mcip.h
@@ -39,6 +39,9 @@ struct mcip_cmd {
 #define CMD_DEBUG_SET_MASK		0x34
 #define CMD_DEBUG_SET_SELECT		0x36
 
+#define CMD_GRTC_READ_LO		0x42
+#define CMD_GRTC_READ_HI		0x43
+
 #define CMD_IDU_ENABLE			0x71
 #define CMD_IDU_DISABLE			0x72
 #define CMD_IDU_SET_MODE		0x74
diff --git a/arch/arc/kernel/mcip.c b/arch/arc/kernel/mcip.c
index 35921c3ab394..ad7e90b97f6e 100644
--- a/arch/arc/kernel/mcip.c
+++ b/arch/arc/kernel/mcip.c
@@ -154,4 +154,7 @@ void mcip_init_early_smp(void)
 		__mcip_cmd_data(CMD_DEBUG_SET_SELECT, 0, 0xf);
 		__mcip_cmd_data(CMD_DEBUG_SET_MASK, 0xf, 0xf);
 	}
+
+	if (IS_ENABLED(CONFIG_ARC_HAS_GRTC) && !mp.grtc)
+		panic("kernel trying to use non-existent GRTC\n");
 }
diff --git a/arch/arc/kernel/time.c b/arch/arc/kernel/time.c
index da495478a40b..3364d2bbc515 100644
--- a/arch/arc/kernel/time.c
+++ b/arch/arc/kernel/time.c
@@ -45,6 +45,8 @@
 #include <asm/clk.h>
 #include <asm/mach_desc.h>
 
+#include <asm/mcip.h>
+
 /* Timer related Aux registers */
 #define ARC_REG_TIMER0_LIMIT	0x23	/* timer 0 limit */
 #define ARC_REG_TIMER0_CTRL	0x22	/* timer 0 control */
@@ -60,6 +62,48 @@
 
 /********** Clock Source Device *********/
 
+#ifdef CONFIG_ARC_HAS_GRTC
+
+static int arc_counter_setup(void)
+{
+	return 1;
+}
+
+static cycle_t arc_counter_read(struct clocksource *cs)
+{
+	unsigned long flags;
+	union {
+#ifdef CONFIG_CPU_BIG_ENDIAN
+		struct { u32 h, l; };
+#else
+		struct { u32 l, h; };
+#endif
+		cycle_t  full;
+	} stamp;
+
+	local_irq_save(flags);
+
+	__mcip_cmd(CMD_GRTC_READ_LO, 0);
+	stamp.l = read_aux_reg(ARC_REG_MCIP_READBACK);
+
+	__mcip_cmd(CMD_GRTC_READ_HI, 0);
+	stamp.h = read_aux_reg(ARC_REG_MCIP_READBACK);
+
+	local_irq_restore(flags);
+
+	return stamp.full;
+}
+
+static struct clocksource arc_counter = {
+	.name   = "ARConnect GRTC",
+	.rating = 400,
+	.read   = arc_counter_read,
+	.mask   = CLOCKSOURCE_MASK(64),
+	.flags  = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+#else
+
 #ifdef CONFIG_ARC_HAS_RTC
 
 #define AUX_RTC_CTRL	0x103
@@ -135,6 +179,7 @@ static struct clocksource arc_counter = {
 };
 
 #endif
+#endif
 
 /********** Clock Event Device *********/
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 16/28] ARCv2: SMP: intc: IDU 2nd level intc for dynamic IRQ distribution
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (14 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 15/28] ARCv2: SMP: clocksource: Enable Global Real Time counter Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 17/28] ARC: add compiler barrier to LLSC based cmpxchg Vineet Gupta
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: arnd, arc-linux-dev, Vineet Gupta, Jason Cooper, Thomas Gleixner

Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 .../devicetree/bindings/arc/archs-idu-intc.txt     |  46 ++++++
 arch/arc/kernel/mcip.c                             | 183 ++++++++++++++++++++-
 2 files changed, 228 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/arc/archs-idu-intc.txt

diff --git a/Documentation/devicetree/bindings/arc/archs-idu-intc.txt b/Documentation/devicetree/bindings/arc/archs-idu-intc.txt
new file mode 100644
index 000000000000..0dcb7c7d3e40
--- /dev/null
+++ b/Documentation/devicetree/bindings/arc/archs-idu-intc.txt
@@ -0,0 +1,46 @@
+* ARC-HS Interrupt Distribution Unit
+
+  This optional 2nd level interrupt controller can be used in SMP configurations for
+  dynamic IRQ routing, load balancing of common/external IRQs towards core intc.
+
+Properties:
+
+- compatible: "snps,archs-idu-intc"
+- interrupt-controller: This is an interrupt controller.
+- interrupt-parent: <reference to parent core intc>
+- #interrupt-cells: Must be <2>.
+- interrupts: <...> specifies the upstream core irqs
+
+  First cell specifies the "common" IRQ from peripheral to IDU
+  Second cell specifies the irq distribution mode to cores
+     0=Round Robin; 1=cpu0, 2=cpu1, 4=cpu2, 8=cpu3
+
+  intc accessed via the special ARC AUX register interface, hence "reg" property
+  is not specified.
+
+Example:
+	core_intc: core-interrupt-controller {
+		compatible = "snps,archs-intc";
+		interrupt-controller;
+		#interrupt-cells = <1>;
+	};
+
+	idu_intc: idu-interrupt-controller {
+		compatible = "snps,archs-idu-intc";
+		interrupt-controller;
+		interrupt-parent = <&core_intc>;
+
+		/*
+		 * <hwirq  distribution>
+		 * distribution: 0=RR; 1=cpu0, 2=cpu1, 4=cpu2, 8=cpu3
+		 */
+		#interrupt-cells = <2>;
+
+		/* upstream core irqs: downstream these are "COMMON" irq 0,1..  */
+		interrupts = <24 25 26 27 28 29 30 31>;
+	};
+
+	some_device: serial@c0fc1000 {
+		interrupt-parent = <&idu_intc>;
+		interrupts = <0 0>;	/* upstream idu IRQ #24, Round Robin */
+	};
diff --git a/arch/arc/kernel/mcip.c b/arch/arc/kernel/mcip.c
index ad7e90b97f6e..30284e8de6ff 100644
--- a/arch/arc/kernel/mcip.c
+++ b/arch/arc/kernel/mcip.c
@@ -14,10 +14,10 @@
 #include <asm/mcip.h>
 
 static char smp_cpuinfo_buf[128];
+static int idu_detected;
 
 static DEFINE_RAW_SPINLOCK(mcip_lock);
 
-
 /*
  * Any SMP specific init any CPU does when it comes up.
  * Here we setup the CPU to enable Inter-Processor-Interrupts
@@ -150,6 +150,8 @@ void mcip_init_early_smp(void)
 		IS_AVAIL1(mp.dbg, "DEBUG "),
 		IS_AVAIL1(mp.grtc, "GRTC"));
 
+	idu_detected = mp.idu;
+
 	if (mp.dbg) {
 		__mcip_cmd_data(CMD_DEBUG_SET_SELECT, 0, 0xf);
 		__mcip_cmd_data(CMD_DEBUG_SET_MASK, 0xf, 0xf);
@@ -158,3 +160,182 @@ void mcip_init_early_smp(void)
 	if (IS_ENABLED(CONFIG_ARC_HAS_GRTC) && !mp.grtc)
 		panic("kernel trying to use non-existent GRTC\n");
 }
+
+/***************************************************************************
+ * ARCv2 Interrupt Distribution Unit (IDU)
+ *
+ * Connects external "COMMON" IRQs to core intc, providing:
+ *  -dynamic routing (IRQ affinity)
+ *  -load balancing (Round Robin interrupt distribution)
+ *  -1:N distribution
+ *
+ * It physically resides in the MCIP hw block
+ */
+
+#include <linux/irqchip.h>
+#include <linux/of.h>
+#include <linux/of_irq.h>
+#include "../../drivers/irqchip/irqchip.h"
+
+/*
+ * Set the DEST for @cmn_irq to @cpu_mask (1 bit per core)
+ */
+static void idu_set_dest(unsigned int cmn_irq, unsigned int cpu_mask)
+{
+	__mcip_cmd_data(CMD_IDU_SET_DEST, cmn_irq, cpu_mask);
+}
+
+static void idu_set_mode(unsigned int cmn_irq, unsigned int lvl,
+			   unsigned int distr)
+{
+	union {
+		unsigned int word;
+		struct {
+			unsigned int distr:2, pad:2, lvl:1, pad2:27;
+		};
+	} data;
+
+	data.distr = distr;
+	data.lvl = lvl;
+	__mcip_cmd_data(CMD_IDU_SET_MODE, cmn_irq, data.word);
+}
+
+static void idu_irq_mask(struct irq_data *data)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&mcip_lock, flags);
+	__mcip_cmd_data(CMD_IDU_SET_MASK, data->hwirq, 1);
+	raw_spin_unlock_irqrestore(&mcip_lock, flags);
+}
+
+static void idu_irq_unmask(struct irq_data *data)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&mcip_lock, flags);
+	__mcip_cmd_data(CMD_IDU_SET_MASK, data->hwirq, 0);
+	raw_spin_unlock_irqrestore(&mcip_lock, flags);
+}
+
+static int
+idu_irq_set_affinity(struct irq_data *d, const struct cpumask *cpumask, bool f)
+{
+	return IRQ_SET_MASK_OK;
+}
+
+static struct irq_chip idu_irq_chip = {
+	.name			= "MCIP IDU Intc",
+	.irq_mask		= idu_irq_mask,
+	.irq_unmask		= idu_irq_unmask,
+#ifdef CONFIG_SMP
+	.irq_set_affinity       = idu_irq_set_affinity,
+#endif
+
+};
+
+static int idu_first_irq;
+
+static void idu_cascade_isr(unsigned int core_irq, struct irq_desc *desc)
+{
+	struct irq_domain *domain = irq_desc_get_handler_data(desc);
+	unsigned int idu_irq;
+
+	idu_irq = core_irq - idu_first_irq;
+	generic_handle_irq(irq_find_mapping(domain, idu_irq));
+}
+
+static int idu_irq_map(struct irq_domain *d, unsigned int virq, irq_hw_number_t hwirq)
+{
+	irq_set_chip_and_handler(virq, &idu_irq_chip, handle_level_irq);
+	irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
+
+	return 0;
+}
+
+static int idu_irq_xlate(struct irq_domain *d, struct device_node *n,
+			 const u32 *intspec, unsigned int intsize,
+			 irq_hw_number_t *out_hwirq, unsigned int *out_type)
+{
+	irq_hw_number_t hwirq = *out_hwirq = intspec[0];
+	int distri = intspec[1];
+	unsigned long flags;
+
+	*out_type = IRQ_TYPE_NONE;
+
+	/* XXX: validate distribution scheme again online cpu mask */
+	if (distri == 0) {
+		/* 0 - Round Robin to all cpus, otherwise 1 bit per core */
+		raw_spin_lock_irqsave(&mcip_lock, flags);
+		idu_set_dest(hwirq, BIT(num_online_cpus()) - 1);
+		idu_set_mode(hwirq, IDU_M_TRIG_LEVEL, IDU_M_DISTRI_RR);
+		raw_spin_unlock_irqrestore(&mcip_lock, flags);
+	} else {
+		/*
+		 * DEST based distribution for Level Triggered intr can only
+		 * have 1 CPU, so generalize it to always contain 1 cpu
+		 */
+		int cpu = ffs(distri);
+
+		if (cpu != fls(distri))
+			pr_warn("IDU irq %lx distri mode set to cpu %x\n",
+				hwirq, cpu);
+
+		raw_spin_lock_irqsave(&mcip_lock, flags);
+		idu_set_dest(hwirq, cpu);
+		idu_set_mode(hwirq, IDU_M_TRIG_LEVEL, IDU_M_DISTRI_DEST);
+		raw_spin_unlock_irqrestore(&mcip_lock, flags);
+	}
+
+	return 0;
+}
+
+static const struct irq_domain_ops idu_irq_ops = {
+	.xlate	= idu_irq_xlate,
+	.map	= idu_irq_map,
+};
+
+/*
+ * [16, 23]: Statically assigned always private-per-core (Timers, WDT, IPI)
+ * [24, 23+C]: If C > 0 then "C" common IRQs
+ * [24+C, N]: Not statically assigned, private-per-core
+ */
+
+
+static int __init
+idu_of_init(struct device_node *intc, struct device_node *parent)
+{
+	struct irq_domain *domain;
+	/* Read IDU BCR to confirm nr_irqs */
+	int nr_irqs = of_irq_count(intc);
+	int i, irq;
+
+	if (!idu_detected)
+		panic("IDU not detected, but DeviceTree using it");
+
+	pr_info("MCIP: IDU referenced from Devicetree %d irqs\n", nr_irqs);
+
+	domain = irq_domain_add_linear(intc, nr_irqs, &idu_irq_ops, NULL);
+
+	/* Parent interrupts (core-intc) are already mapped */
+
+	for (i = 0; i < nr_irqs; i++) {
+		/*
+		 * Return parent uplink IRQs (towards core intc) 24,25,.....
+		 * this step has been done before already
+		 * however we need it to get the parent virq and set IDU handler
+		 * as first level isr
+		 */
+		irq = irq_of_parse_and_map(intc, i);
+		if (!i)
+			idu_first_irq = irq;
+
+		irq_set_handler_data(irq, domain);
+		irq_set_chained_handler(irq, idu_cascade_isr);
+	}
+
+	__mcip_cmd(CMD_IDU_ENABLE, 0);
+
+	return 0;
+}
+IRQCHIP_DECLARE(arcv2_idu_intc, "snps,archs-idu-intc", idu_of_init);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 17/28] ARC: add compiler barrier to LLSC based cmpxchg
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (15 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 16/28] ARCv2: SMP: intc: IDU 2nd level intc for dynamic IRQ distribution Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 12:23   ` Peter Zijlstra
  2015-06-09 11:48 ` [PATCH 18/28] ARC: add smp barriers around atomics per memory-barrriers.txt Vineet Gupta
                   ` (10 subsequent siblings)
  27 siblings, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: arnd, arc-linux-dev, Vineet Gupta, Peter Zijlstra (Intel)

When auditing cmpxchg call sites, Chuck noted that gcc was optimizing
away some of the desired LDs.

|	do {
|		new = old = *ipi_data_ptr;
|		new |= 1U << msg;
|	} while (cmpxchg(ipi_data_ptr, old, new) != old);

was generating to below

| 8015cef8:	ld         r2,[r4,0]  <-- First LD
| 8015cefc:	bset       r1,r2,r1
|
| 8015cf00:	llock      r3,[r4]  <-- atomic op
| 8015cf04:	brne       r3,r2,8015cf10
| 8015cf08:	scond      r1,[r4]
| 8015cf0c:	bnz        8015cf00
|
| 8015cf10:	brne       r3,r2,8015cf00  <-- Branch doesn't go to orig LD

Although this was fixed by adding a ACCESS_ONCE in this call site, it
seems safer (for now at least) to add compiler barrier to LLSC based
cmpxchg

Reported-by: Chuck Jordan <cjordan@synopsys,com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/cmpxchg.h | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index 03cd6894855d..90de5c528da2 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -25,10 +25,11 @@ __cmpxchg(volatile void *ptr, unsigned long expected, unsigned long new)
 	"	scond   %3, [%1]	\n"
 	"	bnz     1b		\n"
 	"2:				\n"
-	: "=&r"(prev)
-	: "r"(ptr), "ir"(expected),
-	  "r"(new) /* can't be "ir". scond can't take limm for "b" */
-	: "cc");
+	: "=&r"(prev)	/* Early clobber, to prevent reg reuse */
+	: "r"(ptr),	/* Not "m": llock only supports reg direct addr mode */
+	  "ir"(expected),
+	  "r"(new)	/* can't be "ir". scond can't take LIMM for "b" */
+	: "cc", "memory"); /* so that gcc knows memory is being written here */
 
 	return prev;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 18/28] ARC: add smp barriers around atomics per memory-barrriers.txt
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (16 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 17/28] ARC: add compiler barrier to LLSC based cmpxchg Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 12:30   ` Peter Zijlstra
  2015-06-12 12:15   ` [PATCH v2] ARC: add smp barriers around atomics per Documentation/atomic_ops.txt Vineet Gupta
  2015-06-09 11:48 ` [PATCH 19/28] arch: conditionally define smp_{mb,rmb,wmb} Vineet Gupta
                   ` (9 subsequent siblings)
  27 siblings, 2 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: arnd, arc-linux-dev, Vineet Gupta, Paul E. McKenney,
	Peter Zijlstra (Intel)

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/atomic.h   | 10 ++++++++++
 arch/arc/include/asm/bitops.h   | 12 ++++++++++++
 arch/arc/include/asm/cmpxchg.h  | 10 ++++++++++
 arch/arc/include/asm/spinlock.h | 10 ++++++++++
 4 files changed, 42 insertions(+)

diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 9917a45fc430..6fc968f78500 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -43,6 +43,8 @@ static inline int atomic_##op##_return(int i, atomic_t *v)		\
 {									\
 	unsigned int temp;						\
 									\
+	smp_mb();							\
+									\
 	__asm__ __volatile__(						\
 	"1:	llock   %0, [%1]	\n"				\
 	"	" #asm_op " %0, %0, %2	\n"				\
@@ -52,6 +54,8 @@ static inline int atomic_##op##_return(int i, atomic_t *v)		\
 	: "r"(&v->counter), "ir"(i)					\
 	: "cc");							\
 									\
+	smp_mb();							\
+									\
 	return temp;							\
 }
 
@@ -142,9 +146,15 @@ ATOMIC_OP(and, &=, and)
 #define __atomic_add_unless(v, a, u)					\
 ({									\
 	int c, old;							\
+									\
+	smp_mb();							\
+									\
 	c = atomic_read(v);						\
 	while (c != (u) && (old = atomic_cmpxchg((v), c, c + (a))) != c)\
 		c = old;						\
+									\
+	smp_mb();							\
+									\
 	c;								\
 })
 
diff --git a/arch/arc/include/asm/bitops.h b/arch/arc/include/asm/bitops.h
index 829a8a2e9704..47878d85e3a3 100644
--- a/arch/arc/include/asm/bitops.h
+++ b/arch/arc/include/asm/bitops.h
@@ -117,6 +117,8 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *m)
 	if (__builtin_constant_p(nr))
 		nr &= 0x1f;
 
+	smp_mb();
+
 	__asm__ __volatile__(
 	"1:	llock   %0, [%2]	\n"
 	"	bset    %1, %0, %3	\n"
@@ -126,6 +128,8 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *m)
 	: "r"(m), "ir"(nr)
 	: "cc");
 
+	smp_mb();
+
 	return (old & (1 << nr)) != 0;
 }
 
@@ -139,6 +143,8 @@ test_and_clear_bit(unsigned long nr, volatile unsigned long *m)
 	if (__builtin_constant_p(nr))
 		nr &= 0x1f;
 
+	smp_mb();
+
 	__asm__ __volatile__(
 	"1:	llock   %0, [%2]	\n"
 	"	bclr    %1, %0, %3	\n"
@@ -148,6 +154,8 @@ test_and_clear_bit(unsigned long nr, volatile unsigned long *m)
 	: "r"(m), "ir"(nr)
 	: "cc");
 
+	smp_mb();
+
 	return (old & (1 << nr)) != 0;
 }
 
@@ -161,6 +169,8 @@ test_and_change_bit(unsigned long nr, volatile unsigned long *m)
 	if (__builtin_constant_p(nr))
 		nr &= 0x1f;
 
+	smp_mb();
+
 	__asm__ __volatile__(
 	"1:	llock   %0, [%2]	\n"
 	"	bxor    %1, %0, %3	\n"
@@ -170,6 +180,8 @@ test_and_change_bit(unsigned long nr, volatile unsigned long *m)
 	: "r"(m), "ir"(nr)
 	: "cc");
 
+	smp_mb();
+
 	return (old & (1 << nr)) != 0;
 }
 
diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index 90de5c528da2..96a3dd8fe4bf 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -10,6 +10,8 @@
 #define __ASM_ARC_CMPXCHG_H
 
 #include <linux/types.h>
+
+#include <asm/barrier.h>
 #include <asm/smp.h>
 
 #ifdef CONFIG_ARC_HAS_LLSC
@@ -19,6 +21,8 @@ __cmpxchg(volatile void *ptr, unsigned long expected, unsigned long new)
 {
 	unsigned long prev;
 
+	smp_mb();
+
 	__asm__ __volatile__(
 	"1:	llock   %0, [%1]	\n"
 	"	brne    %0, %2, 2f	\n"
@@ -31,6 +35,8 @@ __cmpxchg(volatile void *ptr, unsigned long expected, unsigned long new)
 	  "r"(new)	/* can't be "ir". scond can't take LIMM for "b" */
 	: "cc", "memory"); /* so that gcc knows memory is being written here */
 
+	smp_mb();
+
 	return prev;
 }
 
@@ -78,12 +84,16 @@ static inline unsigned long __xchg(unsigned long val, volatile void *ptr,
 
 	switch (size) {
 	case 4:
+		smp_mb();
+
 		__asm__ __volatile__(
 		"	ex  %0, [%1]	\n"
 		: "+r"(val)
 		: "r"(ptr)
 		: "memory");
 
+		smp_mb();
+
 		return val;
 	}
 	return __xchg_bad_pointer();
diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index b6a8c2dfbe6e..8af8eaad4999 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -22,24 +22,32 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	unsigned int tmp = __ARCH_SPIN_LOCK_LOCKED__;
 
+	smp_mb();
+
 	__asm__ __volatile__(
 	"1:	ex  %0, [%1]		\n"
 	"	breq  %0, %2, 1b	\n"
 	: "+&r" (tmp)
 	: "r"(&(lock->slock)), "ir"(__ARCH_SPIN_LOCK_LOCKED__)
 	: "memory");
+
+	smp_mb();
 }
 
 static inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned int tmp = __ARCH_SPIN_LOCK_LOCKED__;
 
+	smp_mb();
+
 	__asm__ __volatile__(
 	"1:	ex  %0, [%1]		\n"
 	: "+r" (tmp)
 	: "r"(&(lock->slock))
 	: "memory");
 
+	smp_mb();
+
 	return (tmp == __ARCH_SPIN_LOCK_UNLOCKED__);
 }
 
@@ -47,6 +55,8 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
 	unsigned int tmp = __ARCH_SPIN_LOCK_UNLOCKED__;
 
+	smp_mb();
+
 	__asm__ __volatile__(
 	"	ex  %0, [%1]		\n"
 	: "+r" (tmp)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 19/28] arch: conditionally define smp_{mb,rmb,wmb}
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (17 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 18/28] ARC: add smp barriers around atomics per memory-barrriers.txt Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 12:32   ` Peter Zijlstra
  2015-06-09 11:48 ` [PATCH 20/28] ARCv2: barriers Vineet Gupta
                   ` (8 subsequent siblings)
  27 siblings, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: arnd, arc-linux-dev, Vineet Gupta, Peter Zijlstra (Intel)

That way arches can define the minimal versions and still #include
asm-generic for defaults (vs. defining defaults in arch code)

See new barrier.h in arc for usage !

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 include/asm-generic/barrier.h | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index f5c40b0fadc2..270b7c989ea1 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -55,17 +55,42 @@
 #endif
 
 #ifdef CONFIG_SMP
+
+#ifndef smp_mb
 #define smp_mb()	mb()
+#endif
+
+#ifndef smp_rmb
 #define smp_rmb()	rmb()
+#endif
+
+#ifndef smp_wmb
 #define smp_wmb()	wmb()
+#endif
+
+#ifndef smp_read_barrier_depends
 #define smp_read_barrier_depends()	read_barrier_depends()
+#endif
+
 #else
+#ifndef smp_mb
 #define smp_mb()	barrier()
+#endif
+
+#ifndef smp_rmb
 #define smp_rmb()	barrier()
+#endif
+
+#ifndef smp_wmb
 #define smp_wmb()	barrier()
+#endif
+
+#ifndef smp_read_barrier_depends
 #define smp_read_barrier_depends()	do { } while (0)
 #endif
 
+#endif
+
 #ifndef set_mb
 #define set_mb(var, value)  do { (var) = (value); mb(); } while (0)
 #endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 20/28] ARCv2: barriers
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (18 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 19/28] arch: conditionally define smp_{mb,rmb,wmb} Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 12:40   ` Peter Zijlstra
  2015-06-09 11:48 ` [PATCH 21/28] ARC: Reduce bitops lines of code using macros Vineet Gupta
                   ` (7 subsequent siblings)
  27 siblings, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: arnd, arc-linux-dev, Vineet Gupta, Peter Zijlstra (Intel)

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/Kbuild    |  1 -
 arch/arc/include/asm/barrier.h | 48 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 arch/arc/include/asm/barrier.h

diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index be0c39e76f7c..59e2dd1d434f 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -1,5 +1,4 @@
 generic-y += auxvec.h
-generic-y += barrier.h
 generic-y += bitsperlong.h
 generic-y += bugs.h
 generic-y += clkdev.h
diff --git a/arch/arc/include/asm/barrier.h b/arch/arc/include/asm/barrier.h
new file mode 100644
index 000000000000..c6ea2f6af55e
--- /dev/null
+++ b/arch/arc/include/asm/barrier.h
@@ -0,0 +1,48 @@
+/*
+ * Copyright (C) 2014-15 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __ASM_BARRIER_H
+#define __ASM_BARRIER_H
+
+#ifdef CONFIG_SMP
+
+#ifdef CONFIG_ISA_ARCV2
+
+/*
+ * DMB:
+ *   - Ensures that selected memory operation issued before it will complete
+ *     before any subsequent memory operation of same type
+ */
+#define smp_mb()	asm volatile("dmb 3\n" : : : "memory")
+#define smp_rmb()	asm volatile("dmb 1\n" : : : "memory")
+#define smp_wmb()	asm volatile("dmb 2\n" : : : "memory")
+
+/*
+ * DSYNC:
+ *   - Waits for completion of all outstanding memory operations before any new
+ *     operations can begin
+ *   - Includes implicit memory operations such as cache/TLB/BPU maintenance ops
+ *   - Lighter version of SYNC as it doesn't wait for non-memory operations
+ */
+#define mb()		asm volatile("dsync\n" : : : "memory")
+
+#else	/* CONFIG_ISA_ARCOMPACT */
+
+/* SYNC:
+ *   - Waits for completion of all outstanding memory transactions AND all
+ *     previous instructions to reture
+ */
+#define mb()		asm volatile("sync\n" : : : "memory")
+
+#endif	/* CONFIG_ISA_ARCV2 */
+
+#endif	/* CONFIG_SMP */
+
+#include <asm-generic/barrier.h>
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 21/28] ARC: Reduce bitops lines of code using macros
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (19 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 20/28] ARCv2: barriers Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-12 12:20   ` [PATCH v2] " Vineet Gupta
  2015-06-09 11:48 ` [PATCH 22/28] ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock Vineet Gupta
                   ` (6 subsequent siblings)
  27 siblings, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: arnd, arc-linux-dev, Vineet Gupta, Peter Zijlstra (Intel)

No semantical changes !

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/bitops.h | 463 +++++++++++++-----------------------------
 1 file changed, 137 insertions(+), 326 deletions(-)

diff --git a/arch/arc/include/asm/bitops.h b/arch/arc/include/asm/bitops.h
index 47878d85e3a3..873df7be5dab 100644
--- a/arch/arc/include/asm/bitops.h
+++ b/arch/arc/include/asm/bitops.h
@@ -18,83 +18,50 @@
 #include <linux/types.h>
 #include <linux/compiler.h>
 #include <asm/barrier.h>
+#ifndef CONFIG_ARC_HAS_LLSC
+#include <asm/smp.h>
+#endif
 
-/*
- * Hardware assisted read-modify-write using ARC700 LLOCK/SCOND insns.
- * The Kconfig glue ensures that in SMP, this is only set if the container
- * SoC/platform has cross-core coherent LLOCK/SCOND
- */
 #if defined(CONFIG_ARC_HAS_LLSC)
 
-static inline void set_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned int temp;
-
-	m += nr >> 5;
-
-	/*
-	 * ARC ISA micro-optimization:
-	 *
-	 * Instructions dealing with bitpos only consider lower 5 bits (0-31)
-	 * e.g (x << 33) is handled like (x << 1) by ASL instruction
-	 *  (mem pointer still needs adjustment to point to next word)
-	 *
-	 * Hence the masking to clamp @nr arg can be elided in general.
-	 *
-	 * However if @nr is a constant (above assumed it in a register),
-	 * and greater than 31, gcc can optimize away (x << 33) to 0,
-	 * as overflow, given the 32-bit ISA. Thus masking needs to be done
-	 * for constant @nr, but no code is generated due to const prop.
-	 */
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	__asm__ __volatile__(
-	"1:	llock   %0, [%1]	\n"
-	"	bset    %0, %0, %2	\n"
-	"	scond   %0, [%1]	\n"
-	"	bnz     1b	\n"
-	: "=&r"(temp)
-	: "r"(m), "ir"(nr)
-	: "cc");
-}
-
-static inline void clear_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned int temp;
-
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	__asm__ __volatile__(
-	"1:	llock   %0, [%1]	\n"
-	"	bclr    %0, %0, %2	\n"
-	"	scond   %0, [%1]	\n"
-	"	bnz     1b	\n"
-	: "=&r"(temp)
-	: "r"(m), "ir"(nr)
-	: "cc");
-}
-
-static inline void change_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned int temp;
-
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
+/*
+ * Hardware assisted Atomic-R-M-W
+ */
 
-	__asm__ __volatile__(
-	"1:	llock   %0, [%1]	\n"
-	"	bxor    %0, %0, %2	\n"
-	"	scond   %0, [%1]	\n"
-	"	bnz     1b		\n"
-	: "=&r"(temp)
-	: "r"(m), "ir"(nr)
-	: "cc");
+#define BIT_OP(op, c_op, asm_op)					\
+static inline void op##_bit(unsigned long nr, volatile unsigned long *m)\
+{									\
+	unsigned int temp;						\
+									\
+	m += nr >> 5;							\
+									\
+	/*								\
+	 * ARC ISA micro-optimization:					\
+	 *								\
+	 * Instructions dealing with bitpos only consider lower 5 bits	\
+	 * e.g (x << 33) is handled like (x << 1) by ASL instruction	\
+	 *  (mem pointer still needs adjustment to point to next word)	\
+	 *								\
+	 * Hence the masking to clamp @nr arg can be elided in general.	\
+	 *								\
+	 * However if @nr is a constant (above assumed in a register),	\
+	 * and greater than 31, gcc can optimize away (x << 33) to 0,	\
+	 * as overflow, given the 32-bit ISA. Thus masking needs to be	\
+	 * done for const @nr, but no code is generated due to gcc	\
+	 * const prop.							\
+	 */								\
+	if (__builtin_constant_p(nr))					\
+		nr &= 0x1f;						\
+									\
+	__asm__ __volatile__(						\
+	"1:	llock       %0, [%1]		\n"			\
+	"	" #asm_op " %0, %0, %2	\n"				\
+	"	scond       %0, [%1]		\n"			\
+	"	bnz         1b			\n"			\
+	: "=&r"(temp)	/* Early clobber, to prevent reg reuse */	\
+	: "r"(m),	/* Not "m": llock only supports reg direct addr mode */	\
+	  "ir"(nr)							\
+	: "cc");							\
 }
 
 /*
@@ -108,87 +75,34 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *m)
  * Since ARC lacks a equivalent h/w primitive, the bit is set unconditionally
  * and the old value of bit is returned
  */
-static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long old, temp;
-
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	smp_mb();
-
-	__asm__ __volatile__(
-	"1:	llock   %0, [%2]	\n"
-	"	bset    %1, %0, %3	\n"
-	"	scond   %1, [%2]	\n"
-	"	bnz     1b		\n"
-	: "=&r"(old), "=&r"(temp)
-	: "r"(m), "ir"(nr)
-	: "cc");
-
-	smp_mb();
-
-	return (old & (1 << nr)) != 0;
-}
-
-static inline int
-test_and_clear_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned int old, temp;
-
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	smp_mb();
-
-	__asm__ __volatile__(
-	"1:	llock   %0, [%2]	\n"
-	"	bclr    %1, %0, %3	\n"
-	"	scond   %1, [%2]	\n"
-	"	bnz     1b		\n"
-	: "=&r"(old), "=&r"(temp)
-	: "r"(m), "ir"(nr)
-	: "cc");
-
-	smp_mb();
-
-	return (old & (1 << nr)) != 0;
-}
-
-static inline int
-test_and_change_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned int old, temp;
-
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	smp_mb();
-
-	__asm__ __volatile__(
-	"1:	llock   %0, [%2]	\n"
-	"	bxor    %1, %0, %3	\n"
-	"	scond   %1, [%2]	\n"
-	"	bnz     1b		\n"
-	: "=&r"(old), "=&r"(temp)
-	: "r"(m), "ir"(nr)
-	: "cc");
-
-	smp_mb();
-
-	return (old & (1 << nr)) != 0;
+#define TEST_N_BIT_OP(op, c_op, asm_op)					\
+static inline int test_and_##op##_bit(unsigned long nr, volatile unsigned long *m)\
+{									\
+	unsigned long old, temp;					\
+									\
+	m += nr >> 5;							\
+									\
+	if (__builtin_constant_p(nr))					\
+		nr &= 0x1f;						\
+									\
+	smp_mb();							\
+									\
+	__asm__ __volatile__(						\
+	"1:	llock       %0, [%2]	\n"				\
+	"	" #asm_op " %1, %0, %3	\n"				\
+	"	scond       %1, [%2]	\n"				\
+	"	bnz         1b		\n"				\
+	: "=&r"(old), "=&r"(temp)					\
+	: "r"(m), "ir"(nr)						\
+	: "cc");							\
+									\
+	smp_mb();							\
+									\
+	return (old & (1 << nr)) != 0;					\
 }
 
 #else	/* !CONFIG_ARC_HAS_LLSC */
 
-#include <asm/smp.h>
-
 /*
  * Non hardware assisted Atomic-R-M-W
  * Locking would change to irq-disabling only (UP) and spinlocks (SMP)
@@ -205,108 +119,40 @@ test_and_change_bit(unsigned long nr, volatile unsigned long *m)
  *             at compile time)
  */
 
-static inline void set_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long temp, flags;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	bitops_lock(flags);
-
-	temp = *m;
-	*m = temp | (1UL << nr);
-
-	bitops_unlock(flags);
-}
-
-static inline void clear_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long temp, flags;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	bitops_lock(flags);
-
-	temp = *m;
-	*m = temp & ~(1UL << nr);
-
-	bitops_unlock(flags);
-}
-
-static inline void change_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long temp, flags;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	bitops_lock(flags);
-
-	temp = *m;
-	*m = temp ^ (1UL << nr);
-
-	bitops_unlock(flags);
+#define BIT_OP(op, c_op, asm_op)					\
+static inline void op##_bit(unsigned long nr, volatile unsigned long *m)\
+{									\
+	unsigned long temp, flags;					\
+	m += nr >> 5;							\
+									\
+	if (__builtin_constant_p(nr))					\
+		nr &= 0x1f;						\
+									\
+	bitops_lock(flags);						\
+									\
+	temp = *m;							\
+	*m = temp c_op (1UL << nr);					\
+									\
+	bitops_unlock(flags);						\
 }
 
-static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long old, flags;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	bitops_lock(flags);
-
-	old = *m;
-	*m = old | (1 << nr);
-
-	bitops_unlock(flags);
-
-	return (old & (1 << nr)) != 0;
-}
-
-static inline int
-test_and_clear_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long old, flags;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	bitops_lock(flags);
-
-	old = *m;
-	*m = old & ~(1 << nr);
-
-	bitops_unlock(flags);
-
-	return (old & (1 << nr)) != 0;
-}
-
-static inline int
-test_and_change_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long old, flags;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	bitops_lock(flags);
-
-	old = *m;
-	*m = old ^ (1 << nr);
-
-	bitops_unlock(flags);
-
-	return (old & (1 << nr)) != 0;
+#define TEST_N_BIT_OP(op, c_op, asm_op)					\
+static inline int test_and_##op##_bit(unsigned long nr, volatile unsigned long *m)\
+{									\
+	unsigned long old, flags;					\
+	m += nr >> 5;							\
+									\
+	if (__builtin_constant_p(nr))					\
+		nr &= 0x1f;						\
+									\
+	bitops_lock(flags);						\
+									\
+	old = *m;							\
+	*m = old c_op (1 << nr);					\
+									\
+	bitops_unlock(flags);						\
+									\
+	return (old & (1 << nr)) != 0;					\
 }
 
 #endif /* CONFIG_ARC_HAS_LLSC */
@@ -315,86 +161,51 @@ test_and_change_bit(unsigned long nr, volatile unsigned long *m)
  * Non atomic variants
  **************************************/
 
-static inline void __set_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long temp;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	temp = *m;
-	*m = temp | (1UL << nr);
-}
-
-static inline void __clear_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long temp;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	temp = *m;
-	*m = temp & ~(1UL << nr);
-}
-
-static inline void __change_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long temp;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	temp = *m;
-	*m = temp ^ (1UL << nr);
-}
-
-static inline int
-__test_and_set_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long old;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	old = *m;
-	*m = old | (1 << nr);
-
-	return (old & (1 << nr)) != 0;
+#define __BIT_OP(op, c_op, asm_op)					\
+static inline void __##op##_bit(unsigned long nr, volatile unsigned long *m)	\
+{									\
+	unsigned long temp;						\
+	m += nr >> 5;							\
+									\
+	if (__builtin_constant_p(nr))					\
+		nr &= 0x1f;						\
+									\
+	temp = *m;							\
+	*m = temp c_op (1UL << nr);					\
 }
 
-static inline int
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long old;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	old = *m;
-	*m = old & ~(1 << nr);
-
-	return (old & (1 << nr)) != 0;
+#define __TEST_N_BIT_OP(op, c_op, asm_op)				\
+static inline int __test_and_##op##_bit(unsigned long nr, volatile unsigned long *m)\
+{									\
+	unsigned long old;						\
+	m += nr >> 5;							\
+									\
+	if (__builtin_constant_p(nr))					\
+		nr &= 0x1f;						\
+									\
+	old = *m;							\
+	*m = old c_op (1 << nr);					\
+									\
+	return (old & (1 << nr)) != 0;					\
 }
 
-static inline int
-__test_and_change_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long old;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	old = *m;
-	*m = old ^ (1 << nr);
-
-	return (old & (1 << nr)) != 0;
-}
+#define BIT_OPS(op, c_op, asm_op)					\
+									\
+	/* set_bit(), clear_bit(), change_bit() */			\
+	BIT_OP(op, c_op, asm_op)					\
+									\
+	/* test_and_set_bit(), test_and_clear_bit(), test_and_change_bit() */\
+	TEST_N_BIT_OP(op, c_op, asm_op)					\
+									\
+	/* __set_bit(), __clear_bit(), __change_bit() */		\
+	__BIT_OP(op, c_op, asm_op)					\
+									\
+	/* __test_and_set_bit(), __test_and_clear_bit(), __test_and_change_bit() */\
+	__TEST_N_BIT_OP(op, c_op, asm_op)
+
+BIT_OPS(set, |, bset)
+BIT_OPS(clear, & ~, bclr)
+BIT_OPS(change, ^, bxor)
 
 /*
  * This routine doesn't need to be atomic.
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 22/28] ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (20 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 21/28] ARC: Reduce bitops lines of code using macros Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 12:35   ` Peter Zijlstra
  2015-06-09 11:48 ` [PATCH 23/28] ARCv2: SLC: Handle explcit flush for DMA ops (w/o IO-coherency) Vineet Gupta
                   ` (5 subsequent siblings)
  27 siblings, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: arnd, arc-linux-dev, Vineet Gupta, Peter Zijlstra (Intel)

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/atomic.h | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 6fc968f78500..abaf222665e6 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -23,13 +23,21 @@
 
 #define atomic_set(v, i) (((v)->counter) = (i))
 
+#ifdef CONFIG_ISA_ARCV2
+#define PREFETCHW	"	prefetchw   [%1]	\n"
+#else
+#define PREFETCHW
+#endif
+
 #define ATOMIC_OP(op, c_op, asm_op)					\
 static inline void atomic_##op(int i, atomic_t *v)			\
 {									\
 	unsigned int temp;						\
 									\
 	__asm__ __volatile__(						\
-	"1:	llock   %0, [%1]	\n"				\
+	"1:				\n"				\
+	PREFETCHW							\
+	"	llock   %0, [%1]	\n"				\
 	"	" #asm_op " %0, %0, %2	\n"				\
 	"	scond   %0, [%1]	\n"				\
 	"	bnz     1b		\n"				\
@@ -46,7 +54,9 @@ static inline int atomic_##op##_return(int i, atomic_t *v)		\
 	smp_mb();							\
 									\
 	__asm__ __volatile__(						\
-	"1:	llock   %0, [%1]	\n"				\
+	"1:				\n"				\
+	PREFETCHW							\
+	"	llock   %0, [%1]	\n"				\
 	"	" #asm_op " %0, %0, %2	\n"				\
 	"	scond   %0, [%1]	\n"				\
 	"	bnz     1b		\n"				\
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 23/28] ARCv2: SLC: Handle explcit flush for DMA ops (w/o IO-coherency)
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (21 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 22/28] ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 24/28] ARCv2: All bits in place, allow ARCv2 builds Vineet Gupta
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel; +Cc: arnd, arc-linux-dev, Vineet Gupta

L2 cache on ARCHS processors is called SLC (System Level Cache)
For working DMA (in absence of hardware assisted IO Coherency) we need
to manage SLC explicitly when buffers transition between cpu and
controllers.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/cache.h | 11 ++++++++
 arch/arc/mm/cache.c          | 64 ++++++++++++++++++++++++++++++++++++++++++--
 arch/arc/mm/dma.c            | 12 +++++++++
 3 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/arch/arc/include/asm/cache.h b/arch/arc/include/asm/cache.h
index d21c76d6b054..d67345d3e2d4 100644
--- a/arch/arc/include/asm/cache.h
+++ b/arch/arc/include/asm/cache.h
@@ -82,5 +82,16 @@ extern void read_decode_cache_bcr(void);
 
 /*System-level cache (L2 cache) related Auxiliary registers */
 #define ARC_REG_SLC_CFG		0x901
+#define ARC_REG_SLC_CTRL	0x903
+#define ARC_REG_SLC_FLUSH	0x904
+#define ARC_REG_SLC_INVALIDATE	0x905
+#define ARC_REG_SLC_RGN_START	0x914
+#define ARC_REG_SLC_RGN_END	0x916
+
+/* Bit val in SLC_CONTROL */
+#define SLC_CTRL_IM		0x040
+#define SLC_CTRL_DISABLE	0x001
+#define SLC_CTRL_BUSY		0x100
+#define SLC_CTRL_RGN_OP_INV	0x200
 
 #endif /* _ASM_CACHE_H */
diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
index 0eaaee60fd0b..b29d62ed4f7e 100644
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -21,6 +21,8 @@
 #include <asm/cachectl.h>
 #include <asm/setup.h>
 
+static int l2_line_sz;
+
 void (*_cache_line_loop_ic_fn)(unsigned long paddr, unsigned long vaddr,
 			       unsigned long sz, const int cacheop);
 
@@ -120,13 +122,16 @@ dc_chk:
 	p_dc->ver = dbcr.ver;
 
 slc_chk:
+	if (!is_isa_arcv2())
+		return;
+
 	p_slc = &cpuinfo_arc700[cpu].slc;
 	READ_BCR(ARC_REG_SLC_BCR, sbcr);
 	if (sbcr.ver) {
 		READ_BCR(ARC_REG_SLC_CFG, slc_cfg);
 		p_slc->ver = sbcr.ver;
 		p_slc->sz_k = 128 << slc_cfg.sz;
-		p_slc->line_len = (slc_cfg.lsz == 0) ? 128 : 64;
+		l2_line_sz = p_slc->line_len = (slc_cfg.lsz == 0) ? 128 : 64;
 	}
 }
 
@@ -460,6 +465,53 @@ static void __ic_line_inv_vaddr(unsigned long paddr, unsigned long vaddr,
 
 #endif /* CONFIG_ARC_HAS_ICACHE */
 
+noinline void slc_op(unsigned long paddr, unsigned long sz, const int op)
+{
+#ifdef CONFIG_ISA_ARCV2
+	unsigned long flags;
+	unsigned int ctrl;
+
+	local_irq_save(flags);
+
+	/*
+	 * The Region Flush operation is specified by CTRL.RGN_OP[11..9]
+	 *  - b'000 (default) is Flush,
+	 *  - b'001 is Invalidate if CTRL.IM == 0
+	 *  - b'001 is Flush-n-Invalidate if CTRL.IM == 1
+	 */
+	ctrl = read_aux_reg(ARC_REG_SLC_CTRL);
+
+	/* Don't rely on default value of IM bit */
+	if (!(op & OP_FLUSH))		/* i.e. OP_INV */
+		ctrl &= ~SLC_CTRL_IM;	/* clear IM: Disable flush before Inv */
+	else
+		ctrl |= SLC_CTRL_IM;
+
+	if (op & OP_INV)
+		ctrl |= SLC_CTRL_RGN_OP_INV;	/* Inv or flush-n-inv */
+	else
+		ctrl &= ~SLC_CTRL_RGN_OP_INV;
+
+	write_aux_reg(ARC_REG_SLC_CTRL, ctrl);
+
+	/*
+	 * Lower bits are ignored, no need to clip
+	 * END needs to be setup before START (latter triggers the operation)
+	 * END can't be same as START, so add (l2_line_sz - 1) to sz
+	 */
+	write_aux_reg(ARC_REG_SLC_RGN_END, (paddr + sz + l2_line_sz - 1));
+	write_aux_reg(ARC_REG_SLC_RGN_START, paddr);
+
+	while (read_aux_reg(ARC_REG_SLC_CTRL) & SLC_CTRL_BUSY);
+
+	local_irq_restore(flags);
+#endif
+}
+
+static inline int need_slc_flush(void)
+{
+	return is_isa_arcv2() && l2_line_sz;
+}
 
 /***********************************************************
  * Exported APIs
@@ -509,22 +561,30 @@ void flush_dcache_page(struct page *page)
 }
 EXPORT_SYMBOL(flush_dcache_page);
 
-
 void dma_cache_wback_inv(unsigned long start, unsigned long sz)
 {
 	__dc_line_op_k(start, sz, OP_FLUSH_N_INV);
+
+	if (need_slc_flush())
+		slc_op(start, sz, OP_FLUSH_N_INV);
 }
 EXPORT_SYMBOL(dma_cache_wback_inv);
 
 void dma_cache_inv(unsigned long start, unsigned long sz)
 {
 	__dc_line_op_k(start, sz, OP_INV);
+
+	if (need_slc_flush())
+		slc_op(start, sz, OP_INV);
 }
 EXPORT_SYMBOL(dma_cache_inv);
 
 void dma_cache_wback(unsigned long start, unsigned long sz)
 {
 	__dc_line_op_k(start, sz, OP_FLUSH);
+
+	if (need_slc_flush())
+		slc_op(start, sz, OP_FLUSH);
 }
 EXPORT_SYMBOL(dma_cache_wback);
 
diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index 2cfe81dca92a..74a637a1cfc4 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -66,6 +66,18 @@ void *dma_alloc_coherent(struct device *dev, size_t size,
 	/* This is bus address, platform dependent */
 	*dma_handle = (dma_addr_t)paddr;
 
+	/*
+	 * Evict any existing L1 and/or L2 lines for the backing page
+	 * in case it was used earlier as a normal "cached" page.
+	 * Yeah this bit us - STAR 9000898266
+	 *
+	 * Although core does call flush_cache_vmap(), it gets kvaddr hence
+	 * can't be used to efficiently flush L1 and/or L2 which need paddr
+	 * Currently flush_cache_vmap nukes the L1 cache completely which
+	 * will be optimized as a separate commit
+	 */
+	dma_cache_wback_inv((unsigned long)paddr, size);
+
 	return kvaddr;
 }
 EXPORT_SYMBOL(dma_alloc_coherent);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 24/28] ARCv2: All bits in place, allow ARCv2 builds
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (22 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 23/28] ARCv2: SLC: Handle explcit flush for DMA ops (w/o IO-coherency) Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 25/28] ARCv2: [nsim*hs*] Support simulation platforms for HS38x cores Vineet Gupta
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel; +Cc: arnd, arc-linux-dev, Vineet Gupta

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/Kconfig | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 1b684595e258..e7cee0a5c56d 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -97,11 +97,10 @@ config ISA_ARCOMPACT
 	help
 	  The original ARC ISA of ARC600/700 cores
 
-### For bisectability, disable ARCv2 support until we have all the bits in place
-#config ISA_ARCV2
-#	bool "ARC ISA v2"
-#	help
-#	  ISA for the Next Generation ARC-HS cores
+config ISA_ARCV2
+	bool "ARC ISA v2"
+	help
+	  ISA for the Next Generation ARC-HS cores
 
 endchoice
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 25/28] ARCv2: [nsim*hs*] Support simulation platforms for HS38x cores
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (23 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 24/28] ARCv2: All bits in place, allow ARCv2 builds Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 26/28] ARC: [axs101] Prepare for AXS103 Vineet Gupta
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: arnd, arc-linux-dev, Vineet Gupta, Grant Likely, Rob Herring, devicetree

Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/boot/dts/nsim_hs.dts              |  53 +++++++++++++++
 arch/arc/boot/dts/nsim_hs_idu.dts          |  72 ++++++++++++++++++++
 arch/arc/boot/dts/nsimosci_hs.dts          |  80 +++++++++++++++++++++++
 arch/arc/boot/dts/nsimosci_hs_idu.dts      | 101 +++++++++++++++++++++++++++++
 arch/arc/configs/nsim_hs_defconfig         |  64 ++++++++++++++++++
 arch/arc/configs/nsim_hs_smp_defconfig     |  63 ++++++++++++++++++
 arch/arc/configs/nsimosci_hs_defconfig     |  73 +++++++++++++++++++++
 arch/arc/configs/nsimosci_hs_smp_defconfig |  93 ++++++++++++++++++++++++++
 arch/arc/plat-sim/platform.c               |   2 +
 9 files changed, 601 insertions(+)
 create mode 100644 arch/arc/boot/dts/nsim_hs.dts
 create mode 100644 arch/arc/boot/dts/nsim_hs_idu.dts
 create mode 100644 arch/arc/boot/dts/nsimosci_hs.dts
 create mode 100644 arch/arc/boot/dts/nsimosci_hs_idu.dts
 create mode 100644 arch/arc/configs/nsim_hs_defconfig
 create mode 100644 arch/arc/configs/nsim_hs_smp_defconfig
 create mode 100644 arch/arc/configs/nsimosci_hs_defconfig
 create mode 100644 arch/arc/configs/nsimosci_hs_smp_defconfig

diff --git a/arch/arc/boot/dts/nsim_hs.dts b/arch/arc/boot/dts/nsim_hs.dts
new file mode 100644
index 000000000000..911f069e0540
--- /dev/null
+++ b/arch/arc/boot/dts/nsim_hs.dts
@@ -0,0 +1,53 @@
+/*
+ * Copyright (C) 2014-15 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+/dts-v1/;
+
+/include/ "skeleton.dtsi"
+
+/ {
+	compatible = "snps,nsim_hs";
+	interrupt-parent = <&core_intc>;
+
+	chosen {
+		bootargs = "earlycon=arc_uart,mmio32,0xc0fc1000,115200n8 console=ttyARC0,115200n8";
+	};
+
+	aliases {
+		serial0 = &arcuart0;
+	};
+
+	fpga {
+		compatible = "simple-bus";
+		#address-cells = <1>;
+		#size-cells = <1>;
+
+		/* child and parent address space 1:1 mapped */
+		ranges;
+
+		core_intc: core-interrupt-controller {
+			compatible = "snps,archs-intc";
+			interrupt-controller;
+			#interrupt-cells = <1>;
+		};
+
+		arcuart0: serial@c0fc1000 {
+			compatible = "snps,arc-uart";
+			reg = <0xc0fc1000 0x100>;
+			interrupts = <24>;
+			clock-frequency = <80000000>;
+			current-speed = <115200>;
+			status = "okay";
+		};
+
+		arcpct0: pct {
+			compatible = "snps,archs-pct";
+			#interrupt-cells = <1>;
+			interrupts = <20>;
+		};
+	};
+};
diff --git a/arch/arc/boot/dts/nsim_hs_idu.dts b/arch/arc/boot/dts/nsim_hs_idu.dts
new file mode 100644
index 000000000000..46ab31975612
--- /dev/null
+++ b/arch/arc/boot/dts/nsim_hs_idu.dts
@@ -0,0 +1,72 @@
+/*
+ * Copyright (C) 2014-15 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+/dts-v1/;
+
+/include/ "skeleton.dtsi"
+
+/ {
+	compatible = "snps,nsim_hs";
+	interrupt-parent = <&core_intc>;
+
+	chosen {
+		bootargs = "earlycon=arc_uart,mmio32,0xc0fc1000,115200n8 console=ttyARC0,115200n8";
+	};
+
+	aliases {
+		serial0 = &arcuart0;
+	};
+
+	fpga {
+		compatible = "simple-bus";
+		#address-cells = <1>;
+		#size-cells = <1>;
+
+		/* child and parent address space 1:1 mapped */
+		ranges;
+
+		core_intc: core-interrupt-controller {
+			compatible = "snps,archs-intc";
+			interrupt-controller;
+			#interrupt-cells = <1>;
+		};
+
+		idu_intc: idu-interrupt-controller {
+			compatible = "snps,archs-idu-intc";
+			interrupt-controller;
+			interrupt-parent = <&core_intc>;
+
+			/*
+			 * <hwirq  distribution>
+			 * distribution: 0=RR; 1=cpu0, 2=cpu1, 4=cpu2, 8=cpu3
+			 */
+			#interrupt-cells = <2>;
+
+			/*
+			 * upstream irqs to core intc - downstream these are
+			 * "COMMON" irq 0,1..
+			 */
+			interrupts = <24 25 26 27 28 29 30 31>;
+		};
+
+		arcuart0: serial@c0fc1000 {
+			compatible = "snps,arc-uart";
+			reg = <0xc0fc1000 0x100>;
+			interrupt-parent = <&idu_intc>;
+			interrupts = <0 0>;
+			clock-frequency = <80000000>;
+			current-speed = <115200>;
+			status = "okay";
+		};
+
+		arcpct0: pct {
+			compatible = "snps,archs-pct";
+			#interrupt-cells = <1>;
+			interrupts = <20>;
+		};
+	};
+};
diff --git a/arch/arc/boot/dts/nsimosci_hs.dts b/arch/arc/boot/dts/nsimosci_hs.dts
new file mode 100644
index 000000000000..d64a96f8515a
--- /dev/null
+++ b/arch/arc/boot/dts/nsimosci_hs.dts
@@ -0,0 +1,80 @@
+/*
+ * Copyright (C) 2014-15 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+/dts-v1/;
+
+/include/ "skeleton.dtsi"
+
+/ {
+	compatible = "snps,nsimosci_hs";
+	clock-frequency = <20000000>;	/* 20 MHZ */
+	#address-cells = <1>;
+	#size-cells = <1>;
+	interrupt-parent = <&core_intc>;
+
+	chosen {
+		/* this is for console on PGU */
+		/* bootargs = "console=tty0 consoleblank=0"; */
+		/* this is for console on serial */
+		bootargs = "earlycon=uart8250,mmio32,0xf0000000,115200n8 console=tty0 console=ttyS0,115200n8 consoleblank=0 debug";
+	};
+
+	aliases {
+		serial0 = &uart0;
+	};
+
+	fpga {
+		compatible = "simple-bus";
+		#address-cells = <1>;
+		#size-cells = <1>;
+
+		/* child and parent address space 1:1 mapped */
+		ranges;
+
+		core_intc: core-interrupt-controller {
+			compatible = "snps,archs-intc";
+			interrupt-controller;
+			#interrupt-cells = <1>;
+		};
+
+		uart0: serial@f0000000 {
+			compatible = "ns8250";
+			reg = <0xf0000000 0x2000>;
+			interrupts = <24>;
+			clock-frequency = <3686400>;
+			baud = <115200>;
+			reg-shift = <2>;
+			reg-io-width = <4>;
+			no-loopback-test = <1>;
+		};
+
+		pgu0: pgu@f9000000 {
+			compatible = "snps,arcpgufb";
+			reg = <0xf9000000 0x400>;
+		};
+
+		ps2: ps2@f9001000 {
+			compatible = "snps,arc_ps2";
+			reg = <0xf9000400 0x14>;
+			interrupts = <27>;
+			interrupt-names = "arc_ps2_irq";
+		};
+
+		eth0: ethernet@f0003000 {
+			compatible = "snps,oscilan";
+			reg = <0xf0003000 0x44>;
+			interrupts = <25>, <26>;
+			interrupt-names = "rx", "tx";
+		};
+
+		arcpct0: pct {
+			compatible = "snps,archs-pct";
+			#interrupt-cells = <1>;
+			interrupts = <20>;
+		};
+	};
+};
diff --git a/arch/arc/boot/dts/nsimosci_hs_idu.dts b/arch/arc/boot/dts/nsimosci_hs_idu.dts
new file mode 100644
index 000000000000..f6bf0ca95a57
--- /dev/null
+++ b/arch/arc/boot/dts/nsimosci_hs_idu.dts
@@ -0,0 +1,101 @@
+/*
+ * Copyright (C) 2014-15 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+/dts-v1/;
+
+/include/ "skeleton.dtsi"
+
+/ {
+	compatible = "snps,nsimosci_hs";
+	clock-frequency = <5000000>;	/* 5 MHZ */
+	#address-cells = <1>;
+	#size-cells = <1>;
+	interrupt-parent = <&core_intc>;
+
+	chosen {
+		/* this is for console on serial */
+		bootargs = "earlycon=uart8250,mmio32,0xf0000000,115200n8 console=tty0 console=ttyS0,115200n8 consoleblan=0 debug";
+	};
+
+	aliases {
+		serial0 = &uart0;
+	};
+
+	fpga {
+		compatible = "simple-bus";
+		#address-cells = <1>;
+		#size-cells = <1>;
+
+		/* child and parent address space 1:1 mapped */
+		ranges;
+
+		core_intc: core-interrupt-controller {
+			compatible = "snps,archs-intc";
+			interrupt-controller;
+			#interrupt-cells = <1>;
+/*			interrupts = <16 17 18 19 20 21 22 23 24 25>; */
+		};
+
+		idu_intc: idu-interrupt-controller {
+			compatible = "snps,archs-idu-intc";
+			interrupt-controller;
+			interrupt-parent = <&core_intc>;
+
+			/*
+			 * <hwirq  distribution>
+			 * distribution: 0=RR; 1=cpu0, 2=cpu1, 4=cpu2, 8=cpu3
+			 */
+			#interrupt-cells = <2>;
+
+			/*
+			 * upstream irqs to core intc - downstream these are
+			 * "COMMON" irq 0,1..
+			 */
+			interrupts = <24 25 26 27 28 29 30 31>;
+		};
+
+		uart0: serial@f0000000 {
+			compatible = "ns8250";
+			reg = <0xf0000000 0x2000>;
+			interrupt-parent = <&idu_intc>;
+			interrupts = <0 0>; /* cmn irq 0 -> cpu irq 24
+						RR distribute to all cpus */
+			clock-frequency = <3686400>;
+			baud = <115200>;
+			reg-shift = <2>;
+			reg-io-width = <4>;
+			no-loopback-test = <1>;
+		};
+
+		pgu0: pgu@f9000000 {
+			compatible = "snps,arcpgufb";
+			reg = <0xf9000000 0x400>;
+		};
+
+		ps2: ps2@f9001000 {
+			compatible = "snps,arc_ps2";
+			reg = <0xf9000400 0x14>;
+			interrupts = <3 0>;
+			interrupt-parent = <&idu_intc>;
+			interrupt-names = "arc_ps2_irq";
+		};
+
+		eth0: ethernet@f0003000 {
+			compatible = "snps,oscilan";
+			reg = <0xf0003000 0x44>;
+			interrupt-parent = <&idu_intc>;
+			interrupts = <1 2>, <2 2>;
+			interrupt-names = "rx", "tx";
+		};
+
+		arcpct0: pct {
+			compatible = "snps,archs-pct";
+			#interrupt-cells = <1>;
+			interrupts = <20>;
+		};
+	};
+};
diff --git a/arch/arc/configs/nsim_hs_defconfig b/arch/arc/configs/nsim_hs_defconfig
new file mode 100644
index 000000000000..f761a7c70761
--- /dev/null
+++ b/arch/arc/configs/nsim_hs_defconfig
@@ -0,0 +1,64 @@
+CONFIG_CROSS_COMPILE="arc-linux-uclibc-"
+# CONFIG_LOCALVERSION_AUTO is not set
+CONFIG_DEFAULT_HOSTNAME="ARCLinux"
+# CONFIG_SWAP is not set
+CONFIG_SYSVIPC=y
+CONFIG_POSIX_MQUEUE=y
+# CONFIG_CROSS_MEMORY_ATTACH is not set
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_NAMESPACES=y
+# CONFIG_UTS_NS is not set
+# CONFIG_PID_NS is not set
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_INITRAMFS_SOURCE="../arc_initramfs_hs/"
+CONFIG_KALLSYMS_ALL=y
+CONFIG_EMBEDDED=y
+# CONFIG_SLUB_DEBUG is not set
+# CONFIG_COMPAT_BRK is not set
+CONFIG_KPROBES=y
+CONFIG_MODULES=y
+# CONFIG_LBDAF is not set
+# CONFIG_BLK_DEV_BSG is not set
+# CONFIG_IOSCHED_DEADLINE is not set
+# CONFIG_IOSCHED_CFQ is not set
+CONFIG_ARC_PLAT_SIM=y
+CONFIG_ISA_ARCV2=y
+CONFIG_ARC_BUILTIN_DTB_NAME="nsim_hs"
+CONFIG_PREEMPT=y
+# CONFIG_COMPACTION is not set
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_UNIX_DIAG=y
+CONFIG_NET_KEY=y
+CONFIG_INET=y
+# CONFIG_IPV6 is not set
+# CONFIG_STANDALONE is not set
+# CONFIG_PREVENT_FIRMWARE_BUILD is not set
+# CONFIG_FIRMWARE_IN_KERNEL is not set
+# CONFIG_BLK_DEV is not set
+# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
+# CONFIG_INPUT_KEYBOARD is not set
+# CONFIG_INPUT_MOUSE is not set
+# CONFIG_SERIO is not set
+# CONFIG_LEGACY_PTYS is not set
+# CONFIG_DEVKMEM is not set
+CONFIG_SERIAL_ARC=y
+CONFIG_SERIAL_ARC_CONSOLE=y
+# CONFIG_HW_RANDOM is not set
+# CONFIG_HWMON is not set
+# CONFIG_VGA_CONSOLE is not set
+# CONFIG_HID is not set
+# CONFIG_USB_SUPPORT is not set
+# CONFIG_IOMMU_SUPPORT is not set
+CONFIG_EXT2_FS=y
+CONFIG_EXT2_FS_XATTR=y
+CONFIG_TMPFS=y
+# CONFIG_MISC_FILESYSTEMS is not set
+CONFIG_NFS_FS=y
+# CONFIG_ENABLE_WARN_DEPRECATED is not set
+# CONFIG_ENABLE_MUST_CHECK is not set
+# CONFIG_DEBUG_PREEMPT is not set
+CONFIG_XZ_DEC=y
diff --git a/arch/arc/configs/nsim_hs_smp_defconfig b/arch/arc/configs/nsim_hs_smp_defconfig
new file mode 100644
index 000000000000..dc6f74f41283
--- /dev/null
+++ b/arch/arc/configs/nsim_hs_smp_defconfig
@@ -0,0 +1,63 @@
+CONFIG_CROSS_COMPILE="arc-linux-uclibc-"
+# CONFIG_LOCALVERSION_AUTO is not set
+CONFIG_DEFAULT_HOSTNAME="ARCLinux"
+# CONFIG_SWAP is not set
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_NAMESPACES=y
+# CONFIG_UTS_NS is not set
+# CONFIG_PID_NS is not set
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_INITRAMFS_SOURCE="../arc_initramfs_hs/"
+CONFIG_KALLSYMS_ALL=y
+CONFIG_EMBEDDED=y
+# CONFIG_SLUB_DEBUG is not set
+# CONFIG_COMPAT_BRK is not set
+CONFIG_KPROBES=y
+CONFIG_MODULES=y
+# CONFIG_LBDAF is not set
+# CONFIG_BLK_DEV_BSG is not set
+# CONFIG_IOSCHED_DEADLINE is not set
+# CONFIG_IOSCHED_CFQ is not set
+CONFIG_ARC_PLAT_SIM=y
+CONFIG_ARC_BOARD_ML509=y
+CONFIG_ISA_ARCV2=y
+CONFIG_SMP=y
+CONFIG_ARC_BUILTIN_DTB_NAME="nsim_hs_idu"
+CONFIG_PREEMPT=y
+# CONFIG_COMPACTION is not set
+# CONFIG_CROSS_MEMORY_ATTACH is not set
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_UNIX_DIAG=y
+CONFIG_NET_KEY=y
+CONFIG_INET=y
+# CONFIG_IPV6 is not set
+# CONFIG_STANDALONE is not set
+# CONFIG_PREVENT_FIRMWARE_BUILD is not set
+# CONFIG_FIRMWARE_IN_KERNEL is not set
+# CONFIG_BLK_DEV is not set
+# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
+# CONFIG_INPUT_KEYBOARD is not set
+# CONFIG_INPUT_MOUSE is not set
+# CONFIG_SERIO is not set
+# CONFIG_LEGACY_PTYS is not set
+# CONFIG_DEVKMEM is not set
+CONFIG_SERIAL_ARC=y
+CONFIG_SERIAL_ARC_CONSOLE=y
+# CONFIG_HW_RANDOM is not set
+# CONFIG_HWMON is not set
+# CONFIG_VGA_CONSOLE is not set
+# CONFIG_HID is not set
+# CONFIG_USB_SUPPORT is not set
+# CONFIG_IOMMU_SUPPORT is not set
+CONFIG_EXT2_FS=y
+CONFIG_EXT2_FS_XATTR=y
+CONFIG_TMPFS=y
+# CONFIG_MISC_FILESYSTEMS is not set
+CONFIG_NFS_FS=y
+# CONFIG_ENABLE_WARN_DEPRECATED is not set
+# CONFIG_ENABLE_MUST_CHECK is not set
+CONFIG_XZ_DEC=y
diff --git a/arch/arc/configs/nsimosci_hs_defconfig b/arch/arc/configs/nsimosci_hs_defconfig
new file mode 100644
index 000000000000..3fef0a210c56
--- /dev/null
+++ b/arch/arc/configs/nsimosci_hs_defconfig
@@ -0,0 +1,73 @@
+CONFIG_CROSS_COMPILE="arc-linux-uclibc-"
+# CONFIG_LOCALVERSION_AUTO is not set
+CONFIG_DEFAULT_HOSTNAME="ARCLinux"
+# CONFIG_SWAP is not set
+CONFIG_SYSVIPC=y
+# CONFIG_CROSS_MEMORY_ATTACH is not set
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_NAMESPACES=y
+# CONFIG_UTS_NS is not set
+# CONFIG_PID_NS is not set
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_INITRAMFS_SOURCE="../arc_initramfs_hs/"
+CONFIG_KALLSYMS_ALL=y
+CONFIG_EMBEDDED=y
+# CONFIG_SLUB_DEBUG is not set
+# CONFIG_COMPAT_BRK is not set
+CONFIG_KPROBES=y
+CONFIG_MODULES=y
+# CONFIG_LBDAF is not set
+# CONFIG_BLK_DEV_BSG is not set
+# CONFIG_IOSCHED_DEADLINE is not set
+# CONFIG_IOSCHED_CFQ is not set
+CONFIG_ARC_PLAT_SIM=y
+CONFIG_ISA_ARCV2=y
+CONFIG_ARC_BUILTIN_DTB_NAME="nsimosci_hs"
+# CONFIG_COMPACTION is not set
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_UNIX_DIAG=y
+CONFIG_NET_KEY=y
+CONFIG_INET=y
+# CONFIG_IPV6 is not set
+# CONFIG_STANDALONE is not set
+# CONFIG_PREVENT_FIRMWARE_BUILD is not set
+# CONFIG_FIRMWARE_IN_KERNEL is not set
+# CONFIG_BLK_DEV is not set
+CONFIG_NETDEVICES=y
+CONFIG_NET_OSCI_LAN=y
+CONFIG_INPUT_EVDEV=y
+# CONFIG_MOUSE_PS2_ALPS is not set
+# CONFIG_MOUSE_PS2_LOGIPS2PP is not set
+# CONFIG_MOUSE_PS2_SYNAPTICS is not set
+# CONFIG_MOUSE_PS2_TRACKPOINT is not set
+CONFIG_MOUSE_PS2_TOUCHKIT=y
+# CONFIG_SERIO_SERPORT is not set
+CONFIG_SERIO_ARC_PS2=y
+# CONFIG_LEGACY_PTYS is not set
+# CONFIG_DEVKMEM is not set
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_SERIAL_8250_NR_UARTS=1
+CONFIG_SERIAL_8250_RUNTIME_UARTS=1
+CONFIG_SERIAL_OF_PLATFORM=y
+# CONFIG_HW_RANDOM is not set
+# CONFIG_HWMON is not set
+CONFIG_FB=y
+# CONFIG_VGA_CONSOLE is not set
+CONFIG_FRAMEBUFFER_CONSOLE=y
+CONFIG_LOGO=y
+# CONFIG_HID is not set
+# CONFIG_USB_SUPPORT is not set
+# CONFIG_IOMMU_SUPPORT is not set
+CONFIG_EXT2_FS=y
+CONFIG_EXT2_FS_XATTR=y
+CONFIG_TMPFS=y
+# CONFIG_MISC_FILESYSTEMS is not set
+CONFIG_NFS_FS=y
+# CONFIG_ENABLE_WARN_DEPRECATED is not set
+# CONFIG_ENABLE_MUST_CHECK is not set
diff --git a/arch/arc/configs/nsimosci_hs_smp_defconfig b/arch/arc/configs/nsimosci_hs_smp_defconfig
new file mode 100644
index 000000000000..51784837daae
--- /dev/null
+++ b/arch/arc/configs/nsimosci_hs_smp_defconfig
@@ -0,0 +1,93 @@
+CONFIG_CROSS_COMPILE="arc-linux-uclibc-"
+CONFIG_DEFAULT_HOSTNAME="ARCLinux"
+# CONFIG_SWAP is not set
+CONFIG_SYSVIPC=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+# CONFIG_UTS_NS is not set
+# CONFIG_PID_NS is not set
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_INITRAMFS_SOURCE="../arc_initramfs_hs/"
+# CONFIG_COMPAT_BRK is not set
+CONFIG_KPROBES=y
+CONFIG_MODULES=y
+# CONFIG_LBDAF is not set
+# CONFIG_BLK_DEV_BSG is not set
+# CONFIG_IOSCHED_DEADLINE is not set
+# CONFIG_IOSCHED_CFQ is not set
+CONFIG_ARC_PLAT_SIM=y
+CONFIG_ARC_BOARD_ML509=y
+CONFIG_ISA_ARCV2=y
+CONFIG_SMP=y
+CONFIG_ARC_HAS_LL64=y
+# CONFIG_ARC_HAS_RTSC is not set
+CONFIG_ARC_BUILTIN_DTB_NAME="nsimosci_hs_idu"
+CONFIG_PREEMPT=y
+# CONFIG_COMPACTION is not set
+# CONFIG_CROSS_MEMORY_ATTACH is not set
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_PACKET_DIAG=y
+CONFIG_UNIX=y
+CONFIG_UNIX_DIAG=y
+CONFIG_NET_KEY=y
+CONFIG_INET=y
+# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
+# CONFIG_INET_XFRM_MODE_TUNNEL is not set
+# CONFIG_INET_XFRM_MODE_BEET is not set
+# CONFIG_INET_LRO is not set
+# CONFIG_IPV6 is not set
+# CONFIG_WIRELESS is not set
+# CONFIG_STANDALONE is not set
+# CONFIG_PREVENT_FIRMWARE_BUILD is not set
+# CONFIG_FIRMWARE_IN_KERNEL is not set
+# CONFIG_BLK_DEV is not set
+CONFIG_NETDEVICES=y
+# CONFIG_NET_VENDOR_ARC is not set
+# CONFIG_NET_CADENCE is not set
+# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_VENDOR_INTEL is not set
+# CONFIG_NET_VENDOR_MARVELL is not set
+# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_NATSEMI is not set
+# CONFIG_NET_VENDOR_SEEQ is not set
+# CONFIG_NET_VENDOR_STMICRO is not set
+# CONFIG_NET_VENDOR_VIA is not set
+# CONFIG_NET_VENDOR_WIZNET is not set
+CONFIG_NET_OSCI_LAN=y
+# CONFIG_WLAN is not set
+CONFIG_INPUT_EVDEV=y
+CONFIG_MOUSE_PS2_TOUCHKIT=y
+# CONFIG_SERIO_SERPORT is not set
+CONFIG_SERIO_LIBPS2=y
+CONFIG_SERIO_ARC_PS2=y
+CONFIG_VT_HW_CONSOLE_BINDING=y
+# CONFIG_LEGACY_PTYS is not set
+# CONFIG_DEVKMEM is not set
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_SERIAL_8250_NR_UARTS=1
+CONFIG_SERIAL_8250_RUNTIME_UARTS=1
+CONFIG_SERIAL_8250_DW=y
+CONFIG_SERIAL_OF_PLATFORM=y
+# CONFIG_HW_RANDOM is not set
+# CONFIG_HWMON is not set
+CONFIG_FB=y
+CONFIG_ARCPGU_RGB888=y
+CONFIG_ARCPGU_DISPTYPE=0
+# CONFIG_VGA_CONSOLE is not set
+CONFIG_FRAMEBUFFER_CONSOLE=y
+CONFIG_LOGO=y
+# CONFIG_HID is not set
+# CONFIG_USB_SUPPORT is not set
+# CONFIG_IOMMU_SUPPORT is not set
+CONFIG_EXT2_FS=y
+CONFIG_EXT2_FS_XATTR=y
+CONFIG_TMPFS=y
+# CONFIG_MISC_FILESYSTEMS is not set
+CONFIG_NFS_FS=y
+# CONFIG_ENABLE_WARN_DEPRECATED is not set
+# CONFIG_ENABLE_MUST_CHECK is not set
+CONFIG_FTRACE=y
diff --git a/arch/arc/plat-sim/platform.c b/arch/arc/plat-sim/platform.c
index 8795ae2ef48a..d9e35b4a2f08 100644
--- a/arch/arc/plat-sim/platform.c
+++ b/arch/arc/plat-sim/platform.c
@@ -22,7 +22,9 @@
 
 static const char *simulation_compat[] __initconst = {
 	"snps,nsim",
+	"snps,nsim_hs",
 	"snps,nsimosci",
+	"snps,nsimosci_hs",
 	NULL,
 };
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 26/28] ARC: [axs101] Prepare for AXS103
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (24 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 25/28] ARCv2: [nsim*hs*] Support simulation platforms for HS38x cores Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 27/28] ARCv2: [axs103] Support ARC SDP FPGA platform for HS38x cores Vineet Gupta
  2015-06-09 11:48 ` [PATCH 28/28] ARCv2: [vdk] dts files and defconfig for HS38 VDK Vineet Gupta
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: arnd, arc-linux-dev, Alexey Brodkin, Grant Likely, Rob Herring,
	devicetree, Vineet Gupta

From: Alexey Brodkin <abrodkin@synopsys.com>

To avoid duplicating the MB DTS file, move the MB intc entry into cpu
card specific file

Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/boot/dts/axc001.dtsi    | 21 +++++++++++++++++++++
 arch/arc/boot/dts/axs10x_mb.dtsi | 17 -----------------
 2 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/arch/arc/boot/dts/axc001.dtsi b/arch/arc/boot/dts/axc001.dtsi
index 6990ca45fc7b..a5e2726a067e 100644
--- a/arch/arc/boot/dts/axc001.dtsi
+++ b/arch/arc/boot/dts/axc001.dtsi
@@ -69,6 +69,27 @@
 		};
 	};
 
+	/*
+	 * This INTC is actually connected to DW APB GPIO
+	 * which acts as a wire between MB INTC and CPU INTC.
+	 * GPIO INTC is configured in platform init code
+	 * and here we mimic direct connection from MB INTC to
+	 * CPU INTC, thus we set "interrupts = <7>" instead of
+	 * "interrupts = <12>"
+	 *
+	 * This intc actually resides on MB, but we move it here to
+	 * avoid duplicating the MB dtsi file given that IRQ from
+	 * this intc to cpu intc are different for axs101 and axs103
+	 */
+	mb_intc: dw-apb-ictl@0xe0012000 {
+		#interrupt-cells = <1>;
+		compatible = "snps,dw-apb-ictl";
+		reg = < 0xe0012000 0x200 >;
+		interrupt-controller;
+		interrupt-parent = <&cpu_intc>;
+		interrupts = < 7 >;
+	};
+
 	memory {
 		#address-cells = <1>;
 		#size-cells = <1>;
diff --git a/arch/arc/boot/dts/axs10x_mb.dtsi b/arch/arc/boot/dts/axs10x_mb.dtsi
index 5d06f1fb4266..f3db32154973 100644
--- a/arch/arc/boot/dts/axs10x_mb.dtsi
+++ b/arch/arc/boot/dts/axs10x_mb.dtsi
@@ -36,23 +36,6 @@
 			};
 		};
 
-		/*
-		 * This INTC is actually connected to DW APB GPIO
-		 * which acts as a wire between MB INTC and CPU INTC.
-		 * GPIO INTC is configured in platform init code
-		 * and here we mimic direct connection from MB INTC to
-		 * CPU INTC, thus we set "interrupts = <7>" instead of
-		 * "interrupts = <12>"
-		 */
-		mb_intc: dw-apb-ictl@0x12000 {
-			#interrupt-cells = <1>;
-			compatible = "snps,dw-apb-ictl";
-			reg = < 0x12000 0x200 >;
-			interrupt-controller;
-			interrupt-parent = <&cpu_intc>;
-			interrupts = < 7 >;
-		};
-
 		ethernet@0x18000 {
 			#interrupt-cells = <1>;
 			compatible = "snps,dwmac";
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 27/28] ARCv2: [axs103] Support ARC SDP FPGA platform for HS38x cores
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (25 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 26/28] ARC: [axs101] Prepare for AXS103 Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  2015-06-09 11:48 ` [PATCH 28/28] ARCv2: [vdk] dts files and defconfig for HS38 VDK Vineet Gupta
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: arnd, arc-linux-dev, Vineet Gupta, Grant Likely, Rob Herring, devicetree

Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 Documentation/devicetree/bindings/arc/axs103.txt |   8 +
 arch/arc/boot/dts/axc003.dtsi                    | 102 ++++++++++++
 arch/arc/boot/dts/axc003_idu.dtsi                | 126 +++++++++++++++
 arch/arc/boot/dts/axs103.dts                     |  24 +++
 arch/arc/boot/dts/axs103_idu.dts                 |  24 +++
 arch/arc/configs/axs103_defconfig                | 117 ++++++++++++++
 arch/arc/configs/axs103_smp_defconfig            | 118 ++++++++++++++
 arch/arc/kernel/devtree.c                        |   2 +-
 arch/arc/plat-axs10x/Kconfig                     |  13 +-
 arch/arc/plat-axs10x/axs10x.c                    | 198 +++++++++++++++++++++--
 10 files changed, 720 insertions(+), 12 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arc/axs103.txt
 create mode 100644 arch/arc/boot/dts/axc003.dtsi
 create mode 100644 arch/arc/boot/dts/axc003_idu.dtsi
 create mode 100644 arch/arc/boot/dts/axs103.dts
 create mode 100644 arch/arc/boot/dts/axs103_idu.dts
 create mode 100644 arch/arc/configs/axs103_defconfig
 create mode 100644 arch/arc/configs/axs103_smp_defconfig

diff --git a/Documentation/devicetree/bindings/arc/axs103.txt b/Documentation/devicetree/bindings/arc/axs103.txt
new file mode 100644
index 000000000000..6eea862e72b9
--- /dev/null
+++ b/Documentation/devicetree/bindings/arc/axs103.txt
@@ -0,0 +1,8 @@
+Synopsys DesignWare ARC Software Development Platforms Device Tree Bindings
+---------------------------------------------------------------------------
+
+SDP Main Board with an AXC003 FPGA Card which can contain various flavours of
+HS38x cores.
+
+Required root node properties:
+    - compatible = "snps,axs103", "snps,arc-sdp";
diff --git a/arch/arc/boot/dts/axc003.dtsi b/arch/arc/boot/dts/axc003.dtsi
new file mode 100644
index 000000000000..15c8d6226c9d
--- /dev/null
+++ b/arch/arc/boot/dts/axc003.dtsi
@@ -0,0 +1,102 @@
+/*
+ * Copyright (C) 2014-15 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * Device tree for AXC003 CPU card: HS38x UP configuration
+ */
+
+/ {
+	compatible = "snps,arc";
+	clock-frequency = <75000000>;
+	#address-cells = <1>;
+	#size-cells = <1>;
+
+	cpu_card {
+		compatible = "simple-bus";
+		#address-cells = <1>;
+		#size-cells = <1>;
+
+		ranges = <0x00000000 0xf0000000 0x10000000>;
+
+		cpu_intc: archs-intc@cpu {
+			compatible = "snps,archs-intc";
+			interrupt-controller;
+			#interrupt-cells = <1>;
+		};
+
+		/*
+		 * this GPIO block ORs all interrupts on CPU card (creg,..)
+		 * to uplink only 1 IRQ to ARC core intc
+		 */
+		dw-apb-gpio@0x2000 {
+			compatible = "snps,dw-apb-gpio";
+			reg = < 0x2000 0x80 >;
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			ictl_intc: gpio-controller@0 {
+				compatible = "snps,dw-apb-gpio-port";
+				gpio-controller;
+				#gpio-cells = <2>;
+				snps,nr-gpios = <30>;
+				reg = <0>;
+				interrupt-controller;
+				#interrupt-cells = <2>;
+				interrupt-parent = <&cpu_intc>;
+				interrupts = <25>;
+			};
+		};
+
+		debug_uart: dw-apb-uart@0x5000 {
+			compatible = "snps,dw-apb-uart";
+			reg = <0x5000 0x100>;
+			clock-frequency = <33333000>;
+			interrupt-parent = <&ictl_intc>;
+			interrupts = <2 4>;
+			baud = <115200>;
+			reg-shift = <2>;
+			reg-io-width = <4>;
+		};
+
+		arcpct0: pct {
+			compatible = "snps,archs-pct";
+			#interrupt-cells = <1>;
+			interrupt-parent = <&cpu_intc>;
+			interrupts = <20>;
+		};
+	};
+
+	/*
+	 * This INTC is actually connected to DW APB GPIO
+	 * which acts as a wire between MB INTC and CPU INTC.
+	 * GPIO INTC is configured in platform init code
+	 * and here we mimic direct connection from MB INTC to
+	 * CPU INTC, thus we set "interrupts = <7>" instead of
+	 * "interrupts = <12>"
+	 *
+	 * This intc actually resides on MB, but we move it here to
+	 * avoid duplicating the MB dtsi file given that IRQ from
+	 * this intc to cpu intc are different for axs101 and axs103
+	 */
+	mb_intc: dw-apb-ictl@0xe0012000 {
+		#interrupt-cells = <1>;
+		compatible = "snps,dw-apb-ictl";
+		reg = < 0xe0012000 0x200 >;
+		interrupt-controller;
+		interrupt-parent = <&cpu_intc>;
+		interrupts = < 24 >;
+	};
+
+	memory {
+		#address-cells = <1>;
+		#size-cells = <1>;
+		ranges = <0x00000000 0x80000000 0x40000000>;
+		device_type = "memory";
+		reg = <0x00000000 0x20000000>;	/* 512MiB */
+	};
+};
diff --git a/arch/arc/boot/dts/axc003_idu.dtsi b/arch/arc/boot/dts/axc003_idu.dtsi
new file mode 100644
index 000000000000..199d42820eca
--- /dev/null
+++ b/arch/arc/boot/dts/axc003_idu.dtsi
@@ -0,0 +1,126 @@
+/*
+ * Copyright (C) 2014, 2015 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * Device tree for AXC003 CPU card: HS38x2 (Dual Core) with IDU intc
+ */
+
+/ {
+	compatible = "snps,arc";
+	clock-frequency = <75000000>;
+	#address-cells = <1>;
+	#size-cells = <1>;
+
+	cpu_card {
+		compatible = "simple-bus";
+		#address-cells = <1>;
+		#size-cells = <1>;
+
+		ranges = <0x00000000 0xf0000000 0x10000000>;
+
+		cpu_intc: archs-intc@cpu {
+			compatible = "snps,archs-intc";
+			interrupt-controller;
+			#interrupt-cells = <1>;
+		};
+
+		idu_intc: idu-interrupt-controller {
+			compatible = "snps,archs-idu-intc";
+			interrupt-controller;
+			interrupt-parent = <&cpu_intc>;
+
+			/*
+			 * <hwirq  distribution>
+			 * distribution: 0=RR; 1=cpu0, 2=cpu1, 4=cpu2, 8=cpu3
+			 */
+			#interrupt-cells = <2>;
+
+			/*
+			 * upstream irqs to core intc - downstream these are
+			 * "COMMON" irq 0,1..
+			 */
+			interrupts = <24 25>;
+		};
+
+		/*
+		 * this GPIO block ORs all interrupts on CPU card (creg,..)
+		 * to uplink only 1 IRQ to ARC core intc
+		 */
+		dw-apb-gpio@0x2000 {
+			compatible = "snps,dw-apb-gpio";
+			reg = < 0x2000 0x80 >;
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			ictl_intc: gpio-controller@0 {
+				compatible = "snps,dw-apb-gpio-port";
+				gpio-controller;
+				#gpio-cells = <2>;
+				snps,nr-gpios = <30>;
+				reg = <0>;
+				interrupt-controller;
+				#interrupt-cells = <2>;
+				interrupt-parent = <&idu_intc>;
+
+				/*
+				 * cmn irq 1 -> cpu irq 25
+				 * Distribute to cpu0 only
+				 */
+				interrupts = <1 1>;
+			};
+		};
+
+		debug_uart: dw-apb-uart@0x5000 {
+			compatible = "snps,dw-apb-uart";
+			reg = <0x5000 0x100>;
+			clock-frequency = <33333000>;
+			interrupt-parent = <&ictl_intc>;
+			interrupts = <2 4>;
+			baud = <115200>;
+			reg-shift = <2>;
+			reg-io-width = <4>;
+		};
+
+		arcpct0: pct {
+			compatible = "snps,archs-pct";
+			#interrupt-cells = <1>;
+			interrupt-parent = <&cpu_intc>;
+			interrupts = <20>;
+		};
+	};
+
+	/*
+	 * This INTC is actually connected to DW APB GPIO
+	 * which acts as a wire between MB INTC and CPU INTC.
+	 * GPIO INTC is configured in platform init code
+	 * and here we mimic direct connection from MB INTC to
+	 * CPU INTC, thus we set "interrupts = <0 1>" instead of
+	 * "interrupts = <12>"
+	 *
+	 * This intc actually resides on MB, but we move it here to
+	 * avoid duplicating the MB dtsi file given that IRQ from
+	 * this intc to cpu intc are different for axs101 and axs103
+	 */
+	mb_intc: dw-apb-ictl@0xe0012000 {
+		#interrupt-cells = <1>;
+		compatible = "snps,dw-apb-ictl";
+		reg = < 0xe0012000 0x200 >;
+		interrupt-controller;
+		interrupt-parent = <&idu_intc>;
+		interrupts = <0 1>;	/* cmn irq 0 -> cpu irq 24
+					   distribute to cpu0 only */
+	};
+
+	memory {
+		#address-cells = <1>;
+		#size-cells = <1>;
+		ranges = <0x00000000 0x80000000 0x40000000>;
+		device_type = "memory";
+		reg = <0x00000000 0x20000000>;	/* 512MiB */
+	};
+};
diff --git a/arch/arc/boot/dts/axs103.dts b/arch/arc/boot/dts/axs103.dts
new file mode 100644
index 000000000000..e6d0e31ea299
--- /dev/null
+++ b/arch/arc/boot/dts/axs103.dts
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2014-15 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * Device Tree for AXS103 SDP with AXS10X Main Board and
+ * AXC003 FPGA Card (with UP bitfile)
+ */
+/dts-v1/;
+
+/include/ "axc003.dtsi"
+/include/ "axs10x_mb.dtsi"
+
+/ {
+	compatible = "snps,axs103", "snps,arc-sdp";
+
+	chosen {
+		bootargs = "earlycon=uart8250,mmio32,0xe0022000,115200n8 console=ttyS3,115200n8 debug print-fatal-signals=1";
+	};
+};
diff --git a/arch/arc/boot/dts/axs103_idu.dts b/arch/arc/boot/dts/axs103_idu.dts
new file mode 100644
index 000000000000..f999fef5a60a
--- /dev/null
+++ b/arch/arc/boot/dts/axs103_idu.dts
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2014-15 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * Device Tree for AXS103 SDP with AXS10X Main Board and
+ * AXC003 FPGA Card (with SMP bitfile)
+ */
+/dts-v1/;
+
+/include/ "axc003_idu.dtsi"
+/include/ "axs10x_mb.dtsi"
+
+/ {
+	compatible = "snps,axs103", "snps,arc-sdp";
+
+	chosen {
+		bootargs = "earlycon=uart8250,mmio32,0xe0022000,115200n8 console=ttyS3,115200n8 debug print-fatal-signals=1";
+	};
+};
diff --git a/arch/arc/configs/axs103_defconfig b/arch/arc/configs/axs103_defconfig
new file mode 100644
index 000000000000..83a6d8d5cc58
--- /dev/null
+++ b/arch/arc/configs/axs103_defconfig
@@ -0,0 +1,117 @@
+CONFIG_CROSS_COMPILE="arc-linux-uclibc-"
+CONFIG_DEFAULT_HOSTNAME="ARCLinux"
+# CONFIG_SWAP is not set
+CONFIG_SYSVIPC=y
+CONFIG_POSIX_MQUEUE=y
+# CONFIG_CROSS_MEMORY_ATTACH is not set
+CONFIG_NO_HZ_IDLE=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_NAMESPACES=y
+# CONFIG_UTS_NS is not set
+# CONFIG_PID_NS is not set
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_INITRAMFS_SOURCE="../../arc_initramfs_hs/"
+CONFIG_EMBEDDED=y
+CONFIG_PERF_EVENTS=y
+# CONFIG_VM_EVENT_COUNTERS is not set
+# CONFIG_SLUB_DEBUG is not set
+# CONFIG_COMPAT_BRK is not set
+CONFIG_MODULES=y
+CONFIG_PARTITION_ADVANCED=y
+CONFIG_ARC_PLAT_AXS10X=y
+CONFIG_AXS103=y
+CONFIG_ISA_ARCV2=y
+CONFIG_ARC_BUILTIN_DTB_NAME="axs103"
+CONFIG_PREEMPT=y
+# CONFIG_COMPACTION is not set
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_NET_KEY=y
+CONFIG_INET=y
+CONFIG_IP_PNP=y
+CONFIG_IP_PNP_DHCP=y
+CONFIG_IP_PNP_BOOTP=y
+CONFIG_IP_PNP_RARP=y
+# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
+# CONFIG_INET_XFRM_MODE_TUNNEL is not set
+# CONFIG_INET_XFRM_MODE_BEET is not set
+# CONFIG_IPV6 is not set
+# CONFIG_STANDALONE is not set
+# CONFIG_PREVENT_FIRMWARE_BUILD is not set
+# CONFIG_FIRMWARE_IN_KERNEL is not set
+CONFIG_MTD=y
+CONFIG_MTD_CMDLINE_PARTS=y
+CONFIG_MTD_BLOCK=y
+CONFIG_MTD_NAND=y
+CONFIG_MTD_NAND_AXS=y
+CONFIG_SCSI=y
+CONFIG_BLK_DEV_SD=y
+CONFIG_NETDEVICES=y
+# CONFIG_NET_VENDOR_ARC is not set
+# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_VENDOR_INTEL is not set
+# CONFIG_NET_VENDOR_MARVELL is not set
+# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_NATSEMI is not set
+# CONFIG_NET_VENDOR_SEEQ is not set
+CONFIG_STMMAC_ETH=y
+# CONFIG_NET_VENDOR_VIA is not set
+# CONFIG_NET_VENDOR_WIZNET is not set
+CONFIG_NATIONAL_PHY=y
+# CONFIG_USB_NET_DRIVERS is not set
+CONFIG_INPUT_EVDEV=y
+CONFIG_MOUSE_PS2_TOUCHKIT=y
+CONFIG_MOUSE_SERIAL=y
+CONFIG_MOUSE_SYNAPTICS_USB=y
+# CONFIG_LEGACY_PTYS is not set
+# CONFIG_DEVKMEM is not set
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_SERIAL_8250_DW=y
+CONFIG_SERIAL_OF_PLATFORM=y
+# CONFIG_HW_RANDOM is not set
+CONFIG_I2C=y
+CONFIG_I2C_CHARDEV=y
+CONFIG_I2C_DESIGNWARE_PLATFORM=y
+# CONFIG_HWMON is not set
+CONFIG_FB=y
+# CONFIG_VGA_CONSOLE is not set
+CONFIG_FRAMEBUFFER_CONSOLE=y
+CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
+CONFIG_LOGO=y
+# CONFIG_LOGO_LINUX_MONO is not set
+# CONFIG_LOGO_LINUX_VGA16 is not set
+# CONFIG_LOGO_LINUX_CLUT224 is not set
+CONFIG_USB=y
+CONFIG_USB_EHCI_HCD=y
+CONFIG_USB_EHCI_HCD_PLATFORM=y
+CONFIG_USB_OHCI_HCD=y
+CONFIG_USB_OHCI_HCD_PLATFORM=y
+CONFIG_USB_STORAGE=y
+CONFIG_MMC=y
+CONFIG_MMC_SDHCI=y
+CONFIG_MMC_SDHCI_PLTFM=y
+CONFIG_MMC_DW=y
+CONFIG_MMC_DW_IDMAC=y
+# CONFIG_IOMMU_SUPPORT is not set
+CONFIG_EXT3_FS=y
+CONFIG_EXT4_FS=y
+CONFIG_MSDOS_FS=y
+CONFIG_VFAT_FS=y
+CONFIG_NTFS_FS=y
+CONFIG_TMPFS=y
+CONFIG_JFFS2_FS=y
+CONFIG_NFS_FS=y
+CONFIG_NLS_CODEPAGE_437=y
+CONFIG_NLS_ISO8859_1=y
+# CONFIG_ENABLE_WARN_DEPRECATED is not set
+# CONFIG_ENABLE_MUST_CHECK is not set
+CONFIG_STRIP_ASM_SYMS=y
+CONFIG_LOCKUP_DETECTOR=y
+CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
+# CONFIG_SCHED_DEBUG is not set
+# CONFIG_DEBUG_PREEMPT is not set
+# CONFIG_FTRACE is not set
diff --git a/arch/arc/configs/axs103_smp_defconfig b/arch/arc/configs/axs103_smp_defconfig
new file mode 100644
index 000000000000..f1e1c84e0dda
--- /dev/null
+++ b/arch/arc/configs/axs103_smp_defconfig
@@ -0,0 +1,118 @@
+CONFIG_CROSS_COMPILE="arc-linux-uclibc-"
+CONFIG_DEFAULT_HOSTNAME="ARCLinux"
+# CONFIG_SWAP is not set
+CONFIG_SYSVIPC=y
+CONFIG_POSIX_MQUEUE=y
+# CONFIG_CROSS_MEMORY_ATTACH is not set
+CONFIG_NO_HZ_IDLE=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_NAMESPACES=y
+# CONFIG_UTS_NS is not set
+# CONFIG_PID_NS is not set
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_INITRAMFS_SOURCE="../../arc_initramfs_hs/"
+CONFIG_EMBEDDED=y
+CONFIG_PERF_EVENTS=y
+# CONFIG_VM_EVENT_COUNTERS is not set
+# CONFIG_COMPAT_BRK is not set
+CONFIG_SLAB=y
+CONFIG_MODULES=y
+CONFIG_PARTITION_ADVANCED=y
+CONFIG_ARC_PLAT_AXS10X=y
+CONFIG_AXS103=y
+CONFIG_ISA_ARCV2=y
+CONFIG_SMP=y
+CONFIG_ARC_BUILTIN_DTB_NAME="axs103_idu"
+CONFIG_PREEMPT=y
+# CONFIG_COMPACTION is not set
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_NET_KEY=y
+CONFIG_INET=y
+CONFIG_IP_PNP=y
+CONFIG_IP_PNP_DHCP=y
+CONFIG_IP_PNP_BOOTP=y
+CONFIG_IP_PNP_RARP=y
+# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
+# CONFIG_INET_XFRM_MODE_TUNNEL is not set
+# CONFIG_INET_XFRM_MODE_BEET is not set
+# CONFIG_IPV6 is not set
+# CONFIG_STANDALONE is not set
+# CONFIG_PREVENT_FIRMWARE_BUILD is not set
+# CONFIG_FIRMWARE_IN_KERNEL is not set
+CONFIG_MTD=y
+CONFIG_MTD_CMDLINE_PARTS=y
+CONFIG_MTD_BLOCK=y
+CONFIG_MTD_NAND=y
+CONFIG_MTD_NAND_AXS=y
+CONFIG_SCSI=y
+CONFIG_BLK_DEV_SD=y
+CONFIG_NETDEVICES=y
+# CONFIG_NET_VENDOR_ARC is not set
+# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_VENDOR_INTEL is not set
+# CONFIG_NET_VENDOR_MARVELL is not set
+# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_NATSEMI is not set
+# CONFIG_NET_VENDOR_SEEQ is not set
+CONFIG_STMMAC_ETH=y
+# CONFIG_NET_VENDOR_VIA is not set
+# CONFIG_NET_VENDOR_WIZNET is not set
+CONFIG_NATIONAL_PHY=y
+# CONFIG_USB_NET_DRIVERS is not set
+CONFIG_INPUT_EVDEV=y
+CONFIG_MOUSE_PS2_TOUCHKIT=y
+CONFIG_MOUSE_SERIAL=y
+CONFIG_MOUSE_SYNAPTICS_USB=y
+# CONFIG_LEGACY_PTYS is not set
+# CONFIG_DEVKMEM is not set
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_SERIAL_8250_DW=y
+CONFIG_SERIAL_OF_PLATFORM=y
+# CONFIG_HW_RANDOM is not set
+CONFIG_I2C=y
+CONFIG_I2C_CHARDEV=y
+CONFIG_I2C_DESIGNWARE_PLATFORM=y
+# CONFIG_HWMON is not set
+CONFIG_FB=y
+# CONFIG_VGA_CONSOLE is not set
+CONFIG_FRAMEBUFFER_CONSOLE=y
+CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
+CONFIG_LOGO=y
+# CONFIG_LOGO_LINUX_MONO is not set
+# CONFIG_LOGO_LINUX_VGA16 is not set
+# CONFIG_LOGO_LINUX_CLUT224 is not set
+CONFIG_USB=y
+CONFIG_USB_EHCI_HCD=y
+CONFIG_USB_EHCI_HCD_PLATFORM=y
+CONFIG_USB_OHCI_HCD=y
+CONFIG_USB_OHCI_HCD_PLATFORM=y
+CONFIG_USB_STORAGE=y
+CONFIG_MMC=y
+CONFIG_MMC_SDHCI=y
+CONFIG_MMC_SDHCI_PLTFM=y
+CONFIG_MMC_DW=y
+CONFIG_MMC_DW_IDMAC=y
+# CONFIG_IOMMU_SUPPORT is not set
+CONFIG_EXT3_FS=y
+CONFIG_EXT4_FS=y
+CONFIG_MSDOS_FS=y
+CONFIG_VFAT_FS=y
+CONFIG_NTFS_FS=y
+CONFIG_TMPFS=y
+CONFIG_JFFS2_FS=y
+CONFIG_NFS_FS=y
+CONFIG_NLS_CODEPAGE_437=y
+CONFIG_NLS_ISO8859_1=y
+# CONFIG_ENABLE_WARN_DEPRECATED is not set
+# CONFIG_ENABLE_MUST_CHECK is not set
+CONFIG_STRIP_ASM_SYMS=y
+CONFIG_LOCKUP_DETECTOR=y
+CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
+# CONFIG_SCHED_DEBUG is not set
+# CONFIG_DEBUG_PREEMPT is not set
+# CONFIG_FTRACE is not set
diff --git a/arch/arc/kernel/devtree.c b/arch/arc/kernel/devtree.c
index f801d46dc087..7e844fd8213f 100644
--- a/arch/arc/kernel/devtree.c
+++ b/arch/arc/kernel/devtree.c
@@ -33,7 +33,7 @@ static void __init arc_set_early_base_baud(unsigned long dt_root)
 	if (of_flat_dt_is_compatible(dt_root, "abilis,arc-tb10x"))
 		arc_base_baud = core_clk/3;
 	else if (of_flat_dt_is_compatible(dt_root, "snps,arc-sdp"))
-		arc_base_baud = 33333333;	/* Fixed 33MHz clk */
+		arc_base_baud = 33333333;	/* Fixed 33MHz clk (AXS10x) */
 	else
 		arc_base_baud = core_clk;
 }
diff --git a/arch/arc/plat-axs10x/Kconfig b/arch/arc/plat-axs10x/Kconfig
index 45641ca8aba8..d475f9d4847c 100644
--- a/arch/arc/plat-axs10x/Kconfig
+++ b/arch/arc/plat-axs10x/Kconfig
@@ -6,7 +6,7 @@
 # published by the Free Software Foundation.
 #
 
-config ARC_PLAT_AXS10X
+menuconfig ARC_PLAT_AXS10X
 	bool "Synopsys ARC AXS10x Software Development Platforms"
 	select DW_APB_ICTL
 	select GPIO_DWAPB
@@ -23,6 +23,7 @@ config ARC_PLAT_AXS10X
 if ARC_PLAT_AXS10X
 
 config AXS101
+	depends on ISA_ARCOMPACT
 	bool "AXS101 with AXC001 CPU Card (ARC 770D/EM6/AS221)"
 	help
 	  This adds support for the 770D/EM6/AS221 CPU Card. Only the ARC
@@ -32,4 +33,14 @@ config AXS101
 	  this daughtercard. Please use the axs101.dts device tree
 	  with this configuration.
 
+config AXS103
+	bool "AXS103 with AXC003 CPU Card (ARC HS38x)"
+	depends on ISA_ARCV2
+	help
+	  This adds support for the HS38x CPU Card.
+
+	  The AXS103 Platform consists of an AXS10x mainboard with
+	  this daughtercard. Please use the axs103.dts device tree
+	  with this configuration.
+
 endif
diff --git a/arch/arc/plat-axs10x/axs10x.c b/arch/arc/plat-axs10x/axs10x.c
index a1cecdaf9dca..ad0a7ef84660 100644
--- a/arch/arc/plat-axs10x/axs10x.c
+++ b/arch/arc/plat-axs10x/axs10x.c
@@ -1,5 +1,5 @@
 /*
- * AXS101 Software Development Platform
+ * AXS101/AXS103 Software Development Platform
  *
  * Copyright (C) 2013-15 Synopsys, Inc. (www.synopsys.com)
  *
@@ -15,8 +15,10 @@
  */
 
 #include <linux/of_platform.h>
-#include <asm/mach_desc.h>
+#include <asm/clk.h>
 #include <asm/io.h>
+#include <asm/mach_desc.h>
+#include <asm/mcip.h>
 
 #define AXS_MB_CGU		0xE0010000
 #define AXS_MB_CREG		0xE0011000
@@ -29,14 +31,6 @@
 #define AXC001_CREG		0xF0001000
 #define AXC001_GPIO_INTC	0xF0003000
 
-#define CREG_CPU_ADDR_770	(AXC001_CREG + 0x20)
-#define CREG_CPU_ADDR_TUNN	(AXC001_CREG + 0x60)
-#define CREG_CPU_ADDR_770_UPD	(AXC001_CREG + 0x34)
-#define CREG_CPU_ADDR_TUNN_UPD	(AXC001_CREG + 0x74)
-
-#define CREG_CPU_ARC770_IRQ_MUX	(AXC001_CREG + 0x114)
-#define CREG_CPU_GPIO_UART_MUX	(AXC001_CREG + 0x120)
-
 static void __init axs10x_enable_gpio_intc_wire(void)
 {
 	/*
@@ -83,6 +77,22 @@ static void __init axs10x_enable_gpio_intc_wire(void)
 	iowrite32(1 << MB_TO_GPIO_IRQ, (void __iomem *) GPIO_INTEN);
 }
 
+static inline void __init
+write_cgu_reg(uint32_t value, void __iomem *reg, void __iomem *lock_reg)
+{
+	unsigned int loops = 128 * 1024, ctr;
+
+	iowrite32(value, reg);
+
+	ctr = loops;
+	while (((ioread32(lock_reg) & 1) == 1) && ctr--) /* wait for unlock */
+		cpu_relax();
+
+	ctr = loops;
+	while (((ioread32(lock_reg) & 1) == 0) && ctr--) /* wait for re-lock */
+		cpu_relax();
+}
+
 static void __init axs10x_print_board_ver(unsigned int creg, const char *str)
 {
 	union ver {
@@ -118,6 +128,16 @@ static void __init axs10x_early_init(void)
 	axs10x_print_board_ver(CREG_MB_VER, mb);
 }
 
+#ifdef CONFIG_AXS101
+
+#define CREG_CPU_ADDR_770	(AXC001_CREG + 0x20)
+#define CREG_CPU_ADDR_TUNN	(AXC001_CREG + 0x60)
+#define CREG_CPU_ADDR_770_UPD	(AXC001_CREG + 0x34)
+#define CREG_CPU_ADDR_TUNN_UPD	(AXC001_CREG + 0x74)
+
+#define CREG_CPU_ARC770_IRQ_MUX	(AXC001_CREG + 0x114)
+#define CREG_CPU_GPIO_UART_MUX	(AXC001_CREG + 0x120)
+
 /*
  * Set up System Memory Map for ARC cpu / peripherals controllers
  *
@@ -287,6 +307,145 @@ static void __init axs101_early_init(void)
 	axs10x_early_init();
 }
 
+#endif	/* CONFIG_AXS101 */
+
+#ifdef CONFIG_AXS103
+
+#define AXC003_CGU	0xF0000000
+#define AXC003_CREG	0xF0001000
+#define AXC003_MST_AXI_TUNNEL	0
+#define AXC003_MST_HS38		1
+
+#define CREG_CPU_AXI_M0_IRQ_MUX	(AXC003_CREG + 0x440)
+#define CREG_CPU_GPIO_UART_MUX	(AXC003_CREG + 0x480)
+#define CREG_CPU_TUN_IO_CTRL	(AXC003_CREG + 0x494)
+
+
+union pll_reg {
+	struct {
+#ifdef CONFIG_CPU_BIG_ENDIAN
+		unsigned int pad:17, noupd:1, bypass:1, edge:1, high:6, low:6;
+#else
+		unsigned int low:6, high:6, edge:1, bypass:1, noupd:1, pad:17;
+#endif
+	};
+	unsigned int val;
+};
+
+static unsigned int __init axs103_get_freq(void)
+{
+	union pll_reg idiv, fbdiv, odiv;
+	unsigned int f = 33333333;
+
+	idiv.val = ioread32((void __iomem *)AXC003_CGU + 0x80 + 0);
+	fbdiv.val = ioread32((void __iomem *)AXC003_CGU + 0x80 + 4);
+	odiv.val = ioread32((void __iomem *)AXC003_CGU + 0x80 + 8);
+
+	if (idiv.bypass != 1)
+		f = f / (idiv.low + idiv.high);
+
+	if (fbdiv.bypass != 1)
+		f = f * (fbdiv.low + fbdiv.high);
+
+	if (odiv.bypass != 1)
+		f = f / (odiv.low + odiv.high);
+
+	f = (f + 500000) / 1000000; /* Rounding */
+	return f;
+}
+
+static inline unsigned int __init encode_div(unsigned int id, int upd)
+{
+	union pll_reg div;
+
+	div.val = 0;
+
+	div.noupd = !upd;
+	div.bypass = id == 1 ? 1 : 0;
+	div.edge = (id%2 == 0) ? 0 : 1;  /* 0 = rising */
+	div.low = (id%2 == 0) ? id >> 1 : (id >> 1)+1;
+	div.high = id >> 1;
+
+	return div.val;
+}
+
+noinline static void __init
+axs103_set_freq(unsigned int id, unsigned int fd, unsigned int od)
+{
+	write_cgu_reg(encode_div(id, 0),
+		      (void __iomem *)AXC003_CGU + 0x80 + 0,
+		      (void __iomem *)AXC003_CGU + 0x110);
+
+	write_cgu_reg(encode_div(fd, 0),
+		      (void __iomem *)AXC003_CGU + 0x80 + 4,
+		      (void __iomem *)AXC003_CGU + 0x110);
+
+	write_cgu_reg(encode_div(od, 1),
+		      (void __iomem *)AXC003_CGU + 0x80 + 8,
+		      (void __iomem *)AXC003_CGU + 0x110);
+}
+
+static void __init axs103_early_init(void)
+{
+	switch (arc_get_core_freq()/1000000) {
+	case 33:
+		axs103_set_freq(1, 1, 1);
+		break;
+	case 50:
+		axs103_set_freq(1, 30, 20);
+		break;
+	case 75:
+		axs103_set_freq(2, 45, 10);
+		break;
+	case 90:
+		axs103_set_freq(2, 54, 10);
+		break;
+	case 100:
+		axs103_set_freq(1, 30, 10);
+		break;
+	case 125:
+		axs103_set_freq(2, 45,  6);
+		break;
+	default:
+		/*
+		 * In this case, core_frequency derived from
+		 * DT "clock-frequency" might not match with board value.
+		 * Hence update it to match the board value.
+		 */
+		arc_set_core_freq(axs103_get_freq() * 1000000);
+		break;
+	}
+
+	pr_info("Freq is %dMHz\n", axs103_get_freq());
+
+	/* Memory maps already config in pre-bootloader */
+
+	/* set GPIO mux to UART */
+	iowrite32(0x01, (void __iomem *) CREG_CPU_GPIO_UART_MUX);
+
+	iowrite32((0x00100000U | 0x000C0000U | 0x00003322U),
+		  (void __iomem *) CREG_CPU_TUN_IO_CTRL);
+
+	/* Set up the AXS_MB interrupt system.*/
+	iowrite32(12, (void __iomem *) (CREG_CPU_AXI_M0_IRQ_MUX
+					 + (AXC003_MST_HS38 << 2)));
+
+	/* connect ICTL - Main Board with GPIO line */
+	iowrite32(0x01, (void __iomem *) CREG_MB_IRQ_MUX);
+
+	axs10x_print_board_ver(AXC003_CREG + 4088, "AXC003 CPU Card");
+
+	axs10x_early_init();
+
+#ifdef CONFIG_ARC_MCIP
+	/* No Hardware init, but filling the smp ops callbacks */
+	mcip_init_early_smp();
+#endif
+}
+#endif
+
+#ifdef CONFIG_AXS101
+
 static const char *axs101_compat[] __initconst = {
 	"snps,axs101",
 	NULL,
@@ -296,3 +455,22 @@ MACHINE_START(AXS101, "axs101")
 	.dt_compat	= axs101_compat,
 	.init_early	= axs101_early_init,
 MACHINE_END
+
+#endif	/* CONFIG_AXS101 */
+
+#ifdef CONFIG_AXS103
+
+static const char *axs103_compat[] __initconst = {
+	"snps,axs103",
+	NULL,
+};
+
+MACHINE_START(AXS103, "axs103")
+	.dt_compat	= axs103_compat,
+	.init_early	= axs103_early_init,
+#ifdef CONFIG_ARC_MCIP
+	.init_smp	= mcip_init_smp,
+#endif
+MACHINE_END
+
+#endif	/* CONFIG_AXS103 */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 28/28] ARCv2: [vdk] dts files and defconfig for HS38 VDK
  2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
                   ` (26 preceding siblings ...)
  2015-06-09 11:48 ` [PATCH 27/28] ARCv2: [axs103] Support ARC SDP FPGA platform for HS38x cores Vineet Gupta
@ 2015-06-09 11:48 ` Vineet Gupta
  27 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-09 11:48 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: arnd, arc-linux-dev, Ruud Derwig, Grant Likely, Rob Herring,
	devicetree, Vineet Gupta

From: Ruud Derwig <rderwig@synopsys.com>

 - CONFIG_ARC_UBOOT_SUPPORT to handle arguments passed in r0, r1, r2
 - CONFIG_DEVTMPFS_MOUNT for mouting rootfs since it uses external cpio
   for rootfs

Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Signed-off-by: Ruud Derwig <rderwig@synopsys.com>
[vgupta: folded the Main baord DT files for smp/up into one]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/boot/dts/vdk_axc003.dtsi       |  61 +++++++++++++++++++
 arch/arc/boot/dts/vdk_axc003_idu.dtsi   |  76 +++++++++++++++++++++++
 arch/arc/boot/dts/vdk_axs10x_mb.dtsi    |  93 ++++++++++++++++++++++++++++
 arch/arc/boot/dts/vdk_hs38.dts          |  21 +++++++
 arch/arc/boot/dts/vdk_hs38_smp.dts      |  21 +++++++
 arch/arc/configs/vdk_hs38_defconfig     | 102 +++++++++++++++++++++++++++++++
 arch/arc/configs/vdk_hs38_smp_defconfig | 104 ++++++++++++++++++++++++++++++++
 arch/arc/kernel/asm-offsets.c           |   2 +
 arch/arc/mm/tlbex.S                     |   2 +
 arch/arc/plat-axs10x/axs10x.c           |   8 +++
 10 files changed, 490 insertions(+)
 create mode 100644 arch/arc/boot/dts/vdk_axc003.dtsi
 create mode 100644 arch/arc/boot/dts/vdk_axc003_idu.dtsi
 create mode 100644 arch/arc/boot/dts/vdk_axs10x_mb.dtsi
 create mode 100644 arch/arc/boot/dts/vdk_hs38.dts
 create mode 100644 arch/arc/boot/dts/vdk_hs38_smp.dts
 create mode 100644 arch/arc/configs/vdk_hs38_defconfig
 create mode 100644 arch/arc/configs/vdk_hs38_smp_defconfig

diff --git a/arch/arc/boot/dts/vdk_axc003.dtsi b/arch/arc/boot/dts/vdk_axc003.dtsi
new file mode 100644
index 000000000000..9393fd902f0d
--- /dev/null
+++ b/arch/arc/boot/dts/vdk_axc003.dtsi
@@ -0,0 +1,61 @@
+/*
+ * Copyright (C) 2013, 2014 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * Device tree for AXC003 CPU card: HS38x UP configuration (VDK version)
+ */
+
+/ {
+	compatible = "snps,arc";
+	clock-frequency = <50000000>;
+	#address-cells = <1>;
+	#size-cells = <1>;
+
+	cpu_card {
+		compatible = "simple-bus";
+		#address-cells = <1>;
+		#size-cells = <1>;
+
+		ranges = <0x00000000 0xf0000000 0x10000000>;
+
+		cpu_intc: archs-intc@cpu {
+			compatible = "snps,archs-intc";
+			interrupt-controller;
+			#interrupt-cells = <1>;
+		};
+
+		debug_uart: dw-apb-uart@0x5000 {
+			compatible = "snps,dw-apb-uart";
+			reg = <0x5000 0x100>;
+			clock-frequency = <2403200>;
+			interrupt-parent = <&cpu_intc>;
+			interrupts = <19>;
+			baud = <115200>;
+			reg-shift = <2>;
+			reg-io-width = <4>;
+		};
+
+	};
+
+	mb_intc: dw-apb-ictl@0xe0012000 {
+		#interrupt-cells = <1>;
+		compatible = "snps,dw-apb-ictl";
+		reg = < 0xe0012000 0x200 >;
+		interrupt-controller;
+		interrupt-parent = <&cpu_intc>;
+		interrupts = < 18 >;
+	};
+
+	memory {
+		#address-cells = <1>;
+		#size-cells = <1>;
+		ranges = <0x00000000 0x80000000 0x40000000>;
+		device_type = "memory";
+		reg = <0x00000000 0x20000000>;	/* 512MiB */
+	};
+};
diff --git a/arch/arc/boot/dts/vdk_axc003_idu.dtsi b/arch/arc/boot/dts/vdk_axc003_idu.dtsi
new file mode 100644
index 000000000000..9bee8ed09eb0
--- /dev/null
+++ b/arch/arc/boot/dts/vdk_axc003_idu.dtsi
@@ -0,0 +1,76 @@
+/*
+ * Copyright (C) 2014, 2015 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * Device tree for AXC003 CPU card:
+ * HS38x2 (Dual Core) with IDU intc (VDK version)
+ */
+
+/ {
+	compatible = "snps,arc";
+	clock-frequency = <50000000>;
+	#address-cells = <1>;
+	#size-cells = <1>;
+
+	cpu_card {
+		compatible = "simple-bus";
+		#address-cells = <1>;
+		#size-cells = <1>;
+
+		ranges = <0x00000000 0xf0000000 0x10000000>;
+
+		cpu_intc: archs-intc@cpu {
+			compatible = "snps,archs-intc";
+			interrupt-controller;
+			#interrupt-cells = <1>;
+		};
+
+		idu_intc: idu-interrupt-controller {
+			compatible = "snps,archs-idu-intc";
+			interrupt-controller;
+			interrupt-parent = <&cpu_intc>;
+
+			/*
+			 * <hwirq  distribution>
+			 * distribution: 0=RR; 1=cpu0, 2=cpu1, 4=cpu2, 8=cpu3
+			 */
+			#interrupt-cells = <2>;
+
+			interrupts = <24 25 26 27>;
+		};
+
+		debug_uart: dw-apb-uart@0x5000 {
+			compatible = "snps,dw-apb-uart";
+			reg = <0x5000 0x100>;
+			clock-frequency = <2403200>;
+			interrupt-parent = <&idu_intc>;
+			interrupts = <2 0>;
+			baud = <115200>;
+			reg-shift = <2>;
+			reg-io-width = <4>;
+		};
+
+	};
+
+	mb_intc: dw-apb-ictl@0xe0012000 {
+		#interrupt-cells = <1>;
+		compatible = "snps,dw-apb-ictl";
+		reg = < 0xe0012000 0x200 >;
+		interrupt-controller;
+		interrupt-parent = <&idu_intc>;
+		interrupts = < 0 0 >;
+	};
+
+	memory {
+		#address-cells = <1>;
+		#size-cells = <1>;
+		ranges = <0x00000000 0x80000000 0x40000000>;
+		device_type = "memory";
+		reg = <0x00000000 0x20000000>;	/* 512MiB */
+	};
+};
diff --git a/arch/arc/boot/dts/vdk_axs10x_mb.dtsi b/arch/arc/boot/dts/vdk_axs10x_mb.dtsi
new file mode 100644
index 000000000000..45cd665fca23
--- /dev/null
+++ b/arch/arc/boot/dts/vdk_axs10x_mb.dtsi
@@ -0,0 +1,93 @@
+/*
+ * Support for peripherals on the AXS10x mainboard (VDK version)
+ *
+ * Copyright (C) 2013-15 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/ {
+	axs10x_mb_vdk {
+		compatible = "simple-bus";
+		#address-cells = <1>;
+		#size-cells = <1>;
+		ranges = <0x00000000 0xe0000000 0x10000000>;
+		interrupt-parent = <&mb_intc>;
+
+		clocks {
+			apbclk: apbclk {
+				compatible = "fixed-clock";
+				clock-frequency = <50000000>;
+				#clock-cells = <0>;
+			};
+
+		};
+
+		ethernet@0x18000 {
+			#interrupt-cells = <1>;
+			compatible = "snps,dwmac";
+			reg = < 0x18000 0x2000 >;
+			interrupts = < 4 >;
+			interrupt-names = "macirq";
+			phy-mode = "rgmii";
+			snps,phy-addr = < 0 >;  // VDK model phy address is 0
+			snps,pbl = < 32 >;
+			clocks = <&apbclk>;
+			clock-names = "stmmaceth";
+		};
+
+		ehci@0x40000 {
+			compatible = "generic-ehci";
+			reg = < 0x40000 0x100 >;
+			interrupts = < 8 >;
+		};
+
+		uart@0x20000 {
+			compatible = "snps,dw-apb-uart";
+			reg = <0x20000 0x100>;
+			clock-frequency = <2403200>;
+			interrupts = <17>;
+			baud = <115200>;
+			reg-shift = <2>;
+			reg-io-width = <4>;
+		};
+
+		uart@0x21000 {
+			compatible = "snps,dw-apb-uart";
+			reg = <0x21000 0x100>;
+			clock-frequency = <2403200>;
+			interrupts = <18>;
+			baud = <115200>;
+			reg-shift = <2>;
+			reg-io-width = <4>;
+		};
+
+		uart@0x22000 {
+			compatible = "snps,dw-apb-uart";
+			reg = <0x22000 0x100>;
+			clock-frequency = <2403200>;
+			interrupts = <19>;
+			baud = <115200>;
+			reg-shift = <2>;
+			reg-io-width = <4>;
+		};
+
+/* PGU output directly sent to virtual LCD screen; hdmi controller not modelled */
+		pgu@0x17000 {
+			compatible = "snps,arcpgufb";
+			reg = <0x17000 0x400>;
+			clock-frequency = <51000000>; /* PGU'clock is initated in init function */
+			/* interrupts = <5>;   PGU interrupts not used, this vector is used for ps2 below */
+		};
+
+/* VDK has additional ps2 keyboard/mouse interface integrated in LCD screen model */
+		ps2: ps2@e0017400 {
+			compatible = "snps,arc_ps2";
+			reg = <0x17400 0x14>;
+			interrupts = <5>;
+			interrupt-names = "arc_ps2_irq";
+		};
+	};
+};
diff --git a/arch/arc/boot/dts/vdk_hs38.dts b/arch/arc/boot/dts/vdk_hs38.dts
new file mode 100644
index 000000000000..5d803dd2de59
--- /dev/null
+++ b/arch/arc/boot/dts/vdk_hs38.dts
@@ -0,0 +1,21 @@
+/*
+ * Copyright (C) 2013 Synopsys, Inc. (www.synopsys.com)
+ *
+ * ARC HS38 Virtual Development Kit (VDK)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+/dts-v1/;
+
+/include/ "vdk_axc003.dtsi"
+/include/ "vdk_axs10x_mb.dtsi"
+
+/ {
+	compatible = "snps,axs103";
+
+	chosen {
+		bootargs = "earlycon=uart8250,mmio32,0xe0022000,115200n8 console=tty0 console=ttyS3,115200n8 consoleblank=0";
+	};
+};
diff --git a/arch/arc/boot/dts/vdk_hs38_smp.dts b/arch/arc/boot/dts/vdk_hs38_smp.dts
new file mode 100644
index 000000000000..031a5bc79b3e
--- /dev/null
+++ b/arch/arc/boot/dts/vdk_hs38_smp.dts
@@ -0,0 +1,21 @@
+/*
+ * Copyright (C) 2013 Synopsys, Inc. (www.synopsys.com)
+ *
+ * ARC HS38 Virtual Development Kit, SMP version (VDK)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+/dts-v1/;
+
+/include/ "vdk_axc003_idu.dtsi"
+/include/ "vdk_axs10x_mb.dtsi"
+
+/ {
+	compatible = "snps,axs103";
+
+	chosen {
+		bootargs = "earlycon=uart8250,mmio32,0xe0022000,115200n8 console=tty0 console=ttyS3,115200n8 consoleblank=0";
+	};
+};
diff --git a/arch/arc/configs/vdk_hs38_defconfig b/arch/arc/configs/vdk_hs38_defconfig
new file mode 100644
index 000000000000..ef35ef3923dd
--- /dev/null
+++ b/arch/arc/configs/vdk_hs38_defconfig
@@ -0,0 +1,102 @@
+CONFIG_CROSS_COMPILE="arc-linux-uclibc-"
+# CONFIG_LOCALVERSION_AUTO is not set
+CONFIG_DEFAULT_HOSTNAME="ARCLinux"
+# CONFIG_CROSS_MEMORY_ATTACH is not set
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_EMBEDDED=y
+CONFIG_PERF_EVENTS=y
+# CONFIG_VM_EVENT_COUNTERS is not set
+# CONFIG_SLUB_DEBUG is not set
+# CONFIG_COMPAT_BRK is not set
+CONFIG_PARTITION_ADVANCED=y
+CONFIG_ARC_PLAT_AXS10X=y
+CONFIG_AXS103=y
+CONFIG_ISA_ARCV2=y
+CONFIG_ARC_UBOOT_SUPPORT=y
+CONFIG_ARC_BUILTIN_DTB_NAME="vdk_hs38"
+CONFIG_PREEMPT=y
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_NET_KEY=y
+CONFIG_INET=y
+CONFIG_IP_PNP=y
+CONFIG_IP_PNP_DHCP=y
+CONFIG_IP_PNP_BOOTP=y
+CONFIG_IP_PNP_RARP=y
+# CONFIG_IPV6 is not set
+CONFIG_DEVTMPFS=y
+CONFIG_DEVTMPFS_MOUNT=y
+# CONFIG_STANDALONE is not set
+# CONFIG_PREVENT_FIRMWARE_BUILD is not set
+# CONFIG_FIRMWARE_IN_KERNEL is not set
+CONFIG_MTD=y
+CONFIG_MTD_CMDLINE_PARTS=y
+CONFIG_MTD_BLOCK=y
+CONFIG_MTD_SLRAM=y
+CONFIG_BLK_DEV_RAM=y
+CONFIG_SCSI=y
+CONFIG_BLK_DEV_SD=y
+CONFIG_NETDEVICES=y
+# CONFIG_NET_VENDOR_ARC is not set
+# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_VENDOR_INTEL is not set
+# CONFIG_NET_VENDOR_MARVELL is not set
+# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_NATSEMI is not set
+# CONFIG_NET_VENDOR_SEEQ is not set
+CONFIG_STMMAC_ETH=y
+# CONFIG_NET_VENDOR_VIA is not set
+# CONFIG_NET_VENDOR_WIZNET is not set
+CONFIG_NATIONAL_PHY=y
+CONFIG_MOUSE_PS2_TOUCHKIT=y
+CONFIG_SERIO_ARC_PS2=y
+# CONFIG_LEGACY_PTYS is not set
+# CONFIG_DEVKMEM is not set
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_SERIAL_8250_DW=y
+CONFIG_SERIAL_OF_PLATFORM=y
+# CONFIG_HW_RANDOM is not set
+# CONFIG_HWMON is not set
+CONFIG_FB=y
+CONFIG_ARCPGU_RGB888=y
+CONFIG_ARCPGU_DISPTYPE=0
+# CONFIG_VGA_CONSOLE is not set
+CONFIG_FRAMEBUFFER_CONSOLE=y
+CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
+CONFIG_LOGO=y
+# CONFIG_LOGO_LINUX_MONO is not set
+# CONFIG_LOGO_LINUX_VGA16 is not set
+# CONFIG_LOGO_LINUX_CLUT224 is not set
+CONFIG_USB=y
+CONFIG_USB_EHCI_HCD=y
+# CONFIG_USB_EHCI_TT_NEWSCHED is not set
+CONFIG_USB_EHCI_HCD_PLATFORM=y
+CONFIG_USB_OHCI_HCD=y
+CONFIG_USB_OHCI_HCD_PLATFORM=y
+CONFIG_USB_STORAGE=y
+CONFIG_USB_SERIAL=y
+# CONFIG_IOMMU_SUPPORT is not set
+CONFIG_EXT3_FS=y
+CONFIG_EXT4_FS=y
+CONFIG_MSDOS_FS=y
+CONFIG_VFAT_FS=y
+CONFIG_NTFS_FS=y
+CONFIG_TMPFS=y
+CONFIG_JFFS2_FS=y
+CONFIG_NFS_FS=y
+CONFIG_NLS_CODEPAGE_437=y
+CONFIG_NLS_ISO8859_1=y
+# CONFIG_ENABLE_WARN_DEPRECATED is not set
+# CONFIG_ENABLE_MUST_CHECK is not set
+CONFIG_STRIP_ASM_SYMS=y
+CONFIG_DEBUG_SHIRQ=y
+CONFIG_LOCKUP_DETECTOR=y
+CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
+# CONFIG_SCHED_DEBUG is not set
+# CONFIG_DEBUG_PREEMPT is not set
+# CONFIG_FTRACE is not set
diff --git a/arch/arc/configs/vdk_hs38_smp_defconfig b/arch/arc/configs/vdk_hs38_smp_defconfig
new file mode 100644
index 000000000000..634509e5e572
--- /dev/null
+++ b/arch/arc/configs/vdk_hs38_smp_defconfig
@@ -0,0 +1,104 @@
+CONFIG_CROSS_COMPILE="arc-linux-uclibc-"
+# CONFIG_LOCALVERSION_AUTO is not set
+CONFIG_DEFAULT_HOSTNAME="ARCLinux"
+# CONFIG_CROSS_MEMORY_ATTACH is not set
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_EMBEDDED=y
+CONFIG_PERF_EVENTS=y
+# CONFIG_VM_EVENT_COUNTERS is not set
+# CONFIG_SLUB_DEBUG is not set
+# CONFIG_COMPAT_BRK is not set
+CONFIG_PARTITION_ADVANCED=y
+CONFIG_ARC_PLAT_AXS10X=y
+CONFIG_AXS103=y
+CONFIG_ISA_ARCV2=y
+CONFIG_SMP=y
+# CONFIG_ARC_HAS_GRTC is not set
+CONFIG_ARC_UBOOT_SUPPORT=y
+CONFIG_ARC_BUILTIN_DTB_NAME="vdk_hs38_smp"
+CONFIG_PREEMPT=y
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_NET_KEY=y
+CONFIG_INET=y
+CONFIG_IP_PNP=y
+CONFIG_IP_PNP_DHCP=y
+CONFIG_IP_PNP_BOOTP=y
+CONFIG_IP_PNP_RARP=y
+# CONFIG_IPV6 is not set
+CONFIG_DEVTMPFS=y
+CONFIG_DEVTMPFS_MOUNT=y
+# CONFIG_STANDALONE is not set
+# CONFIG_PREVENT_FIRMWARE_BUILD is not set
+# CONFIG_FIRMWARE_IN_KERNEL is not set
+CONFIG_MTD=y
+CONFIG_MTD_CMDLINE_PARTS=y
+CONFIG_MTD_BLOCK=y
+CONFIG_MTD_SLRAM=y
+CONFIG_BLK_DEV_RAM=y
+CONFIG_SCSI=y
+CONFIG_BLK_DEV_SD=y
+CONFIG_NETDEVICES=y
+# CONFIG_NET_VENDOR_ARC is not set
+# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_VENDOR_INTEL is not set
+# CONFIG_NET_VENDOR_MARVELL is not set
+# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_NATSEMI is not set
+# CONFIG_NET_VENDOR_SEEQ is not set
+CONFIG_STMMAC_ETH=y
+# CONFIG_NET_VENDOR_VIA is not set
+# CONFIG_NET_VENDOR_WIZNET is not set
+CONFIG_NATIONAL_PHY=y
+CONFIG_MOUSE_PS2_TOUCHKIT=y
+CONFIG_SERIO_ARC_PS2=y
+# CONFIG_LEGACY_PTYS is not set
+# CONFIG_DEVKMEM is not set
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_SERIAL_8250_DW=y
+CONFIG_SERIAL_OF_PLATFORM=y
+# CONFIG_HW_RANDOM is not set
+# CONFIG_HWMON is not set
+CONFIG_FB=y
+CONFIG_ARCPGU_RGB888=y
+CONFIG_ARCPGU_DISPTYPE=0
+# CONFIG_VGA_CONSOLE is not set
+CONFIG_FRAMEBUFFER_CONSOLE=y
+CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
+CONFIG_LOGO=y
+# CONFIG_LOGO_LINUX_MONO is not set
+# CONFIG_LOGO_LINUX_VGA16 is not set
+# CONFIG_LOGO_LINUX_CLUT224 is not set
+CONFIG_USB=y
+CONFIG_USB_EHCI_HCD=y
+# CONFIG_USB_EHCI_TT_NEWSCHED is not set
+CONFIG_USB_EHCI_HCD_PLATFORM=y
+CONFIG_USB_OHCI_HCD=y
+CONFIG_USB_OHCI_HCD_PLATFORM=y
+CONFIG_USB_STORAGE=y
+CONFIG_USB_SERIAL=y
+# CONFIG_IOMMU_SUPPORT is not set
+CONFIG_EXT3_FS=y
+CONFIG_EXT4_FS=y
+CONFIG_MSDOS_FS=y
+CONFIG_VFAT_FS=y
+CONFIG_NTFS_FS=y
+CONFIG_TMPFS=y
+CONFIG_JFFS2_FS=y
+CONFIG_NFS_FS=y
+CONFIG_NLS_CODEPAGE_437=y
+CONFIG_NLS_ISO8859_1=y
+# CONFIG_ENABLE_WARN_DEPRECATED is not set
+# CONFIG_ENABLE_MUST_CHECK is not set
+CONFIG_STRIP_ASM_SYMS=y
+CONFIG_DEBUG_SHIRQ=y
+CONFIG_LOCKUP_DETECTOR=y
+CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
+# CONFIG_SCHED_DEBUG is not set
+# CONFIG_DEBUG_PREEMPT is not set
+# CONFIG_FTRACE is not set
diff --git a/arch/arc/kernel/asm-offsets.c b/arch/arc/kernel/asm-offsets.c
index 605281f5b301..ecaf34e9235c 100644
--- a/arch/arc/kernel/asm-offsets.c
+++ b/arch/arc/kernel/asm-offsets.c
@@ -37,6 +37,8 @@ int main(void)
 
 	DEFINE(TASK_ACT_MM, offsetof(struct task_struct, active_mm));
 	DEFINE(TASK_TGID, offsetof(struct task_struct, tgid));
+	DEFINE(TASK_PID, offsetof(struct task_struct, pid));
+	DEFINE(TASK_COMM, offsetof(struct task_struct, comm));
 
 	DEFINE(MM_CTXT, offsetof(struct mm_struct, context));
 	DEFINE(MM_PGD, offsetof(struct mm_struct, pgd));
diff --git a/arch/arc/mm/tlbex.S b/arch/arc/mm/tlbex.S
index 8624ebd7114e..f6f4c3cb505d 100644
--- a/arch/arc/mm/tlbex.S
+++ b/arch/arc/mm/tlbex.S
@@ -313,6 +313,7 @@ ENTRY(EV_TLBMissI)
 	CONV_PTE_TO_TLB
 	COMMIT_ENTRY_TO_MMU
 	TLBMISS_RESTORE_REGS
+EV_TLBMissI_fast_ret:	; additional label for VDK OS-kit instrumentation
 	rtie
 
 END(EV_TLBMissI)
@@ -378,6 +379,7 @@ ENTRY(EV_TLBMissD)
 
 	COMMIT_ENTRY_TO_MMU
 	TLBMISS_RESTORE_REGS
+EV_TLBMissD_fast_ret:	; additional label for VDK OS-kit instrumentation
 	rtie
 
 ;-------- Common routine to call Linux Page Fault Handler -----------
diff --git a/arch/arc/plat-axs10x/axs10x.c b/arch/arc/plat-axs10x/axs10x.c
index ad0a7ef84660..99f7da513a48 100644
--- a/arch/arc/plat-axs10x/axs10x.c
+++ b/arch/arc/plat-axs10x/axs10x.c
@@ -15,6 +15,8 @@
  */
 
 #include <linux/of_platform.h>
+
+#include <asm/asm-offsets.h>
 #include <asm/clk.h>
 #include <asm/io.h>
 #include <asm/mach_desc.h>
@@ -473,4 +475,10 @@ MACHINE_START(AXS103, "axs103")
 #endif
 MACHINE_END
 
+/*
+ * For the VDK OS-kit, to get the offset to pid and command fields
+ */
+char coware_swa_pid_offset[TASK_PID];
+char coware_swa_comm_offset[TASK_COMM];
+
 #endif	/* CONFIG_AXS103 */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 17/28] ARC: add compiler barrier to LLSC based cmpxchg
  2015-06-09 11:48 ` [PATCH 17/28] ARC: add compiler barrier to LLSC based cmpxchg Vineet Gupta
@ 2015-06-09 12:23   ` Peter Zijlstra
  0 siblings, 0 replies; 66+ messages in thread
From: Peter Zijlstra @ 2015-06-09 12:23 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: linux-arch, linux-kernel, arnd, arc-linux-dev

On Tue, Jun 09, 2015 at 05:18:17PM +0530, Vineet Gupta wrote:
> When auditing cmpxchg call sites, Chuck noted that gcc was optimizing
> away some of the desired LDs.
> 
> |	do {
> |		new = old = *ipi_data_ptr;
> |		new |= 1U << msg;
> |	} while (cmpxchg(ipi_data_ptr, old, new) != old);
> 
> was generating to below
> 
> | 8015cef8:	ld         r2,[r4,0]  <-- First LD
> | 8015cefc:	bset       r1,r2,r1
> |
> | 8015cf00:	llock      r3,[r4]  <-- atomic op
> | 8015cf04:	brne       r3,r2,8015cf10
> | 8015cf08:	scond      r1,[r4]
> | 8015cf0c:	bnz        8015cf00
> |
> | 8015cf10:	brne       r3,r2,8015cf00  <-- Branch doesn't go to orig LD
> 
> Although this was fixed by adding a ACCESS_ONCE in this call site, it
> seems safer (for now at least) to add compiler barrier to LLSC based
> cmpxchg

This is required even. cmpxchg() should include a full memory barrier
_before_ and _after_ the op. Both imply a compiler barrier.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 18/28] ARC: add smp barriers around atomics per memory-barrriers.txt
  2015-06-09 11:48 ` [PATCH 18/28] ARC: add smp barriers around atomics per memory-barrriers.txt Vineet Gupta
@ 2015-06-09 12:30   ` Peter Zijlstra
  2015-06-10  9:17     ` Vineet Gupta
  2015-06-12 12:15   ` [PATCH v2] ARC: add smp barriers around atomics per Documentation/atomic_ops.txt Vineet Gupta
  1 sibling, 1 reply; 66+ messages in thread
From: Peter Zijlstra @ 2015-06-09 12:30 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: linux-arch, linux-kernel, arnd, arc-linux-dev, Paul E. McKenney

On Tue, Jun 09, 2015 at 05:18:18PM +0530, Vineet Gupta wrote:

Please try and provide at least _some_ Changelog body.

<snip all atomic ops that return values>

> diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
> index b6a8c2dfbe6e..8af8eaad4999 100644
> --- a/arch/arc/include/asm/spinlock.h
> +++ b/arch/arc/include/asm/spinlock.h
> @@ -22,24 +22,32 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
>  {
>  	unsigned int tmp = __ARCH_SPIN_LOCK_LOCKED__;
>  
> +	smp_mb();
> +
>  	__asm__ __volatile__(
>  	"1:	ex  %0, [%1]		\n"
>  	"	breq  %0, %2, 1b	\n"
>  	: "+&r" (tmp)
>  	: "r"(&(lock->slock)), "ir"(__ARCH_SPIN_LOCK_LOCKED__)
>  	: "memory");
> +
> +	smp_mb();
>  }
>  
>  static inline int arch_spin_trylock(arch_spinlock_t *lock)
>  {
>  	unsigned int tmp = __ARCH_SPIN_LOCK_LOCKED__;
>  
> +	smp_mb();
> +
>  	__asm__ __volatile__(
>  	"1:	ex  %0, [%1]		\n"
>  	: "+r" (tmp)
>  	: "r"(&(lock->slock))
>  	: "memory");
>  
> +	smp_mb();
> +
>  	return (tmp == __ARCH_SPIN_LOCK_UNLOCKED__);
>  }
>  

Both these are only required to provide an ACQUIRE barrier, if all you
have is smp_mb(), the second is sufficient.

Also note that a failed trylock is not required to provide _any_ barrier
at all.

> @@ -47,6 +55,8 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
>  {
>  	unsigned int tmp = __ARCH_SPIN_LOCK_UNLOCKED__;
>  
> +	smp_mb();
> +
>  	__asm__ __volatile__(
>  	"	ex  %0, [%1]		\n"
>  	: "+r" (tmp)

This requires a RELEASE barrier, again, if all you have is smp_mb(),
this is indeed correct.

Describing some of this would make for a fine Changelog body :-)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 19/28] arch: conditionally define smp_{mb,rmb,wmb}
  2015-06-09 11:48 ` [PATCH 19/28] arch: conditionally define smp_{mb,rmb,wmb} Vineet Gupta
@ 2015-06-09 12:32   ` Peter Zijlstra
  0 siblings, 0 replies; 66+ messages in thread
From: Peter Zijlstra @ 2015-06-09 12:32 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: linux-arch, linux-kernel, arnd, arc-linux-dev

On Tue, Jun 09, 2015 at 05:18:19PM +0530, Vineet Gupta wrote:
> That way arches can define the minimal versions and still #include
> asm-generic for defaults (vs. defining defaults in arch code)
> 
> See new barrier.h in arc for usage !
> 

Fair enough,

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 22/28] ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock
  2015-06-09 11:48 ` [PATCH 22/28] ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock Vineet Gupta
@ 2015-06-09 12:35   ` Peter Zijlstra
  2015-06-10 10:01     ` Vineet Gupta
  0 siblings, 1 reply; 66+ messages in thread
From: Peter Zijlstra @ 2015-06-09 12:35 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: linux-arch, linux-kernel, arnd, arc-linux-dev

On Tue, Jun 09, 2015 at 05:18:22PM +0530, Vineet Gupta wrote:

This really really wants a Changelog describing the actual hardware fail
and why this workaround is sufficient.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 20/28] ARCv2: barriers
  2015-06-09 11:48 ` [PATCH 20/28] ARCv2: barriers Vineet Gupta
@ 2015-06-09 12:40   ` Peter Zijlstra
  2015-06-10  9:34     ` Vineet Gupta
  0 siblings, 1 reply; 66+ messages in thread
From: Peter Zijlstra @ 2015-06-09 12:40 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: linux-arch, linux-kernel, arnd, arc-linux-dev

On Tue, Jun 09, 2015 at 05:18:20PM +0530, Vineet Gupta wrote:

A description of how your hardware works; or a reference to the platform
documentation would not go amiss.

> +++ b/arch/arc/include/asm/barrier.h
> @@ -0,0 +1,48 @@
> +/*
> + * Copyright (C) 2014-15 Synopsys, Inc. (www.synopsys.com)
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#ifndef __ASM_BARRIER_H
> +#define __ASM_BARRIER_H
> +
> +#ifdef CONFIG_SMP
> +
> +#ifdef CONFIG_ISA_ARCV2
> +
> +/*
> + * DMB:
> + *   - Ensures that selected memory operation issued before it will complete
> + *     before any subsequent memory operation of same type
> + */
> +#define smp_mb()	asm volatile("dmb 3\n" : : : "memory")
> +#define smp_rmb()	asm volatile("dmb 1\n" : : : "memory")
> +#define smp_wmb()	asm volatile("dmb 2\n" : : : "memory")
> +
> +/*
> + * DSYNC:
> + *   - Waits for completion of all outstanding memory operations before any new
> + *     operations can begin
> + *   - Includes implicit memory operations such as cache/TLB/BPU maintenance ops
> + *   - Lighter version of SYNC as it doesn't wait for non-memory operations
> + */
> +#define mb()		asm volatile("dsync\n" : : : "memory")

So mb() is supposed to order against things like DMA memory ops, is DMA
part of point 1 or 3, if 3, this is not a suitable instruction.

> +#else	/* CONFIG_ISA_ARCOMPACT */
> +
> +/* SYNC:
> + *   - Waits for completion of all outstanding memory transactions AND all
> + *     previous instructions to reture
> + */
> +#define mb()		asm volatile("sync\n" : : : "memory")
> +
> +#endif	/* CONFIG_ISA_ARCV2 */



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 18/28] ARC: add smp barriers around atomics per memory-barrriers.txt
  2015-06-09 12:30   ` Peter Zijlstra
@ 2015-06-10  9:17     ` Vineet Gupta
  2015-06-10 10:53       ` Peter Zijlstra
  0 siblings, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-10  9:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-kernel, arnd, arc-linux-dev, Paul E. McKenney

On Tuesday 09 June 2015 06:00 PM, Peter Zijlstra wrote:
> On Tue, Jun 09, 2015 at 05:18:18PM +0530, Vineet Gupta wrote:
>
> Please try and provide at least _some_ Changelog body.
>
> <snip all atomic ops that return values>

Will do as comments in source as well as commit log in v2.

>> diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
>> index b6a8c2dfbe6e..8af8eaad4999 100644
>> --- a/arch/arc/include/asm/spinlock.h
>> +++ b/arch/arc/include/asm/spinlock.h
>> @@ -22,24 +22,32 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
>>  {
>>  	unsigned int tmp = __ARCH_SPIN_LOCK_LOCKED__;
>>  
>> +	smp_mb();
>> +
>>  	__asm__ __volatile__(
>>  	"1:	ex  %0, [%1]		\n"
>>  	"	breq  %0, %2, 1b	\n"
>>  	: "+&r" (tmp)
>>  	: "r"(&(lock->slock)), "ir"(__ARCH_SPIN_LOCK_LOCKED__)
>>  	: "memory");
>> +
>> +	smp_mb();
>>  }
>>  
>>  static inline int arch_spin_trylock(arch_spinlock_t *lock)
>>  {
>>  	unsigned int tmp = __ARCH_SPIN_LOCK_LOCKED__;
>>  
>> +	smp_mb();
>> +
>>  	__asm__ __volatile__(
>>  	"1:	ex  %0, [%1]		\n"
>>  	: "+r" (tmp)
>>  	: "r"(&(lock->slock))
>>  	: "memory");
>>  
>> +	smp_mb();
>> +
>>  	return (tmp == __ARCH_SPIN_LOCK_UNLOCKED__);
>>  }
>>  
> Both these are only required to provide an ACQUIRE barrier, if all you
> have is smp_mb(), the second is sufficient.

Essentially ARCv2 is weakly ordered with explicit ordering provided by DMB
instructions with semantics load/load, store/store, all/all.

I wanted to clarify a couple of things
(1) ACQUIRE barrier implies store/{store,load} while RELEASE implies
{load,store}/store and given what DMB provides for ARCv2, smp_mb() is the only fit ?
(2) Do we need smp_mb() on both sides of spin lock/unlock - doesn't ACQUIRE imply
we have a smp_mb() after lock but before any subsequent critical section - so the
top hunk is not necessarily needed. Similarly RELEASE requires a smp_mb() before
the memory operation for lock, but not after.

> Also note that a failed trylock is not required to provide _any_ barrier
> at all.

But that means wrapping the barrier in a branch etc, I'd rather keep them uniform
for now - unless we see performance hits due to that. I suppose all of that is
more relevant for heavy metal 4k cpu stuff ?

>
>> @@ -47,6 +55,8 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
>>  {
>>  	unsigned int tmp = __ARCH_SPIN_LOCK_UNLOCKED__;
>>  
>> +	smp_mb();
>> +
>>  	__asm__ __volatile__(
>>  	"	ex  %0, [%1]		\n"
>>  	: "+r" (tmp)
> This requires a RELEASE barrier, again, if all you have is smp_mb(),
> this is indeed correct.

Ok, actually we already had a smp_mb() in the end of this function - but depending
on what ur reply is to #2 above we can remove that (as a seperate commit)

>
> Describing some of this would make for a fine Changelog body :-)

I will spin a v2 after your response, with more informative changelog.

Thx for the review.

-Vineet


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 20/28] ARCv2: barriers
  2015-06-09 12:40   ` Peter Zijlstra
@ 2015-06-10  9:34     ` Vineet Gupta
  2015-06-10 10:58       ` Peter Zijlstra
  0 siblings, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-10  9:34 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-arch, linux-kernel, arnd, arc-linux-dev

On Tuesday 09 June 2015 06:10 PM, Peter Zijlstra wrote:
> On Tue, Jun 09, 2015 at 05:18:20PM +0530, Vineet Gupta wrote:
>
> A description of how your hardware works; or a reference to the platform
> documentation would not go amiss.

Honestly the docs group is working on a publicly sharable version of PRM
(Programmer's Reference Manual) but it might take some more time. I'm sure kernel
developers including you don't like to sign an NDA.... The information I have in
comments is pretty much what we have in there w.r.t. the barrier instructions. But
I will capture the the weak memory ordering and other details as part of changelog
here too.

>> [snip ....]
>> +/*
>> + * DMB:
>> + *   - Ensures that selected memory operation issued before it will complete
>> + *     before any subsequent memory operation of same type
>> + */
>> +#define smp_mb()	asm volatile("dmb 3\n" : : : "memory")
>> +#define smp_rmb()	asm volatile("dmb 1\n" : : : "memory")
>> +#define smp_wmb()	asm volatile("dmb 2\n" : : : "memory")
>> +
>> +/*
>> + * DSYNC:
>> + *   - Waits for completion of all outstanding memory operations before any new
>> + *     operations can begin
>> + *   - Includes implicit memory operations such as cache/TLB/BPU maintenance ops
>> + *   - Lighter version of SYNC as it doesn't wait for non-memory operations
>> + */
>> +#define mb()		asm volatile("dsync\n" : : : "memory")
> So mb() is supposed to order against things like DMA memory ops, is DMA
> part of point 1 or 3, if 3, this is not a suitable instruction.

Can u please explain the DMA case a bit more ? From what I understood and used in
say ethernet driver, it is more of a line drawn between say cpu updating a shared
buffer descriptor and kicking a MMIO register (which in turn could initiate a DMA)
but I'm not sure how mb() can possibly order with DMA per se (unless there's some
advanced form of IO-coherency)

-Vineet

>
>> +#else	/* CONFIG_ISA_ARCOMPACT */
>> +
>> +/* SYNC:
>> + *   - Waits for completion of all outstanding memory transactions AND all
>> + *     previous instructions to reture
>> + */
>> +#define mb()		asm volatile("sync\n" : : : "memory")
>> +
>> +#endif	/* CONFIG_ISA_ARCV2 */
>
>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 22/28] ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock
  2015-06-09 12:35   ` Peter Zijlstra
@ 2015-06-10 10:01     ` Vineet Gupta
  2015-06-10 11:02       ` Peter Zijlstra
  0 siblings, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-10 10:01 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-arch, linux-kernel, arnd, arc-linux-dev

On Tuesday 09 June 2015 06:05 PM, Peter Zijlstra wrote:

On Tue, Jun 09, 2015 at 05:18:22PM +0530, Vineet Gupta wrote:

This really really wants a Changelog describing the actual hardware fail
and why this workaround is sufficient.



OK - I need some more time to rehash the exact details with our hardware folks. But AFAIKR, this was hardware livelock in llock/scond when 2 cores were doing r-m-w to two different words in the same cache line - adding prefetchw (prefetch with a write intent) would get the line in exclusive state and break the livelock.

The test itself was one from EEMBC Multibench but I'll have to look it up.

Wasn't there something similar in ARM world too - they have some sort of snoop-delayed exclusive handling in hardware to mitigate something similar although as Will later remarked it involved llock/scond with vanilla ld/st to same line/word ?
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-May/254142.html

Thx,
-Vineet


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 18/28] ARC: add smp barriers around atomics per memory-barrriers.txt
  2015-06-10  9:17     ` Vineet Gupta
@ 2015-06-10 10:53       ` Peter Zijlstra
  2015-06-11 13:03         ` Vineet Gupta
  0 siblings, 1 reply; 66+ messages in thread
From: Peter Zijlstra @ 2015-06-10 10:53 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: linux-arch, linux-kernel, arnd, arc-linux-dev, Paul E. McKenney

On Wed, Jun 10, 2015 at 09:17:16AM +0000, Vineet Gupta wrote:
> >> --- a/arch/arc/include/asm/spinlock.h
> >> +++ b/arch/arc/include/asm/spinlock.h
> >> @@ -22,24 +22,32 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
> >>  {
> >>  	unsigned int tmp = __ARCH_SPIN_LOCK_LOCKED__;
> >>  
> >> +	smp_mb();
> >> +
> >>  	__asm__ __volatile__(
> >>  	"1:	ex  %0, [%1]		\n"
> >>  	"	breq  %0, %2, 1b	\n"
> >>  	: "+&r" (tmp)
> >>  	: "r"(&(lock->slock)), "ir"(__ARCH_SPIN_LOCK_LOCKED__)
> >>  	: "memory");
> >> +
> >> +	smp_mb();
> >>  }

> > Both these are only required to provide an ACQUIRE barrier, if all you
> > have is smp_mb(), the second is sufficient.
> 
> Essentially ARCv2 is weakly ordered with explicit ordering provided by DMB
> instructions with semantics load/load, store/store, all/all.
> 
> I wanted to clarify a couple of things
> (1) ACQUIRE barrier implies store/{store,load} while RELEASE implies
> {load,store}/store and given what DMB provides for ARCv2, smp_mb() is the only fit ?

Please see Documentation/memory-barriers.txt, but a quick recap:

 - ACQUIRE: both loads and stores before to the barrier are allowed to
   be observed after it.  Neither loads nor stores after the barrier are
   allowed to be observed before it.

 - RELEASE: both loads and stores before it must be observed before the
   barrier. However, any load or store after it may be observed before
   it.

Therefore:

 X = Y = 0;

	[S] X = 1
	    ACQUIRE

	    RELEASE
	[S] Y = 1

is in fact fully unordered, because both stores are allowed to cross in,
and could cross one another on the inside, like:

	    ACQUIRE
	[S] Y = 1
	[S] X = 1
	    RELEASE

> (2) Do we need smp_mb() on both sides of spin lock/unlock - doesn't ACQUIRE imply
> we have a smp_mb() after lock but before any subsequent critical section - so the
> top hunk is not necessarily needed. Similarly RELEASE requires a smp_mb() before
> the memory operation for lock, but not after.

You do not need an smp_mb() on both sides, as you say, after lock and
before unlock is sufficient. The main point being that things can not
escape out of the critical section. Its fine for them to leak in.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 20/28] ARCv2: barriers
  2015-06-10  9:34     ` Vineet Gupta
@ 2015-06-10 10:58       ` Peter Zijlstra
  2015-06-10 13:01         ` Will Deacon
  0 siblings, 1 reply; 66+ messages in thread
From: Peter Zijlstra @ 2015-06-10 10:58 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: linux-arch, linux-kernel, arnd, arc-linux-dev, Will Deacon

On Wed, Jun 10, 2015 at 09:34:18AM +0000, Vineet Gupta wrote:
> On Tuesday 09 June 2015 06:10 PM, Peter Zijlstra wrote:
> > On Tue, Jun 09, 2015 at 05:18:20PM +0530, Vineet Gupta wrote:
> >
> > A description of how your hardware works; or a reference to the platform
> > documentation would not go amiss.
> 
> Honestly the docs group is working on a publicly sharable version of PRM
> (Programmer's Reference Manual) but it might take some more time. 

Good news that. I appreciate these things can take some time.

> I'm sure kernel
> developers including you don't like to sign an NDA.... 

It might also be a question on your company vs my company. But yes, I
generally prefer not to do NDAs.

> The information I have in
> comments is pretty much what we have in there w.r.t. the barrier instructions. But
> I will capture the the weak memory ordering and other details as part of changelog
> here too.

Right, so I think we all understand weak (ARM, PPC etc..) and we all
understand load/load, store/store and load-store/load-store barriers.

Although explicitly mentioning it never hurt anybody ;-)

I think the most interesting part is the device side.

> >> +/*
> >> + * DSYNC:
> >> + *   - Waits for completion of all outstanding memory operations before any new
> >> + *     operations can begin
> >> + *   - Includes implicit memory operations such as cache/TLB/BPU maintenance ops
> >> + *   - Lighter version of SYNC as it doesn't wait for non-memory operations
> >> + */
> >> +#define mb()		asm volatile("dsync\n" : : : "memory")
> > So mb() is supposed to order against things like DMA memory ops, is DMA
> > part of point 1 or 3, if 3, this is not a suitable instruction.
> 
> Can u please explain the DMA case a bit more ? From what I understood and used in
> say ethernet driver, it is more of a line drawn between say cpu updating a shared
> buffer descriptor and kicking a MMIO register (which in turn could initiate a DMA)
> but I'm not sure how mb() can possibly order with DMA per se (unless there's some
> advanced form of IO-coherency)

I'm afraid I might not be the best of sources here, I tend to stay away
from actual device stuff like that. I've Cc'ed Will Deacon who might be
able to shed a bit more light on this aspect.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 22/28] ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock
  2015-06-10 10:01     ` Vineet Gupta
@ 2015-06-10 11:02       ` Peter Zijlstra
  2015-06-19  9:55         ` [PATCH v2 " Vineet Gupta
  0 siblings, 1 reply; 66+ messages in thread
From: Peter Zijlstra @ 2015-06-10 11:02 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: linux-arch, linux-kernel, arnd, arc-linux-dev

On Wed, Jun 10, 2015 at 10:01:01AM +0000, Vineet Gupta wrote:
> OK - I need some more time to rehash the exact details with our
> hardware folks. But AFAIKR, this was hardware livelock in llock/scond
> when 2 cores were doing r-m-w to two different words in the same cache
> line - adding prefetchw (prefetch with a write intent) would get the
> line in exclusive state and break the livelock.
> 
> The test itself was one from EEMBC Multibench but I'll have to look it
> up.
> 
> Wasn't there something similar in ARM world too - they have some sort
> of snoop-delayed exclusive handling in hardware to mitigate something
> similar although as Will later remarked it involved llock/scond with
> vanilla ld/st to same line/word ?

> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-May/254142.html

Cute, I was not aware of that.

Sounds reasonable (unfortunate but understandable). Speaking as someone
who at times does full arch sweeps on code like this it really helps
understanding if such things are explained a wee bit better than not at
all :-)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 20/28] ARCv2: barriers
  2015-06-10 10:58       ` Peter Zijlstra
@ 2015-06-10 13:01         ` Will Deacon
  2015-06-11 12:13           ` Vineet Gupta
  0 siblings, 1 reply; 66+ messages in thread
From: Will Deacon @ 2015-06-10 13:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vineet Gupta, linux-arch, linux-kernel, arnd, arc-linux-dev

On Wed, Jun 10, 2015 at 11:58:40AM +0100, Peter Zijlstra wrote:
> On Wed, Jun 10, 2015 at 09:34:18AM +0000, Vineet Gupta wrote:
> > On Tuesday 09 June 2015 06:10 PM, Peter Zijlstra wrote:
> I think the most interesting part is the device side.
> 
> > >> +/*
> > >> + * DSYNC:
> > >> + *   - Waits for completion of all outstanding memory operations before any new
> > >> + *     operations can begin
> > >> + *   - Includes implicit memory operations such as cache/TLB/BPU maintenance ops
> > >> + *   - Lighter version of SYNC as it doesn't wait for non-memory operations
> > >> + */
> > >> +#define mb()		asm volatile("dsync\n" : : : "memory")
> > > So mb() is supposed to order against things like DMA memory ops, is DMA
> > > part of point 1 or 3, if 3, this is not a suitable instruction.
> > 
> > Can u please explain the DMA case a bit more ? From what I understood and used in
> > say ethernet driver, it is more of a line drawn between say cpu updating a shared
> > buffer descriptor and kicking a MMIO register (which in turn could initiate a DMA)
> > but I'm not sure how mb() can possibly order with DMA per se (unless there's some
> > advanced form of IO-coherency)
> 
> I'm afraid I might not be the best of sources here, I tend to stay away
> from actual device stuff like that. I've Cc'ed Will Deacon who might be
> able to shed a bit more light on this aspect.

I'd definitely expect mb() to order arbitrary memory accesses against each
other (i.e. regardless of whether or not they're to RAM or MMIO devices).
Some drivers use it to "flush the writebuffer" but I don't think that makes
a whole lot of sense. Certainly, on ARM, if we want to know that something
reached an MMIO endpoint then we'll need a read-back as well as the barrier
for the general case.

You also need that guarantee in your readl/writel family of macros. It's
extremely heavy and rarely needed, which is why I added the _relaxed
versions to all architectures.

The "ordering against DMA" is something like reading an MMIO register to
determine whether the DMA has completed, then going off to read the contents
out of the DMA buffer. The comment you have about DSYNC makes it sound like
it's not sufficient for this case.

Will

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 20/28] ARCv2: barriers
  2015-06-10 13:01         ` Will Deacon
@ 2015-06-11 12:13           ` Vineet Gupta
  2015-06-11 13:39             ` Will Deacon
  0 siblings, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-11 12:13 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra; +Cc: linux-arch, linux-kernel, arnd, arc-linux-dev

On Wednesday 10 June 2015 06:31 PM, Will Deacon wrote:
> On Wed, Jun 10, 2015 at 11:58:40AM +0100, Peter Zijlstra wrote:
>> On Wed, Jun 10, 2015 at 09:34:18AM +0000, Vineet Gupta wrote:
>>> On Tuesday 09 June 2015 06:10 PM, Peter Zijlstra wrote:
>> I think the most interesting part is the device side.
>>
>>>>> +/*
>>>>> + * DSYNC:
>>>>> + *   - Waits for completion of all outstanding memory operations before any new
>>>>> + *     operations can begin
>>>>> + *   - Includes implicit memory operations such as cache/TLB/BPU maintenance ops
>>>>> + *   - Lighter version of SYNC as it doesn't wait for non-memory operations
>>>>> + */
>>>>> +#define mb()		asm volatile("dsync\n" : : : "memory")
>>>> So mb() is supposed to order against things like DMA memory ops, is DMA
>>>> part of point 1 or 3, if 3, this is not a suitable instruction.
>>> Can u please explain the DMA case a bit more ? From what I understood and used in
>>> say ethernet driver, it is more of a line drawn between say cpu updating a shared
>>> buffer descriptor and kicking a MMIO register (which in turn could initiate a DMA)
>>> but I'm not sure how mb() can possibly order with DMA per se (unless there's some
>>> advanced form of IO-coherency)
>> I'm afraid I might not be the best of sources here, I tend to stay away
>> from actual device stuff like that. I've Cc'ed Will Deacon who might be
>> able to shed a bit more light on this aspect.
> I'd definitely expect mb() to order arbitrary memory accesses against each
> other (i.e. regardless of whether or not they're to RAM or MMIO devices).
> Some drivers use it to "flush the writebuffer" but I don't think that makes
> a whole lot of sense. Certainly, on ARM, if we want to know that something
> reached an MMIO endpoint then we'll need a read-back as well as the barrier
> for the general case.
>
> You also need that guarantee in your readl/writel family of macros. It's
> extremely heavy and rarely needed, which is why I added the _relaxed
> versions to all architectures.

Wow - adding that to these accessors will really be heavy - given that a whole
bunch of drivers still use the stock API (or perhaps don't know / care whether
they need the readl or the relaxed api. And it is practically impossible to switch
them over - after if ain't broken how can u fix it. So far we've been testing this
implementation (readl/writel - w/o any explicit barrier) on slower FPGA builds and
this includes a whole bunch of designware IP - mmc, eth, gpio.... and don't see
any ill effects - do you reckon we still need to add it.


> The "ordering against DMA" is something like reading an MMIO register to
> determine whether the DMA has completed, then going off to read the contents
> out of the DMA buffer. The comment you have about DSYNC makes it sound like
> it's not sufficient for this case.

IMHO this use case is slightly pedantic - since DMA completion will typically
follow up with an interrupt (I understand it's still possible to poll a dma status
reg). at any rate when it comes to dwaring a line between memory accesses -
regular or mmio, DSYNC is all we got in the ISA so ARCV2 mb() has to use it -
there's no better option.

-Vineet

> Will
>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 18/28] ARC: add smp barriers around atomics per memory-barrriers.txt
  2015-06-10 10:53       ` Peter Zijlstra
@ 2015-06-11 13:03         ` Vineet Gupta
  0 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-11 13:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-kernel, arnd, arc-linux-dev, Paul E. McKenney

On Wednesday 10 June 2015 04:23 PM, Peter Zijlstra wrote:
> On Wed, Jun 10, 2015 at 09:17:16AM +0000, Vineet Gupta wrote:
>> I wanted to clarify a couple of things
>> (1) ACQUIRE barrier implies store/{store,load} while RELEASE implies
>> {load,store}/store and given what DMB provides for ARCv2, smp_mb() is the only fit ?
> Please see Documentation/memory-barriers.txt, but a quick recap:
>
>  - ACQUIRE: both loads and stores before to the barrier are allowed to
>    be observed after it.  Neither loads nor stores after the barrier are
>    allowed to be observed before it.
>
>  - RELEASE: both loads and stores before it must be observed before the
>    barrier. However, any load or store after it may be observed before
>    it.
>
> Therefore:
>
>  X = Y = 0;
>
> 	[S] X = 1
> 	    ACQUIRE
>
> 	    RELEASE
> 	[S] Y = 1
>
> is in fact fully unordered, because both stores are allowed to cross in,
> and could cross one another on the inside, like:
>
> 	    ACQUIRE
> 	[S] Y = 1
> 	[S] X = 1
> 	    RELEASE

Thx for that.  I think I was mixing smp_load_acquire() / store_release() with the
spin lock ACQUIRE/RELEASE. As Paul put it on a lwn article, after re-reading
memory-barrier.txt I've indeed felt a hit on my already meager brain power :-)

>> (2) Do we need smp_mb() on both sides of spin lock/unlock - doesn't ACQUIRE imply
>> we have a smp_mb() after lock but before any subsequent critical section - so the
>> top hunk is not necessarily needed. Similarly RELEASE requires a smp_mb() before
>> the memory operation for lock, but not after.
> You do not need an smp_mb() on both sides, as you say, after lock and
> before unlock is sufficient. The main point being that things can not
> escape out of the critical section. Its fine for them to leak in.

Ok - neverthless I will probably keep the extraneous barriers around for now since
I see some weird hackbench regression on a dual core SMP build by removing the
those 3 barriers (and/or replacing them with a nop so as to keep the icache / bpu
micro-arch profile exactly same as before).

-Vineet

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 20/28] ARCv2: barriers
  2015-06-11 12:13           ` Vineet Gupta
@ 2015-06-11 13:39             ` Will Deacon
  2015-06-19 13:13               ` Vineet Gupta
  0 siblings, 1 reply; 66+ messages in thread
From: Will Deacon @ 2015-06-11 13:39 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Peter Zijlstra, linux-arch, linux-kernel, arnd, arc-linux-dev

On Thu, Jun 11, 2015 at 01:13:28PM +0100, Vineet Gupta wrote:
> On Wednesday 10 June 2015 06:31 PM, Will Deacon wrote:
> > On Wed, Jun 10, 2015 at 11:58:40AM +0100, Peter Zijlstra wrote:
> >> On Wed, Jun 10, 2015 at 09:34:18AM +0000, Vineet Gupta wrote:
> >>> On Tuesday 09 June 2015 06:10 PM, Peter Zijlstra wrote:
> >> I think the most interesting part is the device side.
> >>
> >>>>> +/*
> >>>>> + * DSYNC:
> >>>>> + *   - Waits for completion of all outstanding memory operations before any new
> >>>>> + *     operations can begin
> >>>>> + *   - Includes implicit memory operations such as cache/TLB/BPU maintenance ops
> >>>>> + *   - Lighter version of SYNC as it doesn't wait for non-memory operations
> >>>>> + */
> >>>>> +#define mb()		asm volatile("dsync\n" : : : "memory")
> >>>> So mb() is supposed to order against things like DMA memory ops, is DMA
> >>>> part of point 1 or 3, if 3, this is not a suitable instruction.
> >>> Can u please explain the DMA case a bit more ? From what I understood and used in
> >>> say ethernet driver, it is more of a line drawn between say cpu updating a shared
> >>> buffer descriptor and kicking a MMIO register (which in turn could initiate a DMA)
> >>> but I'm not sure how mb() can possibly order with DMA per se (unless there's some
> >>> advanced form of IO-coherency)
> >> I'm afraid I might not be the best of sources here, I tend to stay away
> >> from actual device stuff like that. I've Cc'ed Will Deacon who might be
> >> able to shed a bit more light on this aspect.
> > I'd definitely expect mb() to order arbitrary memory accesses against each
> > other (i.e. regardless of whether or not they're to RAM or MMIO devices).
> > Some drivers use it to "flush the writebuffer" but I don't think that makes
> > a whole lot of sense. Certainly, on ARM, if we want to know that something
> > reached an MMIO endpoint then we'll need a read-back as well as the barrier
> > for the general case.
> >
> > You also need that guarantee in your readl/writel family of macros. It's
> > extremely heavy and rarely needed, which is why I added the _relaxed
> > versions to all architectures.
> 
> Wow - adding that to these accessors will really be heavy - given that a whole
> bunch of drivers still use the stock API (or perhaps don't know / care whether
> they need the readl or the relaxed api. And it is practically impossible to switch
> them over - after if ain't broken how can u fix it. So far we've been testing this
> implementation (readl/writel - w/o any explicit barrier) on slower FPGA builds and
> this includes a whole bunch of designware IP - mmc, eth, gpio.... and don't see
> any ill effects - do you reckon we still need to add it.

Unfortunately, yes, as that's effectively what the kernel requires:

  http://marc.info/?l=linux-kernel&m=121192394430581&w=2
  http://thread.gmane.org/gmane.linux.ide/46414

The conclusion is that x86 *does* provide this ordering in its accessors
and drivers are written to assume that, so either you go round fixing all
the drivers by adding the missing barriers or you implement it in your
accessors (like we have done on ARM). Subtle I/O ordering issues are no
fun to debug.

That's also the reason I added the _relaxed versions, so you can port
drivers one-by-one to the weaker semantics whilst having the potentially
broken drivers continue to work.

> > The "ordering against DMA" is something like reading an MMIO register to
> > determine whether the DMA has completed, then going off to read the contents
> > out of the DMA buffer. The comment you have about DSYNC makes it sound like
> > it's not sufficient for this case.
> 
> IMHO this use case is slightly pedantic - since DMA completion will typically
> follow up with an interrupt (I understand it's still possible to poll a dma status
> reg). at any rate when it comes to dwaring a line between memory accesses -
> regular or mmio, DSYNC is all we got in the ISA so ARCV2 mb() has to use it -
> there's no better option.

Does taking an interrupt ensure visibility of the data on your
architecture? Most non-pci device architectures allow that to race, so
you end up relying on the readX in the irq handler to order the buffer
access.

If you don't have an instruction for this, then I don't understand how
you can perform DMA to/from regions of memory that are mapped as weakly
ordered by the CPU (e.g. how would you write a data buffer then tell the
device to go read from it?).

Will

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2] ARC: add smp barriers around atomics per Documentation/atomic_ops.txt
  2015-06-09 11:48 ` [PATCH 18/28] ARC: add smp barriers around atomics per memory-barrriers.txt Vineet Gupta
  2015-06-09 12:30   ` Peter Zijlstra
@ 2015-06-12 12:15   ` Vineet Gupta
  2015-06-12 13:04     ` Peter Zijlstra
  1 sibling, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-12 12:15 UTC (permalink / raw)
  To: peterz
  Cc: linux-arch, linux-kernel, arc-linux-dev, Vineet Gupta,
	Paul E. McKenney, stable

 - arch_spin_lock/unlock were lacking the ACQUIRE/RELEASE barriers
   Since ARCv2 only provides load/load, store/store and all/all, we need
   the full barrier

 - LLOCK/SCOND based atomics, bitops, cmpxchg, which return modified
   values were lacking the explicit smp barriers.

 - Non LLOCK/SCOND varaints don't need the explicit barriers since that
   is implicity provided by the spin locks used to implement the
   critical section (the spin lock barriers in turn are also fixed in
   this commit as explained above

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/atomic.h   | 21 +++++++++++++++++++++
 arch/arc/include/asm/bitops.h   | 19 +++++++++++++++++++
 arch/arc/include/asm/cmpxchg.h  | 17 +++++++++++++++++
 arch/arc/include/asm/spinlock.h | 32 ++++++++++++++++++++++++++++++++
 4 files changed, 89 insertions(+)

diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 9917a45fc430..20b7dc17979e 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -43,6 +43,12 @@ static inline int atomic_##op##_return(int i, atomic_t *v)		\
 {									\
 	unsigned int temp;						\
 									\
+	/*								\
+	 * Explicit full memory barrier needed before/after as		\
+	 * LLOCK/SCOND thmeselves don't provide any such semantics	\
+	 */								\
+	smp_mb();							\
+									\
 	__asm__ __volatile__(						\
 	"1:	llock   %0, [%1]	\n"				\
 	"	" #asm_op " %0, %0, %2	\n"				\
@@ -52,6 +58,8 @@ static inline int atomic_##op##_return(int i, atomic_t *v)		\
 	: "r"(&v->counter), "ir"(i)					\
 	: "cc");							\
 									\
+	smp_mb();							\
+									\
 	return temp;							\
 }
 
@@ -105,6 +113,9 @@ static inline int atomic_##op##_return(int i, atomic_t *v)		\
 	unsigned long flags;						\
 	unsigned long temp;						\
 									\
+	/*								\
+	 * spin lock/unlock provides the needed smp_mb() before/after	\
+	 */								\
 	atomic_ops_lock(flags);						\
 	temp = v->counter;						\
 	temp c_op i;							\
@@ -142,9 +153,19 @@ ATOMIC_OP(and, &=, and)
 #define __atomic_add_unless(v, a, u)					\
 ({									\
 	int c, old;							\
+									\
+	/*								\
+	 * Explicit full memory barrier needed before/after as		\
+	 * LLOCK/SCOND thmeselves don't provide any such semantics	\
+	 */								\
+	smp_mb();							\
+									\
 	c = atomic_read(v);						\
 	while (c != (u) && (old = atomic_cmpxchg((v), c, c + (a))) != c)\
 		c = old;						\
+									\
+	smp_mb();							\
+									\
 	c;								\
 })
 
diff --git a/arch/arc/include/asm/bitops.h b/arch/arc/include/asm/bitops.h
index 829a8a2e9704..dd03fd931bb7 100644
--- a/arch/arc/include/asm/bitops.h
+++ b/arch/arc/include/asm/bitops.h
@@ -117,6 +117,12 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *m)
 	if (__builtin_constant_p(nr))
 		nr &= 0x1f;
 
+	/*
+	 * Explicit full memory barrier needed before/after as
+	 * LLOCK/SCOND themselves don't provide any such semantics
+	 */
+	smp_mb();
+
 	__asm__ __volatile__(
 	"1:	llock   %0, [%2]	\n"
 	"	bset    %1, %0, %3	\n"
@@ -126,6 +132,8 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *m)
 	: "r"(m), "ir"(nr)
 	: "cc");
 
+	smp_mb();
+
 	return (old & (1 << nr)) != 0;
 }
 
@@ -139,6 +147,8 @@ test_and_clear_bit(unsigned long nr, volatile unsigned long *m)
 	if (__builtin_constant_p(nr))
 		nr &= 0x1f;
 
+	smp_mb();
+
 	__asm__ __volatile__(
 	"1:	llock   %0, [%2]	\n"
 	"	bclr    %1, %0, %3	\n"
@@ -148,6 +158,8 @@ test_and_clear_bit(unsigned long nr, volatile unsigned long *m)
 	: "r"(m), "ir"(nr)
 	: "cc");
 
+	smp_mb();
+
 	return (old & (1 << nr)) != 0;
 }
 
@@ -161,6 +173,8 @@ test_and_change_bit(unsigned long nr, volatile unsigned long *m)
 	if (__builtin_constant_p(nr))
 		nr &= 0x1f;
 
+	smp_mb();
+
 	__asm__ __volatile__(
 	"1:	llock   %0, [%2]	\n"
 	"	bxor    %1, %0, %3	\n"
@@ -170,6 +184,8 @@ test_and_change_bit(unsigned long nr, volatile unsigned long *m)
 	: "r"(m), "ir"(nr)
 	: "cc");
 
+	smp_mb();
+
 	return (old & (1 << nr)) != 0;
 }
 
@@ -249,6 +265,9 @@ static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *m)
 	if (__builtin_constant_p(nr))
 		nr &= 0x1f;
 
+	/*
+	 * spin lock/unlock provide the needed smp_mb() before/after
+	 */
 	bitops_lock(flags);
 
 	old = *m;
diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index 90de5c528da2..44fd531f4d7b 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -10,6 +10,8 @@
 #define __ASM_ARC_CMPXCHG_H
 
 #include <linux/types.h>
+
+#include <asm/barrier.h>
 #include <asm/smp.h>
 
 #ifdef CONFIG_ARC_HAS_LLSC
@@ -19,6 +21,12 @@ __cmpxchg(volatile void *ptr, unsigned long expected, unsigned long new)
 {
 	unsigned long prev;
 
+	/*
+	 * Explicit full memory barrier needed before/after as
+	 * LLOCK/SCOND thmeselves don't provide any such semantics
+	 */
+	smp_mb();
+
 	__asm__ __volatile__(
 	"1:	llock   %0, [%1]	\n"
 	"	brne    %0, %2, 2f	\n"
@@ -31,6 +39,8 @@ __cmpxchg(volatile void *ptr, unsigned long expected, unsigned long new)
 	  "r"(new)	/* can't be "ir". scond can't take LIMM for "b" */
 	: "cc", "memory"); /* so that gcc knows memory is being written here */
 
+	smp_mb();
+
 	return prev;
 }
 
@@ -43,6 +53,9 @@ __cmpxchg(volatile void *ptr, unsigned long expected, unsigned long new)
 	int prev;
 	volatile unsigned long *p = ptr;
 
+	/*
+	 * spin lock/unlock provide the needed smp_mb() before/after
+	 */
 	atomic_ops_lock(flags);
 	prev = *p;
 	if (prev == expected)
@@ -78,12 +91,16 @@ static inline unsigned long __xchg(unsigned long val, volatile void *ptr,
 
 	switch (size) {
 	case 4:
+		smp_mb();
+
 		__asm__ __volatile__(
 		"	ex  %0, [%1]	\n"
 		: "+r"(val)
 		: "r"(ptr)
 		: "memory");
 
+		smp_mb();
+
 		return val;
 	}
 	return __xchg_bad_pointer();
diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index b6a8c2dfbe6e..e1651df6a93d 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -22,24 +22,46 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	unsigned int tmp = __ARCH_SPIN_LOCK_LOCKED__;
 
+	/*
+	 * This smp_mb() is technically superfluous, we only need the one
+	 * after the lock for providing the ACQUIRE semantics.
+	 * However doing the "right" thing was regressing hackbench
+	 * so keeping this, pending further investigation
+	 */
+	smp_mb();
+
 	__asm__ __volatile__(
 	"1:	ex  %0, [%1]		\n"
 	"	breq  %0, %2, 1b	\n"
 	: "+&r" (tmp)
 	: "r"(&(lock->slock)), "ir"(__ARCH_SPIN_LOCK_LOCKED__)
 	: "memory");
+
+	/*
+	 * ACQUIRE barrier to ensure load/store after taking the lock
+	 * don't "bleed-up" out of the critical section (leak-in is allowed)
+	 * http://www.spinics.net/lists/kernel/msg2010409.html
+	 *
+	 * ARCv2 only has load-load, store-store and all-all barrier
+	 * thus need the full all-all barrier
+	 */
+	smp_mb();
 }
 
 static inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned int tmp = __ARCH_SPIN_LOCK_LOCKED__;
 
+	smp_mb();
+
 	__asm__ __volatile__(
 	"1:	ex  %0, [%1]		\n"
 	: "+r" (tmp)
 	: "r"(&(lock->slock))
 	: "memory");
 
+	smp_mb();
+
 	return (tmp == __ARCH_SPIN_LOCK_UNLOCKED__);
 }
 
@@ -47,12 +69,22 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
 	unsigned int tmp = __ARCH_SPIN_LOCK_UNLOCKED__;
 
+	/*
+	 * RELEASE barrier: given the instructions avail on ARCv2, full barrier
+	 * is the only option
+	 */
+	smp_mb();
+
 	__asm__ __volatile__(
 	"	ex  %0, [%1]		\n"
 	: "+r" (tmp)
 	: "r"(&(lock->slock))
 	: "memory");
 
+	/*
+	 * superfluous, but keeping for now - see pairing version in
+	 * arch_spin_lock above
+	 */
 	smp_mb();
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2] ARC: Reduce bitops lines of code using macros
  2015-06-09 11:48 ` [PATCH 21/28] ARC: Reduce bitops lines of code using macros Vineet Gupta
@ 2015-06-12 12:20   ` Vineet Gupta
  2015-06-12 13:05     ` Peter Zijlstra
  0 siblings, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-12 12:20 UTC (permalink / raw)
  To: peterz; +Cc: linux-arch, linux-kernel, arc-linux-dev, Vineet Gupta

No semantical changes !

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/bitops.h | 477 +++++++++++++-----------------------------
 1 file changed, 144 insertions(+), 333 deletions(-)

diff --git a/arch/arc/include/asm/bitops.h b/arch/arc/include/asm/bitops.h
index dd03fd931bb7..99fe118d3730 100644
--- a/arch/arc/include/asm/bitops.h
+++ b/arch/arc/include/asm/bitops.h
@@ -18,83 +18,50 @@
 #include <linux/types.h>
 #include <linux/compiler.h>
 #include <asm/barrier.h>
+#ifndef CONFIG_ARC_HAS_LLSC
+#include <asm/smp.h>
+#endif
 
-/*
- * Hardware assisted read-modify-write using ARC700 LLOCK/SCOND insns.
- * The Kconfig glue ensures that in SMP, this is only set if the container
- * SoC/platform has cross-core coherent LLOCK/SCOND
- */
 #if defined(CONFIG_ARC_HAS_LLSC)
 
-static inline void set_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned int temp;
-
-	m += nr >> 5;
-
-	/*
-	 * ARC ISA micro-optimization:
-	 *
-	 * Instructions dealing with bitpos only consider lower 5 bits (0-31)
-	 * e.g (x << 33) is handled like (x << 1) by ASL instruction
-	 *  (mem pointer still needs adjustment to point to next word)
-	 *
-	 * Hence the masking to clamp @nr arg can be elided in general.
-	 *
-	 * However if @nr is a constant (above assumed it in a register),
-	 * and greater than 31, gcc can optimize away (x << 33) to 0,
-	 * as overflow, given the 32-bit ISA. Thus masking needs to be done
-	 * for constant @nr, but no code is generated due to const prop.
-	 */
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	__asm__ __volatile__(
-	"1:	llock   %0, [%1]	\n"
-	"	bset    %0, %0, %2	\n"
-	"	scond   %0, [%1]	\n"
-	"	bnz     1b	\n"
-	: "=&r"(temp)
-	: "r"(m), "ir"(nr)
-	: "cc");
-}
-
-static inline void clear_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned int temp;
-
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	__asm__ __volatile__(
-	"1:	llock   %0, [%1]	\n"
-	"	bclr    %0, %0, %2	\n"
-	"	scond   %0, [%1]	\n"
-	"	bnz     1b	\n"
-	: "=&r"(temp)
-	: "r"(m), "ir"(nr)
-	: "cc");
-}
-
-static inline void change_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned int temp;
-
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
+/*
+ * Hardware assisted Atomic-R-M-W
+ */
 
-	__asm__ __volatile__(
-	"1:	llock   %0, [%1]	\n"
-	"	bxor    %0, %0, %2	\n"
-	"	scond   %0, [%1]	\n"
-	"	bnz     1b		\n"
-	: "=&r"(temp)
-	: "r"(m), "ir"(nr)
-	: "cc");
+#define BIT_OP(op, c_op, asm_op)					\
+static inline void op##_bit(unsigned long nr, volatile unsigned long *m)\
+{									\
+	unsigned int temp;						\
+									\
+	m += nr >> 5;							\
+									\
+	/*								\
+	 * ARC ISA micro-optimization:					\
+	 *								\
+	 * Instructions dealing with bitpos only consider lower 5 bits	\
+	 * e.g (x << 33) is handled like (x << 1) by ASL instruction	\
+	 *  (mem pointer still needs adjustment to point to next word)	\
+	 *								\
+	 * Hence the masking to clamp @nr arg can be elided in general.	\
+	 *								\
+	 * However if @nr is a constant (above assumed in a register),	\
+	 * and greater than 31, gcc can optimize away (x << 33) to 0,	\
+	 * as overflow, given the 32-bit ISA. Thus masking needs to be	\
+	 * done for const @nr, but no code is generated due to gcc	\
+	 * const prop.							\
+	 */								\
+	if (__builtin_constant_p(nr))					\
+		nr &= 0x1f;						\
+									\
+	__asm__ __volatile__(						\
+	"1:	llock       %0, [%1]		\n"			\
+	"	" #asm_op " %0, %0, %2	\n"				\
+	"	scond       %0, [%1]		\n"			\
+	"	bnz         1b			\n"			\
+	: "=&r"(temp)	/* Early clobber, to prevent reg reuse */	\
+	: "r"(m),	/* Not "m": llock only supports reg direct addr mode */	\
+	  "ir"(nr)							\
+	: "cc");							\
 }
 
 /*
@@ -108,91 +75,38 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *m)
  * Since ARC lacks a equivalent h/w primitive, the bit is set unconditionally
  * and the old value of bit is returned
  */
-static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long old, temp;
-
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	/*
-	 * Explicit full memory barrier needed before/after as
-	 * LLOCK/SCOND themselves don't provide any such semantics
-	 */
-	smp_mb();
-
-	__asm__ __volatile__(
-	"1:	llock   %0, [%2]	\n"
-	"	bset    %1, %0, %3	\n"
-	"	scond   %1, [%2]	\n"
-	"	bnz     1b		\n"
-	: "=&r"(old), "=&r"(temp)
-	: "r"(m), "ir"(nr)
-	: "cc");
-
-	smp_mb();
-
-	return (old & (1 << nr)) != 0;
-}
-
-static inline int
-test_and_clear_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned int old, temp;
-
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	smp_mb();
-
-	__asm__ __volatile__(
-	"1:	llock   %0, [%2]	\n"
-	"	bclr    %1, %0, %3	\n"
-	"	scond   %1, [%2]	\n"
-	"	bnz     1b		\n"
-	: "=&r"(old), "=&r"(temp)
-	: "r"(m), "ir"(nr)
-	: "cc");
-
-	smp_mb();
-
-	return (old & (1 << nr)) != 0;
-}
-
-static inline int
-test_and_change_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned int old, temp;
-
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	smp_mb();
-
-	__asm__ __volatile__(
-	"1:	llock   %0, [%2]	\n"
-	"	bxor    %1, %0, %3	\n"
-	"	scond   %1, [%2]	\n"
-	"	bnz     1b		\n"
-	: "=&r"(old), "=&r"(temp)
-	: "r"(m), "ir"(nr)
-	: "cc");
-
-	smp_mb();
-
-	return (old & (1 << nr)) != 0;
+#define TEST_N_BIT_OP(op, c_op, asm_op)					\
+static inline int test_and_##op##_bit(unsigned long nr, volatile unsigned long *m)\
+{									\
+	unsigned long old, temp;					\
+									\
+	m += nr >> 5;							\
+									\
+	if (__builtin_constant_p(nr))					\
+		nr &= 0x1f;						\
+									\
+	/*								\
+	 * Explicit full memory barrier needed before/after as		\
+	 * LLOCK/SCOND themselves don't provide any such smenatic	\
+	 */								\
+	smp_mb();							\
+									\
+	__asm__ __volatile__(						\
+	"1:	llock       %0, [%2]	\n"				\
+	"	" #asm_op " %1, %0, %3	\n"				\
+	"	scond       %1, [%2]	\n"				\
+	"	bnz         1b		\n"				\
+	: "=&r"(old), "=&r"(temp)					\
+	: "r"(m), "ir"(nr)						\
+	: "cc");							\
+									\
+	smp_mb();							\
+									\
+	return (old & (1 << nr)) != 0;					\
 }
 
 #else	/* !CONFIG_ARC_HAS_LLSC */
 
-#include <asm/smp.h>
-
 /*
  * Non hardware assisted Atomic-R-M-W
  * Locking would change to irq-disabling only (UP) and spinlocks (SMP)
@@ -209,111 +123,43 @@ test_and_change_bit(unsigned long nr, volatile unsigned long *m)
  *             at compile time)
  */
 
-static inline void set_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long temp, flags;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	bitops_lock(flags);
-
-	temp = *m;
-	*m = temp | (1UL << nr);
-
-	bitops_unlock(flags);
-}
-
-static inline void clear_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long temp, flags;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	bitops_lock(flags);
-
-	temp = *m;
-	*m = temp & ~(1UL << nr);
-
-	bitops_unlock(flags);
-}
-
-static inline void change_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long temp, flags;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	bitops_lock(flags);
-
-	temp = *m;
-	*m = temp ^ (1UL << nr);
-
-	bitops_unlock(flags);
+#define BIT_OP(op, c_op, asm_op)					\
+static inline void op##_bit(unsigned long nr, volatile unsigned long *m)\
+{									\
+	unsigned long temp, flags;					\
+	m += nr >> 5;							\
+									\
+	if (__builtin_constant_p(nr))					\
+		nr &= 0x1f;						\
+									\
+	/*								\
+	 * spin lock/unlock provide the needed smp_mb() before/after	\
+	 */								\
+	bitops_lock(flags);						\
+									\
+	temp = *m;							\
+	*m = temp c_op (1UL << nr);					\
+									\
+	bitops_unlock(flags);						\
 }
 
-static inline int test_and_set_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long old, flags;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	/*
-	 * spin lock/unlock provide the needed smp_mb() before/after
-	 */
-	bitops_lock(flags);
-
-	old = *m;
-	*m = old | (1 << nr);
-
-	bitops_unlock(flags);
-
-	return (old & (1 << nr)) != 0;
-}
-
-static inline int
-test_and_clear_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long old, flags;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	bitops_lock(flags);
-
-	old = *m;
-	*m = old & ~(1 << nr);
-
-	bitops_unlock(flags);
-
-	return (old & (1 << nr)) != 0;
-}
-
-static inline int
-test_and_change_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long old, flags;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	bitops_lock(flags);
-
-	old = *m;
-	*m = old ^ (1 << nr);
-
-	bitops_unlock(flags);
-
-	return (old & (1 << nr)) != 0;
+#define TEST_N_BIT_OP(op, c_op, asm_op)					\
+static inline int test_and_##op##_bit(unsigned long nr, volatile unsigned long *m)\
+{									\
+	unsigned long old, flags;					\
+	m += nr >> 5;							\
+									\
+	if (__builtin_constant_p(nr))					\
+		nr &= 0x1f;						\
+									\
+	bitops_lock(flags);						\
+									\
+	old = *m;							\
+	*m = old c_op (1 << nr);					\
+									\
+	bitops_unlock(flags);						\
+									\
+	return (old & (1 << nr)) != 0;					\
 }
 
 #endif /* CONFIG_ARC_HAS_LLSC */
@@ -322,86 +168,51 @@ test_and_change_bit(unsigned long nr, volatile unsigned long *m)
  * Non atomic variants
  **************************************/
 
-static inline void __set_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long temp;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	temp = *m;
-	*m = temp | (1UL << nr);
-}
-
-static inline void __clear_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long temp;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	temp = *m;
-	*m = temp & ~(1UL << nr);
-}
-
-static inline void __change_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long temp;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	temp = *m;
-	*m = temp ^ (1UL << nr);
-}
-
-static inline int
-__test_and_set_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long old;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	old = *m;
-	*m = old | (1 << nr);
-
-	return (old & (1 << nr)) != 0;
+#define __BIT_OP(op, c_op, asm_op)					\
+static inline void __##op##_bit(unsigned long nr, volatile unsigned long *m)	\
+{									\
+	unsigned long temp;						\
+	m += nr >> 5;							\
+									\
+	if (__builtin_constant_p(nr))					\
+		nr &= 0x1f;						\
+									\
+	temp = *m;							\
+	*m = temp c_op (1UL << nr);					\
 }
 
-static inline int
-__test_and_clear_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long old;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	old = *m;
-	*m = old & ~(1 << nr);
-
-	return (old & (1 << nr)) != 0;
+#define __TEST_N_BIT_OP(op, c_op, asm_op)				\
+static inline int __test_and_##op##_bit(unsigned long nr, volatile unsigned long *m)\
+{									\
+	unsigned long old;						\
+	m += nr >> 5;							\
+									\
+	if (__builtin_constant_p(nr))					\
+		nr &= 0x1f;						\
+									\
+	old = *m;							\
+	*m = old c_op (1 << nr);					\
+									\
+	return (old & (1 << nr)) != 0;					\
 }
 
-static inline int
-__test_and_change_bit(unsigned long nr, volatile unsigned long *m)
-{
-	unsigned long old;
-	m += nr >> 5;
-
-	if (__builtin_constant_p(nr))
-		nr &= 0x1f;
-
-	old = *m;
-	*m = old ^ (1 << nr);
-
-	return (old & (1 << nr)) != 0;
-}
+#define BIT_OPS(op, c_op, asm_op)					\
+									\
+	/* set_bit(), clear_bit(), change_bit() */			\
+	BIT_OP(op, c_op, asm_op)					\
+									\
+	/* test_and_set_bit(), test_and_clear_bit(), test_and_change_bit() */\
+	TEST_N_BIT_OP(op, c_op, asm_op)					\
+									\
+	/* __set_bit(), __clear_bit(), __change_bit() */		\
+	__BIT_OP(op, c_op, asm_op)					\
+									\
+	/* __test_and_set_bit(), __test_and_clear_bit(), __test_and_change_bit() */\
+	__TEST_N_BIT_OP(op, c_op, asm_op)
+
+BIT_OPS(set, |, bset)
+BIT_OPS(clear, & ~, bclr)
+BIT_OPS(change, ^, bxor)
 
 /*
  * This routine doesn't need to be atomic.
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] ARC: add smp barriers around atomics per Documentation/atomic_ops.txt
  2015-06-12 12:15   ` [PATCH v2] ARC: add smp barriers around atomics per Documentation/atomic_ops.txt Vineet Gupta
@ 2015-06-12 13:04     ` Peter Zijlstra
  2015-06-12 13:16       ` Vineet Gupta
  0 siblings, 1 reply; 66+ messages in thread
From: Peter Zijlstra @ 2015-06-12 13:04 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: linux-arch, linux-kernel, arc-linux-dev, Paul E. McKenney, stable

On Fri, Jun 12, 2015 at 05:45:59PM +0530, Vineet Gupta wrote:
>  - arch_spin_lock/unlock were lacking the ACQUIRE/RELEASE barriers
>    Since ARCv2 only provides load/load, store/store and all/all, we need
>    the full barrier
> 
>  - LLOCK/SCOND based atomics, bitops, cmpxchg, which return modified
>    values were lacking the explicit smp barriers.
>
>  - Non LLOCK/SCOND varaints don't need the explicit barriers since that
>    is implicity provided by the spin locks used to implement the
>    critical section (the spin lock barriers in turn are also fixed in
>    this commit as explained above

And iirc you're relying on asm-generic/barrier.h to issue
smp_mb__{before,after}_atomic() as smp_mb(), right?

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Although I'd love to know why you need those extra barriers in your
spinlocks...

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] ARC: Reduce bitops lines of code using macros
  2015-06-12 12:20   ` [PATCH v2] " Vineet Gupta
@ 2015-06-12 13:05     ` Peter Zijlstra
  0 siblings, 0 replies; 66+ messages in thread
From: Peter Zijlstra @ 2015-06-12 13:05 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: linux-arch, linux-kernel, arc-linux-dev

On Fri, Jun 12, 2015 at 05:50:34PM +0530, Vineet Gupta wrote:
> No semantical changes !
 

Acked-by Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] ARC: add smp barriers around atomics per Documentation/atomic_ops.txt
  2015-06-12 13:04     ` Peter Zijlstra
@ 2015-06-12 13:16       ` Vineet Gupta
  0 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-12 13:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-kernel, arc-linux-dev, Paul E. McKenney, stable

On Friday 12 June 2015 06:35 PM, Peter Zijlstra wrote:

On Fri, Jun 12, 2015 at 05:45:59PM +0530, Vineet Gupta wrote:


>  - arch_spin_lock/unlock were lacking the ACQUIRE/RELEASE barriers
>    Since ARCv2 only provides load/load, store/store and all/all, we need
>    the full barrier
>
>  - LLOCK/SCOND based atomics, bitops, cmpxchg, which return modified
>    values were lacking the explicit smp barriers.
>
>  - Non LLOCK/SCOND varaints don't need the explicit barriers since that
>    is implicity provided by the spin locks used to implement the
>    critical section (the spin lock barriers in turn are also fixed in
>    this commit as explained above


And iirc you're relying on asm-generic/barrier.h to issue
smp_mb__{before,after}_atomic() as smp_mb(), right?

Yep !



Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org><mailto:peterz@infradead.org>

Thx !


Although I'd love to know why you need those extra barriers in your
spinlocks...


I'll keep you posted as I'd like to get rid of them too. But there's bunch of stuff going on ATM so can't really jump into investigating that. Will need some wrestling with perf... which makes me think that I'd posted a bunch of perf patches for ARC/ARCv2 as well - can u please take alook at them sometime soon !

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 22/28] ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock
  2015-06-10 11:02       ` Peter Zijlstra
@ 2015-06-19  9:55         ` Vineet Gupta
  2015-06-19  9:59           ` Will Deacon
  0 siblings, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-19  9:55 UTC (permalink / raw)
  To: Peter Zijlstra (Intel); +Cc: lkml, linux-arch, arc-linux-dev, Vineet Gupta

A quad core SMP build could get into hardware livelock with concurrent
LLOCK/SCOND. Workaround that by adding a PREFETCHW which is serialized by
SCU (System Coherency Unit). It brings the cache line in Exclusive state
and makes others invalidate their lines. This gives enough time for
winner to complete the LLOCK/SCOND, before others can get the line back.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/atomic.h | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 20b7dc17979e..03484cb4d16d 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -23,13 +23,21 @@
 
 #define atomic_set(v, i) (((v)->counter) = (i))
 
+#ifdef CONFIG_ISA_ARCV2
+#define PREFETCHW	"	prefetchw   [%1]	\n"
+#else
+#define PREFETCHW
+#endif
+
 #define ATOMIC_OP(op, c_op, asm_op)					\
 static inline void atomic_##op(int i, atomic_t *v)			\
 {									\
 	unsigned int temp;						\
 									\
 	__asm__ __volatile__(						\
-	"1:	llock   %0, [%1]	\n"				\
+	"1:				\n"				\
+	PREFETCHW							\
+	"	llock   %0, [%1]	\n"				\
 	"	" #asm_op " %0, %0, %2	\n"				\
 	"	scond   %0, [%1]	\n"				\
 	"	bnz     1b		\n"				\
@@ -50,7 +58,9 @@ static inline int atomic_##op##_return(int i, atomic_t *v)		\
 	smp_mb();							\
 									\
 	__asm__ __volatile__(						\
-	"1:	llock   %0, [%1]	\n"				\
+	"1:				\n"				\
+	PREFETCHW							\
+	"	llock   %0, [%1]	\n"				\
 	"	" #asm_op " %0, %0, %2	\n"				\
 	"	scond   %0, [%1]	\n"				\
 	"	bnz     1b		\n"				\
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 22/28] ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock
  2015-06-19  9:55         ` [PATCH v2 " Vineet Gupta
@ 2015-06-19  9:59           ` Will Deacon
  2015-06-19 10:09             ` Vineet Gupta
  2015-06-23  7:59             ` Vineet Gupta
  0 siblings, 2 replies; 66+ messages in thread
From: Will Deacon @ 2015-06-19  9:59 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: Peter Zijlstra (Intel), lkml, linux-arch, arc-linux-dev

On Fri, Jun 19, 2015 at 10:55:26AM +0100, Vineet Gupta wrote:
> A quad core SMP build could get into hardware livelock with concurrent
> LLOCK/SCOND. Workaround that by adding a PREFETCHW which is serialized by
> SCU (System Coherency Unit). It brings the cache line in Exclusive state
> and makes others invalidate their lines. This gives enough time for
> winner to complete the LLOCK/SCOND, before others can get the line back.
> 
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
> ---
>  arch/arc/include/asm/atomic.h | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
> index 20b7dc17979e..03484cb4d16d 100644
> --- a/arch/arc/include/asm/atomic.h
> +++ b/arch/arc/include/asm/atomic.h
> @@ -23,13 +23,21 @@
>  
>  #define atomic_set(v, i) (((v)->counter) = (i))
>  
> +#ifdef CONFIG_ISA_ARCV2
> +#define PREFETCHW	"	prefetchw   [%1]	\n"
> +#else
> +#define PREFETCHW
> +#endif
> +
>  #define ATOMIC_OP(op, c_op, asm_op)					\
>  static inline void atomic_##op(int i, atomic_t *v)			\
>  {									\
>  	unsigned int temp;						\
>  									\
>  	__asm__ __volatile__(						\
> -	"1:	llock   %0, [%1]	\n"				\
> +	"1:				\n"				\
> +	PREFETCHW							\
> +	"	llock   %0, [%1]	\n"				\
>  	"	" #asm_op " %0, %0, %2	\n"				\
>  	"	scond   %0, [%1]	\n"				\
>  	"	bnz     1b		\n"				\

Curious, but are you *sure* the prefetch should be *inside* the loop?
On most ll/sc architectures, that's a livelock waiting to happen because
you ping-pong the cache-line around in exclusive state.

Will

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 22/28] ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock
  2015-06-19  9:59           ` Will Deacon
@ 2015-06-19 10:09             ` Vineet Gupta
  2015-06-23  7:59             ` Vineet Gupta
  1 sibling, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-19 10:09 UTC (permalink / raw)
  To: Will Deacon, Carlos Basto
  Cc: Peter Zijlstra (Intel), lkml, linux-arch, arc-linux-dev

On Friday 19 June 2015 03:29 PM, Will Deacon wrote:
> On Fri, Jun 19, 2015 at 10:55:26AM +0100, Vineet Gupta wrote:
>> > A quad core SMP build could get into hardware livelock with concurrent
>> > LLOCK/SCOND. Workaround that by adding a PREFETCHW which is serialized by
>> > SCU (System Coherency Unit). It brings the cache line in Exclusive state
>> > and makes others invalidate their lines. This gives enough time for
>> > winner to complete the LLOCK/SCOND, before others can get the line back.
>> > 
>> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
>> > Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
>> > ---
>> >  arch/arc/include/asm/atomic.h | 14 ++++++++++++--
>> >  1 file changed, 12 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
>> > index 20b7dc17979e..03484cb4d16d 100644
>> > --- a/arch/arc/include/asm/atomic.h
>> > +++ b/arch/arc/include/asm/atomic.h
>> > @@ -23,13 +23,21 @@
>> >  
>> >  #define atomic_set(v, i) (((v)->counter) = (i))
>> >  
>> > +#ifdef CONFIG_ISA_ARCV2
>> > +#define PREFETCHW	"	prefetchw   [%1]	\n"
>> > +#else
>> > +#define PREFETCHW
>> > +#endif
>> > +
>> >  #define ATOMIC_OP(op, c_op, asm_op)					\
>> >  static inline void atomic_##op(int i, atomic_t *v)			\
>> >  {									\
>> >  	unsigned int temp;						\
>> >  									\
>> >  	__asm__ __volatile__(						\
>> > -	"1:	llock   %0, [%1]	\n"				\
>> > +	"1:				\n"				\
>> > +	PREFETCHW							\
>> > +	"	llock   %0, [%1]	\n"				\
>> >  	"	" #asm_op " %0, %0, %2	\n"				\
>> >  	"	scond   %0, [%1]	\n"				\
>> >  	"	bnz     1b		\n"				\
> Curious, but are you *sure* the prefetch should be *inside* the loop?
> On most ll/sc architectures, that's a livelock waiting to happen because
> you ping-pong the cache-line around in exclusive state.

Indeed, the prefetchw inside the loop seems dubious, but this is what broke the
h/w livelock when we were playing with multibench last year and what i was told to
do by h/w folks. Let me go check once again !

-Vineet

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 20/28] ARCv2: barriers
  2015-06-11 13:39             ` Will Deacon
@ 2015-06-19 13:13               ` Vineet Gupta
  2015-06-22 13:36                 ` Will Deacon
  0 siblings, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-19 13:13 UTC (permalink / raw)
  To: Will Deacon; +Cc: Peter Zijlstra, linux-arch, linux-kernel, arnd, arc-linux-dev

On Thursday 11 June 2015 07:09 PM, Will Deacon wrote:
> On Thu, Jun 11, 2015 at 01:13:28PM +0100, Vineet Gupta wrote:
>> On Wednesday 10 June 2015 06:31 PM, Will Deacon wrote:
>>> On Wed, Jun 10, 2015 at 11:58:40AM +0100, Peter Zijlstra wrote:
>>>> On Wed, Jun 10, 2015 at 09:34:18AM +0000, Vineet Gupta wrote:
>>>>> On Tuesday 09 June 2015 06:10 PM, Peter Zijlstra wrote:
>>>> I think the most interesting part is the device side.
>>>>
>>>>>>> +/*
>>>>>>> + * DSYNC:
>>>>>>> + *   - Waits for completion of all outstanding memory operations before any new
>>>>>>> + *     operations can begin
>>>>>>> + *   - Includes implicit memory operations such as cache/TLB/BPU maintenance ops
>>>>>>> + *   - Lighter version of SYNC as it doesn't wait for non-memory operations
>>>>>>> + */
>>>>>>> +#define mb()		asm volatile("dsync\n" : : : "memory")
>>>>>> So mb() is supposed to order against things like DMA memory ops, is DMA
>>>>>> part of point 1 or 3, if 3, this is not a suitable instruction.
>>>>> Can u please explain the DMA case a bit more ? From what I understood and used in
>>>>> say ethernet driver, it is more of a line drawn between say cpu updating a shared
>>>>> buffer descriptor and kicking a MMIO register (which in turn could initiate a DMA)
>>>>> but I'm not sure how mb() can possibly order with DMA per se (unless there's some
>>>>> advanced form of IO-coherency)
>>>> I'm afraid I might not be the best of sources here, I tend to stay away
>>>> from actual device stuff like that. I've Cc'ed Will Deacon who might be
>>>> able to shed a bit more light on this aspect.
>>> I'd definitely expect mb() to order arbitrary memory accesses against each
>>> other (i.e. regardless of whether or not they're to RAM or MMIO devices).
>>> Some drivers use it to "flush the writebuffer" but I don't think that makes
>>> a whole lot of sense. Certainly, on ARM, if we want to know that something
>>> reached an MMIO endpoint then we'll need a read-back as well as the barrier
>>> for the general case.
>>>
>>> You also need that guarantee in your readl/writel family of macros. It's
>>> extremely heavy and rarely needed, which is why I added the _relaxed
>>> versions to all architectures.
>>
>> Wow - adding that to these accessors will really be heavy - given that a whole
>> bunch of drivers still use the stock API (or perhaps don't know / care whether
>> they need the readl or the relaxed api. And it is practically impossible to switch
>> them over - after if ain't broken how can u fix it. So far we've been testing this
>> implementation (readl/writel - w/o any explicit barrier) on slower FPGA builds and
>> this includes a whole bunch of designware IP - mmc, eth, gpio.... and don't see
>> any ill effects - do you reckon we still need to add it.
> 
> Unfortunately, yes, as that's effectively what the kernel requires:
> 
>   http://marc.info/?l=linux-kernel&m=121192394430581&w=2
>   http://thread.gmane.org/gmane.linux.ide/46414

Oh great - thx for those !


> The conclusion is that x86 *does* provide this ordering in its accessors
> and drivers are written to assume that, so either you go round fixing all
> the drivers by adding the missing barriers or you implement it in your
> accessors (like we have done on ARM). Subtle I/O ordering issues are no
> fun to debug.
> 
> That's also the reason I added the _relaxed versions, so you can port
> drivers one-by-one to the weaker semantics whilst having the potentially
> broken drivers continue to work.
> 

OK, so given that regular/mmio is also weakly ordered, it would seem that we need
full mb() *before* and *after* the IO access in the non relaxed API. ARM code
seems to put a rmb() after the readl and wmb() before the writel. Is that based on
how h/w provides for some ?

In one of the links you posted above, Catalin posed the same question, but I
didn't see response to that.

| If we are to make the writel/readl on ARM fully ordered with both IO
| (enforced by hardware) and uncached memory, do we add barriers on each
| side of the writel/readl etc.? The common cases would require a barrier
| before writel (write buffer flushing) and a barrier after readl (in case
| of polling for a "DMA complete" state).
|
| So if io_wmb() just orders to IO writes (writel_relaxed), does it mean
| that we still need a mighty wmb() that orders any type of accesses (i.e.
| uncached memory vs IO)? Can drivers not use the strict writel() and no
| longer rely on wmb() (wondering whether we could simplify it on ARM with
| fully ordered IO accessors)?

Further readl/writel would be no different than ioread32/iowrite32 ?

FWIW, h/w folks tell me that DMB guarentess local barrier semantics so we don't
need to use DSYNC. Latter only provides full r+w+TLB/BPU stuff while DMB allows
finer grained r/w/r+w. But if we need full mb then using one vs. other becomes a
moot point.

-Vineet


>>> The "ordering against DMA" is something like reading an MMIO register to
>>> determine whether the DMA has completed, then going off to read the contents
>>> out of the DMA buffer. The comment you have about DSYNC makes it sound like
>>> it's not sufficient for this case.
>>
>> IMHO this use case is slightly pedantic - since DMA completion will typically
>> follow up with an interrupt (I understand it's still possible to poll a dma status
>> reg). at any rate when it comes to dwaring a line between memory accesses -
>> regular or mmio, DSYNC is all we got in the ISA so ARCV2 mb() has to use it -
>> there's no better option.
> 
> Does taking an interrupt ensure visibility of the data on your
> architecture? Most non-pci device architectures allow that to race, so
> you end up relying on the readX in the irq handler to order the buffer
> access.
> 
> If you don't have an instruction for this, then I don't understand how
> you can perform DMA to/from regions of memory that are mapped as weakly
> ordered by the CPU (e.g. how would you write a data buffer then tell the
> device to go read from it?).
> 
> Will
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 20/28] ARCv2: barriers
  2015-06-19 13:13               ` Vineet Gupta
@ 2015-06-22 13:36                 ` Will Deacon
  2015-06-23  7:58                   ` [PATCH v2 " Vineet Gupta
  2015-06-23  8:02                   ` [PATCH " Vineet Gupta
  0 siblings, 2 replies; 66+ messages in thread
From: Will Deacon @ 2015-06-22 13:36 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Peter Zijlstra, linux-arch, linux-kernel, arnd, arc-linux-dev

On Fri, Jun 19, 2015 at 02:13:02PM +0100, Vineet Gupta wrote:
> On Thursday 11 June 2015 07:09 PM, Will Deacon wrote:
> > On Thu, Jun 11, 2015 at 01:13:28PM +0100, Vineet Gupta wrote:
> >> On Wednesday 10 June 2015 06:31 PM, Will Deacon wrote:
> >>> You also need that guarantee in your readl/writel family of macros. It's
> >>> extremely heavy and rarely needed, which is why I added the _relaxed
> >>> versions to all architectures.
> >>
> >> Wow - adding that to these accessors will really be heavy - given that a whole
> >> bunch of drivers still use the stock API (or perhaps don't know / care whether
> >> they need the readl or the relaxed api. And it is practically impossible to switch
> >> them over - after if ain't broken how can u fix it. So far we've been testing this
> >> implementation (readl/writel - w/o any explicit barrier) on slower FPGA builds and
> >> this includes a whole bunch of designware IP - mmc, eth, gpio.... and don't see
> >> any ill effects - do you reckon we still need to add it.
> > 
> > Unfortunately, yes, as that's effectively what the kernel requires:
> > 
> >   http://marc.info/?l=linux-kernel&m=121192394430581&w=2
> >   http://thread.gmane.org/gmane.linux.ide/46414
> 
> Oh great - thx for those !
> 
> > The conclusion is that x86 *does* provide this ordering in its accessors
> > and drivers are written to assume that, so either you go round fixing all
> > the drivers by adding the missing barriers or you implement it in your
> > accessors (like we have done on ARM). Subtle I/O ordering issues are no
> > fun to debug.
> > 
> > That's also the reason I added the _relaxed versions, so you can port
> > drivers one-by-one to the weaker semantics whilst having the potentially
> > broken drivers continue to work.
> > 
> 
> OK, so given that regular/mmio is also weakly ordered, it would seem that we need
> full mb() *before* and *after* the IO access in the non relaxed API. ARM code
> seems to put a rmb() after the readl and wmb() before the writel. Is that based on
> how h/w provides for some ?

We figured that you'd likely be doing something like:

<writel_relaxed DMA buffer>
<writel MMIO "go" reg>

or:

<readl MMIO "status" reg>
<readl_relaxed DMA buffer>

so ended up with writel doing {wmb(); writel_relaxed} and readl doing
{readl_relaxed; rmb()}.

> In one of the links you posted above, Catalin posed the same question, but I
> didn't see response to that.
> 
> | If we are to make the writel/readl on ARM fully ordered with both IO
> | (enforced by hardware) and uncached memory, do we add barriers on each
> | side of the writel/readl etc.? The common cases would require a barrier
> | before writel (write buffer flushing) and a barrier after readl (in case
> | of polling for a "DMA complete" state).
> |
> | So if io_wmb() just orders to IO writes (writel_relaxed), does it mean
> | that we still need a mighty wmb() that orders any type of accesses (i.e.
> | uncached memory vs IO)? Can drivers not use the strict writel() and no
> | longer rely on wmb() (wondering whether we could simplify it on ARM with
> | fully ordered IO accessors)?
> 
> Further readl/writel would be no different than ioread32/iowrite32 ?

ioread32/iowrite32 can be used with port addresses and dispatch to the
relevant accessors depending on that. The memory ordering semantics should
be the same as readl/writel.

> FWIW, h/w folks tell me that DMB guarentess local barrier semantics so we don't
> need to use DSYNC. Latter only provides full r+w+TLB/BPU stuff while DMB allows
> finer grained r/w/r+w. But if we need full mb then using one vs. other becomes a
> moot point.

I'd say go with what we do on ARM/arm64, then at least we have consistency
in the use of barriers.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 20/28] ARCv2: barriers
  2015-06-22 13:36                 ` Will Deacon
@ 2015-06-23  7:58                   ` Vineet Gupta
  2015-06-23  8:49                     ` Will Deacon
  2015-06-23  9:25                     ` [PATCH v2 20/28] " Peter Zijlstra
  2015-06-23  8:02                   ` [PATCH " Vineet Gupta
  1 sibling, 2 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-23  7:58 UTC (permalink / raw)
  To: Peter Zijlstra (Intel)
  Cc: lkml, linux-arch, arc-linux-dev, Vineet Gupta, Will Deacon

ARCv2 based HS38 cores are weakly ordered and thus explicit barriers for
kernel proper.

SMP barrier is provided by DMB instruction which also guarantees local
barrier hence used as backend of smp_*mb() as well as *mb() APIs

Also hookup barriers into MMIO accessors to avoid ordering issues in IO

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
Changes since v1
 * Better changelog and comments
 * local/mandatory barriers to NOT use DSYNC, but DMB
 * define DMB based mandatory barriers even for !SMP
---

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/Kbuild    |  1 -
 arch/arc/include/asm/barrier.h | 48 ++++++++++++++++++++++++++++++++++++++++++
 arch/arc/include/asm/io.h      | 42 +++++++++++++++++++++++++++++++++---
 3 files changed, 87 insertions(+), 4 deletions(-)
 create mode 100644 arch/arc/include/asm/barrier.h

diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index be0c39e76f7c..59e2dd1d434f 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -1,5 +1,4 @@
 generic-y += auxvec.h
-generic-y += barrier.h
 generic-y += bitsperlong.h
 generic-y += bugs.h
 generic-y += clkdev.h
diff --git a/arch/arc/include/asm/barrier.h b/arch/arc/include/asm/barrier.h
new file mode 100644
index 000000000000..a7209983ee64
--- /dev/null
+++ b/arch/arc/include/asm/barrier.h
@@ -0,0 +1,48 @@
+/*
+ * Copyright (C) 2014-15 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __ASM_BARRIER_H
+#define __ASM_BARRIER_H
+
+#ifdef CONFIG_ISA_ARCV2
+
+/*
+ * ARCv2 based HS38 cores are in-order issue, but still weakly ordered
+ * due to micro-arch buffering/queuing of load/store, cache hit vs. miss ...
+ *
+ * Explicit barrier provided by DMB instruction
+ *  - Operand supports fine grained load/store/load+store semantics
+ *  - Ensures that selected memory operation issued before it will complete
+ *    before any subsequent memory operation of same type
+ *  - DMB guarantees SMP as well as local barrier semantics
+ *    (asm-generic/barrier.h ensures sane smp_*mb if not defined here, i.e.
+ *    UP: barrier(), SMP: smp_*mb == *mb)
+ *  - DSYNC provides DMB+completion_of_cache_bpu_maintenance_ops hence not needed
+ *    in the general case. Plus it only provides full barrier.
+ */
+
+#define mb()	asm volatile("dmb 3\n" : : : "memory")
+#define rmb()	asm volatile("dmb 1\n" : : : "memory")
+#define wmb()	asm volatile("dmb 2\n" : : : "memory")
+
+#endif
+
+#ifdef CONFIG_ISA_ARCOMPACT
+
+/*
+ * ARCompact based cores (ARC700) only have SYNC instruction which is super
+ * heavy weight as it flushes the pipeline as well.
+ * There are no real SMP implementations of such cores.
+ */
+
+#define mb()	asm volatile("sync\n" : : : "memory")
+#endif
+
+#include <asm-generic/barrier.h>
+
+#endif
diff --git a/arch/arc/include/asm/io.h b/arch/arc/include/asm/io.h
index cabd518cb253..18c685bf9748 100644
--- a/arch/arc/include/asm/io.h
+++ b/arch/arc/include/asm/io.h
@@ -98,9 +98,45 @@ static inline void __raw_writel(u32 w, volatile void __iomem *addr)
 
 }
 
-#define readb_relaxed readb
-#define readw_relaxed readw
-#define readl_relaxed readl
+#ifdef CONFIG_ISA_ARCV2
+#include <asm/barrier.h>
+#define __iormb()		rmb()
+#define __iowmb()		wmb()
+#else
+#define __iormb()		do { } while (0)
+#define __iowmb()		do { } while (0)
+#endif
+
+/*
+ * MMIO can also get buffered/optimized in micro-arch, so barriers needed
+ * Based on ARM model for the typical use case
+ *
+ *	<writel_relaxed DMA buffer>
+ *	<writel MMIO "go" reg>
+ *   or:
+ *   	<readl MMIO "status" reg>
+ *   	<readl_relaxed DMA buffer>
+ *
+ * http://www.spinics.net/lists/kernel/msg2018397.html
+ */
+#define readb(c)		({ u8  __v = readb_relaxed(c); __iormb(); __v; })
+#define readw(c)		({ u16 __v = readw_relaxed(c); __iormb(); __v; })
+#define readl(c)		({ u32 __v = readl_relaxed(c); __iormb(); __v; })
+
+#define writeb(v,c)		({ __iowmb(); writeb_relaxed(v,c); })
+#define writew(v,c)		({ __iowmb(); writew_relaxed(v,c); })
+#define writel(v,c)		({ __iowmb(); writel_relaxed(v,c); })
+
+/*
+ * Relaxed API for drivers which can handle any ordering themselves
+ */
+#define readb_relaxed(c)	__raw_readb(c)
+#define readw_relaxed(c)	__raw_readw(c)
+#define readl_relaxed(c)	__raw_readl(c)
+
+#define writeb_relaxed(v,c)	__raw_writeb(v,c)
+#define writew_relaxed(v,c)	__raw_writew(v,c)
+#define writel_relaxed(v,c)	__raw_writel(v,c)
 
 #include <asm-generic/io.h>
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 22/28] ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock
  2015-06-19  9:59           ` Will Deacon
  2015-06-19 10:09             ` Vineet Gupta
@ 2015-06-23  7:59             ` Vineet Gupta
  1 sibling, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-23  7:59 UTC (permalink / raw)
  To: Peter Zijlstra (Intel)
  Cc: lkml, linux-arch, arc-linux-dev, Vineet Gupta, Will Deacon

A quad core SMP build could get into hardware livelock with concurrent
LLOCK/SCOND. Workaround that by adding a PREFETCHW which is serialized by
SCU (System Coherency Unit). It brings the cache line in Exclusive state
and makes others invalidate their lines. This gives enough time for
winner to complete the LLOCK/SCOND, before others can get the line back.

The prefetchw in the ll/sc loop is not nice but this is the only
software workaround for current version of RTL.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/atomic.h | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 20b7dc17979e..03484cb4d16d 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -23,13 +23,21 @@
 
 #define atomic_set(v, i) (((v)->counter) = (i))
 
+#ifdef CONFIG_ISA_ARCV2
+#define PREFETCHW	"	prefetchw   [%1]	\n"
+#else
+#define PREFETCHW
+#endif
+
 #define ATOMIC_OP(op, c_op, asm_op)					\
 static inline void atomic_##op(int i, atomic_t *v)			\
 {									\
 	unsigned int temp;						\
 									\
 	__asm__ __volatile__(						\
-	"1:	llock   %0, [%1]	\n"				\
+	"1:				\n"				\
+	PREFETCHW							\
+	"	llock   %0, [%1]	\n"				\
 	"	" #asm_op " %0, %0, %2	\n"				\
 	"	scond   %0, [%1]	\n"				\
 	"	bnz     1b		\n"				\
@@ -50,7 +58,9 @@ static inline int atomic_##op##_return(int i, atomic_t *v)		\
 	smp_mb();							\
 									\
 	__asm__ __volatile__(						\
-	"1:	llock   %0, [%1]	\n"				\
+	"1:				\n"				\
+	PREFETCHW							\
+	"	llock   %0, [%1]	\n"				\
 	"	" #asm_op " %0, %0, %2	\n"				\
 	"	scond   %0, [%1]	\n"				\
 	"	bnz     1b		\n"				\
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 20/28] ARCv2: barriers
  2015-06-22 13:36                 ` Will Deacon
  2015-06-23  7:58                   ` [PATCH v2 " Vineet Gupta
@ 2015-06-23  8:02                   ` Vineet Gupta
  1 sibling, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-23  8:02 UTC (permalink / raw)
  To: Will Deacon; +Cc: Peter Zijlstra, linux-arch, linux-kernel, arnd, arc-linux-dev

On Monday 22 June 2015 07:06 PM, Will Deacon wrote:
>> OK, so given that regular/mmio is also weakly ordered, it would seem that we need
>> > full mb() *before* and *after* the IO access in the non relaxed API. ARM code
>> > seems to put a rmb() after the readl and wmb() before the writel. Is that based on
>> > how h/w provides for some ?
> We figured that you'd likely be doing something like:
> 
> <writel_relaxed DMA buffer>
> <writel MMIO "go" reg>
> 
> or:
> 
> <readl MMIO "status" reg>
> <readl_relaxed DMA buffer>
> 
> so ended up with writel doing {wmb(); writel_relaxed} and readl doing
> {readl_relaxed; rmb()}.
> 
>> > In one of the links you posted above, Catalin posed the same question, but I
>> > didn't see response to that.
>> > 
>> > | If we are to make the writel/readl on ARM fully ordered with both IO
>> > | (enforced by hardware) and uncached memory, do we add barriers on each
>> > | side of the writel/readl etc.? The common cases would require a barrier
>> > | before writel (write buffer flushing) and a barrier after readl (in case
>> > | of polling for a "DMA complete" state).
>> > |
>> > | So if io_wmb() just orders to IO writes (writel_relaxed), does it mean
>> > | that we still need a mighty wmb() that orders any type of accesses (i.e.
>> > | uncached memory vs IO)? Can drivers not use the strict writel() and no
>> > | longer rely on wmb() (wondering whether we could simplify it on ARM with
>> > | fully ordered IO accessors)?
>> > 
>> > Further readl/writel would be no different than ioread32/iowrite32 ?
> ioread32/iowrite32 can be used with port addresses and dispatch to the
> relevant accessors depending on that. The memory ordering semantics should
> be the same as readl/writel.
> 
>> > FWIW, h/w folks tell me that DMB guarentess local barrier semantics so we don't
>> > need to use DSYNC. Latter only provides full r+w+TLB/BPU stuff while DMB allows
>> > finer grained r/w/r+w. But if we need full mb then using one vs. other becomes a
>> > moot point.
> I'd say go with what we do on ARM/arm64, then at least we have consistency
> in the use of barriers.

Thx for very helpful review/feedback Will. I've posted a v2 !

-Vineet

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 20/28] ARCv2: barriers
  2015-06-23  7:58                   ` [PATCH v2 " Vineet Gupta
@ 2015-06-23  8:49                     ` Will Deacon
  2015-06-23  9:03                       ` Vineet Gupta
  2015-06-23  9:25                     ` [PATCH v2 20/28] " Peter Zijlstra
  1 sibling, 1 reply; 66+ messages in thread
From: Will Deacon @ 2015-06-23  8:49 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: Peter Zijlstra (Intel), lkml, linux-arch, arc-linux-dev

Hi Vineet,

On Tue, Jun 23, 2015 at 08:58:03AM +0100, Vineet Gupta wrote:
> ARCv2 based HS38 cores are weakly ordered and thus explicit barriers for
> kernel proper.
> 
> SMP barrier is provided by DMB instruction which also guarantees local
> barrier hence used as backend of smp_*mb() as well as *mb() APIs
> 
> Also hookup barriers into MMIO accessors to avoid ordering issues in IO
> 
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
> ---
> Changes since v1
>  * Better changelog and comments
>  * local/mandatory barriers to NOT use DSYNC, but DMB
>  * define DMB based mandatory barriers even for !SMP
> ---

Functionally, all looks good to me. However, my comment is completely
misleading ;)

> +/*
> + * MMIO can also get buffered/optimized in micro-arch, so barriers needed
> + * Based on ARM model for the typical use case
> + *
> + *	<writel_relaxed DMA buffer>
> + *	<writel MMIO "go" reg>
> + *   or:
> + *   	<readl MMIO "status" reg>
> + *   	<readl_relaxed DMA buffer>

The writel_relaxed/readl_relaxed parts here would actually just be
bog-standard loads and stores to an in-memory buffer. I was trying too hard
to show the barrier semantics and accidentally turned the DMA buffers into
__iomem regions.

If you fix the comment:

  Reviewed-by: Will Deacon <will.deacon@arm.com>

Sorry for messing you about.

Will

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 20/28] ARCv2: barriers
  2015-06-23  8:49                     ` Will Deacon
@ 2015-06-23  9:03                       ` Vineet Gupta
  2015-06-23  9:26                         ` Will Deacon
  0 siblings, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-23  9:03 UTC (permalink / raw)
  To: Will Deacon, Vineet Gupta
  Cc: Peter Zijlstra (Intel), lkml, linux-arch, arc-linux-dev

Hi Will,

On Tuesday 23 June 2015 02:19 PM, Will Deacon wrote:
>> +/*
>> > + * MMIO can also get buffered/optimized in micro-arch, so barriers needed
>> > + * Based on ARM model for the typical use case
>> > + *
>> > + *	<writel_relaxed DMA buffer>
>> > + *	<writel MMIO "go" reg>
>> > + *   or:
>> > + *   	<readl MMIO "status" reg>
>> > + *   	<readl_relaxed DMA buffer>
> The writel_relaxed/readl_relaxed parts here would actually just be
> bog-standard loads and stores to an in-memory buffer. I was trying too hard
> to show the barrier semantics and accidentally turned the DMA buffers into
> __iomem regions.

Not sure if I follow you completely :-)

IMHO, It doesn't matter if we are dealing with a typical DMA buffer (cached) or a
buffer descriptor (typically uncached unless there's hardware IO-coh or some
such). Both the cases assume a vanilla ld/st to buffer (using relaxed API) with a
surrounding MMIO access.

> 
> If you fix the comment:

Does this look better ?

- *	<writel_relaxed DMA buffer>
+ *	<writel_relaxed DMA buffer (cached or uncached)>

> 
>   Reviewed-by: Will Deacon <will.deacon@arm.com>
> 
> Sorry for messing you about.

NP.

-Vineet

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 20/28] ARCv2: barriers
  2015-06-23  7:58                   ` [PATCH v2 " Vineet Gupta
  2015-06-23  8:49                     ` Will Deacon
@ 2015-06-23  9:25                     ` Peter Zijlstra
  1 sibling, 0 replies; 66+ messages in thread
From: Peter Zijlstra @ 2015-06-23  9:25 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: lkml, linux-arch, arc-linux-dev, Will Deacon

On Tue, Jun 23, 2015 at 01:28:03PM +0530, Vineet Gupta wrote:
> +/*
> + * MMIO can also get buffered/optimized in micro-arch, so barriers needed
> + * Based on ARM model for the typical use case
> + *
> + *	<writel_relaxed DMA buffer>
> + *	<writel MMIO "go" reg>
> + *   or:
> + *   	<readl MMIO "status" reg>
> + *   	<readl_relaxed DMA buffer>
> + *
> + * http://www.spinics.net/lists/kernel/msg2018397.html

	http://lkml.kernel.org/r/20150622133656.GG1583@arm.com

Might I suggest you use the above link. Since we control kernel.org we
have redirect that URL to a archive of our choosing. This keeps the
links working even if spinics.net or marc.info were to go belly up.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 20/28] ARCv2: barriers
  2015-06-23  9:03                       ` Vineet Gupta
@ 2015-06-23  9:26                         ` Will Deacon
  2015-06-23  9:52                           ` [PATCH v3 22/28] " Vineet Gupta
  0 siblings, 1 reply; 66+ messages in thread
From: Will Deacon @ 2015-06-23  9:26 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: Peter Zijlstra (Intel), lkml, linux-arch, arc-linux-dev

On Tue, Jun 23, 2015 at 10:03:25AM +0100, Vineet Gupta wrote:
> On Tuesday 23 June 2015 02:19 PM, Will Deacon wrote:
> >> +/*
> >> > + * MMIO can also get buffered/optimized in micro-arch, so barriers needed
> >> > + * Based on ARM model for the typical use case
> >> > + *
> >> > + *	<writel_relaxed DMA buffer>
> >> > + *	<writel MMIO "go" reg>
> >> > + *   or:
> >> > + *   	<readl MMIO "status" reg>
> >> > + *   	<readl_relaxed DMA buffer>
> > The writel_relaxed/readl_relaxed parts here would actually just be
> > bog-standard loads and stores to an in-memory buffer. I was trying too hard
> > to show the barrier semantics and accidentally turned the DMA buffers into
> > __iomem regions.
> 
> Not sure if I follow you completely :-)

D'oh, sorry.

> IMHO, It doesn't matter if we are dealing with a typical DMA buffer (cached) or a
> buffer descriptor (typically uncached unless there's hardware IO-coh or some
> such). Both the cases assume a vanilla ld/st to buffer (using relaxed API) with a
> surrounding MMIO access.

It's more that you should only pass __iomem pointers (i.e. stuff you got
back from something like ioremap) to readl_relaxed/writel_relaxed and that's
not typically how you would allocate your DMA buffer.

> > If you fix the comment:
> 
> Does this look better ?
> 
> - *	<writel_relaxed DMA buffer>
> + *	<writel_relaxed DMA buffer (cached or uncached)>

I'd just replace 'writel_relaxed' with whatever your store instruction is
(ST)?

Will

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 22/28] ARCv2: barriers
  2015-06-23  9:26                         ` Will Deacon
@ 2015-06-23  9:52                           ` Vineet Gupta
  2015-06-23 16:28                             ` Will Deacon
  0 siblings, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-23  9:52 UTC (permalink / raw)
  To: Peter Zijlstra (Intel)
  Cc: lkml, linux-arch, arc-linux-dev, Vineet Gupta, Will Deacon

ARCv2 based HS38 cores are weakly ordered and thus explicit barriers for
kernel proper.

SMP barrier is provided by DMB instruction which also guarantees local
barrier hence used as backend of smp_*mb() as well as *mb() APIs

Also hookup barriers into MMIO accessors to avoid ordering issues in IO

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
Changes since v2
 * lkml discussion link points to lkml redirector (PeterZ)
 * Updated comment about IO ordering to use standard LD/ST (Will Deacon)

Changes since v1
 * Better changelog and comments
 * local/mandatory barriers to NOT use DSYNC, but DMB
 * define DMB based mandatory barriers even for !SMP
---

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/Kbuild    |  1 -
 arch/arc/include/asm/barrier.h | 48 ++++++++++++++++++++++++++++++++++++++++++
 arch/arc/include/asm/io.h      | 44 +++++++++++++++++++++++++++++++++++---
 3 files changed, 89 insertions(+), 4 deletions(-)
 create mode 100644 arch/arc/include/asm/barrier.h

diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index be0c39e76f7c..59e2dd1d434f 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -1,5 +1,4 @@
 generic-y += auxvec.h
-generic-y += barrier.h
 generic-y += bitsperlong.h
 generic-y += bugs.h
 generic-y += clkdev.h
diff --git a/arch/arc/include/asm/barrier.h b/arch/arc/include/asm/barrier.h
new file mode 100644
index 000000000000..a7209983ee64
--- /dev/null
+++ b/arch/arc/include/asm/barrier.h
@@ -0,0 +1,48 @@
+/*
+ * Copyright (C) 2014-15 Synopsys, Inc. (www.synopsys.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __ASM_BARRIER_H
+#define __ASM_BARRIER_H
+
+#ifdef CONFIG_ISA_ARCV2
+
+/*
+ * ARCv2 based HS38 cores are in-order issue, but still weakly ordered
+ * due to micro-arch buffering/queuing of load/store, cache hit vs. miss ...
+ *
+ * Explicit barrier provided by DMB instruction
+ *  - Operand supports fine grained load/store/load+store semantics
+ *  - Ensures that selected memory operation issued before it will complete
+ *    before any subsequent memory operation of same type
+ *  - DMB guarantees SMP as well as local barrier semantics
+ *    (asm-generic/barrier.h ensures sane smp_*mb if not defined here, i.e.
+ *    UP: barrier(), SMP: smp_*mb == *mb)
+ *  - DSYNC provides DMB+completion_of_cache_bpu_maintenance_ops hence not needed
+ *    in the general case. Plus it only provides full barrier.
+ */
+
+#define mb()	asm volatile("dmb 3\n" : : : "memory")
+#define rmb()	asm volatile("dmb 1\n" : : : "memory")
+#define wmb()	asm volatile("dmb 2\n" : : : "memory")
+
+#endif
+
+#ifdef CONFIG_ISA_ARCOMPACT
+
+/*
+ * ARCompact based cores (ARC700) only have SYNC instruction which is super
+ * heavy weight as it flushes the pipeline as well.
+ * There are no real SMP implementations of such cores.
+ */
+
+#define mb()	asm volatile("sync\n" : : : "memory")
+#endif
+
+#include <asm-generic/barrier.h>
+
+#endif
diff --git a/arch/arc/include/asm/io.h b/arch/arc/include/asm/io.h
index cabd518cb253..9749096028dc 100644
--- a/arch/arc/include/asm/io.h
+++ b/arch/arc/include/asm/io.h
@@ -98,9 +98,47 @@ static inline void __raw_writel(u32 w, volatile void __iomem *addr)
 
 }
 
-#define readb_relaxed readb
-#define readw_relaxed readw
-#define readl_relaxed readl
+#ifdef CONFIG_ISA_ARCV2
+#include <asm/barrier.h>
+#define __iormb()		rmb()
+#define __iowmb()		wmb()
+#else
+#define __iormb()		do { } while (0)
+#define __iowmb()		do { } while (0)
+#endif
+
+/*
+ * MMIO can also get buffered/optimized in micro-arch, so barriers needed
+ * Based on ARM model for the typical use case
+ *
+ *	<ST [DMA buffer]>
+ *		wmb()
+ *	<writel MMIO "go" reg>
+ *  or:
+ *	<readl MMIO "status" reg>
+ *		rmb()
+ *	<LD [DMA buffer]>
+ *
+ * http://lkml.kernel.org/r/20150622133656.GG1583@arm.com
+ */
+#define readb(c)		({ u8  __v = readb_relaxed(c); __iormb(); __v; })
+#define readw(c)		({ u16 __v = readw_relaxed(c); __iormb(); __v; })
+#define readl(c)		({ u32 __v = readl_relaxed(c); __iormb(); __v; })
+
+#define writeb(v,c)		({ __iowmb(); writeb_relaxed(v,c); })
+#define writew(v,c)		({ __iowmb(); writew_relaxed(v,c); })
+#define writel(v,c)		({ __iowmb(); writel_relaxed(v,c); })
+
+/*
+ * Relaxed API for drivers which can handle any ordering themselves
+ */
+#define readb_relaxed(c)	__raw_readb(c)
+#define readw_relaxed(c)	__raw_readw(c)
+#define readl_relaxed(c)	__raw_readl(c)
+
+#define writeb_relaxed(v,c)	__raw_writeb(v,c)
+#define writew_relaxed(v,c)	__raw_writew(v,c)
+#define writel_relaxed(v,c)	__raw_writel(v,c)
 
 #include <asm-generic/io.h>
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 22/28] ARCv2: barriers
  2015-06-23  9:52                           ` [PATCH v3 22/28] " Vineet Gupta
@ 2015-06-23 16:28                             ` Will Deacon
  0 siblings, 0 replies; 66+ messages in thread
From: Will Deacon @ 2015-06-23 16:28 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: Peter Zijlstra (Intel), lkml, linux-arch, arc-linux-dev

On Tue, Jun 23, 2015 at 10:52:06AM +0100, Vineet Gupta wrote:
> ARCv2 based HS38 cores are weakly ordered and thus explicit barriers for
> kernel proper.
> 
> SMP barrier is provided by DMB instruction which also guarantees local
> barrier hence used as backend of smp_*mb() as well as *mb() APIs
> 
> Also hookup barriers into MMIO accessors to avoid ordering issues in IO
> 
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
> ---
> Changes since v2
>  * lkml discussion link points to lkml redirector (PeterZ)
>  * Updated comment about IO ordering to use standard LD/ST (Will Deacon)
> 
> Changes since v1
>  * Better changelog and comments
>  * local/mandatory barriers to NOT use DSYNC, but DMB
>  * define DMB based mandatory barriers even for !SMP
> ---

[...]

> +/*
> + * MMIO can also get buffered/optimized in micro-arch, so barriers needed
> + * Based on ARM model for the typical use case
> + *
> + *	<ST [DMA buffer]>
> + *		wmb()
> + *	<writel MMIO "go" reg>
> + *  or:
> + *	<readl MMIO "status" reg>
> + *		rmb()
> + *	<LD [DMA buffer]>
> + *
> + * http://lkml.kernel.org/r/20150622133656.GG1583@arm.com
> + */

If that makes sense to you, then fine, but I find the wmb() and rmb() a bit
odd since they're implied by the writel/readl macros.

Regardless, you can keep my reviewed-by from last time.

Will

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 11/28] ARCv2: extable: Enable sorting at build time
  2015-06-09 11:48 ` [PATCH 11/28] ARCv2: extable: Enable sorting at build time Vineet Gupta
@ 2015-06-24  5:51   ` Vineet Gupta
  2015-06-29 20:38     ` David Daney
  0 siblings, 1 reply; 66+ messages in thread
From: Vineet Gupta @ 2015-06-24  5:51 UTC (permalink / raw)
  To: David Daney, linux-arch, linux-kernel; +Cc: arnd, arc-linux-dev

Hi David,

On Tuesday 09 June 2015 05:18 PM, Vineet Gupta wrote:
> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
> ---
>  scripts/sortextable.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/scripts/sortextable.c b/scripts/sortextable.c
> index 1052d4834a44..c2423d913b46 100644
> --- a/scripts/sortextable.c
> +++ b/scripts/sortextable.c
> @@ -47,6 +47,10 @@
>  #define EM_MICROBLAZE	189
>  #endif
>  
> +#ifndef EM_ARCV2
> +#define EM_ARCV2	195
> +#endif
> +
>  static int fd_map;	/* File descriptor for file being modified. */
>  static int mmap_failed; /* Boolean flag. */
>  static void *ehdr_curr; /* current ElfXX_Ehdr *  for resource cleanup */
> @@ -281,6 +285,7 @@ do_file(char const *const fname)
>  		custom_sort = sort_relative_table;
>  		break;
>  	case EM_ARCOMPACT:
> +	case EM_ARCV2:
>  	case EM_ARM:
>  	case EM_AARCH64:
>  	case EM_MICROBLAZE:
> 

Sorry for missing you in the CC in orig post of this patch. Can I get your Ack on
this one !

-Vineet

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 11/28] ARCv2: extable: Enable sorting at build time
  2015-06-24  5:51   ` Vineet Gupta
@ 2015-06-29 20:38     ` David Daney
  2015-06-30  4:41       ` Vineet Gupta
  0 siblings, 1 reply; 66+ messages in thread
From: David Daney @ 2015-06-29 20:38 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: David Daney, linux-arch, linux-kernel, arnd, arc-linux-dev

On 06/23/2015 10:51 PM, Vineet Gupta wrote:
> Hi David,
>
> On Tuesday 09 June 2015 05:18 PM, Vineet Gupta wrote:
>> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
>> ---
>>   scripts/sortextable.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/scripts/sortextable.c b/scripts/sortextable.c
>> index 1052d4834a44..c2423d913b46 100644
>> --- a/scripts/sortextable.c
>> +++ b/scripts/sortextable.c
>> @@ -47,6 +47,10 @@
>>   #define EM_MICROBLAZE	189
>>   #endif
>>
>> +#ifndef EM_ARCV2
>> +#define EM_ARCV2	195
>> +#endif
>> +
>>   static int fd_map;	/* File descriptor for file being modified. */
>>   static int mmap_failed; /* Boolean flag. */
>>   static void *ehdr_curr; /* current ElfXX_Ehdr *  for resource cleanup */
>> @@ -281,6 +285,7 @@ do_file(char const *const fname)
>>   		custom_sort = sort_relative_table;
>>   		break;
>>   	case EM_ARCOMPACT:
>> +	case EM_ARCV2:
>>   	case EM_ARM:
>>   	case EM_AARCH64:
>>   	case EM_MICROBLAZE:
>>
>
> Sorry for missing you in the CC in orig post of this patch. Can I get your Ack on
> this one !
>

OK:

Acked-by: David Daney <david.daney@cavium.com>


Really this is obvious, and only effects ARCv2, so the corresponding 
maintainers shouldn't need my Ack.

David Daney


> -Vineet
>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 11/28] ARCv2: extable: Enable sorting at build time
  2015-06-29 20:38     ` David Daney
@ 2015-06-30  4:41       ` Vineet Gupta
  0 siblings, 0 replies; 66+ messages in thread
From: Vineet Gupta @ 2015-06-30  4:41 UTC (permalink / raw)
  To: David Daney; +Cc: David Daney, linux-arch, linux-kernel, arnd, arc-linux-dev

On Tuesday 30 June 2015 02:08 AM, David Daney wrote:
>>
>> Sorry for missing you in the CC in orig post of this patch. Can I get your Ack on
>> this one !
>>
> 
> OK:
> 
> Acked-by: David Daney <david.daney@cavium.com>

Thx David. Although I had put the patch as it is in the end as I need to send a
pull request to Linus soon for hat stuff.


> Really this is obvious, and only effects ARCv2, so the corresponding maintainers
> shouldn't need my Ack.

Indeed it was obvious, but last few commits in that file had ACK's from you so I
thought of retaining the tradition so to speak :-)

-Vineet

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2015-06-30  4:42 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-09 11:48 [PATCH 00/28] ARCv2 port to Linux - (B) ISA / Core / platform support Vineet Gupta
2015-06-09 11:48 ` [PATCH 01/28] ARCv2: [intc] HS38 core interrupt controller Vineet Gupta
2015-06-09 11:48 ` [PATCH 02/28] ARCv2: Support for ARCv2 ISA and HS38x cores Vineet Gupta
2015-06-09 11:48 ` [PATCH 03/28] ARCv2: STAR 9000793984: Handle return from intr to Delay Slot Vineet Gupta
2015-06-09 11:48 ` [PATCH 04/28] ARCv2: STAR 9000808988: signals involving " Vineet Gupta
2015-06-09 11:48 ` [PATCH 05/28] ARCv2: STAR 9000814690: Really Re-enable interrupts to avoid deadlocks Vineet Gupta
2015-06-09 11:48 ` [PATCH 06/28] ARCv2: MMUv4: TLB programming Model changes Vineet Gupta
2015-06-09 11:48 ` [PATCH 07/28] ARCv2: MMUv4: cache programming model changes Vineet Gupta
2015-06-09 11:48 ` [PATCH 08/28] ARCv2: MMUv4: support aliasing icache config Vineet Gupta
2015-06-09 11:48 ` [PATCH 09/28] ARCv2: optimised string/mem lib routines Vineet Gupta
2015-06-09 11:48 ` [PATCH 10/28] ARCv2: Adhere to Zero Delay loop restriction Vineet Gupta
2015-06-09 11:48 ` [PATCH 11/28] ARCv2: extable: Enable sorting at build time Vineet Gupta
2015-06-24  5:51   ` Vineet Gupta
2015-06-29 20:38     ` David Daney
2015-06-30  4:41       ` Vineet Gupta
2015-06-09 11:48 ` [PATCH 12/28] ARCv2: clocksource: Introduce 64bit local RTC counter Vineet Gupta
2015-06-09 11:48 ` [PATCH 13/28] ARC: make plat_smp_ops weak to allow over-rides Vineet Gupta
2015-06-09 11:48 ` [PATCH 14/28] ARCv2: SMP: ARConnect debug/robustness Vineet Gupta
2015-06-09 11:48 ` [PATCH 15/28] ARCv2: SMP: clocksource: Enable Global Real Time counter Vineet Gupta
2015-06-09 11:48 ` [PATCH 16/28] ARCv2: SMP: intc: IDU 2nd level intc for dynamic IRQ distribution Vineet Gupta
2015-06-09 11:48 ` [PATCH 17/28] ARC: add compiler barrier to LLSC based cmpxchg Vineet Gupta
2015-06-09 12:23   ` Peter Zijlstra
2015-06-09 11:48 ` [PATCH 18/28] ARC: add smp barriers around atomics per memory-barrriers.txt Vineet Gupta
2015-06-09 12:30   ` Peter Zijlstra
2015-06-10  9:17     ` Vineet Gupta
2015-06-10 10:53       ` Peter Zijlstra
2015-06-11 13:03         ` Vineet Gupta
2015-06-12 12:15   ` [PATCH v2] ARC: add smp barriers around atomics per Documentation/atomic_ops.txt Vineet Gupta
2015-06-12 13:04     ` Peter Zijlstra
2015-06-12 13:16       ` Vineet Gupta
2015-06-09 11:48 ` [PATCH 19/28] arch: conditionally define smp_{mb,rmb,wmb} Vineet Gupta
2015-06-09 12:32   ` Peter Zijlstra
2015-06-09 11:48 ` [PATCH 20/28] ARCv2: barriers Vineet Gupta
2015-06-09 12:40   ` Peter Zijlstra
2015-06-10  9:34     ` Vineet Gupta
2015-06-10 10:58       ` Peter Zijlstra
2015-06-10 13:01         ` Will Deacon
2015-06-11 12:13           ` Vineet Gupta
2015-06-11 13:39             ` Will Deacon
2015-06-19 13:13               ` Vineet Gupta
2015-06-22 13:36                 ` Will Deacon
2015-06-23  7:58                   ` [PATCH v2 " Vineet Gupta
2015-06-23  8:49                     ` Will Deacon
2015-06-23  9:03                       ` Vineet Gupta
2015-06-23  9:26                         ` Will Deacon
2015-06-23  9:52                           ` [PATCH v3 22/28] " Vineet Gupta
2015-06-23 16:28                             ` Will Deacon
2015-06-23  9:25                     ` [PATCH v2 20/28] " Peter Zijlstra
2015-06-23  8:02                   ` [PATCH " Vineet Gupta
2015-06-09 11:48 ` [PATCH 21/28] ARC: Reduce bitops lines of code using macros Vineet Gupta
2015-06-12 12:20   ` [PATCH v2] " Vineet Gupta
2015-06-12 13:05     ` Peter Zijlstra
2015-06-09 11:48 ` [PATCH 22/28] ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock Vineet Gupta
2015-06-09 12:35   ` Peter Zijlstra
2015-06-10 10:01     ` Vineet Gupta
2015-06-10 11:02       ` Peter Zijlstra
2015-06-19  9:55         ` [PATCH v2 " Vineet Gupta
2015-06-19  9:59           ` Will Deacon
2015-06-19 10:09             ` Vineet Gupta
2015-06-23  7:59             ` Vineet Gupta
2015-06-09 11:48 ` [PATCH 23/28] ARCv2: SLC: Handle explcit flush for DMA ops (w/o IO-coherency) Vineet Gupta
2015-06-09 11:48 ` [PATCH 24/28] ARCv2: All bits in place, allow ARCv2 builds Vineet Gupta
2015-06-09 11:48 ` [PATCH 25/28] ARCv2: [nsim*hs*] Support simulation platforms for HS38x cores Vineet Gupta
2015-06-09 11:48 ` [PATCH 26/28] ARC: [axs101] Prepare for AXS103 Vineet Gupta
2015-06-09 11:48 ` [PATCH 27/28] ARCv2: [axs103] Support ARC SDP FPGA platform for HS38x cores Vineet Gupta
2015-06-09 11:48 ` [PATCH 28/28] ARCv2: [vdk] dts files and defconfig for HS38 VDK Vineet Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).