All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-08 10:52 ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: will, catalin.marinas, arnd, olof, misono.tomohiro

(Resend as cover letter title was missing in the first time. Sorry for noise)

Hello,

This series adds Fujitsu A64FX SoC entry in drivers/soc and hardware
barrier driver for it.

[Driver Description]
 A64FX CPU has several functions for HPC workload and hardware barrier
 is one of them. It is a mechanism to realize fast synchronization by
 PEs belonging to the same L3 cache domain by using implementation
 defined hardware registers.
 For more details, see A64FX HPC extension specification in
 https://github.com/fujitsu/A64FX
 
 The driver mainly offers a set of ioctls to manipulate related registers.
 Patch 1-9 implements driver code and patch 10 finally adds kconfig,
 Makefile and MAINTAINER entry for the driver.  

 Also, C library and test program for this driver is available on: 
 https://github.com/fujitsu/hardware_barrier

 The driver is based on v5.11-rc2 and tested on FX700 environment.

[RFC]
 This is the first time we upstream drivers for our chip and I want to
 confirm driver location and patch submission process.

 Based on my observation it seems drivers/soc folder is right place to put
 this driver, so I added Kconfig entry for arm64 platform config, created
 soc/fujitsu folder and updated MAINTAINER entry accordingly (last patch).
 Is it right?

 Also for final submission I think I need to 1) create some public git
 tree to push driver code (github or something), 2) make pull request to
 SOC team (soc@kernel.org). Is it a correct procedure?

 I will appreciate any help/comments.

sidenote: We plan to post other drivers for A64FX HPC extension
(prefetch control and cache control) too anytime soon.

Misono Tomohiro (10):
  soc: fujitsu: hwb: Add hardware barrier driver init/exit code
  soc: fujtisu: hwb: Add open operation
  soc: fujitsu: hwb: Add IOC_BB_ALLOC ioctl
  soc: fujitsu: hwb: Add IOC_BW_ASSIGN ioctl
  soc: fujitsu: hwb: Add IOC_BW_UNASSIGN ioctl
  soc: fujitsu: hwb: Add IOC_BB_FREE ioctl
  soc: fujitsu: hwb: Add IOC_GET_PE_INFO ioctl
  soc: fujitsu: hwb: Add release operation
  soc: fujitsu: hwb: Add sysfs entry
  soc: fujitsu: hwb: Add Kconfig/Makefile to build fujitsu_hwb driver

 MAINTAINERS                            |    7 +
 arch/arm64/Kconfig.platforms           |    5 +
 drivers/soc/Kconfig                    |    1 +
 drivers/soc/Makefile                   |    1 +
 drivers/soc/fujitsu/Kconfig            |   24 +
 drivers/soc/fujitsu/Makefile           |    2 +
 drivers/soc/fujitsu/fujitsu_hwb.c      | 1253 ++++++++++++++++++++++++
 include/uapi/linux/fujitsu_hpc_ioctl.h |   41 +
 8 files changed, 1334 insertions(+)
 create mode 100644 drivers/soc/fujitsu/Kconfig
 create mode 100644 drivers/soc/fujitsu/Makefile
 create mode 100644 drivers/soc/fujitsu/fujitsu_hwb.c
 create mode 100644 include/uapi/linux/fujitsu_hpc_ioctl.h

-- 
2.26.2


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-08 10:52 ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: olof, catalin.marinas, will, misono.tomohiro, arnd

(Resend as cover letter title was missing in the first time. Sorry for noise)

Hello,

This series adds Fujitsu A64FX SoC entry in drivers/soc and hardware
barrier driver for it.

[Driver Description]
 A64FX CPU has several functions for HPC workload and hardware barrier
 is one of them. It is a mechanism to realize fast synchronization by
 PEs belonging to the same L3 cache domain by using implementation
 defined hardware registers.
 For more details, see A64FX HPC extension specification in
 https://github.com/fujitsu/A64FX
 
 The driver mainly offers a set of ioctls to manipulate related registers.
 Patch 1-9 implements driver code and patch 10 finally adds kconfig,
 Makefile and MAINTAINER entry for the driver.  

 Also, C library and test program for this driver is available on: 
 https://github.com/fujitsu/hardware_barrier

 The driver is based on v5.11-rc2 and tested on FX700 environment.

[RFC]
 This is the first time we upstream drivers for our chip and I want to
 confirm driver location and patch submission process.

 Based on my observation it seems drivers/soc folder is right place to put
 this driver, so I added Kconfig entry for arm64 platform config, created
 soc/fujitsu folder and updated MAINTAINER entry accordingly (last patch).
 Is it right?

 Also for final submission I think I need to 1) create some public git
 tree to push driver code (github or something), 2) make pull request to
 SOC team (soc@kernel.org). Is it a correct procedure?

 I will appreciate any help/comments.

sidenote: We plan to post other drivers for A64FX HPC extension
(prefetch control and cache control) too anytime soon.

Misono Tomohiro (10):
  soc: fujitsu: hwb: Add hardware barrier driver init/exit code
  soc: fujtisu: hwb: Add open operation
  soc: fujitsu: hwb: Add IOC_BB_ALLOC ioctl
  soc: fujitsu: hwb: Add IOC_BW_ASSIGN ioctl
  soc: fujitsu: hwb: Add IOC_BW_UNASSIGN ioctl
  soc: fujitsu: hwb: Add IOC_BB_FREE ioctl
  soc: fujitsu: hwb: Add IOC_GET_PE_INFO ioctl
  soc: fujitsu: hwb: Add release operation
  soc: fujitsu: hwb: Add sysfs entry
  soc: fujitsu: hwb: Add Kconfig/Makefile to build fujitsu_hwb driver

 MAINTAINERS                            |    7 +
 arch/arm64/Kconfig.platforms           |    5 +
 drivers/soc/Kconfig                    |    1 +
 drivers/soc/Makefile                   |    1 +
 drivers/soc/fujitsu/Kconfig            |   24 +
 drivers/soc/fujitsu/Makefile           |    2 +
 drivers/soc/fujitsu/fujitsu_hwb.c      | 1253 ++++++++++++++++++++++++
 include/uapi/linux/fujitsu_hpc_ioctl.h |   41 +
 8 files changed, 1334 insertions(+)
 create mode 100644 drivers/soc/fujitsu/Kconfig
 create mode 100644 drivers/soc/fujitsu/Makefile
 create mode 100644 drivers/soc/fujitsu/fujitsu_hwb.c
 create mode 100644 include/uapi/linux/fujitsu_hpc_ioctl.h

-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 01/10] soc: fujitsu: hwb: Add hardware barrier driver init/exit code
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: will, catalin.marinas, arnd, olof, misono.tomohiro

This adds hardware barrier driver's struct definitions and
module init/exit code. We use miscdeice for barrier driver ioctl
and /dev/fujitsu_hwb will be created upon module load.
Following commits will add each ioctl definition.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c | 313 ++++++++++++++++++++++++++++++
 1 file changed, 313 insertions(+)
 create mode 100644 drivers/soc/fujitsu/fujitsu_hwb.c

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
new file mode 100644
index 000000000000..44c32c1683df
--- /dev/null
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -0,0 +1,313 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 FUJITSU LIMITED
+ *
+ * This hardware barrier (HWB) driver provides a set of ioctls to realize synchronization
+ * by PEs in the same Come Memory Group (CMG) by using implementation defined registers.
+ * On A64FX, CMG is the same as L3 cache domain.
+ *
+ * The main purpose of the driver is setting up registers which cannot be accessed
+ * from EL0. However, after initialization, BST_SYNC/LBSY_SYNC registers which is used
+ * in synchronization main logic can be accessed from EL0 (therefore it is fast).
+ *
+ * Simplified barrier operation flow of user application is as follows:
+ *  (one PE)
+ *    1. Call IOC_BB_ALLOC to setup INIT_SYNC register which is shared in a CMG.
+ *       This specifies which PEs join synchronization
+ *  (on each PE joining synchronization)
+ *    2. Call IOC_BW_ASSIGN to setup ASSIGN_SYNC register per PE
+ *    3. Barrier main logic (all logic runs in EL0)
+ *      a) Write 1 to BST_SYNC register
+ *      b) Read LBSY_SYNC register
+ *      c) If LBSY_SYNC value is 1, sync is finished, otherwise go back to b
+ *         (If all PEs joining synchronization write 1 to BST_SYNC, LBSY_SYNC becomes 1)
+ *    4. Call IOC_BW_UNASSIGN to reset ASSIGN_SYNC register
+ *  (one PE)
+ *    5. Call IOC_BB_FREE to reset INIT_SYNC register
+ */
+
+#include <asm/cputype.h>
+#include <linux/bitfield.h>
+#include <linux/bitops.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/kernel.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/spinlock.h>
+#include <linux/slab.h>
+#include <linux/wait.h>
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+#define pr_fmt(fmt) "[%s:%s:%d] " fmt, KBUILD_MODNAME, __func__, __LINE__
+
+/* Since miscdevice is used, /dev/fujitsu_hwb will be created when module is loaded */
+#define FHWB_DEV_NAME "fujitsu_hwb"
+
+/* Implementation defined registers for barrier shared in CMG */
+#define FHWB_INIT_SYNC_BB0_EL1  sys_reg(3, 0, 15, 13, 0)
+#define FHWB_INIT_SYNC_BB1_EL1  sys_reg(3, 0, 15, 13, 1)
+#define FHWB_INIT_SYNC_BB2_EL1  sys_reg(3, 0, 15, 13, 2)
+#define FHWB_INIT_SYNC_BB3_EL1  sys_reg(3, 0, 15, 13, 3)
+#define FHWB_INIT_SYNC_BB4_EL1  sys_reg(3, 0, 15, 13, 4)
+#define FHWB_INIT_SYNC_BB5_EL1  sys_reg(3, 0, 15, 13, 5)
+
+/* Implementation defined registers for barrier per PE */
+#define FHWB_CTRL_EL1           sys_reg(3, 0, 11, 12, 0)
+#define FHWB_BST_BIT_EL1        sys_reg(3, 0, 11, 12, 4)
+#define FHWB_ASSIGN_SYNC_W0_EL1 sys_reg(3, 0, 15, 15, 0)
+#define FHWB_ASSIGN_SYNC_W1_EL1 sys_reg(3, 0, 15, 15, 1)
+#define FHWB_ASSIGN_SYNC_W2_EL1 sys_reg(3, 0, 15, 15, 2)
+#define FHWB_ASSIGN_SYNC_W3_EL1 sys_reg(3, 0, 15, 15, 3)
+
+/* Field definitions for above registers */
+#define FHWB_INIT_SYNC_BB_EL1_MASK_FIELD  GENMASK_ULL(44, 32)
+#define FHWB_INIT_SYNC_BB_EL1_BST_FIELD   GENMASK_ULL(12, 0)
+#define FHWB_CTRL_EL1_EL1AE               BIT_ULL(63)
+#define FHWB_CTRL_EL1_EL0AE               BIT_ULL(62)
+#define FHWB_BST_BIT_EL1_CMG_FILED        GENMASK_ULL(5, 4)
+#define FHWB_BST_BIT_EL1_PE_FILED         GENMASK_ULL(3, 0)
+#define FHWB_ASSIGN_SYNC_W_EL1_VALID      BIT_ULL(63)
+
+static enum cpuhp_state _hp_state;
+
+/*
+ * Each PE has its own CMG and Physical PE number (determined by BST_BIT_EL1 register).
+ * Barrier operation can be performed by PEs which belong to the same CMG.
+ */
+struct pe_info {
+	/* CMG number of this PE */
+	u8 cmg;
+	/* Physical PE number of this PE */
+	u8 ppe;
+};
+
+/* Hardware information of running system */
+struct hwb_hwinfo {
+	/* CPU type (part number) */
+	unsigned int type;
+	/* Number of CMG */
+	u8 num_cmg;
+	/* Number of barrier blade(BB) per CMG */
+	u8 num_bb;
+	/* Number of barrier window(BW) per PE */
+	u8 num_bw;
+	/*
+	 * Maximum number of PE per CMG.
+	 * Depending on BIOS configuration, each CMG has up to max_pe_per_cmg PEs
+	 * and each PE has unique physical PE number between 0 ~ (max_pe_per_cmg-1)
+	 */
+	u8 max_pe_per_cmg;
+
+	/* Bitmap for currently allocated BB per CMG */
+	unsigned long *used_bb_bmap;
+	/* Bitmap for currently allocated BW per PE */
+	unsigned long *used_bw_bmap;
+	/* Mapping table of cpuid -> CMG/PE number */
+	struct pe_info *core_map;
+};
+static struct hwb_hwinfo _hwinfo;
+
+/* List for barrier blade currently used per FD */
+struct hwb_private_data {
+	struct list_head bb_list;
+	spinlock_t list_lock;
+};
+
+/* Each barrier blade info */
+#define BB_FREEING 1
+struct bb_info {
+	/* cpumask for PEs which participate synchronization */
+	cpumask_var_t pemask;
+	/* cpumask for PEs which currently assigned BW for this BB */
+	cpumask_var_t assigned_pemask;
+	/* Added to hwb_private_data::bb_list */
+	struct list_head node;
+	/* For indicating if this bb is currently being freed or not */
+	unsigned long flag;
+	/* For waiting ongoing assign/unassign operation to finish before freeing BB */
+	wait_queue_head_t wq;
+	/* Track ongoing assign/unassign operation count */
+	atomic_t ongoing_assign_count;
+	/* CMG  number of this blade */
+	u8 cmg;
+	/* BB number of this blade */
+	u8 bb;
+	/* Hold assigned window number of each PE corresponding to @assigned_pemask */
+	u8 *bw;
+	/* Track usage count as IOC_BB_FREE and IOC_BW_[UN]ASSIGN might be run in parallel */
+	struct kref kref;
+};
+static struct kmem_cache *bb_info_cachep;
+
+static const struct file_operations fujitsu_hwb_dev_fops = {
+	.owner          = THIS_MODULE,
+};
+
+static struct miscdevice bar_miscdev = {
+	.fops  = &fujitsu_hwb_dev_fops,
+	.minor = MISC_DYNAMIC_MINOR,
+	.mode  = 0666,
+	.name  = FHWB_DEV_NAME,
+};
+
+static void destroy_bb_info_cachep(void)
+{
+	kmem_cache_destroy(bb_info_cachep);
+}
+
+static int __init init_bb_info_cachep(void)
+{
+	/*
+	 * Since cpumask value will be copied from userspace to the beginning of
+	 * struct bb_info, use kmem_cache_create_usercopy to mark that region.
+	 * Otherwise CONFIG_HARDENED_USERCOPY gives user_copy_warn.
+	 */
+	bb_info_cachep = kmem_cache_create_usercopy("bb_info_cache", sizeof(struct bb_info),
+			0, SLAB_HWCACHE_ALIGN, 0, sizeof(cpumask_var_t), NULL);
+	if (bb_info_cachep == NULL)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void free_map(void)
+{
+	kfree(_hwinfo.used_bw_bmap);
+	kfree(_hwinfo.used_bb_bmap);
+	kfree(_hwinfo.core_map);
+}
+
+static int __init alloc_map(void)
+{
+	_hwinfo.core_map = kcalloc(num_possible_cpus(), sizeof(struct pe_info), GFP_KERNEL);
+	_hwinfo.used_bb_bmap = kcalloc(_hwinfo.num_cmg, sizeof(unsigned long), GFP_KERNEL);
+	_hwinfo.used_bw_bmap = kcalloc(num_possible_cpus(), sizeof(unsigned long), GFP_KERNEL);
+	if (!_hwinfo.core_map || !_hwinfo.used_bb_bmap || !_hwinfo.used_bw_bmap)
+		goto fail;
+
+	/* 0 is valid number for both CMG/PE. Set all bits to 1 to represents uninitialized state */
+	memset(_hwinfo.core_map, 0xFF, sizeof(struct pe_info) * num_possible_cpus());
+
+	return 0;
+
+fail:
+	free_map();
+	return -ENOMEM;
+}
+
+/* Get this system's CPU type (part number). If it is not fujitsu CPU, return -1 */
+static int __init get_cpu_type(void)
+{
+	if (read_cpuid_implementor() != ARM_CPU_IMP_FUJITSU)
+		return -1;
+
+	return read_cpuid_part_number();
+}
+
+static int __init setup_hwinfo(void)
+{
+	int type;
+
+	type = get_cpu_type();
+	if (type < 0)
+		return -ENODEV;
+
+	_hwinfo.type = type;
+	switch (type) {
+	case FUJITSU_CPU_PART_A64FX:
+		_hwinfo.num_cmg = 4;
+		_hwinfo.num_bb = 6;
+		_hwinfo.num_bw = 4;
+		_hwinfo.max_pe_per_cmg = 13;
+		break;
+	default:
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+static int hwb_cpu_online(unsigned int cpu)
+{
+	u64 val;
+	int i;
+
+	/* Setup core_map by reading BST_BIT_EL1 register of each PE */
+	val = read_sysreg_s(FHWB_BST_BIT_EL1);
+	_hwinfo.core_map[cpu].cmg = FIELD_GET(FHWB_BST_BIT_EL1_CMG_FILED, val);
+	_hwinfo.core_map[cpu].ppe = FIELD_GET(FHWB_BST_BIT_EL1_PE_FILED, val);
+
+	/* Since these registers' values are UNKNOWN on reset, explicitly clear all */
+	for (i = 0; i < _hwinfo.num_bw; i++)
+		write_bw_reg(i, 0);
+
+	write_sysreg_s(0, FHWB_CTRL_EL1);
+
+	return 0;
+}
+
+static int __init hwb_init(void)
+{
+	int ret;
+
+	ret = setup_hwinfo();
+	if (ret < 0) {
+		pr_err("Unsupported CPU type\n");
+		return ret;
+	}
+
+	ret = alloc_map();
+	if (ret < 0)
+		return ret;
+
+	ret = init_bb_info_cachep();
+	if (ret < 0)
+		goto out1;
+
+	/*
+	 * Setup cpuhp callback to ensure each PE's resource will be initialized
+	 * even if some PEs are offline at this point
+	 */
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "soc/fujitsu_hwb:online",
+		hwb_cpu_online, NULL);
+	if (ret < 0) {
+		pr_err("cpuhp setup failed: %d\n", ret);
+		goto out2;
+	}
+	_hp_state = ret;
+
+	ret = misc_register(&bar_miscdev);
+	if (ret < 0) {
+		pr_err("misc_register failed: %d\n", ret);
+		goto out3;
+	}
+
+	return 0;
+
+out3:
+	cpuhp_remove_state(_hp_state);
+out2:
+	destroy_bb_info_cachep();
+out1:
+	free_map();
+
+	return ret;
+}
+
+static void __exit hwb_exit(void)
+{
+	misc_deregister(&bar_miscdev);
+	cpuhp_remove_state(_hp_state);
+	destroy_bb_info_cachep();
+	free_map();
+}
+
+module_init(hwb_init);
+module_exit(hwb_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("FUJITSU LIMITED");
+MODULE_DESCRIPTION("FUJITSU HPC Hardware Barrier Driver");
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 01/10] soc: fujitsu: hwb: Add hardware barrier driver init/exit code
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: olof, catalin.marinas, will, misono.tomohiro, arnd

This adds hardware barrier driver's struct definitions and
module init/exit code. We use miscdeice for barrier driver ioctl
and /dev/fujitsu_hwb will be created upon module load.
Following commits will add each ioctl definition.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c | 313 ++++++++++++++++++++++++++++++
 1 file changed, 313 insertions(+)
 create mode 100644 drivers/soc/fujitsu/fujitsu_hwb.c

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
new file mode 100644
index 000000000000..44c32c1683df
--- /dev/null
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -0,0 +1,313 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 FUJITSU LIMITED
+ *
+ * This hardware barrier (HWB) driver provides a set of ioctls to realize synchronization
+ * by PEs in the same Come Memory Group (CMG) by using implementation defined registers.
+ * On A64FX, CMG is the same as L3 cache domain.
+ *
+ * The main purpose of the driver is setting up registers which cannot be accessed
+ * from EL0. However, after initialization, BST_SYNC/LBSY_SYNC registers which is used
+ * in synchronization main logic can be accessed from EL0 (therefore it is fast).
+ *
+ * Simplified barrier operation flow of user application is as follows:
+ *  (one PE)
+ *    1. Call IOC_BB_ALLOC to setup INIT_SYNC register which is shared in a CMG.
+ *       This specifies which PEs join synchronization
+ *  (on each PE joining synchronization)
+ *    2. Call IOC_BW_ASSIGN to setup ASSIGN_SYNC register per PE
+ *    3. Barrier main logic (all logic runs in EL0)
+ *      a) Write 1 to BST_SYNC register
+ *      b) Read LBSY_SYNC register
+ *      c) If LBSY_SYNC value is 1, sync is finished, otherwise go back to b
+ *         (If all PEs joining synchronization write 1 to BST_SYNC, LBSY_SYNC becomes 1)
+ *    4. Call IOC_BW_UNASSIGN to reset ASSIGN_SYNC register
+ *  (one PE)
+ *    5. Call IOC_BB_FREE to reset INIT_SYNC register
+ */
+
+#include <asm/cputype.h>
+#include <linux/bitfield.h>
+#include <linux/bitops.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/kernel.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/spinlock.h>
+#include <linux/slab.h>
+#include <linux/wait.h>
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+#define pr_fmt(fmt) "[%s:%s:%d] " fmt, KBUILD_MODNAME, __func__, __LINE__
+
+/* Since miscdevice is used, /dev/fujitsu_hwb will be created when module is loaded */
+#define FHWB_DEV_NAME "fujitsu_hwb"
+
+/* Implementation defined registers for barrier shared in CMG */
+#define FHWB_INIT_SYNC_BB0_EL1  sys_reg(3, 0, 15, 13, 0)
+#define FHWB_INIT_SYNC_BB1_EL1  sys_reg(3, 0, 15, 13, 1)
+#define FHWB_INIT_SYNC_BB2_EL1  sys_reg(3, 0, 15, 13, 2)
+#define FHWB_INIT_SYNC_BB3_EL1  sys_reg(3, 0, 15, 13, 3)
+#define FHWB_INIT_SYNC_BB4_EL1  sys_reg(3, 0, 15, 13, 4)
+#define FHWB_INIT_SYNC_BB5_EL1  sys_reg(3, 0, 15, 13, 5)
+
+/* Implementation defined registers for barrier per PE */
+#define FHWB_CTRL_EL1           sys_reg(3, 0, 11, 12, 0)
+#define FHWB_BST_BIT_EL1        sys_reg(3, 0, 11, 12, 4)
+#define FHWB_ASSIGN_SYNC_W0_EL1 sys_reg(3, 0, 15, 15, 0)
+#define FHWB_ASSIGN_SYNC_W1_EL1 sys_reg(3, 0, 15, 15, 1)
+#define FHWB_ASSIGN_SYNC_W2_EL1 sys_reg(3, 0, 15, 15, 2)
+#define FHWB_ASSIGN_SYNC_W3_EL1 sys_reg(3, 0, 15, 15, 3)
+
+/* Field definitions for above registers */
+#define FHWB_INIT_SYNC_BB_EL1_MASK_FIELD  GENMASK_ULL(44, 32)
+#define FHWB_INIT_SYNC_BB_EL1_BST_FIELD   GENMASK_ULL(12, 0)
+#define FHWB_CTRL_EL1_EL1AE               BIT_ULL(63)
+#define FHWB_CTRL_EL1_EL0AE               BIT_ULL(62)
+#define FHWB_BST_BIT_EL1_CMG_FILED        GENMASK_ULL(5, 4)
+#define FHWB_BST_BIT_EL1_PE_FILED         GENMASK_ULL(3, 0)
+#define FHWB_ASSIGN_SYNC_W_EL1_VALID      BIT_ULL(63)
+
+static enum cpuhp_state _hp_state;
+
+/*
+ * Each PE has its own CMG and Physical PE number (determined by BST_BIT_EL1 register).
+ * Barrier operation can be performed by PEs which belong to the same CMG.
+ */
+struct pe_info {
+	/* CMG number of this PE */
+	u8 cmg;
+	/* Physical PE number of this PE */
+	u8 ppe;
+};
+
+/* Hardware information of running system */
+struct hwb_hwinfo {
+	/* CPU type (part number) */
+	unsigned int type;
+	/* Number of CMG */
+	u8 num_cmg;
+	/* Number of barrier blade(BB) per CMG */
+	u8 num_bb;
+	/* Number of barrier window(BW) per PE */
+	u8 num_bw;
+	/*
+	 * Maximum number of PE per CMG.
+	 * Depending on BIOS configuration, each CMG has up to max_pe_per_cmg PEs
+	 * and each PE has unique physical PE number between 0 ~ (max_pe_per_cmg-1)
+	 */
+	u8 max_pe_per_cmg;
+
+	/* Bitmap for currently allocated BB per CMG */
+	unsigned long *used_bb_bmap;
+	/* Bitmap for currently allocated BW per PE */
+	unsigned long *used_bw_bmap;
+	/* Mapping table of cpuid -> CMG/PE number */
+	struct pe_info *core_map;
+};
+static struct hwb_hwinfo _hwinfo;
+
+/* List for barrier blade currently used per FD */
+struct hwb_private_data {
+	struct list_head bb_list;
+	spinlock_t list_lock;
+};
+
+/* Each barrier blade info */
+#define BB_FREEING 1
+struct bb_info {
+	/* cpumask for PEs which participate synchronization */
+	cpumask_var_t pemask;
+	/* cpumask for PEs which currently assigned BW for this BB */
+	cpumask_var_t assigned_pemask;
+	/* Added to hwb_private_data::bb_list */
+	struct list_head node;
+	/* For indicating if this bb is currently being freed or not */
+	unsigned long flag;
+	/* For waiting ongoing assign/unassign operation to finish before freeing BB */
+	wait_queue_head_t wq;
+	/* Track ongoing assign/unassign operation count */
+	atomic_t ongoing_assign_count;
+	/* CMG  number of this blade */
+	u8 cmg;
+	/* BB number of this blade */
+	u8 bb;
+	/* Hold assigned window number of each PE corresponding to @assigned_pemask */
+	u8 *bw;
+	/* Track usage count as IOC_BB_FREE and IOC_BW_[UN]ASSIGN might be run in parallel */
+	struct kref kref;
+};
+static struct kmem_cache *bb_info_cachep;
+
+static const struct file_operations fujitsu_hwb_dev_fops = {
+	.owner          = THIS_MODULE,
+};
+
+static struct miscdevice bar_miscdev = {
+	.fops  = &fujitsu_hwb_dev_fops,
+	.minor = MISC_DYNAMIC_MINOR,
+	.mode  = 0666,
+	.name  = FHWB_DEV_NAME,
+};
+
+static void destroy_bb_info_cachep(void)
+{
+	kmem_cache_destroy(bb_info_cachep);
+}
+
+static int __init init_bb_info_cachep(void)
+{
+	/*
+	 * Since cpumask value will be copied from userspace to the beginning of
+	 * struct bb_info, use kmem_cache_create_usercopy to mark that region.
+	 * Otherwise CONFIG_HARDENED_USERCOPY gives user_copy_warn.
+	 */
+	bb_info_cachep = kmem_cache_create_usercopy("bb_info_cache", sizeof(struct bb_info),
+			0, SLAB_HWCACHE_ALIGN, 0, sizeof(cpumask_var_t), NULL);
+	if (bb_info_cachep == NULL)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void free_map(void)
+{
+	kfree(_hwinfo.used_bw_bmap);
+	kfree(_hwinfo.used_bb_bmap);
+	kfree(_hwinfo.core_map);
+}
+
+static int __init alloc_map(void)
+{
+	_hwinfo.core_map = kcalloc(num_possible_cpus(), sizeof(struct pe_info), GFP_KERNEL);
+	_hwinfo.used_bb_bmap = kcalloc(_hwinfo.num_cmg, sizeof(unsigned long), GFP_KERNEL);
+	_hwinfo.used_bw_bmap = kcalloc(num_possible_cpus(), sizeof(unsigned long), GFP_KERNEL);
+	if (!_hwinfo.core_map || !_hwinfo.used_bb_bmap || !_hwinfo.used_bw_bmap)
+		goto fail;
+
+	/* 0 is valid number for both CMG/PE. Set all bits to 1 to represents uninitialized state */
+	memset(_hwinfo.core_map, 0xFF, sizeof(struct pe_info) * num_possible_cpus());
+
+	return 0;
+
+fail:
+	free_map();
+	return -ENOMEM;
+}
+
+/* Get this system's CPU type (part number). If it is not fujitsu CPU, return -1 */
+static int __init get_cpu_type(void)
+{
+	if (read_cpuid_implementor() != ARM_CPU_IMP_FUJITSU)
+		return -1;
+
+	return read_cpuid_part_number();
+}
+
+static int __init setup_hwinfo(void)
+{
+	int type;
+
+	type = get_cpu_type();
+	if (type < 0)
+		return -ENODEV;
+
+	_hwinfo.type = type;
+	switch (type) {
+	case FUJITSU_CPU_PART_A64FX:
+		_hwinfo.num_cmg = 4;
+		_hwinfo.num_bb = 6;
+		_hwinfo.num_bw = 4;
+		_hwinfo.max_pe_per_cmg = 13;
+		break;
+	default:
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+static int hwb_cpu_online(unsigned int cpu)
+{
+	u64 val;
+	int i;
+
+	/* Setup core_map by reading BST_BIT_EL1 register of each PE */
+	val = read_sysreg_s(FHWB_BST_BIT_EL1);
+	_hwinfo.core_map[cpu].cmg = FIELD_GET(FHWB_BST_BIT_EL1_CMG_FILED, val);
+	_hwinfo.core_map[cpu].ppe = FIELD_GET(FHWB_BST_BIT_EL1_PE_FILED, val);
+
+	/* Since these registers' values are UNKNOWN on reset, explicitly clear all */
+	for (i = 0; i < _hwinfo.num_bw; i++)
+		write_bw_reg(i, 0);
+
+	write_sysreg_s(0, FHWB_CTRL_EL1);
+
+	return 0;
+}
+
+static int __init hwb_init(void)
+{
+	int ret;
+
+	ret = setup_hwinfo();
+	if (ret < 0) {
+		pr_err("Unsupported CPU type\n");
+		return ret;
+	}
+
+	ret = alloc_map();
+	if (ret < 0)
+		return ret;
+
+	ret = init_bb_info_cachep();
+	if (ret < 0)
+		goto out1;
+
+	/*
+	 * Setup cpuhp callback to ensure each PE's resource will be initialized
+	 * even if some PEs are offline at this point
+	 */
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "soc/fujitsu_hwb:online",
+		hwb_cpu_online, NULL);
+	if (ret < 0) {
+		pr_err("cpuhp setup failed: %d\n", ret);
+		goto out2;
+	}
+	_hp_state = ret;
+
+	ret = misc_register(&bar_miscdev);
+	if (ret < 0) {
+		pr_err("misc_register failed: %d\n", ret);
+		goto out3;
+	}
+
+	return 0;
+
+out3:
+	cpuhp_remove_state(_hp_state);
+out2:
+	destroy_bb_info_cachep();
+out1:
+	free_map();
+
+	return ret;
+}
+
+static void __exit hwb_exit(void)
+{
+	misc_deregister(&bar_miscdev);
+	cpuhp_remove_state(_hp_state);
+	destroy_bb_info_cachep();
+	free_map();
+}
+
+module_init(hwb_init);
+module_exit(hwb_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("FUJITSU LIMITED");
+MODULE_DESCRIPTION("FUJITSU HPC Hardware Barrier Driver");
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 02/10] soc: fujtisu: hwb: Add open operation
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: will, catalin.marinas, arnd, olof, misono.tomohiro

Nothing special. Just preparing private_data for this FD.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
index 44c32c1683df..1dec3d3c652f 100644
--- a/drivers/soc/fujitsu/fujitsu_hwb.c
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -142,8 +142,28 @@ struct bb_info {
 };
 static struct kmem_cache *bb_info_cachep;
 
+static int fujitsu_hwb_dev_open(struct inode *inode, struct file *filp)
+{
+	struct hwb_private_data *pdata;
+
+	pdata = kzalloc(sizeof(*pdata), GFP_KERNEL);
+	if (!pdata)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&pdata->bb_list);
+	spin_lock_init(&pdata->list_lock);
+
+	/*
+	 * misc_open() sets pointer of the miscdevice to filp->private_data.
+	 * Just override it since barrier fops does not use it
+	 */
+	filp->private_data = pdata;
+
+	return 0;
+}
+
 static const struct file_operations fujitsu_hwb_dev_fops = {
 	.owner          = THIS_MODULE,
+	.open           = fujitsu_hwb_dev_open,
 };
 
 static struct miscdevice bar_miscdev = {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 02/10] soc: fujtisu: hwb: Add open operation
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: olof, catalin.marinas, will, misono.tomohiro, arnd

Nothing special. Just preparing private_data for this FD.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
index 44c32c1683df..1dec3d3c652f 100644
--- a/drivers/soc/fujitsu/fujitsu_hwb.c
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -142,8 +142,28 @@ struct bb_info {
 };
 static struct kmem_cache *bb_info_cachep;
 
+static int fujitsu_hwb_dev_open(struct inode *inode, struct file *filp)
+{
+	struct hwb_private_data *pdata;
+
+	pdata = kzalloc(sizeof(*pdata), GFP_KERNEL);
+	if (!pdata)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&pdata->bb_list);
+	spin_lock_init(&pdata->list_lock);
+
+	/*
+	 * misc_open() sets pointer of the miscdevice to filp->private_data.
+	 * Just override it since barrier fops does not use it
+	 */
+	filp->private_data = pdata;
+
+	return 0;
+}
+
 static const struct file_operations fujitsu_hwb_dev_fops = {
 	.owner          = THIS_MODULE,
+	.open           = fujitsu_hwb_dev_open,
 };
 
 static struct miscdevice bar_miscdev = {
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 03/10] soc: fujitsu: hwb: Add IOC_BB_ALLOC ioctl
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: will, catalin.marinas, arnd, olof, misono.tomohiro

IOC_BB_ALLOC ioctl initialize INIT_SYNC register which represents
PEs in a CMG joining synchronization. Although we get cpumask of
PEs from userspace, INIT_SYNC register requires mask value based
on physical PE number which is written in each PE's BST register.
So we perform conversion of cpumask value in validate_and_conver_pemask().

Since INIT_SYNC register is a shared resource per CMG, we pick
up one PE and send IPI to it to write the register.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c      | 223 +++++++++++++++++++++++++
 include/uapi/linux/fujitsu_hpc_ioctl.h |  23 +++
 2 files changed, 246 insertions(+)
 create mode 100644 include/uapi/linux/fujitsu_hpc_ioctl.h

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
index 1dec3d3c652f..24d1bb00f55c 100644
--- a/drivers/soc/fujitsu/fujitsu_hwb.c
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -38,6 +38,8 @@
 #include <linux/slab.h>
 #include <linux/wait.h>
 
+#include <linux/fujitsu_hpc_ioctl.h>
+
 #ifdef pr_fmt
 #undef pr_fmt
 #endif
@@ -142,6 +144,226 @@ struct bb_info {
 };
 static struct kmem_cache *bb_info_cachep;
 
+static void free_bb_info(struct kref *kref)
+{
+	struct bb_info *bb_info = container_of(kref, struct bb_info, kref);
+
+	free_cpumask_var(bb_info->assigned_pemask);
+	free_cpumask_var(bb_info->pemask);
+	kfree(bb_info->bw);
+	kmem_cache_free(bb_info_cachep, bb_info);
+}
+
+static struct bb_info *alloc_bb_info(void)
+{
+	struct bb_info *bb_info;
+
+	bb_info = kmem_cache_zalloc(bb_info_cachep, GFP_KERNEL);
+	if (!bb_info)
+		return NULL;
+
+	bb_info->bw = kcalloc(_hwinfo.max_pe_per_cmg, sizeof(u8), GFP_KERNEL);
+	if (!bb_info->bw) {
+		free_bb_info(&bb_info->kref);
+		return NULL;
+	}
+	if (!zalloc_cpumask_var(&bb_info->pemask, GFP_KERNEL) ||
+		!zalloc_cpumask_var(&bb_info->assigned_pemask, GFP_KERNEL)) {
+		free_bb_info(&bb_info->kref);
+		return NULL;
+	}
+
+	init_waitqueue_head(&bb_info->wq);
+	kref_init(&bb_info->kref);
+
+	return bb_info;
+}
+
+static inline void put_bb_info(struct bb_info *bb_info)
+{
+	kref_put(&bb_info->kref, free_bb_info);
+}
+
+/* Validate pemask's range and convert it to a mask based on physical PE number */
+static int validate_and_convert_pemask(struct bb_info *bb_info, unsigned long *phys_pemask)
+{
+	int cpu;
+	u8 cmg;
+
+	if (cpumask_weight(bb_info->pemask) < 2) {
+		pr_err("pemask needs at least two bit set: %*pbl\n",
+						cpumask_pr_args(bb_info->pemask));
+		return -EINVAL;
+	}
+
+	if (!cpumask_subset(bb_info->pemask, cpu_online_mask)) {
+		pr_err("pemask needs to be subset of online cpu: %*pbl, %*pbl\n",
+			cpumask_pr_args(bb_info->pemask), cpumask_pr_args(cpu_online_mask));
+		return -EINVAL;
+	}
+
+	/*
+	 * INIT_SYNC register requires a mask value based on physical PE number.
+	 * So convert pemask to it while checking if all PEs belongs to the same CMG
+	 */
+	cpu = cpumask_first(bb_info->pemask);
+	cmg = _hwinfo.core_map[cpu].cmg;
+	*phys_pemask = 0;
+	for_each_cpu(cpu, bb_info->pemask) {
+		if (_hwinfo.core_map[cpu].cmg != cmg) {
+			pr_err("All PEs must belong to the same CMG: %*pbl\n",
+							cpumask_pr_args(bb_info->pemask));
+			return -EINVAL;
+		}
+		set_bit(_hwinfo.core_map[cpu].ppe, phys_pemask);
+	}
+	bb_info->cmg = cmg;
+
+	pr_debug("pemask: %*pbl, physical_pemask: %lx\n",
+					cpumask_pr_args(bb_info->pemask), *phys_pemask);
+
+	return 0;
+}
+
+/* Search free BB in_hwinfo->used_bb_bitmap[cmg] */
+static int search_free_bb(u8 cmg)
+{
+	int i;
+
+	for (i = 0; i < _hwinfo.num_bb; i++) {
+		if (!test_and_set_bit(i, &_hwinfo.used_bb_bmap[cmg])) {
+			pr_debug("Use BB %u in CMG %u, bitmap: %lx\n",
+							i, cmg, _hwinfo.used_bb_bmap[cmg]);
+			return i;
+		}
+	}
+
+	pr_err("All barrier blade is currently used in CMG %u\n", cmg);
+	return -EBUSY;
+}
+
+struct init_sync_args {
+	u64 val;
+	u8 bb;
+};
+
+static void write_init_sync_reg(void *args)
+{
+	struct init_sync_args *sync_args = (struct init_sync_args *)args;
+
+	switch (sync_args->bb) {
+	case 0:
+		write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB0_EL1);
+		break;
+	case 1:
+		write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB1_EL1);
+		break;
+	case 2:
+		write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB2_EL1);
+		break;
+	case 3:
+		write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB3_EL1);
+		break;
+	case 4:
+		write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB4_EL1);
+		break;
+	case 5:
+		write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB5_EL1);
+		break;
+	}
+}
+
+/* Send IPI to initialize INIT_SYNC register */
+static void setup_bb(struct bb_info *bb_info, unsigned long phys_pemask)
+{
+	struct init_sync_args args = {0};
+	int cpu;
+
+	/* INIT_SYNC register is shared resource in CMG. Pick one PE to set it up */
+	cpu = cpumask_any(bb_info->pemask);
+
+	args.bb = bb_info->bb;
+	args.val = FIELD_PREP(FHWB_INIT_SYNC_BB_EL1_MASK_FIELD, phys_pemask);
+	on_each_cpu_mask(cpumask_of(cpu), write_init_sync_reg, &args, 1);
+
+	pr_debug("Setup bb. cpu: %d, CMG: %u, BB: %u, bimtap: %lx\n",
+			cpu, bb_info->cmg, bb_info->bb, _hwinfo.used_bb_bmap[bb_info->cmg]);
+}
+
+static int ioc_bb_alloc(struct file *filp, void __user *argp)
+{
+	struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;
+	struct fujitsu_hwb_ioc_bb_ctl bb_ctl;
+	struct bb_info *bb_info;
+	unsigned long physical_pemask;
+	unsigned int size;
+	int ret;
+
+	if (copy_from_user(&bb_ctl, (struct fujitsu_hwb_ioc_bb_ctl __user *)argp,
+						sizeof(struct fujitsu_hwb_ioc_bb_ctl)))
+		return -EFAULT;
+
+	bb_info = alloc_bb_info();
+	if (!bb_info)
+		return -ENOMEM;
+
+	/* cpumask size may vary in user and kernel space. Use the smaller one */
+	size = min(cpumask_size(), bb_ctl.size);
+	if (copy_from_user(bb_info->pemask, bb_ctl.pemask, size)) {
+		ret = -EFAULT;
+		goto put_bb_info;
+	}
+
+	ret = validate_and_convert_pemask(bb_info, &physical_pemask);
+	if (ret < 0)
+		goto put_bb_info;
+
+	ret = search_free_bb(bb_info->cmg);
+	if (ret < 0)
+		goto put_bb_info;
+	bb_info->bb = ret;
+
+	/* Copy back CMG/BB number to be used to user */
+	bb_ctl.cmg = bb_info->cmg;
+	bb_ctl.bb  = bb_info->bb;
+	if (copy_to_user((struct fujitsu_hwb_ioc_bb_ctl __user *)argp, &bb_ctl,
+						sizeof(struct fujitsu_hwb_ioc_bb_ctl))) {
+		ret = -EFAULT;
+		clear_bit(bb_ctl.bb, &_hwinfo.used_bb_bmap[bb_ctl.cmg]);
+		goto put_bb_info;
+	}
+
+	setup_bb(bb_info, physical_pemask);
+
+	spin_lock(&pdata->list_lock);
+	list_add_tail(&bb_info->node, &pdata->bb_list);
+	spin_unlock(&pdata->list_lock);
+
+	return 0;
+
+put_bb_info:
+	put_bb_info(bb_info);
+
+	return ret;
+}
+
+static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	int ret;
+
+	switch (cmd) {
+	case FUJITSU_HWB_IOC_BB_ALLOC:
+		ret = ioc_bb_alloc(filp, argp);
+		break;
+	default:
+		ret = -ENOTTY;
+		break;
+	}
+
+	return ret;
+}
+
 static int fujitsu_hwb_dev_open(struct inode *inode, struct file *filp)
 {
 	struct hwb_private_data *pdata;
@@ -164,6 +386,7 @@ static int fujitsu_hwb_dev_open(struct inode *inode, struct file *filp)
 static const struct file_operations fujitsu_hwb_dev_fops = {
 	.owner          = THIS_MODULE,
 	.open           = fujitsu_hwb_dev_open,
+	.unlocked_ioctl = fujitsu_hwb_dev_ioctl,
 };
 
 static struct miscdevice bar_miscdev = {
diff --git a/include/uapi/linux/fujitsu_hpc_ioctl.h b/include/uapi/linux/fujitsu_hpc_ioctl.h
new file mode 100644
index 000000000000..c87a5bad3f59
--- /dev/null
+++ b/include/uapi/linux/fujitsu_hpc_ioctl.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/* Copyright 2020 FUJITSU LIMITED */
+#ifndef _UAPI_LINUX_FUJITSU_HPC_IOC_H
+#define _UAPI_LINUX_FUJITSU_HPC_IOC_H
+
+#include <linux/ioctl.h>
+#include <asm/types.h>
+
+#define __FUJITSU_IOCTL_MAGIC 'F'
+
+/* ioctl definitions for hardware barrier driver */
+struct fujitsu_hwb_ioc_bb_ctl {
+	__u8 cmg;
+	__u8 bb;
+	__u8 unused[2];
+	__u32 size;
+	unsigned long __user *pemask;
+};
+
+#define FUJITSU_HWB_IOC_BB_ALLOC _IOWR(__FUJITSU_IOCTL_MAGIC, \
+	0x00, struct fujitsu_hwb_ioc_bb_ctl)
+
+#endif /* _UAPI_LINUX_FUJITSU_HPC_IOC_H */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 03/10] soc: fujitsu: hwb: Add IOC_BB_ALLOC ioctl
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: olof, catalin.marinas, will, misono.tomohiro, arnd

IOC_BB_ALLOC ioctl initialize INIT_SYNC register which represents
PEs in a CMG joining synchronization. Although we get cpumask of
PEs from userspace, INIT_SYNC register requires mask value based
on physical PE number which is written in each PE's BST register.
So we perform conversion of cpumask value in validate_and_conver_pemask().

Since INIT_SYNC register is a shared resource per CMG, we pick
up one PE and send IPI to it to write the register.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c      | 223 +++++++++++++++++++++++++
 include/uapi/linux/fujitsu_hpc_ioctl.h |  23 +++
 2 files changed, 246 insertions(+)
 create mode 100644 include/uapi/linux/fujitsu_hpc_ioctl.h

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
index 1dec3d3c652f..24d1bb00f55c 100644
--- a/drivers/soc/fujitsu/fujitsu_hwb.c
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -38,6 +38,8 @@
 #include <linux/slab.h>
 #include <linux/wait.h>
 
+#include <linux/fujitsu_hpc_ioctl.h>
+
 #ifdef pr_fmt
 #undef pr_fmt
 #endif
@@ -142,6 +144,226 @@ struct bb_info {
 };
 static struct kmem_cache *bb_info_cachep;
 
+static void free_bb_info(struct kref *kref)
+{
+	struct bb_info *bb_info = container_of(kref, struct bb_info, kref);
+
+	free_cpumask_var(bb_info->assigned_pemask);
+	free_cpumask_var(bb_info->pemask);
+	kfree(bb_info->bw);
+	kmem_cache_free(bb_info_cachep, bb_info);
+}
+
+static struct bb_info *alloc_bb_info(void)
+{
+	struct bb_info *bb_info;
+
+	bb_info = kmem_cache_zalloc(bb_info_cachep, GFP_KERNEL);
+	if (!bb_info)
+		return NULL;
+
+	bb_info->bw = kcalloc(_hwinfo.max_pe_per_cmg, sizeof(u8), GFP_KERNEL);
+	if (!bb_info->bw) {
+		free_bb_info(&bb_info->kref);
+		return NULL;
+	}
+	if (!zalloc_cpumask_var(&bb_info->pemask, GFP_KERNEL) ||
+		!zalloc_cpumask_var(&bb_info->assigned_pemask, GFP_KERNEL)) {
+		free_bb_info(&bb_info->kref);
+		return NULL;
+	}
+
+	init_waitqueue_head(&bb_info->wq);
+	kref_init(&bb_info->kref);
+
+	return bb_info;
+}
+
+static inline void put_bb_info(struct bb_info *bb_info)
+{
+	kref_put(&bb_info->kref, free_bb_info);
+}
+
+/* Validate pemask's range and convert it to a mask based on physical PE number */
+static int validate_and_convert_pemask(struct bb_info *bb_info, unsigned long *phys_pemask)
+{
+	int cpu;
+	u8 cmg;
+
+	if (cpumask_weight(bb_info->pemask) < 2) {
+		pr_err("pemask needs at least two bit set: %*pbl\n",
+						cpumask_pr_args(bb_info->pemask));
+		return -EINVAL;
+	}
+
+	if (!cpumask_subset(bb_info->pemask, cpu_online_mask)) {
+		pr_err("pemask needs to be subset of online cpu: %*pbl, %*pbl\n",
+			cpumask_pr_args(bb_info->pemask), cpumask_pr_args(cpu_online_mask));
+		return -EINVAL;
+	}
+
+	/*
+	 * INIT_SYNC register requires a mask value based on physical PE number.
+	 * So convert pemask to it while checking if all PEs belongs to the same CMG
+	 */
+	cpu = cpumask_first(bb_info->pemask);
+	cmg = _hwinfo.core_map[cpu].cmg;
+	*phys_pemask = 0;
+	for_each_cpu(cpu, bb_info->pemask) {
+		if (_hwinfo.core_map[cpu].cmg != cmg) {
+			pr_err("All PEs must belong to the same CMG: %*pbl\n",
+							cpumask_pr_args(bb_info->pemask));
+			return -EINVAL;
+		}
+		set_bit(_hwinfo.core_map[cpu].ppe, phys_pemask);
+	}
+	bb_info->cmg = cmg;
+
+	pr_debug("pemask: %*pbl, physical_pemask: %lx\n",
+					cpumask_pr_args(bb_info->pemask), *phys_pemask);
+
+	return 0;
+}
+
+/* Search free BB in_hwinfo->used_bb_bitmap[cmg] */
+static int search_free_bb(u8 cmg)
+{
+	int i;
+
+	for (i = 0; i < _hwinfo.num_bb; i++) {
+		if (!test_and_set_bit(i, &_hwinfo.used_bb_bmap[cmg])) {
+			pr_debug("Use BB %u in CMG %u, bitmap: %lx\n",
+							i, cmg, _hwinfo.used_bb_bmap[cmg]);
+			return i;
+		}
+	}
+
+	pr_err("All barrier blade is currently used in CMG %u\n", cmg);
+	return -EBUSY;
+}
+
+struct init_sync_args {
+	u64 val;
+	u8 bb;
+};
+
+static void write_init_sync_reg(void *args)
+{
+	struct init_sync_args *sync_args = (struct init_sync_args *)args;
+
+	switch (sync_args->bb) {
+	case 0:
+		write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB0_EL1);
+		break;
+	case 1:
+		write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB1_EL1);
+		break;
+	case 2:
+		write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB2_EL1);
+		break;
+	case 3:
+		write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB3_EL1);
+		break;
+	case 4:
+		write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB4_EL1);
+		break;
+	case 5:
+		write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB5_EL1);
+		break;
+	}
+}
+
+/* Send IPI to initialize INIT_SYNC register */
+static void setup_bb(struct bb_info *bb_info, unsigned long phys_pemask)
+{
+	struct init_sync_args args = {0};
+	int cpu;
+
+	/* INIT_SYNC register is shared resource in CMG. Pick one PE to set it up */
+	cpu = cpumask_any(bb_info->pemask);
+
+	args.bb = bb_info->bb;
+	args.val = FIELD_PREP(FHWB_INIT_SYNC_BB_EL1_MASK_FIELD, phys_pemask);
+	on_each_cpu_mask(cpumask_of(cpu), write_init_sync_reg, &args, 1);
+
+	pr_debug("Setup bb. cpu: %d, CMG: %u, BB: %u, bimtap: %lx\n",
+			cpu, bb_info->cmg, bb_info->bb, _hwinfo.used_bb_bmap[bb_info->cmg]);
+}
+
+static int ioc_bb_alloc(struct file *filp, void __user *argp)
+{
+	struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;
+	struct fujitsu_hwb_ioc_bb_ctl bb_ctl;
+	struct bb_info *bb_info;
+	unsigned long physical_pemask;
+	unsigned int size;
+	int ret;
+
+	if (copy_from_user(&bb_ctl, (struct fujitsu_hwb_ioc_bb_ctl __user *)argp,
+						sizeof(struct fujitsu_hwb_ioc_bb_ctl)))
+		return -EFAULT;
+
+	bb_info = alloc_bb_info();
+	if (!bb_info)
+		return -ENOMEM;
+
+	/* cpumask size may vary in user and kernel space. Use the smaller one */
+	size = min(cpumask_size(), bb_ctl.size);
+	if (copy_from_user(bb_info->pemask, bb_ctl.pemask, size)) {
+		ret = -EFAULT;
+		goto put_bb_info;
+	}
+
+	ret = validate_and_convert_pemask(bb_info, &physical_pemask);
+	if (ret < 0)
+		goto put_bb_info;
+
+	ret = search_free_bb(bb_info->cmg);
+	if (ret < 0)
+		goto put_bb_info;
+	bb_info->bb = ret;
+
+	/* Copy back CMG/BB number to be used to user */
+	bb_ctl.cmg = bb_info->cmg;
+	bb_ctl.bb  = bb_info->bb;
+	if (copy_to_user((struct fujitsu_hwb_ioc_bb_ctl __user *)argp, &bb_ctl,
+						sizeof(struct fujitsu_hwb_ioc_bb_ctl))) {
+		ret = -EFAULT;
+		clear_bit(bb_ctl.bb, &_hwinfo.used_bb_bmap[bb_ctl.cmg]);
+		goto put_bb_info;
+	}
+
+	setup_bb(bb_info, physical_pemask);
+
+	spin_lock(&pdata->list_lock);
+	list_add_tail(&bb_info->node, &pdata->bb_list);
+	spin_unlock(&pdata->list_lock);
+
+	return 0;
+
+put_bb_info:
+	put_bb_info(bb_info);
+
+	return ret;
+}
+
+static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	int ret;
+
+	switch (cmd) {
+	case FUJITSU_HWB_IOC_BB_ALLOC:
+		ret = ioc_bb_alloc(filp, argp);
+		break;
+	default:
+		ret = -ENOTTY;
+		break;
+	}
+
+	return ret;
+}
+
 static int fujitsu_hwb_dev_open(struct inode *inode, struct file *filp)
 {
 	struct hwb_private_data *pdata;
@@ -164,6 +386,7 @@ static int fujitsu_hwb_dev_open(struct inode *inode, struct file *filp)
 static const struct file_operations fujitsu_hwb_dev_fops = {
 	.owner          = THIS_MODULE,
 	.open           = fujitsu_hwb_dev_open,
+	.unlocked_ioctl = fujitsu_hwb_dev_ioctl,
 };
 
 static struct miscdevice bar_miscdev = {
diff --git a/include/uapi/linux/fujitsu_hpc_ioctl.h b/include/uapi/linux/fujitsu_hpc_ioctl.h
new file mode 100644
index 000000000000..c87a5bad3f59
--- /dev/null
+++ b/include/uapi/linux/fujitsu_hpc_ioctl.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/* Copyright 2020 FUJITSU LIMITED */
+#ifndef _UAPI_LINUX_FUJITSU_HPC_IOC_H
+#define _UAPI_LINUX_FUJITSU_HPC_IOC_H
+
+#include <linux/ioctl.h>
+#include <asm/types.h>
+
+#define __FUJITSU_IOCTL_MAGIC 'F'
+
+/* ioctl definitions for hardware barrier driver */
+struct fujitsu_hwb_ioc_bb_ctl {
+	__u8 cmg;
+	__u8 bb;
+	__u8 unused[2];
+	__u32 size;
+	unsigned long __user *pemask;
+};
+
+#define FUJITSU_HWB_IOC_BB_ALLOC _IOWR(__FUJITSU_IOCTL_MAGIC, \
+	0x00, struct fujitsu_hwb_ioc_bb_ctl)
+
+#endif /* _UAPI_LINUX_FUJITSU_HPC_IOC_H */
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 04/10] soc: fujitsu: hwb: Add IOC_BW_ASSIGN ioctl
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: will, catalin.marinas, arnd, olof, misono.tomohiro

IOC_BW_ASSIGN ioctl sets up control register and window register on each
PE. Therefore, this ioctl will be called as many times as the number of
PEs joining synchronization. Also, the caller thread is expected to be
bound to one PE at this point.

Since barrier window and control register is per-PE resource and
context switch is not supported at this point, we forbid concurrent
running of ioc_bw_assign() on the same PE by disabling preemption.

After this ioctl returns successfully, user program (EL0) can access
BST_SYNC/LBSY_SYNC registers directly to realize synchronization.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c      | 187 +++++++++++++++++++++++++
 include/uapi/linux/fujitsu_hpc_ioctl.h |   7 +
 2 files changed, 194 insertions(+)

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
index 24d1bb00f55c..85ffc1642dd9 100644
--- a/drivers/soc/fujitsu/fujitsu_hwb.c
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -179,6 +179,34 @@ static struct bb_info *alloc_bb_info(void)
 	return bb_info;
 }
 
+static struct bb_info *get_bb_info(struct hwb_private_data *pdata, u8 cmg, u8 bb)
+{
+	struct bb_info *bb_info;
+
+	if (cmg >= _hwinfo.num_cmg || bb >= _hwinfo.num_bb) {
+		pr_err("CMG/BB number is invalid: %u/%u\n", cmg, bb);
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (!test_bit(bb, &_hwinfo.used_bb_bmap[cmg])) {
+		pr_err("BB is not allocated: %u/%u\n", cmg, bb);
+		return ERR_PTR(-ENOENT);
+	}
+
+	spin_lock(&pdata->list_lock);
+	list_for_each_entry(bb_info, &pdata->bb_list, node) {
+		if (bb_info->cmg == cmg && bb_info->bb == bb) {
+			kref_get(&bb_info->kref);
+			spin_unlock(&pdata->list_lock);
+			return bb_info;
+		}
+	}
+	spin_unlock(&pdata->list_lock);
+
+	pr_err("BB is not allocated by this process: %u/%u\n", cmg, bb);
+	return ERR_PTR(-EPERM);
+}
+
 static inline void put_bb_info(struct bb_info *bb_info)
 {
 	kref_put(&bb_info->kref, free_bb_info);
@@ -347,6 +375,162 @@ static int ioc_bb_alloc(struct file *filp, void __user *argp)
 	return ret;
 }
 
+static bool is_bound_only_one_pe(void)
+{
+	if (current->nr_cpus_allowed == 1)
+		return true;
+
+	pr_err("Thread must be bound to one PE between assign and unassign\n");
+	return false;
+}
+
+/* Check if this PE can be assignable and set window number to be used to @bw_ctl->window */
+static int is_bw_assignable(struct bb_info *bb_info, struct fujitsu_hwb_ioc_bw_ctl *bw_ctl, int cpu)
+{
+	int i;
+
+	if (!cpumask_test_cpu(cpu, bb_info->pemask)) {
+		pr_err("This pe is not supposed to join sync, %u/%u/%d\n",
+						bb_info->cmg, bb_info->bb, cpu);
+		return -EINVAL;
+	}
+
+	if (cpumask_test_cpu(cpu, bb_info->assigned_pemask)) {
+		pr_err("This pe is already assigned to window: %u/%u/%d\n",
+						bb_info->cmg, bb_info->bb, cpu);
+		return -EINVAL;
+	}
+
+	if (bw_ctl->window >= 0) {
+		/* User specifies window number to use. Check if available */
+		if (bw_ctl->window >= _hwinfo.num_bw) {
+			pr_err("Window number is invalid: %u/%u/%d/%u\n",
+						bb_info->cmg, bb_info->bb, cpu, bw_ctl->window);
+			return -EINVAL;
+		}
+
+		if (test_bit(bw_ctl->window, &_hwinfo.used_bw_bmap[cpu])) {
+			pr_err("Window is already used: %u/%u/%d/%u\n",
+						bb_info->cmg, bb_info->bb, cpu, bw_ctl->window);
+			return -EBUSY;
+		}
+	} else {
+		/* User does not specify window number. Use free window */
+		i = ffz(_hwinfo.used_bw_bmap[cpu]);
+		if (i == _hwinfo.num_bw) {
+			pr_err("There is no free window: %u/%u/%d\n",
+					bb_info->cmg, bb_info->bb, cpu);
+			return -EBUSY;
+		}
+
+		bw_ctl->window = i;
+	}
+
+	return 0;
+}
+
+static void setup_ctl_reg(struct bb_info *bb_info, int cpu)
+{
+	u64 val;
+
+	if (_hwinfo.used_bw_bmap[cpu] != 0)
+		/* Already setup. Nothing todo */
+		return;
+
+	/*
+	 * This is the first assign on this PE.
+	 * Setup ctrl reg to allow access to BST_SYNC/LBSY_SYNC from EL0
+	 */
+	val = (FHWB_CTRL_EL1_EL1AE | FHWB_CTRL_EL1_EL0AE);
+	write_sysreg_s(val, FHWB_CTRL_EL1);
+
+	pr_debug("Setup ctl reg. cpu: %d\n", cpu);
+}
+
+static void write_bw_reg(u8 window, u64 val)
+{
+	switch (window) {
+	case 0:
+		write_sysreg_s(val, FHWB_ASSIGN_SYNC_W0_EL1);
+		break;
+	case 1:
+		write_sysreg_s(val, FHWB_ASSIGN_SYNC_W1_EL1);
+		break;
+	case 2:
+		write_sysreg_s(val, FHWB_ASSIGN_SYNC_W2_EL1);
+		break;
+	case 3:
+		write_sysreg_s(val, FHWB_ASSIGN_SYNC_W3_EL1);
+		break;
+	}
+}
+
+static void setup_bw(struct bb_info *bb_info, struct fujitsu_hwb_ioc_bw_ctl *bw_ctl, int cpu)
+{
+	u64 val;
+	u8 ppe;
+
+	/* Set valid bit and bb number */
+	val = (FHWB_ASSIGN_SYNC_W_EL1_VALID | bw_ctl->bb);
+	write_bw_reg(bw_ctl->window, val);
+
+	/* Update bitmap info */
+	ppe = _hwinfo.core_map[cpu].ppe;
+	set_bit(bw_ctl->window, &_hwinfo.used_bw_bmap[cpu]);
+	cpumask_set_cpu(cpu, bb_info->assigned_pemask);
+	bb_info->bw[ppe] = bw_ctl->window;
+
+	pr_debug("Setup bw. cpu: %d, window: %u, BB: %u, bw_bmap: %lx, assigned_pemask: %*pbl\n",
+			cpu, bw_ctl->window, bw_ctl->bb,
+			_hwinfo.used_bw_bmap[cpu], cpumask_pr_args(bb_info->assigned_pemask));
+}
+
+static int ioc_bw_assign(struct file *filp, void __user *argp)
+{
+	struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;
+	struct fujitsu_hwb_ioc_bw_ctl bw_ctl;
+	struct bb_info *bb_info;
+	int ret;
+	int cpu;
+	u8 cmg;
+
+	if (!is_bound_only_one_pe())
+		return -EPERM;
+
+	if (copy_from_user(&bw_ctl, (struct fujitsu_hwb_ioc_bw_ctl __user *)argp,
+						sizeof(struct fujitsu_hwb_ioc_bw_ctl)))
+		return -EFAULT;
+
+	cpu = smp_processor_id();
+	cmg = _hwinfo.core_map[cpu].cmg;
+	bb_info = get_bb_info(pdata, cmg, bw_ctl.bb);
+	if (IS_ERR(bb_info))
+		return PTR_ERR(bb_info);
+
+	/*
+	 * Barrier window register and control register is each PE's resource.
+	 * context switch is not supported and mutual exclusion is needed for
+	 * assign and unassign on this PE
+	 */
+	preempt_disable();
+	ret = is_bw_assignable(bb_info, &bw_ctl, cpu);
+	if (!ret) {
+		setup_ctl_reg(bb_info, cpu);
+		setup_bw(bb_info, &bw_ctl, cpu);
+	}
+	preempt_enable();
+
+	put_bb_info(bb_info);
+
+	/* Copy back window number to be used to user */
+	if (!ret && copy_to_user((struct fujitsu_hwb_ioc_bw_ctl __user *)argp, &bw_ctl,
+						sizeof(struct fujitsu_hwb_ioc_bw_ctl)))
+		/* Leave cleanup to f_op->release() */
+		return -EFAULT;
+
+	return ret;
+}
+
 static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -356,6 +540,9 @@ static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned
 	case FUJITSU_HWB_IOC_BB_ALLOC:
 		ret = ioc_bb_alloc(filp, argp);
 		break;
+	case FUJITSU_HWB_IOC_BW_ASSIGN:
+		ret = ioc_bw_assign(filp, argp);
+		break;
 	default:
 		ret = -ENOTTY;
 		break;
diff --git a/include/uapi/linux/fujitsu_hpc_ioctl.h b/include/uapi/linux/fujitsu_hpc_ioctl.h
index c87a5bad3f59..ad90f8f3ae9a 100644
--- a/include/uapi/linux/fujitsu_hpc_ioctl.h
+++ b/include/uapi/linux/fujitsu_hpc_ioctl.h
@@ -17,7 +17,14 @@ struct fujitsu_hwb_ioc_bb_ctl {
 	unsigned long __user *pemask;
 };
 
+struct fujitsu_hwb_ioc_bw_ctl {
+	__u8 bb;
+	__s8 window;
+};
+
 #define FUJITSU_HWB_IOC_BB_ALLOC _IOWR(__FUJITSU_IOCTL_MAGIC, \
 	0x00, struct fujitsu_hwb_ioc_bb_ctl)
+#define FUJITSU_HWB_IOC_BW_ASSIGN _IOWR(__FUJITSU_IOCTL_MAGIC, \
+	0x01, struct fujitsu_hwb_ioc_bw_ctl)
 
 #endif /* _UAPI_LINUX_FUJITSU_HPC_IOC_H */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 04/10] soc: fujitsu: hwb: Add IOC_BW_ASSIGN ioctl
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: olof, catalin.marinas, will, misono.tomohiro, arnd

IOC_BW_ASSIGN ioctl sets up control register and window register on each
PE. Therefore, this ioctl will be called as many times as the number of
PEs joining synchronization. Also, the caller thread is expected to be
bound to one PE at this point.

Since barrier window and control register is per-PE resource and
context switch is not supported at this point, we forbid concurrent
running of ioc_bw_assign() on the same PE by disabling preemption.

After this ioctl returns successfully, user program (EL0) can access
BST_SYNC/LBSY_SYNC registers directly to realize synchronization.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c      | 187 +++++++++++++++++++++++++
 include/uapi/linux/fujitsu_hpc_ioctl.h |   7 +
 2 files changed, 194 insertions(+)

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
index 24d1bb00f55c..85ffc1642dd9 100644
--- a/drivers/soc/fujitsu/fujitsu_hwb.c
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -179,6 +179,34 @@ static struct bb_info *alloc_bb_info(void)
 	return bb_info;
 }
 
+static struct bb_info *get_bb_info(struct hwb_private_data *pdata, u8 cmg, u8 bb)
+{
+	struct bb_info *bb_info;
+
+	if (cmg >= _hwinfo.num_cmg || bb >= _hwinfo.num_bb) {
+		pr_err("CMG/BB number is invalid: %u/%u\n", cmg, bb);
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (!test_bit(bb, &_hwinfo.used_bb_bmap[cmg])) {
+		pr_err("BB is not allocated: %u/%u\n", cmg, bb);
+		return ERR_PTR(-ENOENT);
+	}
+
+	spin_lock(&pdata->list_lock);
+	list_for_each_entry(bb_info, &pdata->bb_list, node) {
+		if (bb_info->cmg == cmg && bb_info->bb == bb) {
+			kref_get(&bb_info->kref);
+			spin_unlock(&pdata->list_lock);
+			return bb_info;
+		}
+	}
+	spin_unlock(&pdata->list_lock);
+
+	pr_err("BB is not allocated by this process: %u/%u\n", cmg, bb);
+	return ERR_PTR(-EPERM);
+}
+
 static inline void put_bb_info(struct bb_info *bb_info)
 {
 	kref_put(&bb_info->kref, free_bb_info);
@@ -347,6 +375,162 @@ static int ioc_bb_alloc(struct file *filp, void __user *argp)
 	return ret;
 }
 
+static bool is_bound_only_one_pe(void)
+{
+	if (current->nr_cpus_allowed == 1)
+		return true;
+
+	pr_err("Thread must be bound to one PE between assign and unassign\n");
+	return false;
+}
+
+/* Check if this PE can be assignable and set window number to be used to @bw_ctl->window */
+static int is_bw_assignable(struct bb_info *bb_info, struct fujitsu_hwb_ioc_bw_ctl *bw_ctl, int cpu)
+{
+	int i;
+
+	if (!cpumask_test_cpu(cpu, bb_info->pemask)) {
+		pr_err("This pe is not supposed to join sync, %u/%u/%d\n",
+						bb_info->cmg, bb_info->bb, cpu);
+		return -EINVAL;
+	}
+
+	if (cpumask_test_cpu(cpu, bb_info->assigned_pemask)) {
+		pr_err("This pe is already assigned to window: %u/%u/%d\n",
+						bb_info->cmg, bb_info->bb, cpu);
+		return -EINVAL;
+	}
+
+	if (bw_ctl->window >= 0) {
+		/* User specifies window number to use. Check if available */
+		if (bw_ctl->window >= _hwinfo.num_bw) {
+			pr_err("Window number is invalid: %u/%u/%d/%u\n",
+						bb_info->cmg, bb_info->bb, cpu, bw_ctl->window);
+			return -EINVAL;
+		}
+
+		if (test_bit(bw_ctl->window, &_hwinfo.used_bw_bmap[cpu])) {
+			pr_err("Window is already used: %u/%u/%d/%u\n",
+						bb_info->cmg, bb_info->bb, cpu, bw_ctl->window);
+			return -EBUSY;
+		}
+	} else {
+		/* User does not specify window number. Use free window */
+		i = ffz(_hwinfo.used_bw_bmap[cpu]);
+		if (i == _hwinfo.num_bw) {
+			pr_err("There is no free window: %u/%u/%d\n",
+					bb_info->cmg, bb_info->bb, cpu);
+			return -EBUSY;
+		}
+
+		bw_ctl->window = i;
+	}
+
+	return 0;
+}
+
+static void setup_ctl_reg(struct bb_info *bb_info, int cpu)
+{
+	u64 val;
+
+	if (_hwinfo.used_bw_bmap[cpu] != 0)
+		/* Already setup. Nothing todo */
+		return;
+
+	/*
+	 * This is the first assign on this PE.
+	 * Setup ctrl reg to allow access to BST_SYNC/LBSY_SYNC from EL0
+	 */
+	val = (FHWB_CTRL_EL1_EL1AE | FHWB_CTRL_EL1_EL0AE);
+	write_sysreg_s(val, FHWB_CTRL_EL1);
+
+	pr_debug("Setup ctl reg. cpu: %d\n", cpu);
+}
+
+static void write_bw_reg(u8 window, u64 val)
+{
+	switch (window) {
+	case 0:
+		write_sysreg_s(val, FHWB_ASSIGN_SYNC_W0_EL1);
+		break;
+	case 1:
+		write_sysreg_s(val, FHWB_ASSIGN_SYNC_W1_EL1);
+		break;
+	case 2:
+		write_sysreg_s(val, FHWB_ASSIGN_SYNC_W2_EL1);
+		break;
+	case 3:
+		write_sysreg_s(val, FHWB_ASSIGN_SYNC_W3_EL1);
+		break;
+	}
+}
+
+static void setup_bw(struct bb_info *bb_info, struct fujitsu_hwb_ioc_bw_ctl *bw_ctl, int cpu)
+{
+	u64 val;
+	u8 ppe;
+
+	/* Set valid bit and bb number */
+	val = (FHWB_ASSIGN_SYNC_W_EL1_VALID | bw_ctl->bb);
+	write_bw_reg(bw_ctl->window, val);
+
+	/* Update bitmap info */
+	ppe = _hwinfo.core_map[cpu].ppe;
+	set_bit(bw_ctl->window, &_hwinfo.used_bw_bmap[cpu]);
+	cpumask_set_cpu(cpu, bb_info->assigned_pemask);
+	bb_info->bw[ppe] = bw_ctl->window;
+
+	pr_debug("Setup bw. cpu: %d, window: %u, BB: %u, bw_bmap: %lx, assigned_pemask: %*pbl\n",
+			cpu, bw_ctl->window, bw_ctl->bb,
+			_hwinfo.used_bw_bmap[cpu], cpumask_pr_args(bb_info->assigned_pemask));
+}
+
+static int ioc_bw_assign(struct file *filp, void __user *argp)
+{
+	struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;
+	struct fujitsu_hwb_ioc_bw_ctl bw_ctl;
+	struct bb_info *bb_info;
+	int ret;
+	int cpu;
+	u8 cmg;
+
+	if (!is_bound_only_one_pe())
+		return -EPERM;
+
+	if (copy_from_user(&bw_ctl, (struct fujitsu_hwb_ioc_bw_ctl __user *)argp,
+						sizeof(struct fujitsu_hwb_ioc_bw_ctl)))
+		return -EFAULT;
+
+	cpu = smp_processor_id();
+	cmg = _hwinfo.core_map[cpu].cmg;
+	bb_info = get_bb_info(pdata, cmg, bw_ctl.bb);
+	if (IS_ERR(bb_info))
+		return PTR_ERR(bb_info);
+
+	/*
+	 * Barrier window register and control register is each PE's resource.
+	 * context switch is not supported and mutual exclusion is needed for
+	 * assign and unassign on this PE
+	 */
+	preempt_disable();
+	ret = is_bw_assignable(bb_info, &bw_ctl, cpu);
+	if (!ret) {
+		setup_ctl_reg(bb_info, cpu);
+		setup_bw(bb_info, &bw_ctl, cpu);
+	}
+	preempt_enable();
+
+	put_bb_info(bb_info);
+
+	/* Copy back window number to be used to user */
+	if (!ret && copy_to_user((struct fujitsu_hwb_ioc_bw_ctl __user *)argp, &bw_ctl,
+						sizeof(struct fujitsu_hwb_ioc_bw_ctl)))
+		/* Leave cleanup to f_op->release() */
+		return -EFAULT;
+
+	return ret;
+}
+
 static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -356,6 +540,9 @@ static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned
 	case FUJITSU_HWB_IOC_BB_ALLOC:
 		ret = ioc_bb_alloc(filp, argp);
 		break;
+	case FUJITSU_HWB_IOC_BW_ASSIGN:
+		ret = ioc_bw_assign(filp, argp);
+		break;
 	default:
 		ret = -ENOTTY;
 		break;
diff --git a/include/uapi/linux/fujitsu_hpc_ioctl.h b/include/uapi/linux/fujitsu_hpc_ioctl.h
index c87a5bad3f59..ad90f8f3ae9a 100644
--- a/include/uapi/linux/fujitsu_hpc_ioctl.h
+++ b/include/uapi/linux/fujitsu_hpc_ioctl.h
@@ -17,7 +17,14 @@ struct fujitsu_hwb_ioc_bb_ctl {
 	unsigned long __user *pemask;
 };
 
+struct fujitsu_hwb_ioc_bw_ctl {
+	__u8 bb;
+	__s8 window;
+};
+
 #define FUJITSU_HWB_IOC_BB_ALLOC _IOWR(__FUJITSU_IOCTL_MAGIC, \
 	0x00, struct fujitsu_hwb_ioc_bb_ctl)
+#define FUJITSU_HWB_IOC_BW_ASSIGN _IOWR(__FUJITSU_IOCTL_MAGIC, \
+	0x01, struct fujitsu_hwb_ioc_bw_ctl)
 
 #endif /* _UAPI_LINUX_FUJITSU_HPC_IOC_H */
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 05/10] soc: fujitsu: hwb: Add IOC_BW_UNASSIGN ioctl
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: will, catalin.marinas, arnd, olof, misono.tomohiro

IOC_BW_UNASSIGN resets what IOC_BW_ASSIGN did on each PE.
This ioctl will also be called as many times as the number of PEs joining
synchronization.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c      | 93 ++++++++++++++++++++++++++
 include/uapi/linux/fujitsu_hpc_ioctl.h |  2 +
 2 files changed, 95 insertions(+)

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
index 85ffc1642dd9..8c4cabd60872 100644
--- a/drivers/soc/fujitsu/fujitsu_hwb.c
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -531,6 +531,96 @@ static int ioc_bw_assign(struct file *filp, void __user *argp)
 	return ret;
 }
 
+static int is_bw_unassignable(struct bb_info *bb_info, int cpu)
+{
+	u8 ppe;
+
+	if (!cpumask_test_and_clear_cpu(cpu, bb_info->assigned_pemask)) {
+		pr_err("This pe is not assigned: %u/%u/%d\n", bb_info->cmg, bb_info->bb, cpu);
+		return -EINVAL;
+	}
+
+	ppe = _hwinfo.core_map[cpu].ppe;
+	if (!test_bit(bb_info->bw[ppe], &_hwinfo.used_bw_bmap[cpu])) {
+		/* should not happen */
+		pr_crit("Logic error. This window is not assigned: %u/%u/%d\n",
+							bb_info->cmg, bb_info->bb, cpu);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void teardown_ctl_reg(struct bb_info *bb_info, int cpu)
+{
+	if (_hwinfo.used_bw_bmap[cpu] != 0)
+		/* Other window on this PE is still in use. Nothing todo */
+		return;
+
+	/*
+	 * This is the last unassign on this PE.
+	 * Clear all bits to disallow access to BST_SYNC/LBSY_SYNC from EL0
+	 */
+	write_sysreg_s(0, FHWB_CTRL_EL1);
+
+	pr_debug("Teardown ctl reg. cpu: %d\n", cpu);
+}
+
+static void teardown_bw(struct bb_info *bb_info, int cpu)
+{
+	u8 window;
+	u8 ppe;
+
+	/* Just clear all bits */
+	ppe = _hwinfo.core_map[cpu].ppe;
+	window = bb_info->bw[ppe];
+	write_bw_reg(window, 0);
+
+	/* Update bitmap info */
+	clear_bit(window, &_hwinfo.used_bw_bmap[cpu]);
+	bb_info->bw[ppe] = -1;
+
+	pr_debug("Teardown bw. cpu: %d, window: %u, BB: %u, bw_bmap: %lx, assigned_pemask: %*pbl\n",
+			cpu, window, bb_info->bb,
+			_hwinfo.used_bw_bmap[cpu], cpumask_pr_args(bb_info->assigned_pemask));
+}
+
+static int ioc_bw_unassign(struct file *filp, void __user *argp)
+{
+	struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;
+	struct fujitsu_hwb_ioc_bw_ctl bw_ctl;
+	struct bb_info *bb_info;
+	int cpu;
+	int ret;
+	u8 cmg;
+
+	if (!is_bound_only_one_pe())
+		return -EPERM;
+
+	if (copy_from_user(&bw_ctl, (struct fujitsu_hwb_ioc_bw_ctl __user *)argp,
+						sizeof(struct fujitsu_hwb_ioc_bw_ctl)))
+		return -EFAULT;
+
+	cpu = smp_processor_id();
+	cmg = _hwinfo.core_map[cpu].cmg;
+	bb_info = get_bb_info(pdata, cmg, bw_ctl.bb);
+	if (IS_ERR(bb_info))
+		return PTR_ERR(bb_info);
+
+	/* See comments in ioc_bw_assign() */
+	preempt_disable();
+	ret = is_bw_unassignable(bb_info, cpu);
+	if (!ret) {
+		teardown_bw(bb_info, cpu);
+		teardown_ctl_reg(bb_info, cpu);
+	}
+	preempt_enable();
+
+	put_bb_info(bb_info);
+
+	return ret;
+}
+
 static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -543,6 +633,9 @@ static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned
 	case FUJITSU_HWB_IOC_BW_ASSIGN:
 		ret = ioc_bw_assign(filp, argp);
 		break;
+	case FUJITSU_HWB_IOC_BW_UNASSIGN:
+		ret = ioc_bw_unassign(filp, argp);
+		break;
 	default:
 		ret = -ENOTTY;
 		break;
diff --git a/include/uapi/linux/fujitsu_hpc_ioctl.h b/include/uapi/linux/fujitsu_hpc_ioctl.h
index ad90f8f3ae9a..396029f2bc0d 100644
--- a/include/uapi/linux/fujitsu_hpc_ioctl.h
+++ b/include/uapi/linux/fujitsu_hpc_ioctl.h
@@ -26,5 +26,7 @@ struct fujitsu_hwb_ioc_bw_ctl {
 	0x00, struct fujitsu_hwb_ioc_bb_ctl)
 #define FUJITSU_HWB_IOC_BW_ASSIGN _IOWR(__FUJITSU_IOCTL_MAGIC, \
 	0x01, struct fujitsu_hwb_ioc_bw_ctl)
+#define FUJITSU_HWB_IOC_BW_UNASSIGN _IOW(__FUJITSU_IOCTL_MAGIC, \
+	0x02, struct fujitsu_hwb_ioc_bw_ctl)
 
 #endif /* _UAPI_LINUX_FUJITSU_HPC_IOC_H */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 05/10] soc: fujitsu: hwb: Add IOC_BW_UNASSIGN ioctl
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: olof, catalin.marinas, will, misono.tomohiro, arnd

IOC_BW_UNASSIGN resets what IOC_BW_ASSIGN did on each PE.
This ioctl will also be called as many times as the number of PEs joining
synchronization.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c      | 93 ++++++++++++++++++++++++++
 include/uapi/linux/fujitsu_hpc_ioctl.h |  2 +
 2 files changed, 95 insertions(+)

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
index 85ffc1642dd9..8c4cabd60872 100644
--- a/drivers/soc/fujitsu/fujitsu_hwb.c
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -531,6 +531,96 @@ static int ioc_bw_assign(struct file *filp, void __user *argp)
 	return ret;
 }
 
+static int is_bw_unassignable(struct bb_info *bb_info, int cpu)
+{
+	u8 ppe;
+
+	if (!cpumask_test_and_clear_cpu(cpu, bb_info->assigned_pemask)) {
+		pr_err("This pe is not assigned: %u/%u/%d\n", bb_info->cmg, bb_info->bb, cpu);
+		return -EINVAL;
+	}
+
+	ppe = _hwinfo.core_map[cpu].ppe;
+	if (!test_bit(bb_info->bw[ppe], &_hwinfo.used_bw_bmap[cpu])) {
+		/* should not happen */
+		pr_crit("Logic error. This window is not assigned: %u/%u/%d\n",
+							bb_info->cmg, bb_info->bb, cpu);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void teardown_ctl_reg(struct bb_info *bb_info, int cpu)
+{
+	if (_hwinfo.used_bw_bmap[cpu] != 0)
+		/* Other window on this PE is still in use. Nothing todo */
+		return;
+
+	/*
+	 * This is the last unassign on this PE.
+	 * Clear all bits to disallow access to BST_SYNC/LBSY_SYNC from EL0
+	 */
+	write_sysreg_s(0, FHWB_CTRL_EL1);
+
+	pr_debug("Teardown ctl reg. cpu: %d\n", cpu);
+}
+
+static void teardown_bw(struct bb_info *bb_info, int cpu)
+{
+	u8 window;
+	u8 ppe;
+
+	/* Just clear all bits */
+	ppe = _hwinfo.core_map[cpu].ppe;
+	window = bb_info->bw[ppe];
+	write_bw_reg(window, 0);
+
+	/* Update bitmap info */
+	clear_bit(window, &_hwinfo.used_bw_bmap[cpu]);
+	bb_info->bw[ppe] = -1;
+
+	pr_debug("Teardown bw. cpu: %d, window: %u, BB: %u, bw_bmap: %lx, assigned_pemask: %*pbl\n",
+			cpu, window, bb_info->bb,
+			_hwinfo.used_bw_bmap[cpu], cpumask_pr_args(bb_info->assigned_pemask));
+}
+
+static int ioc_bw_unassign(struct file *filp, void __user *argp)
+{
+	struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;
+	struct fujitsu_hwb_ioc_bw_ctl bw_ctl;
+	struct bb_info *bb_info;
+	int cpu;
+	int ret;
+	u8 cmg;
+
+	if (!is_bound_only_one_pe())
+		return -EPERM;
+
+	if (copy_from_user(&bw_ctl, (struct fujitsu_hwb_ioc_bw_ctl __user *)argp,
+						sizeof(struct fujitsu_hwb_ioc_bw_ctl)))
+		return -EFAULT;
+
+	cpu = smp_processor_id();
+	cmg = _hwinfo.core_map[cpu].cmg;
+	bb_info = get_bb_info(pdata, cmg, bw_ctl.bb);
+	if (IS_ERR(bb_info))
+		return PTR_ERR(bb_info);
+
+	/* See comments in ioc_bw_assign() */
+	preempt_disable();
+	ret = is_bw_unassignable(bb_info, cpu);
+	if (!ret) {
+		teardown_bw(bb_info, cpu);
+		teardown_ctl_reg(bb_info, cpu);
+	}
+	preempt_enable();
+
+	put_bb_info(bb_info);
+
+	return ret;
+}
+
 static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -543,6 +633,9 @@ static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned
 	case FUJITSU_HWB_IOC_BW_ASSIGN:
 		ret = ioc_bw_assign(filp, argp);
 		break;
+	case FUJITSU_HWB_IOC_BW_UNASSIGN:
+		ret = ioc_bw_unassign(filp, argp);
+		break;
 	default:
 		ret = -ENOTTY;
 		break;
diff --git a/include/uapi/linux/fujitsu_hpc_ioctl.h b/include/uapi/linux/fujitsu_hpc_ioctl.h
index ad90f8f3ae9a..396029f2bc0d 100644
--- a/include/uapi/linux/fujitsu_hpc_ioctl.h
+++ b/include/uapi/linux/fujitsu_hpc_ioctl.h
@@ -26,5 +26,7 @@ struct fujitsu_hwb_ioc_bw_ctl {
 	0x00, struct fujitsu_hwb_ioc_bb_ctl)
 #define FUJITSU_HWB_IOC_BW_ASSIGN _IOWR(__FUJITSU_IOCTL_MAGIC, \
 	0x01, struct fujitsu_hwb_ioc_bw_ctl)
+#define FUJITSU_HWB_IOC_BW_UNASSIGN _IOW(__FUJITSU_IOCTL_MAGIC, \
+	0x02, struct fujitsu_hwb_ioc_bw_ctl)
 
 #endif /* _UAPI_LINUX_FUJITSU_HPC_IOC_H */
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 06/10] soc: fujitsu: hwb: Add IOC_BB_FREE ioctl
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: will, catalin.marinas, arnd, olof, misono.tomohiro

IOC_BB_FREE ioctl resets what IOC_BB_ALLOC ioctl did.

We need to forbid assign/unassign operation happens during free
operation, so we set the flag to indicate it and also wait
ongoing assign/unassign to finish first.

If there exist PEs on which IOC_BW_UNASSIGN is not called,
we send IPI to do effectively the same operation as IOC_BW_UNASSIGN.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c      | 125 ++++++++++++++++++++++++-
 include/uapi/linux/fujitsu_hpc_ioctl.h |   2 +
 2 files changed, 122 insertions(+), 5 deletions(-)

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
index 8c4cabd60872..2535942cc0d7 100644
--- a/drivers/soc/fujitsu/fujitsu_hwb.c
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -196,6 +196,12 @@ static struct bb_info *get_bb_info(struct hwb_private_data *pdata, u8 cmg, u8 bb
 	spin_lock(&pdata->list_lock);
 	list_for_each_entry(bb_info, &pdata->bb_list, node) {
 		if (bb_info->cmg == cmg && bb_info->bb == bb) {
+			if (test_bit(BB_FREEING,  &bb_info->flag)) {
+				pr_err("BB is currently being freed: %u/%u\n", cmg, bb);
+				spin_unlock(&pdata->list_lock);
+				return ERR_PTR(-EPERM);
+			}
+
 			kref_get(&bb_info->kref);
 			spin_unlock(&pdata->list_lock);
 			return bb_info;
@@ -389,6 +395,11 @@ static int is_bw_assignable(struct bb_info *bb_info, struct fujitsu_hwb_ioc_bw_c
 {
 	int i;
 
+	if (test_bit(BB_FREEING, &bb_info->flag)) {
+		pr_err("BB is currently being freed: %u/%u/%d\n", bb_info->cmg, bb_info->bb, cpu);
+		return -EPERM;
+	}
+
 	if (!cpumask_test_cpu(cpu, bb_info->pemask)) {
 		pr_err("This pe is not supposed to join sync, %u/%u/%d\n",
 						bb_info->cmg, bb_info->bb, cpu);
@@ -490,6 +501,7 @@ static int ioc_bw_assign(struct file *filp, void __user *argp)
 	struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;
 	struct fujitsu_hwb_ioc_bw_ctl bw_ctl;
 	struct bb_info *bb_info;
+	unsigned long flags;
 	int ret;
 	int cpu;
 	u8 cmg;
@@ -507,18 +519,27 @@ static int ioc_bw_assign(struct file *filp, void __user *argp)
 	if (IS_ERR(bb_info))
 		return PTR_ERR(bb_info);
 
+	/* Increment counter to avoid this BB being freed during assign operation */
+	atomic_inc(&bb_info->ongoing_assign_count);
+
 	/*
 	 * Barrier window register and control register is each PE's resource.
 	 * context switch is not supported and mutual exclusion is needed for
-	 * assign and unassign on this PE
+	 * assign and unassign on this PE. As cleanup_bw() might be executed
+	 * in interrupt context via on_each_cpu_mask, disabling irq is needed
 	 */
-	preempt_disable();
+	local_irq_save(flags);
 	ret = is_bw_assignable(bb_info, &bw_ctl, cpu);
 	if (!ret) {
 		setup_ctl_reg(bb_info, cpu);
 		setup_bw(bb_info, &bw_ctl, cpu);
 	}
-	preempt_enable();
+	local_irq_restore(flags);
+
+	/* Wakeup if there is a process waiting in ioc_bb_free() */
+	if (atomic_dec_and_test(&bb_info->ongoing_assign_count) &&
+					test_bit(BB_FREEING, &bb_info->flag))
+		wake_up(&bb_info->wq);
 
 	put_bb_info(bb_info);
 
@@ -535,6 +556,12 @@ static int is_bw_unassignable(struct bb_info *bb_info, int cpu)
 {
 	u8 ppe;
 
+	if (test_bit(BB_FREEING, &bb_info->flag)) {
+		pr_err("This bb is currently being freed: %u/%u/%d\n",
+							bb_info->cmg, bb_info->bb, cpu);
+		return -EPERM;
+	}
+
 	if (!cpumask_test_and_clear_cpu(cpu, bb_info->assigned_pemask)) {
 		pr_err("This pe is not assigned: %u/%u/%d\n", bb_info->cmg, bb_info->bb, cpu);
 		return -EINVAL;
@@ -590,6 +617,7 @@ static int ioc_bw_unassign(struct file *filp, void __user *argp)
 	struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;
 	struct fujitsu_hwb_ioc_bw_ctl bw_ctl;
 	struct bb_info *bb_info;
+	unsigned long flags;
 	int cpu;
 	int ret;
 	u8 cmg;
@@ -608,19 +636,103 @@ static int ioc_bw_unassign(struct file *filp, void __user *argp)
 		return PTR_ERR(bb_info);
 
 	/* See comments in ioc_bw_assign() */
-	preempt_disable();
+	atomic_inc(&bb_info->ongoing_assign_count);
+
+	local_irq_save(flags);
 	ret = is_bw_unassignable(bb_info, cpu);
 	if (!ret) {
 		teardown_bw(bb_info, cpu);
 		teardown_ctl_reg(bb_info, cpu);
 	}
-	preempt_enable();
+	local_irq_restore(flags);
+
+	if (atomic_dec_and_test(&bb_info->ongoing_assign_count) &&
+					test_bit(BB_FREEING, &bb_info->flag))
+		wake_up(&bb_info->wq);
 
 	put_bb_info(bb_info);
 
 	return ret;
 }
 
+static void cleanup_bw_func(void *args)
+{
+	struct bb_info *bb_info = (struct bb_info *)args;
+	int cpu = smp_processor_id();
+
+	teardown_bw(bb_info, cpu);
+	teardown_ctl_reg(bb_info, cpu);
+}
+
+/* Send IPI to reset INIT_SYNC register */
+static void teardown_bb(struct bb_info *bb_info)
+{
+	struct init_sync_args args = {0};
+	int cpu;
+
+	/* Reset BW on each PE if IOC_BW_UNASSIGN is not called properly  */
+	if (cpumask_weight(bb_info->assigned_pemask) != 0) {
+		pr_warn("unassign is not called properly. CMG: %d, BB: %d, unassigned PE: %*pbl\n",
+			bb_info->cmg, bb_info->bb, cpumask_pr_args(bb_info->assigned_pemask));
+		on_each_cpu_mask(bb_info->assigned_pemask, cleanup_bw_func, bb_info, 1);
+	}
+
+	/* INIT_SYNC register is shared resource in CMG. Pick one PE */
+	cpu = cpumask_any(bb_info->pemask);
+
+	args.bb = bb_info->bb;
+	/* Just clear all bits */
+	args.val = 0;
+	on_each_cpu_mask(cpumask_of(cpu), write_init_sync_reg, &args, 1);
+
+	clear_bit(bb_info->bb, &_hwinfo.used_bb_bmap[bb_info->cmg]);
+
+	pr_debug("Teardown bb: cpu: %d, CMG: %u, BB: %u, bitmap: %lx\n",
+			cpu, bb_info->cmg, bb_info->bb, _hwinfo.used_bb_bmap[bb_info->cmg]);
+}
+
+static int ioc_bb_free(struct file *filp,  void __user *argp)
+{
+	struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;
+	struct fujitsu_hwb_ioc_bb_ctl bb_ctl;
+	struct bb_info *bb_info;
+
+	if (copy_from_user(&bb_ctl, (struct fujitsu_hwb_ioc_bb_ctl __user *)argp,
+						sizeof(struct fujitsu_hwb_ioc_bb_ctl)))
+		return -EFAULT;
+
+	bb_info = get_bb_info(pdata, bb_ctl.cmg, bb_ctl.bb);
+	if (IS_ERR(bb_info))
+		return PTR_ERR(bb_info);
+
+	/* Forbid free/assign/unassign operation from now on */
+	if (test_and_set_bit(BB_FREEING, &bb_info->flag)) {
+		pr_err("IOC_BB_FREE is already called. CMG: %u, BB: %u\n", bb_ctl.cmg, bb_ctl.bb);
+		put_bb_info(bb_info);
+		return -EPERM;
+	}
+
+	/* Wait current ongoing assign/unassign operation to finish */
+	if (wait_event_interruptible(bb_info->wq,
+					(atomic_read(&bb_info->ongoing_assign_count) == 0))) {
+		clear_bit(BB_FREEING, &bb_info->flag);
+		put_bb_info(bb_info);
+		pr_debug("IOC_BB_FREE is interrupted. CMG: %u, BB: %u\n", bb_ctl.cmg, bb_ctl.bb);
+		return -EINTR;
+	}
+
+	teardown_bb(bb_info);
+	spin_lock(&pdata->list_lock);
+	list_del_init(&bb_info->node);
+	spin_unlock(&pdata->list_lock);
+
+	/* 1 put for get_bb_info, 1 for alloc_bb_info */
+	put_bb_info(bb_info);
+	put_bb_info(bb_info);
+
+	return 0;
+}
+
 static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -636,6 +748,9 @@ static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned
 	case FUJITSU_HWB_IOC_BW_UNASSIGN:
 		ret = ioc_bw_unassign(filp, argp);
 		break;
+	case FUJITSU_HWB_IOC_BB_FREE:
+		ret = ioc_bb_free(filp, argp);
+		break;
 	default:
 		ret = -ENOTTY;
 		break;
diff --git a/include/uapi/linux/fujitsu_hpc_ioctl.h b/include/uapi/linux/fujitsu_hpc_ioctl.h
index 396029f2bc0d..7a285d8db0a9 100644
--- a/include/uapi/linux/fujitsu_hpc_ioctl.h
+++ b/include/uapi/linux/fujitsu_hpc_ioctl.h
@@ -28,5 +28,7 @@ struct fujitsu_hwb_ioc_bw_ctl {
 	0x01, struct fujitsu_hwb_ioc_bw_ctl)
 #define FUJITSU_HWB_IOC_BW_UNASSIGN _IOW(__FUJITSU_IOCTL_MAGIC, \
 	0x02, struct fujitsu_hwb_ioc_bw_ctl)
+#define FUJITSU_HWB_IOC_BB_FREE _IOW(__FUJITSU_IOCTL_MAGIC, \
+	0x03, struct fujitsu_hwb_ioc_bb_ctl)
 
 #endif /* _UAPI_LINUX_FUJITSU_HPC_IOC_H */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 06/10] soc: fujitsu: hwb: Add IOC_BB_FREE ioctl
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: olof, catalin.marinas, will, misono.tomohiro, arnd

IOC_BB_FREE ioctl resets what IOC_BB_ALLOC ioctl did.

We need to forbid assign/unassign operation happens during free
operation, so we set the flag to indicate it and also wait
ongoing assign/unassign to finish first.

If there exist PEs on which IOC_BW_UNASSIGN is not called,
we send IPI to do effectively the same operation as IOC_BW_UNASSIGN.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c      | 125 ++++++++++++++++++++++++-
 include/uapi/linux/fujitsu_hpc_ioctl.h |   2 +
 2 files changed, 122 insertions(+), 5 deletions(-)

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
index 8c4cabd60872..2535942cc0d7 100644
--- a/drivers/soc/fujitsu/fujitsu_hwb.c
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -196,6 +196,12 @@ static struct bb_info *get_bb_info(struct hwb_private_data *pdata, u8 cmg, u8 bb
 	spin_lock(&pdata->list_lock);
 	list_for_each_entry(bb_info, &pdata->bb_list, node) {
 		if (bb_info->cmg == cmg && bb_info->bb == bb) {
+			if (test_bit(BB_FREEING,  &bb_info->flag)) {
+				pr_err("BB is currently being freed: %u/%u\n", cmg, bb);
+				spin_unlock(&pdata->list_lock);
+				return ERR_PTR(-EPERM);
+			}
+
 			kref_get(&bb_info->kref);
 			spin_unlock(&pdata->list_lock);
 			return bb_info;
@@ -389,6 +395,11 @@ static int is_bw_assignable(struct bb_info *bb_info, struct fujitsu_hwb_ioc_bw_c
 {
 	int i;
 
+	if (test_bit(BB_FREEING, &bb_info->flag)) {
+		pr_err("BB is currently being freed: %u/%u/%d\n", bb_info->cmg, bb_info->bb, cpu);
+		return -EPERM;
+	}
+
 	if (!cpumask_test_cpu(cpu, bb_info->pemask)) {
 		pr_err("This pe is not supposed to join sync, %u/%u/%d\n",
 						bb_info->cmg, bb_info->bb, cpu);
@@ -490,6 +501,7 @@ static int ioc_bw_assign(struct file *filp, void __user *argp)
 	struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;
 	struct fujitsu_hwb_ioc_bw_ctl bw_ctl;
 	struct bb_info *bb_info;
+	unsigned long flags;
 	int ret;
 	int cpu;
 	u8 cmg;
@@ -507,18 +519,27 @@ static int ioc_bw_assign(struct file *filp, void __user *argp)
 	if (IS_ERR(bb_info))
 		return PTR_ERR(bb_info);
 
+	/* Increment counter to avoid this BB being freed during assign operation */
+	atomic_inc(&bb_info->ongoing_assign_count);
+
 	/*
 	 * Barrier window register and control register is each PE's resource.
 	 * context switch is not supported and mutual exclusion is needed for
-	 * assign and unassign on this PE
+	 * assign and unassign on this PE. As cleanup_bw() might be executed
+	 * in interrupt context via on_each_cpu_mask, disabling irq is needed
 	 */
-	preempt_disable();
+	local_irq_save(flags);
 	ret = is_bw_assignable(bb_info, &bw_ctl, cpu);
 	if (!ret) {
 		setup_ctl_reg(bb_info, cpu);
 		setup_bw(bb_info, &bw_ctl, cpu);
 	}
-	preempt_enable();
+	local_irq_restore(flags);
+
+	/* Wakeup if there is a process waiting in ioc_bb_free() */
+	if (atomic_dec_and_test(&bb_info->ongoing_assign_count) &&
+					test_bit(BB_FREEING, &bb_info->flag))
+		wake_up(&bb_info->wq);
 
 	put_bb_info(bb_info);
 
@@ -535,6 +556,12 @@ static int is_bw_unassignable(struct bb_info *bb_info, int cpu)
 {
 	u8 ppe;
 
+	if (test_bit(BB_FREEING, &bb_info->flag)) {
+		pr_err("This bb is currently being freed: %u/%u/%d\n",
+							bb_info->cmg, bb_info->bb, cpu);
+		return -EPERM;
+	}
+
 	if (!cpumask_test_and_clear_cpu(cpu, bb_info->assigned_pemask)) {
 		pr_err("This pe is not assigned: %u/%u/%d\n", bb_info->cmg, bb_info->bb, cpu);
 		return -EINVAL;
@@ -590,6 +617,7 @@ static int ioc_bw_unassign(struct file *filp, void __user *argp)
 	struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;
 	struct fujitsu_hwb_ioc_bw_ctl bw_ctl;
 	struct bb_info *bb_info;
+	unsigned long flags;
 	int cpu;
 	int ret;
 	u8 cmg;
@@ -608,19 +636,103 @@ static int ioc_bw_unassign(struct file *filp, void __user *argp)
 		return PTR_ERR(bb_info);
 
 	/* See comments in ioc_bw_assign() */
-	preempt_disable();
+	atomic_inc(&bb_info->ongoing_assign_count);
+
+	local_irq_save(flags);
 	ret = is_bw_unassignable(bb_info, cpu);
 	if (!ret) {
 		teardown_bw(bb_info, cpu);
 		teardown_ctl_reg(bb_info, cpu);
 	}
-	preempt_enable();
+	local_irq_restore(flags);
+
+	if (atomic_dec_and_test(&bb_info->ongoing_assign_count) &&
+					test_bit(BB_FREEING, &bb_info->flag))
+		wake_up(&bb_info->wq);
 
 	put_bb_info(bb_info);
 
 	return ret;
 }
 
+static void cleanup_bw_func(void *args)
+{
+	struct bb_info *bb_info = (struct bb_info *)args;
+	int cpu = smp_processor_id();
+
+	teardown_bw(bb_info, cpu);
+	teardown_ctl_reg(bb_info, cpu);
+}
+
+/* Send IPI to reset INIT_SYNC register */
+static void teardown_bb(struct bb_info *bb_info)
+{
+	struct init_sync_args args = {0};
+	int cpu;
+
+	/* Reset BW on each PE if IOC_BW_UNASSIGN is not called properly  */
+	if (cpumask_weight(bb_info->assigned_pemask) != 0) {
+		pr_warn("unassign is not called properly. CMG: %d, BB: %d, unassigned PE: %*pbl\n",
+			bb_info->cmg, bb_info->bb, cpumask_pr_args(bb_info->assigned_pemask));
+		on_each_cpu_mask(bb_info->assigned_pemask, cleanup_bw_func, bb_info, 1);
+	}
+
+	/* INIT_SYNC register is shared resource in CMG. Pick one PE */
+	cpu = cpumask_any(bb_info->pemask);
+
+	args.bb = bb_info->bb;
+	/* Just clear all bits */
+	args.val = 0;
+	on_each_cpu_mask(cpumask_of(cpu), write_init_sync_reg, &args, 1);
+
+	clear_bit(bb_info->bb, &_hwinfo.used_bb_bmap[bb_info->cmg]);
+
+	pr_debug("Teardown bb: cpu: %d, CMG: %u, BB: %u, bitmap: %lx\n",
+			cpu, bb_info->cmg, bb_info->bb, _hwinfo.used_bb_bmap[bb_info->cmg]);
+}
+
+static int ioc_bb_free(struct file *filp,  void __user *argp)
+{
+	struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;
+	struct fujitsu_hwb_ioc_bb_ctl bb_ctl;
+	struct bb_info *bb_info;
+
+	if (copy_from_user(&bb_ctl, (struct fujitsu_hwb_ioc_bb_ctl __user *)argp,
+						sizeof(struct fujitsu_hwb_ioc_bb_ctl)))
+		return -EFAULT;
+
+	bb_info = get_bb_info(pdata, bb_ctl.cmg, bb_ctl.bb);
+	if (IS_ERR(bb_info))
+		return PTR_ERR(bb_info);
+
+	/* Forbid free/assign/unassign operation from now on */
+	if (test_and_set_bit(BB_FREEING, &bb_info->flag)) {
+		pr_err("IOC_BB_FREE is already called. CMG: %u, BB: %u\n", bb_ctl.cmg, bb_ctl.bb);
+		put_bb_info(bb_info);
+		return -EPERM;
+	}
+
+	/* Wait current ongoing assign/unassign operation to finish */
+	if (wait_event_interruptible(bb_info->wq,
+					(atomic_read(&bb_info->ongoing_assign_count) == 0))) {
+		clear_bit(BB_FREEING, &bb_info->flag);
+		put_bb_info(bb_info);
+		pr_debug("IOC_BB_FREE is interrupted. CMG: %u, BB: %u\n", bb_ctl.cmg, bb_ctl.bb);
+		return -EINTR;
+	}
+
+	teardown_bb(bb_info);
+	spin_lock(&pdata->list_lock);
+	list_del_init(&bb_info->node);
+	spin_unlock(&pdata->list_lock);
+
+	/* 1 put for get_bb_info, 1 for alloc_bb_info */
+	put_bb_info(bb_info);
+	put_bb_info(bb_info);
+
+	return 0;
+}
+
 static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -636,6 +748,9 @@ static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned
 	case FUJITSU_HWB_IOC_BW_UNASSIGN:
 		ret = ioc_bw_unassign(filp, argp);
 		break;
+	case FUJITSU_HWB_IOC_BB_FREE:
+		ret = ioc_bb_free(filp, argp);
+		break;
 	default:
 		ret = -ENOTTY;
 		break;
diff --git a/include/uapi/linux/fujitsu_hpc_ioctl.h b/include/uapi/linux/fujitsu_hpc_ioctl.h
index 396029f2bc0d..7a285d8db0a9 100644
--- a/include/uapi/linux/fujitsu_hpc_ioctl.h
+++ b/include/uapi/linux/fujitsu_hpc_ioctl.h
@@ -28,5 +28,7 @@ struct fujitsu_hwb_ioc_bw_ctl {
 	0x01, struct fujitsu_hwb_ioc_bw_ctl)
 #define FUJITSU_HWB_IOC_BW_UNASSIGN _IOW(__FUJITSU_IOCTL_MAGIC, \
 	0x02, struct fujitsu_hwb_ioc_bw_ctl)
+#define FUJITSU_HWB_IOC_BB_FREE _IOW(__FUJITSU_IOCTL_MAGIC, \
+	0x03, struct fujitsu_hwb_ioc_bb_ctl)
 
 #endif /* _UAPI_LINUX_FUJITSU_HPC_IOC_H */
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 07/10] soc: fujitsu: hwb: Add IOC_GET_PE_INFO ioctl
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: will, catalin.marinas, arnd, olof, misono.tomohiro

This is an infomative ioctl to tell users CMG/PE number of currently
running PE.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c      | 18 ++++++++++++++++++
 include/uapi/linux/fujitsu_hpc_ioctl.h |  7 +++++++
 2 files changed, 25 insertions(+)

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
index 2535942cc0d7..1132cb74b13b 100644
--- a/drivers/soc/fujitsu/fujitsu_hwb.c
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -733,6 +733,21 @@ static int ioc_bb_free(struct file *filp,  void __user *argp)
 	return 0;
 }
 
+static int ioc_get_pe_info(struct file *filp, void __user *argp)
+{
+	struct fujitsu_hwb_ioc_pe_info pe_info = {0};
+	int cpu = smp_processor_id();
+
+	pe_info.cmg = _hwinfo.core_map[cpu].cmg;
+	pe_info.ppe = _hwinfo.core_map[cpu].ppe;
+
+	if (copy_to_user((struct fujitsu_hwb_ioc_pe_info __user *)argp, &pe_info,
+						sizeof(struct fujitsu_hwb_ioc_pe_info)))
+		return -EFAULT;
+
+	return 0;
+}
+
 static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -751,6 +766,9 @@ static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned
 	case FUJITSU_HWB_IOC_BB_FREE:
 		ret = ioc_bb_free(filp, argp);
 		break;
+	case FUJITSU_HWB_IOC_GET_PE_INFO:
+		ret = ioc_get_pe_info(filp, argp);
+		break;
 	default:
 		ret = -ENOTTY;
 		break;
diff --git a/include/uapi/linux/fujitsu_hpc_ioctl.h b/include/uapi/linux/fujitsu_hpc_ioctl.h
index 7a285d8db0a9..1226014d97c4 100644
--- a/include/uapi/linux/fujitsu_hpc_ioctl.h
+++ b/include/uapi/linux/fujitsu_hpc_ioctl.h
@@ -22,6 +22,11 @@ struct fujitsu_hwb_ioc_bw_ctl {
 	__s8 window;
 };
 
+struct fujitsu_hwb_ioc_pe_info {
+	__u8 cmg;
+	__u8 ppe;
+};
+
 #define FUJITSU_HWB_IOC_BB_ALLOC _IOWR(__FUJITSU_IOCTL_MAGIC, \
 	0x00, struct fujitsu_hwb_ioc_bb_ctl)
 #define FUJITSU_HWB_IOC_BW_ASSIGN _IOWR(__FUJITSU_IOCTL_MAGIC, \
@@ -30,5 +35,7 @@ struct fujitsu_hwb_ioc_bw_ctl {
 	0x02, struct fujitsu_hwb_ioc_bw_ctl)
 #define FUJITSU_HWB_IOC_BB_FREE _IOW(__FUJITSU_IOCTL_MAGIC, \
 	0x03, struct fujitsu_hwb_ioc_bb_ctl)
+#define FUJITSU_HWB_IOC_GET_PE_INFO _IOR(__FUJITSU_IOCTL_MAGIC, \
+	0x04, struct fujitsu_hwb_ioc_pe_info)
 
 #endif /* _UAPI_LINUX_FUJITSU_HPC_IOC_H */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 07/10] soc: fujitsu: hwb: Add IOC_GET_PE_INFO ioctl
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: olof, catalin.marinas, will, misono.tomohiro, arnd

This is an infomative ioctl to tell users CMG/PE number of currently
running PE.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c      | 18 ++++++++++++++++++
 include/uapi/linux/fujitsu_hpc_ioctl.h |  7 +++++++
 2 files changed, 25 insertions(+)

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
index 2535942cc0d7..1132cb74b13b 100644
--- a/drivers/soc/fujitsu/fujitsu_hwb.c
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -733,6 +733,21 @@ static int ioc_bb_free(struct file *filp,  void __user *argp)
 	return 0;
 }
 
+static int ioc_get_pe_info(struct file *filp, void __user *argp)
+{
+	struct fujitsu_hwb_ioc_pe_info pe_info = {0};
+	int cpu = smp_processor_id();
+
+	pe_info.cmg = _hwinfo.core_map[cpu].cmg;
+	pe_info.ppe = _hwinfo.core_map[cpu].ppe;
+
+	if (copy_to_user((struct fujitsu_hwb_ioc_pe_info __user *)argp, &pe_info,
+						sizeof(struct fujitsu_hwb_ioc_pe_info)))
+		return -EFAULT;
+
+	return 0;
+}
+
 static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -751,6 +766,9 @@ static long fujitsu_hwb_dev_ioctl(struct file *filp, unsigned int cmd, unsigned
 	case FUJITSU_HWB_IOC_BB_FREE:
 		ret = ioc_bb_free(filp, argp);
 		break;
+	case FUJITSU_HWB_IOC_GET_PE_INFO:
+		ret = ioc_get_pe_info(filp, argp);
+		break;
 	default:
 		ret = -ENOTTY;
 		break;
diff --git a/include/uapi/linux/fujitsu_hpc_ioctl.h b/include/uapi/linux/fujitsu_hpc_ioctl.h
index 7a285d8db0a9..1226014d97c4 100644
--- a/include/uapi/linux/fujitsu_hpc_ioctl.h
+++ b/include/uapi/linux/fujitsu_hpc_ioctl.h
@@ -22,6 +22,11 @@ struct fujitsu_hwb_ioc_bw_ctl {
 	__s8 window;
 };
 
+struct fujitsu_hwb_ioc_pe_info {
+	__u8 cmg;
+	__u8 ppe;
+};
+
 #define FUJITSU_HWB_IOC_BB_ALLOC _IOWR(__FUJITSU_IOCTL_MAGIC, \
 	0x00, struct fujitsu_hwb_ioc_bb_ctl)
 #define FUJITSU_HWB_IOC_BW_ASSIGN _IOWR(__FUJITSU_IOCTL_MAGIC, \
@@ -30,5 +35,7 @@ struct fujitsu_hwb_ioc_bw_ctl {
 	0x02, struct fujitsu_hwb_ioc_bw_ctl)
 #define FUJITSU_HWB_IOC_BB_FREE _IOW(__FUJITSU_IOCTL_MAGIC, \
 	0x03, struct fujitsu_hwb_ioc_bb_ctl)
+#define FUJITSU_HWB_IOC_GET_PE_INFO _IOR(__FUJITSU_IOCTL_MAGIC, \
+	0x04, struct fujitsu_hwb_ioc_pe_info)
 
 #endif /* _UAPI_LINUX_FUJITSU_HPC_IOC_H */
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 08/10] soc: fujitsu: hwb: Add release operation
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: will, catalin.marinas, arnd, olof, misono.tomohiro

Upon release, we cleanup remaining resources/registers if necessary.
This happens when user does not call IOC_BB_FREE properly and the
function will do effectively the same operation as IOC_BB_FREE.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
index 1132cb74b13b..46f1f244f93a 100644
--- a/drivers/soc/fujitsu/fujitsu_hwb.c
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -796,9 +796,35 @@ static int fujitsu_hwb_dev_open(struct inode *inode, struct file *filp)
 	return 0;
 }
 
+static int fujitsu_hwb_dev_release(struct inode *inode, struct file *filp)
+{
+	struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;
+	struct bb_info *bb_info, *tmp;
+
+	/*
+	 * Cleanup BB if IOC_BB_FREE is not called properly.
+	 * No lock for pdata->bb_list is needed cause there is no one else
+	 */
+	if (!list_empty(&pdata->bb_list)) {
+		pr_warn("free operation is not called properly\n");
+
+		list_for_each_entry_safe(bb_info, tmp, &pdata->bb_list, node) {
+			teardown_bb(bb_info);
+			list_del_init(&bb_info->node);
+			/* 1 put for alloc_bb_info */
+			put_bb_info(bb_info);
+		}
+	}
+
+	kfree(pdata);
+
+	return 0;
+}
+
 static const struct file_operations fujitsu_hwb_dev_fops = {
 	.owner          = THIS_MODULE,
 	.open           = fujitsu_hwb_dev_open,
+	.release        = fujitsu_hwb_dev_release,
 	.unlocked_ioctl = fujitsu_hwb_dev_ioctl,
 };
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 08/10] soc: fujitsu: hwb: Add release operation
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: olof, catalin.marinas, will, misono.tomohiro, arnd

Upon release, we cleanup remaining resources/registers if necessary.
This happens when user does not call IOC_BB_FREE properly and the
function will do effectively the same operation as IOC_BB_FREE.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
index 1132cb74b13b..46f1f244f93a 100644
--- a/drivers/soc/fujitsu/fujitsu_hwb.c
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -796,9 +796,35 @@ static int fujitsu_hwb_dev_open(struct inode *inode, struct file *filp)
 	return 0;
 }
 
+static int fujitsu_hwb_dev_release(struct inode *inode, struct file *filp)
+{
+	struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;
+	struct bb_info *bb_info, *tmp;
+
+	/*
+	 * Cleanup BB if IOC_BB_FREE is not called properly.
+	 * No lock for pdata->bb_list is needed cause there is no one else
+	 */
+	if (!list_empty(&pdata->bb_list)) {
+		pr_warn("free operation is not called properly\n");
+
+		list_for_each_entry_safe(bb_info, tmp, &pdata->bb_list, node) {
+			teardown_bb(bb_info);
+			list_del_init(&bb_info->node);
+			/* 1 put for alloc_bb_info */
+			put_bb_info(bb_info);
+		}
+	}
+
+	kfree(pdata);
+
+	return 0;
+}
+
 static const struct file_operations fujitsu_hwb_dev_fops = {
 	.owner          = THIS_MODULE,
 	.open           = fujitsu_hwb_dev_open,
+	.release        = fujitsu_hwb_dev_release,
 	.unlocked_ioctl = fujitsu_hwb_dev_ioctl,
 };
 
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 09/10] soc: fujitsu: hwb: Add sysfs entry
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: will, catalin.marinas, arnd, olof, misono.tomohiro

This adds sysfs entry per CMG to show running barrier driver status
for debugging user application. The following entries will be created:

/sys/class/misc/fujitsu_hwb
 |- hwinfo ... number of CMG/BB/BW/pe_per_cmg on running system
 |- CMG0
     |- core_map      ... cpuid belonging to this CMG
     |- used_bb_bmap  ... bitmap of currently allocated BB
     |- used_bw_bmap  ... bitmap of currently allocated BW
     |- init_sync_bb0 ... current value of INIT_SYNC register 0
     |- init_sync_bb1 ... current value of INIT_SYNC register 1
     ...
 |- CMG1
  ...

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c | 258 ++++++++++++++++++++++++++++++
 1 file changed, 258 insertions(+)

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
index 46f1f244f93a..a3a0e314f63a 100644
--- a/drivers/soc/fujitsu/fujitsu_hwb.c
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -32,6 +32,7 @@
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
 #include <linux/kernel.h>
+#include <linux/kobject.h>
 #include <linux/miscdevice.h>
 #include <linux/module.h>
 #include <linux/spinlock.h>
@@ -931,6 +932,254 @@ static int hwb_cpu_online(unsigned int cpu)
 	return 0;
 }
 
+static void read_init_sync_reg(void *args)
+{
+	struct init_sync_args *sync_args = (struct init_sync_args *)args;
+	u64 val = 0;
+
+	switch (sync_args->bb) {
+	case 0:
+		val = read_sysreg_s(FHWB_INIT_SYNC_BB0_EL1);
+		break;
+	case 1:
+		val = read_sysreg_s(FHWB_INIT_SYNC_BB1_EL1);
+		break;
+	case 2:
+		val = read_sysreg_s(FHWB_INIT_SYNC_BB2_EL1);
+		break;
+	case 3:
+		val = read_sysreg_s(FHWB_INIT_SYNC_BB3_EL1);
+		break;
+	case 4:
+		val = read_sysreg_s(FHWB_INIT_SYNC_BB4_EL1);
+		break;
+	case 5:
+		val = read_sysreg_s(FHWB_INIT_SYNC_BB5_EL1);
+		break;
+	}
+
+	sync_args->val = val;
+}
+
+struct hwb_attr {
+	struct kobj_attribute attr;
+	u8 bb;
+};
+static struct hwb_attr *battr;
+
+/* kobject for each CMG */
+static struct kobject **cmg_kobj;
+
+/* Get CMG number based on index value of cmg_kobj */
+static int get_cmg_from_kobj(struct kobject *kobj)
+{
+	int i;
+
+	for (i = 0; i < _hwinfo.num_cmg; i++) {
+		if (cmg_kobj[i] == kobj)
+			return i;
+	}
+	/* should not happen */
+	WARN_ON_ONCE("cmg_kobj not found\n");
+	return 0;
+}
+
+static ssize_t hwb_init_sync_bb_show(struct kobject *kobj,
+					 struct kobj_attribute *attr, char *buf)
+{
+	struct hwb_attr *battr = container_of(attr, struct hwb_attr, attr);
+	struct init_sync_args args = {0};
+	ssize_t written = 0;
+	int cpu;
+	int cmg;
+	u64 mask;
+	u64 bst;
+
+	/* Find online cpu in target cmg */
+	cmg = get_cmg_from_kobj(kobj);
+	for_each_online_cpu(cpu) {
+		if (_hwinfo.core_map[cpu].cmg == cmg)
+			break;
+	}
+	if (cpu >= nr_cpu_ids)
+		return 0;
+
+	/* Send IPI to read INIT_SYNC register */
+	args.bb = battr->bb;
+	on_each_cpu_mask(cpumask_of(cpu), read_init_sync_reg, &args, 1);
+
+	mask = FIELD_GET(FHWB_INIT_SYNC_BB_EL1_MASK_FIELD, args.val);
+	bst = FIELD_GET(FHWB_INIT_SYNC_BB_EL1_BST_FIELD, args.val);
+
+	written += scnprintf(buf, PAGE_SIZE, "%04llx\n", mask);
+	written += scnprintf(buf + written, PAGE_SIZE - written, "%04llx\n", bst);
+
+	return written;
+}
+
+#define BARRIER_ATTR(name) \
+static struct kobj_attribute hwb_##name##_attribute = \
+	__ATTR(name, 0444, hwb_##name##_show, NULL)
+
+static ssize_t hwb_hwinfo_show(struct kobject *kobj,
+				   struct kobj_attribute *attr, char *buf)
+{
+	return scnprintf(buf, PAGE_SIZE, "%d %d %d %d\n",
+				_hwinfo.num_cmg, _hwinfo.num_bb,
+				_hwinfo.num_bw, _hwinfo.max_pe_per_cmg);
+}
+BARRIER_ATTR(hwinfo);
+
+static ssize_t hwb_used_bb_bmap_show(struct kobject *kobj,
+					 struct kobj_attribute *attr, char *buf)
+{
+	int cmg;
+
+	cmg = get_cmg_from_kobj(kobj);
+
+	return scnprintf(buf, PAGE_SIZE, "%04lx\n", _hwinfo.used_bb_bmap[cmg]);
+}
+BARRIER_ATTR(used_bb_bmap);
+
+static ssize_t hwb_used_bw_bmap_show(struct kobject *kobj,
+					 struct kobj_attribute *attr, char *buf)
+{
+	ssize_t written = 0;
+	int cmg;
+	int cpu;
+
+	cmg = get_cmg_from_kobj(kobj);
+	for (cpu = 0; cpu < num_possible_cpus(); cpu++) {
+		if (_hwinfo.core_map[cpu].cmg == cmg)
+			written += scnprintf(buf + written, PAGE_SIZE - written, "%d %04lx\n",
+						 cpu, _hwinfo.used_bw_bmap[cpu]);
+	}
+
+	return written;
+}
+BARRIER_ATTR(used_bw_bmap);
+
+static ssize_t hwb_core_map_show(struct kobject *kobj,
+				     struct kobj_attribute *attr, char *buf)
+{
+	ssize_t written = 0;
+	int cmg;
+	int cpu;
+
+	cmg = get_cmg_from_kobj(kobj);
+	for (cpu = 0; cpu < num_possible_cpus(); cpu++) {
+		if (_hwinfo.core_map[cpu].cmg == cmg)
+			written += scnprintf(buf + written, PAGE_SIZE - written, "%d %d\n",
+				cpu, _hwinfo.core_map[cpu].ppe);
+	}
+
+	return written;
+}
+BARRIER_ATTR(core_map);
+
+static struct attribute *hwb_attrs[] = {
+	&hwb_used_bb_bmap_attribute.attr,
+	&hwb_used_bw_bmap_attribute.attr,
+	&hwb_core_map_attribute.attr,
+	NULL,
+};
+
+static const struct attribute_group hwb_attribute = {
+	.attrs = hwb_attrs,
+};
+
+static void destroy_sysfs(void)
+{
+	int cmg;
+	int bb;
+	int i;
+
+	sysfs_remove_file(&bar_miscdev.this_device->kobj, &hwb_hwinfo_attribute.attr);
+
+	for (cmg = 0; cmg < _hwinfo.num_cmg; cmg++) {
+		for (bb = 0; bb < _hwinfo.num_bb; bb++) {
+			i = (cmg * _hwinfo.num_bb) + bb;
+			if (battr[i].attr.attr.name)
+				sysfs_remove_file(cmg_kobj[cmg], &battr[i].attr.attr);
+		}
+	}
+	kfree(battr);
+
+	for (cmg = 0; cmg < _hwinfo.num_cmg; cmg++) {
+		if (cmg_kobj[cmg]) {
+			sysfs_remove_group(cmg_kobj[cmg], &hwb_attribute);
+			kobject_put(cmg_kobj[cmg]);
+		}
+	}
+	kfree(cmg_kobj);
+}
+
+/* Create sysfs file under /sys/class/misc/fujitsu_hwb */
+#define NAME_LEN 16
+static int __init init_sysfs(void)
+{
+	char name[NAME_LEN];
+	int ret;
+	int cmg;
+	int bb;
+	int i;
+
+	/* Create file to show number of CMG/BB/BW/pe_per_cmg */
+	ret = sysfs_create_file(&bar_miscdev.this_device->kobj, &hwb_hwinfo_attribute.attr);
+	if (ret)
+		return ret;
+
+	cmg_kobj = kcalloc(_hwinfo.num_cmg, sizeof(struct kobject *), GFP_KERNEL);
+	battr = kcalloc(_hwinfo.num_cmg * _hwinfo.num_bb, sizeof(struct hwb_attr), GFP_KERNEL);
+	if (!cmg_kobj || !battr) {
+		kfree(cmg_kobj);
+		kfree(battr);
+		return -ENOMEM;
+	}
+
+	/* Create folder for each CMG and create core_map/bitmap file */
+	for (cmg = 0; cmg < _hwinfo.num_cmg; cmg++) {
+		scnprintf(name, NAME_LEN, "CMG%d", cmg);
+		cmg_kobj[cmg] = kobject_create_and_add(name, &bar_miscdev.this_device->kobj);
+		if (!cmg_kobj[cmg]) {
+			ret = -ENOMEM;
+			goto fail;
+		}
+
+		ret = sysfs_create_group(cmg_kobj[cmg], &hwb_attribute);
+		if (ret)
+			goto fail;
+	}
+
+	/* Create files for INIT_SYNC register */
+	for (cmg = 0; cmg < _hwinfo.num_cmg; cmg++) {
+		for (bb = 0; bb < _hwinfo.num_bb; bb++) {
+			i = (cmg * _hwinfo.num_bb) + bb;
+
+			scnprintf(name, NAME_LEN, "init_sync_bb%d", bb);
+			battr[i].bb = bb;
+			battr[i].attr.attr.name = kstrdup(name, GFP_KERNEL);
+			if (!battr[i].attr.attr.name) {
+				ret = -ENOMEM;
+				goto fail;
+			}
+			battr[i].attr.attr.mode = 0400; /* root only */
+			battr[i].attr.show = hwb_init_sync_bb_show;
+
+			sysfs_attr_init(&battr[i].attr.attr);
+			ret = sysfs_create_file(cmg_kobj[cmg], &battr[i].attr.attr);
+			if (ret < 0)
+				goto fail;
+		}
+	}
+
+	return 0;
+
+fail:
+	destroy_sysfs();
+	return ret;
+}
+
 static int __init hwb_init(void)
 {
 	int ret;
@@ -967,8 +1216,16 @@ static int __init hwb_init(void)
 		goto out3;
 	}
 
+	ret = init_sysfs();
+	if (ret < 0) {
+		pr_err("sysfs creation failed: %d\n", ret);
+		goto out4;
+	}
+
 	return 0;
 
+out4:
+	misc_deregister(&bar_miscdev);
 out3:
 	cpuhp_remove_state(_hp_state);
 out2:
@@ -981,6 +1238,7 @@ static int __init hwb_init(void)
 
 static void __exit hwb_exit(void)
 {
+	destroy_sysfs();
 	misc_deregister(&bar_miscdev);
 	cpuhp_remove_state(_hp_state);
 	destroy_bb_info_cachep();
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 09/10] soc: fujitsu: hwb: Add sysfs entry
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: olof, catalin.marinas, will, misono.tomohiro, arnd

This adds sysfs entry per CMG to show running barrier driver status
for debugging user application. The following entries will be created:

/sys/class/misc/fujitsu_hwb
 |- hwinfo ... number of CMG/BB/BW/pe_per_cmg on running system
 |- CMG0
     |- core_map      ... cpuid belonging to this CMG
     |- used_bb_bmap  ... bitmap of currently allocated BB
     |- used_bw_bmap  ... bitmap of currently allocated BW
     |- init_sync_bb0 ... current value of INIT_SYNC register 0
     |- init_sync_bb1 ... current value of INIT_SYNC register 1
     ...
 |- CMG1
  ...

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 drivers/soc/fujitsu/fujitsu_hwb.c | 258 ++++++++++++++++++++++++++++++
 1 file changed, 258 insertions(+)

diff --git a/drivers/soc/fujitsu/fujitsu_hwb.c b/drivers/soc/fujitsu/fujitsu_hwb.c
index 46f1f244f93a..a3a0e314f63a 100644
--- a/drivers/soc/fujitsu/fujitsu_hwb.c
+++ b/drivers/soc/fujitsu/fujitsu_hwb.c
@@ -32,6 +32,7 @@
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
 #include <linux/kernel.h>
+#include <linux/kobject.h>
 #include <linux/miscdevice.h>
 #include <linux/module.h>
 #include <linux/spinlock.h>
@@ -931,6 +932,254 @@ static int hwb_cpu_online(unsigned int cpu)
 	return 0;
 }
 
+static void read_init_sync_reg(void *args)
+{
+	struct init_sync_args *sync_args = (struct init_sync_args *)args;
+	u64 val = 0;
+
+	switch (sync_args->bb) {
+	case 0:
+		val = read_sysreg_s(FHWB_INIT_SYNC_BB0_EL1);
+		break;
+	case 1:
+		val = read_sysreg_s(FHWB_INIT_SYNC_BB1_EL1);
+		break;
+	case 2:
+		val = read_sysreg_s(FHWB_INIT_SYNC_BB2_EL1);
+		break;
+	case 3:
+		val = read_sysreg_s(FHWB_INIT_SYNC_BB3_EL1);
+		break;
+	case 4:
+		val = read_sysreg_s(FHWB_INIT_SYNC_BB4_EL1);
+		break;
+	case 5:
+		val = read_sysreg_s(FHWB_INIT_SYNC_BB5_EL1);
+		break;
+	}
+
+	sync_args->val = val;
+}
+
+struct hwb_attr {
+	struct kobj_attribute attr;
+	u8 bb;
+};
+static struct hwb_attr *battr;
+
+/* kobject for each CMG */
+static struct kobject **cmg_kobj;
+
+/* Get CMG number based on index value of cmg_kobj */
+static int get_cmg_from_kobj(struct kobject *kobj)
+{
+	int i;
+
+	for (i = 0; i < _hwinfo.num_cmg; i++) {
+		if (cmg_kobj[i] == kobj)
+			return i;
+	}
+	/* should not happen */
+	WARN_ON_ONCE("cmg_kobj not found\n");
+	return 0;
+}
+
+static ssize_t hwb_init_sync_bb_show(struct kobject *kobj,
+					 struct kobj_attribute *attr, char *buf)
+{
+	struct hwb_attr *battr = container_of(attr, struct hwb_attr, attr);
+	struct init_sync_args args = {0};
+	ssize_t written = 0;
+	int cpu;
+	int cmg;
+	u64 mask;
+	u64 bst;
+
+	/* Find online cpu in target cmg */
+	cmg = get_cmg_from_kobj(kobj);
+	for_each_online_cpu(cpu) {
+		if (_hwinfo.core_map[cpu].cmg == cmg)
+			break;
+	}
+	if (cpu >= nr_cpu_ids)
+		return 0;
+
+	/* Send IPI to read INIT_SYNC register */
+	args.bb = battr->bb;
+	on_each_cpu_mask(cpumask_of(cpu), read_init_sync_reg, &args, 1);
+
+	mask = FIELD_GET(FHWB_INIT_SYNC_BB_EL1_MASK_FIELD, args.val);
+	bst = FIELD_GET(FHWB_INIT_SYNC_BB_EL1_BST_FIELD, args.val);
+
+	written += scnprintf(buf, PAGE_SIZE, "%04llx\n", mask);
+	written += scnprintf(buf + written, PAGE_SIZE - written, "%04llx\n", bst);
+
+	return written;
+}
+
+#define BARRIER_ATTR(name) \
+static struct kobj_attribute hwb_##name##_attribute = \
+	__ATTR(name, 0444, hwb_##name##_show, NULL)
+
+static ssize_t hwb_hwinfo_show(struct kobject *kobj,
+				   struct kobj_attribute *attr, char *buf)
+{
+	return scnprintf(buf, PAGE_SIZE, "%d %d %d %d\n",
+				_hwinfo.num_cmg, _hwinfo.num_bb,
+				_hwinfo.num_bw, _hwinfo.max_pe_per_cmg);
+}
+BARRIER_ATTR(hwinfo);
+
+static ssize_t hwb_used_bb_bmap_show(struct kobject *kobj,
+					 struct kobj_attribute *attr, char *buf)
+{
+	int cmg;
+
+	cmg = get_cmg_from_kobj(kobj);
+
+	return scnprintf(buf, PAGE_SIZE, "%04lx\n", _hwinfo.used_bb_bmap[cmg]);
+}
+BARRIER_ATTR(used_bb_bmap);
+
+static ssize_t hwb_used_bw_bmap_show(struct kobject *kobj,
+					 struct kobj_attribute *attr, char *buf)
+{
+	ssize_t written = 0;
+	int cmg;
+	int cpu;
+
+	cmg = get_cmg_from_kobj(kobj);
+	for (cpu = 0; cpu < num_possible_cpus(); cpu++) {
+		if (_hwinfo.core_map[cpu].cmg == cmg)
+			written += scnprintf(buf + written, PAGE_SIZE - written, "%d %04lx\n",
+						 cpu, _hwinfo.used_bw_bmap[cpu]);
+	}
+
+	return written;
+}
+BARRIER_ATTR(used_bw_bmap);
+
+static ssize_t hwb_core_map_show(struct kobject *kobj,
+				     struct kobj_attribute *attr, char *buf)
+{
+	ssize_t written = 0;
+	int cmg;
+	int cpu;
+
+	cmg = get_cmg_from_kobj(kobj);
+	for (cpu = 0; cpu < num_possible_cpus(); cpu++) {
+		if (_hwinfo.core_map[cpu].cmg == cmg)
+			written += scnprintf(buf + written, PAGE_SIZE - written, "%d %d\n",
+				cpu, _hwinfo.core_map[cpu].ppe);
+	}
+
+	return written;
+}
+BARRIER_ATTR(core_map);
+
+static struct attribute *hwb_attrs[] = {
+	&hwb_used_bb_bmap_attribute.attr,
+	&hwb_used_bw_bmap_attribute.attr,
+	&hwb_core_map_attribute.attr,
+	NULL,
+};
+
+static const struct attribute_group hwb_attribute = {
+	.attrs = hwb_attrs,
+};
+
+static void destroy_sysfs(void)
+{
+	int cmg;
+	int bb;
+	int i;
+
+	sysfs_remove_file(&bar_miscdev.this_device->kobj, &hwb_hwinfo_attribute.attr);
+
+	for (cmg = 0; cmg < _hwinfo.num_cmg; cmg++) {
+		for (bb = 0; bb < _hwinfo.num_bb; bb++) {
+			i = (cmg * _hwinfo.num_bb) + bb;
+			if (battr[i].attr.attr.name)
+				sysfs_remove_file(cmg_kobj[cmg], &battr[i].attr.attr);
+		}
+	}
+	kfree(battr);
+
+	for (cmg = 0; cmg < _hwinfo.num_cmg; cmg++) {
+		if (cmg_kobj[cmg]) {
+			sysfs_remove_group(cmg_kobj[cmg], &hwb_attribute);
+			kobject_put(cmg_kobj[cmg]);
+		}
+	}
+	kfree(cmg_kobj);
+}
+
+/* Create sysfs file under /sys/class/misc/fujitsu_hwb */
+#define NAME_LEN 16
+static int __init init_sysfs(void)
+{
+	char name[NAME_LEN];
+	int ret;
+	int cmg;
+	int bb;
+	int i;
+
+	/* Create file to show number of CMG/BB/BW/pe_per_cmg */
+	ret = sysfs_create_file(&bar_miscdev.this_device->kobj, &hwb_hwinfo_attribute.attr);
+	if (ret)
+		return ret;
+
+	cmg_kobj = kcalloc(_hwinfo.num_cmg, sizeof(struct kobject *), GFP_KERNEL);
+	battr = kcalloc(_hwinfo.num_cmg * _hwinfo.num_bb, sizeof(struct hwb_attr), GFP_KERNEL);
+	if (!cmg_kobj || !battr) {
+		kfree(cmg_kobj);
+		kfree(battr);
+		return -ENOMEM;
+	}
+
+	/* Create folder for each CMG and create core_map/bitmap file */
+	for (cmg = 0; cmg < _hwinfo.num_cmg; cmg++) {
+		scnprintf(name, NAME_LEN, "CMG%d", cmg);
+		cmg_kobj[cmg] = kobject_create_and_add(name, &bar_miscdev.this_device->kobj);
+		if (!cmg_kobj[cmg]) {
+			ret = -ENOMEM;
+			goto fail;
+		}
+
+		ret = sysfs_create_group(cmg_kobj[cmg], &hwb_attribute);
+		if (ret)
+			goto fail;
+	}
+
+	/* Create files for INIT_SYNC register */
+	for (cmg = 0; cmg < _hwinfo.num_cmg; cmg++) {
+		for (bb = 0; bb < _hwinfo.num_bb; bb++) {
+			i = (cmg * _hwinfo.num_bb) + bb;
+
+			scnprintf(name, NAME_LEN, "init_sync_bb%d", bb);
+			battr[i].bb = bb;
+			battr[i].attr.attr.name = kstrdup(name, GFP_KERNEL);
+			if (!battr[i].attr.attr.name) {
+				ret = -ENOMEM;
+				goto fail;
+			}
+			battr[i].attr.attr.mode = 0400; /* root only */
+			battr[i].attr.show = hwb_init_sync_bb_show;
+
+			sysfs_attr_init(&battr[i].attr.attr);
+			ret = sysfs_create_file(cmg_kobj[cmg], &battr[i].attr.attr);
+			if (ret < 0)
+				goto fail;
+		}
+	}
+
+	return 0;
+
+fail:
+	destroy_sysfs();
+	return ret;
+}
+
 static int __init hwb_init(void)
 {
 	int ret;
@@ -967,8 +1216,16 @@ static int __init hwb_init(void)
 		goto out3;
 	}
 
+	ret = init_sysfs();
+	if (ret < 0) {
+		pr_err("sysfs creation failed: %d\n", ret);
+		goto out4;
+	}
+
 	return 0;
 
+out4:
+	misc_deregister(&bar_miscdev);
 out3:
 	cpuhp_remove_state(_hp_state);
 out2:
@@ -981,6 +1238,7 @@ static int __init hwb_init(void)
 
 static void __exit hwb_exit(void)
 {
+	destroy_sysfs();
 	misc_deregister(&bar_miscdev);
 	cpuhp_remove_state(_hp_state);
 	destroy_bb_info_cachep();
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 10/10] soc: fujitsu: hwb: Add Kconfig/Makefile to build fujitsu_hwb driver
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: will, catalin.marinas, arnd, olof, misono.tomohiro

This adds kconfig/Makefile to build fujitsu hardware barrier driver
(fujitsu_hwb.ko when built as module).

Note that this is the first time to add A64FX specific driver,
this also adds A64FX entry in Kconfig.platforms of arm64 Kconfig.
Also add MAINTAINERS entry for ARM/A64FX accordingly.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 MAINTAINERS                  |  7 +++++++
 arch/arm64/Kconfig.platforms |  5 +++++
 drivers/soc/Kconfig          |  1 +
 drivers/soc/Makefile         |  1 +
 drivers/soc/fujitsu/Kconfig  | 24 ++++++++++++++++++++++++
 drivers/soc/fujitsu/Makefile |  2 ++
 6 files changed, 40 insertions(+)
 create mode 100644 drivers/soc/fujitsu/Kconfig
 create mode 100644 drivers/soc/fujitsu/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index 6eff4f720c72..d57ec44ceaed 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1508,6 +1508,13 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git
 F:	arch/arm/mach-*/
 F:	arch/arm/plat-*/
 
+ARM/A64FX SOC SUPPORT
+M:	Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
+L:	linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+S:	Maintained
+F:	drivers/soc/fujitsu/
+F:	include/uapi/linux/fujitsu_hpc_ioctl.h
+
 ARM/ACTIONS SEMI ARCHITECTURE
 M:	Andreas Färber <afaerber@suse.de>
 M:	Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms
index 6eecdef538bd..41fb214adaff 100644
--- a/arch/arm64/Kconfig.platforms
+++ b/arch/arm64/Kconfig.platforms
@@ -1,6 +1,11 @@
 # SPDX-License-Identifier: GPL-2.0-only
 menu "Platform selection"
 
+config ARCH_A64FX
+	bool "Fujitsu A64FX Platforms"
+	help
+	  This enables support for Fujitsu A64FX SoC family.
+
 config ARCH_ACTIONS
 	bool "Actions Semi Platforms"
 	select OWL_TIMER
diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig
index d097d070f579..7a52b5dc4c96 100644
--- a/drivers/soc/Kconfig
+++ b/drivers/soc/Kconfig
@@ -7,6 +7,7 @@ source "drivers/soc/aspeed/Kconfig"
 source "drivers/soc/atmel/Kconfig"
 source "drivers/soc/bcm/Kconfig"
 source "drivers/soc/fsl/Kconfig"
+source "drivers/soc/fujitsu/Kconfig"
 source "drivers/soc/imx/Kconfig"
 source "drivers/soc/ixp4xx/Kconfig"
 source "drivers/soc/litex/Kconfig"
diff --git a/drivers/soc/Makefile b/drivers/soc/Makefile
index 699b758d28e4..57c0dddc4d23 100644
--- a/drivers/soc/Makefile
+++ b/drivers/soc/Makefile
@@ -10,6 +10,7 @@ obj-y				+= bcm/
 obj-$(CONFIG_ARCH_DOVE)		+= dove/
 obj-$(CONFIG_MACH_DOVE)		+= dove/
 obj-y				+= fsl/
+obj-y				+= fujitsu/
 obj-$(CONFIG_ARCH_GEMINI)	+= gemini/
 obj-y				+= imx/
 obj-$(CONFIG_ARCH_IXP4XX)	+= ixp4xx/
diff --git a/drivers/soc/fujitsu/Kconfig b/drivers/soc/fujitsu/Kconfig
new file mode 100644
index 000000000000..cbba0c939e62
--- /dev/null
+++ b/drivers/soc/fujitsu/Kconfig
@@ -0,0 +1,24 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# FUJITSU SoC drivers
+#
+menuconfig SOC_FUJITSU
+	bool "FUJITSU SoC drivers"
+	depends on ARCH_A64FX || COMPILE_TEST
+
+if SOC_FUJITSU
+
+config FUJITSU_HARDWARE_BARRIER
+	tristate "FUJITSU HPC Hardware Barrier Driver"
+	depends on ARM64_VHE || COMPILE_TEST
+	help
+	  FUJITSU HPC Hardware Barrier Driver
+
+	  This driver offers hardware barrier functions for A64FX system
+	  which realizes synchronization by PEs in the same CMG (L3 cache
+	  domain) by using implementation defined registers. As control
+	  registers can only be accessed from EL2 on reset, this driver
+	  needs support of VHE.
+	  When built as a module, this will be called as "fujitsu_hwb".
+
+endif # SOC_FUJITSU
diff --git a/drivers/soc/fujitsu/Makefile b/drivers/soc/fujitsu/Makefile
new file mode 100644
index 000000000000..1b8e4c947f7f
--- /dev/null
+++ b/drivers/soc/fujitsu/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_FUJITSU_HARDWARE_BARRIER) +=	fujitsu_hwb.o
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 10/10] soc: fujitsu: hwb: Add Kconfig/Makefile to build fujitsu_hwb driver
@ 2021-01-08 10:52   ` Misono Tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: Misono Tomohiro @ 2021-01-08 10:52 UTC (permalink / raw)
  To: linux-arm-kernel, soc; +Cc: olof, catalin.marinas, will, misono.tomohiro, arnd

This adds kconfig/Makefile to build fujitsu hardware barrier driver
(fujitsu_hwb.ko when built as module).

Note that this is the first time to add A64FX specific driver,
this also adds A64FX entry in Kconfig.platforms of arm64 Kconfig.
Also add MAINTAINERS entry for ARM/A64FX accordingly.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 MAINTAINERS                  |  7 +++++++
 arch/arm64/Kconfig.platforms |  5 +++++
 drivers/soc/Kconfig          |  1 +
 drivers/soc/Makefile         |  1 +
 drivers/soc/fujitsu/Kconfig  | 24 ++++++++++++++++++++++++
 drivers/soc/fujitsu/Makefile |  2 ++
 6 files changed, 40 insertions(+)
 create mode 100644 drivers/soc/fujitsu/Kconfig
 create mode 100644 drivers/soc/fujitsu/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index 6eff4f720c72..d57ec44ceaed 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1508,6 +1508,13 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git
 F:	arch/arm/mach-*/
 F:	arch/arm/plat-*/
 
+ARM/A64FX SOC SUPPORT
+M:	Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
+L:	linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+S:	Maintained
+F:	drivers/soc/fujitsu/
+F:	include/uapi/linux/fujitsu_hpc_ioctl.h
+
 ARM/ACTIONS SEMI ARCHITECTURE
 M:	Andreas Färber <afaerber@suse.de>
 M:	Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms
index 6eecdef538bd..41fb214adaff 100644
--- a/arch/arm64/Kconfig.platforms
+++ b/arch/arm64/Kconfig.platforms
@@ -1,6 +1,11 @@
 # SPDX-License-Identifier: GPL-2.0-only
 menu "Platform selection"
 
+config ARCH_A64FX
+	bool "Fujitsu A64FX Platforms"
+	help
+	  This enables support for Fujitsu A64FX SoC family.
+
 config ARCH_ACTIONS
 	bool "Actions Semi Platforms"
 	select OWL_TIMER
diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig
index d097d070f579..7a52b5dc4c96 100644
--- a/drivers/soc/Kconfig
+++ b/drivers/soc/Kconfig
@@ -7,6 +7,7 @@ source "drivers/soc/aspeed/Kconfig"
 source "drivers/soc/atmel/Kconfig"
 source "drivers/soc/bcm/Kconfig"
 source "drivers/soc/fsl/Kconfig"
+source "drivers/soc/fujitsu/Kconfig"
 source "drivers/soc/imx/Kconfig"
 source "drivers/soc/ixp4xx/Kconfig"
 source "drivers/soc/litex/Kconfig"
diff --git a/drivers/soc/Makefile b/drivers/soc/Makefile
index 699b758d28e4..57c0dddc4d23 100644
--- a/drivers/soc/Makefile
+++ b/drivers/soc/Makefile
@@ -10,6 +10,7 @@ obj-y				+= bcm/
 obj-$(CONFIG_ARCH_DOVE)		+= dove/
 obj-$(CONFIG_MACH_DOVE)		+= dove/
 obj-y				+= fsl/
+obj-y				+= fujitsu/
 obj-$(CONFIG_ARCH_GEMINI)	+= gemini/
 obj-y				+= imx/
 obj-$(CONFIG_ARCH_IXP4XX)	+= ixp4xx/
diff --git a/drivers/soc/fujitsu/Kconfig b/drivers/soc/fujitsu/Kconfig
new file mode 100644
index 000000000000..cbba0c939e62
--- /dev/null
+++ b/drivers/soc/fujitsu/Kconfig
@@ -0,0 +1,24 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# FUJITSU SoC drivers
+#
+menuconfig SOC_FUJITSU
+	bool "FUJITSU SoC drivers"
+	depends on ARCH_A64FX || COMPILE_TEST
+
+if SOC_FUJITSU
+
+config FUJITSU_HARDWARE_BARRIER
+	tristate "FUJITSU HPC Hardware Barrier Driver"
+	depends on ARM64_VHE || COMPILE_TEST
+	help
+	  FUJITSU HPC Hardware Barrier Driver
+
+	  This driver offers hardware barrier functions for A64FX system
+	  which realizes synchronization by PEs in the same CMG (L3 cache
+	  domain) by using implementation defined registers. As control
+	  registers can only be accessed from EL2 on reset, this driver
+	  needs support of VHE.
+	  When built as a module, this will be called as "fujitsu_hwb".
+
+endif # SOC_FUJITSU
diff --git a/drivers/soc/fujitsu/Makefile b/drivers/soc/fujitsu/Makefile
new file mode 100644
index 000000000000..1b8e4c947f7f
--- /dev/null
+++ b/drivers/soc/fujitsu/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_FUJITSU_HARDWARE_BARRIER) +=	fujitsu_hwb.o
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-08 12:54   ` Mark Rutland
  0 siblings, 0 replies; 66+ messages in thread
From: Mark Rutland @ 2021-01-08 12:54 UTC (permalink / raw)
  To: Misono Tomohiro; +Cc: linux-arm-kernel, soc, olof, catalin.marinas, will, arnd

On Fri, Jan 08, 2021 at 07:52:31PM +0900, Misono Tomohiro wrote:
> (Resend as cover letter title was missing in the first time. Sorry for noise)
> 
> Hello,

Hi,

> This series adds Fujitsu A64FX SoC entry in drivers/soc and hardware
> barrier driver for it.
> 
> [Driver Description]
>  A64FX CPU has several functions for HPC workload and hardware barrier
>  is one of them. It is a mechanism to realize fast synchronization by
>  PEs belonging to the same L3 cache domain by using implementation
>  defined hardware registers.
>  For more details, see A64FX HPC extension specification in
>  https://github.com/fujitsu/A64FX
>  
>  The driver mainly offers a set of ioctls to manipulate related registers.
>  Patch 1-9 implements driver code and patch 10 finally adds kconfig,
>  Makefile and MAINTAINER entry for the driver.  

I have a number of concerns here, and at a high level, I do not think
that this is something Linux can reasonably support in its current form.
Sorry if this comes across as harsh; I appreciate the work that has gone
into this, and the effort to try to upstream support is great -- my
concerns are with the overal picture.

As a general rule, we avoid the use of IMPLEMENTATION DEFINED features
in Linux, as they pose a number of correctness/safety challenges and
come with a potentially significan long term maintenance burden that is
generally not justified by the features themselves. For example, such
features are not usable under virtualization (where a hypervisor may set
HCR_EL2.TIDCP, or fail to context-switch state that it is unaware of).

Secondly, the intended usage model appears to expose this to EL0 for
direct access, and the code seems to depend on threads being pinned, but
AFAICT this is not enforced and there is no provision for
context-switch, thread migration, or interaction with ptrace. I fear
this is going to be very fragile in practice, and that extending that
support in future will require much more complexity than is currently
apparent, with potentially invasive changes to arch code.

Thirdly, this requires userspace software to be intimately familiar with
the HW platform that it is running on (both in terms of using IMP-DEF
instructions and needing to know the physical layout), rather than being
generic and portable, which I don't believe is something that we wish to
encourage.  I also think this is unlikely to be supported by generic
software because of the lack of portability, and consequently I struggle
to beleive that this will see significant usage.

Further, as an IMP-DEF feature, it's not clear how much of this will
carry forward into future designs, and where things may change. It's
extremely difficult to determine whether any of the ABI decisions (e.g.
the sysfs layout) are sufficient, or what level of changes would be
necessary in userspace code if there are physical topology changes or
changes to the strucutre of the system register interfaces.

Overall, I think this needs much more justification in terms of
practical usage, safety/correctness, and long term maintenance, and with
that I think the longer term goal would be to use this to justify an
architectural feature along similar lines rather than to support any
IMPLEMENTATION DEFINED variants upstream in Linux.

>  Also, C library and test program for this driver is available on: 
>  https://github.com/fujitsu/hardware_barrier

Hmm... I see some code in that repo which looks suspiciously like code
from the Linux kernel tree, but licensed differently, which is
concerning.

Specifically, the asm block in internal.h (which the SPDX headers says
is licensed as LGPL-3.0-only) looks like a copy of code from
arch/arm64/include/asm/sysreg.h (which is licensed as GPL-2.0-only per
its SPDX header).

If that code was copied, I don't believe that relicensing is permitted.
I would advise that someone with legal training considers the provenance
of that code and what is permitted.

Thanks,
Mark.

>  The driver is based on v5.11-rc2 and tested on FX700 environment.
> 
> [RFC]
>  This is the first time we upstream drivers for our chip and I want to
>  confirm driver location and patch submission process.
> 
>  Based on my observation it seems drivers/soc folder is right place to put
>  this driver, so I added Kconfig entry for arm64 platform config, created
>  soc/fujitsu folder and updated MAINTAINER entry accordingly (last patch).
>  Is it right?
> 
>  Also for final submission I think I need to 1) create some public git
>  tree to push driver code (github or something), 2) make pull request to
>  SOC team (soc@kernel.org). Is it a correct procedure?
> 
>  I will appreciate any help/comments.
> 
> sidenote: We plan to post other drivers for A64FX HPC extension
> (prefetch control and cache control) too anytime soon.
> 
> Misono Tomohiro (10):
>   soc: fujitsu: hwb: Add hardware barrier driver init/exit code
>   soc: fujtisu: hwb: Add open operation
>   soc: fujitsu: hwb: Add IOC_BB_ALLOC ioctl
>   soc: fujitsu: hwb: Add IOC_BW_ASSIGN ioctl
>   soc: fujitsu: hwb: Add IOC_BW_UNASSIGN ioctl
>   soc: fujitsu: hwb: Add IOC_BB_FREE ioctl
>   soc: fujitsu: hwb: Add IOC_GET_PE_INFO ioctl
>   soc: fujitsu: hwb: Add release operation
>   soc: fujitsu: hwb: Add sysfs entry
>   soc: fujitsu: hwb: Add Kconfig/Makefile to build fujitsu_hwb driver
> 
>  MAINTAINERS                            |    7 +
>  arch/arm64/Kconfig.platforms           |    5 +
>  drivers/soc/Kconfig                    |    1 +
>  drivers/soc/Makefile                   |    1 +
>  drivers/soc/fujitsu/Kconfig            |   24 +
>  drivers/soc/fujitsu/Makefile           |    2 +
>  drivers/soc/fujitsu/fujitsu_hwb.c      | 1253 ++++++++++++++++++++++++
>  include/uapi/linux/fujitsu_hpc_ioctl.h |   41 +
>  8 files changed, 1334 insertions(+)
>  create mode 100644 drivers/soc/fujitsu/Kconfig
>  create mode 100644 drivers/soc/fujitsu/Makefile
>  create mode 100644 drivers/soc/fujitsu/fujitsu_hwb.c
>  create mode 100644 include/uapi/linux/fujitsu_hpc_ioctl.h
> 
> -- 
> 2.26.2
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-08 12:54   ` Mark Rutland
  0 siblings, 0 replies; 66+ messages in thread
From: Mark Rutland @ 2021-01-08 12:54 UTC (permalink / raw)
  To: Misono Tomohiro; +Cc: arnd, catalin.marinas, soc, olof, will, linux-arm-kernel

On Fri, Jan 08, 2021 at 07:52:31PM +0900, Misono Tomohiro wrote:
> (Resend as cover letter title was missing in the first time. Sorry for noise)
> 
> Hello,

Hi,

> This series adds Fujitsu A64FX SoC entry in drivers/soc and hardware
> barrier driver for it.
> 
> [Driver Description]
>  A64FX CPU has several functions for HPC workload and hardware barrier
>  is one of them. It is a mechanism to realize fast synchronization by
>  PEs belonging to the same L3 cache domain by using implementation
>  defined hardware registers.
>  For more details, see A64FX HPC extension specification in
>  https://github.com/fujitsu/A64FX
>  
>  The driver mainly offers a set of ioctls to manipulate related registers.
>  Patch 1-9 implements driver code and patch 10 finally adds kconfig,
>  Makefile and MAINTAINER entry for the driver.  

I have a number of concerns here, and at a high level, I do not think
that this is something Linux can reasonably support in its current form.
Sorry if this comes across as harsh; I appreciate the work that has gone
into this, and the effort to try to upstream support is great -- my
concerns are with the overal picture.

As a general rule, we avoid the use of IMPLEMENTATION DEFINED features
in Linux, as they pose a number of correctness/safety challenges and
come with a potentially significan long term maintenance burden that is
generally not justified by the features themselves. For example, such
features are not usable under virtualization (where a hypervisor may set
HCR_EL2.TIDCP, or fail to context-switch state that it is unaware of).

Secondly, the intended usage model appears to expose this to EL0 for
direct access, and the code seems to depend on threads being pinned, but
AFAICT this is not enforced and there is no provision for
context-switch, thread migration, or interaction with ptrace. I fear
this is going to be very fragile in practice, and that extending that
support in future will require much more complexity than is currently
apparent, with potentially invasive changes to arch code.

Thirdly, this requires userspace software to be intimately familiar with
the HW platform that it is running on (both in terms of using IMP-DEF
instructions and needing to know the physical layout), rather than being
generic and portable, which I don't believe is something that we wish to
encourage.  I also think this is unlikely to be supported by generic
software because of the lack of portability, and consequently I struggle
to beleive that this will see significant usage.

Further, as an IMP-DEF feature, it's not clear how much of this will
carry forward into future designs, and where things may change. It's
extremely difficult to determine whether any of the ABI decisions (e.g.
the sysfs layout) are sufficient, or what level of changes would be
necessary in userspace code if there are physical topology changes or
changes to the strucutre of the system register interfaces.

Overall, I think this needs much more justification in terms of
practical usage, safety/correctness, and long term maintenance, and with
that I think the longer term goal would be to use this to justify an
architectural feature along similar lines rather than to support any
IMPLEMENTATION DEFINED variants upstream in Linux.

>  Also, C library and test program for this driver is available on: 
>  https://github.com/fujitsu/hardware_barrier

Hmm... I see some code in that repo which looks suspiciously like code
from the Linux kernel tree, but licensed differently, which is
concerning.

Specifically, the asm block in internal.h (which the SPDX headers says
is licensed as LGPL-3.0-only) looks like a copy of code from
arch/arm64/include/asm/sysreg.h (which is licensed as GPL-2.0-only per
its SPDX header).

If that code was copied, I don't believe that relicensing is permitted.
I would advise that someone with legal training considers the provenance
of that code and what is permitted.

Thanks,
Mark.

>  The driver is based on v5.11-rc2 and tested on FX700 environment.
> 
> [RFC]
>  This is the first time we upstream drivers for our chip and I want to
>  confirm driver location and patch submission process.
> 
>  Based on my observation it seems drivers/soc folder is right place to put
>  this driver, so I added Kconfig entry for arm64 platform config, created
>  soc/fujitsu folder and updated MAINTAINER entry accordingly (last patch).
>  Is it right?
> 
>  Also for final submission I think I need to 1) create some public git
>  tree to push driver code (github or something), 2) make pull request to
>  SOC team (soc@kernel.org). Is it a correct procedure?
> 
>  I will appreciate any help/comments.
> 
> sidenote: We plan to post other drivers for A64FX HPC extension
> (prefetch control and cache control) too anytime soon.
> 
> Misono Tomohiro (10):
>   soc: fujitsu: hwb: Add hardware barrier driver init/exit code
>   soc: fujtisu: hwb: Add open operation
>   soc: fujitsu: hwb: Add IOC_BB_ALLOC ioctl
>   soc: fujitsu: hwb: Add IOC_BW_ASSIGN ioctl
>   soc: fujitsu: hwb: Add IOC_BW_UNASSIGN ioctl
>   soc: fujitsu: hwb: Add IOC_BB_FREE ioctl
>   soc: fujitsu: hwb: Add IOC_GET_PE_INFO ioctl
>   soc: fujitsu: hwb: Add release operation
>   soc: fujitsu: hwb: Add sysfs entry
>   soc: fujitsu: hwb: Add Kconfig/Makefile to build fujitsu_hwb driver
> 
>  MAINTAINERS                            |    7 +
>  arch/arm64/Kconfig.platforms           |    5 +
>  drivers/soc/Kconfig                    |    1 +
>  drivers/soc/Makefile                   |    1 +
>  drivers/soc/fujitsu/Kconfig            |   24 +
>  drivers/soc/fujitsu/Makefile           |    2 +
>  drivers/soc/fujitsu/fujitsu_hwb.c      | 1253 ++++++++++++++++++++++++
>  include/uapi/linux/fujitsu_hpc_ioctl.h |   41 +
>  8 files changed, 1334 insertions(+)
>  create mode 100644 drivers/soc/fujitsu/Kconfig
>  create mode 100644 drivers/soc/fujitsu/Makefile
>  create mode 100644 drivers/soc/fujitsu/fujitsu_hwb.c
>  create mode 100644 include/uapi/linux/fujitsu_hpc_ioctl.h
> 
> -- 
> 2.26.2
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 03/10] soc: fujitsu: hwb: Add IOC_BB_ALLOC ioctl
@ 2021-01-08 13:22     ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-01-08 13:22 UTC (permalink / raw)
  To: Misono Tomohiro
  Cc: Linux ARM, SoC Team, Will Deacon, Catalin Marinas, Arnd Bergmann,
	Olof Johansson

On Fri, Jan 8, 2021 at 11:52 AM Misono Tomohiro
<misono.tomohiro@jp.fujitsu.com> wrote:

> +static void write_init_sync_reg(void *args)
> +{
> +       struct init_sync_args *sync_args = (struct init_sync_args *)args;
> +
> +       switch (sync_args->bb) {
> +       case 0:
> +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB0_EL1);
> +               break;
> +       case 1:
> +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB1_EL1);
> +               break;
> +       case 2:
> +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB2_EL1);
> +               break;
> +       case 3:
> +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB3_EL1);
> +               break;
> +       case 4:
> +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB4_EL1);
> +               break;
> +       case 5:
> +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB5_EL1);
> +               break;
> +       }
> +}

(minor style comment

I think this could be simplified into a single write_sysreg_s() with the
register number calculated based on sync_args->bb, rather than duplicating
the same three lines six times.

> +static int ioc_bb_alloc(struct file *filp, void __user *argp)
> +{
> +       struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;

A slightly better way to write the ioctl command specific functions
is to just give the argument the correct type (struct
fujitsu_hwb_ioc_bb_ctl __user*)
instead of 'void __user *', to avoid the cast.

Similarly, as you don't use 'filp' itself, just pass the struct hwb_private_data
pointer as the first argument.

> @@ -164,6 +386,7 @@ static int fujitsu_hwb_dev_open(struct inode *inode, struct file *filp)
>  static const struct file_operations fujitsu_hwb_dev_fops = {
>         .owner          = THIS_MODULE,
>         .open           = fujitsu_hwb_dev_open,
> +       .unlocked_ioctl = fujitsu_hwb_dev_ioctl,
>  };

All drivers with an ioctl interface should work in compat mode as well,
so please add a

       .compat_ioctl = compat_ptr_ioctl;

> +#define __FUJITSU_IOCTL_MAGIC 'F'

It's hard to find a non-conflicting range of ioctl numbers, but
it would be good to note the conflict in

Documentation/userspace-api/ioctl/ioctl-number.rst

The 'F' range is also used in framebuffer drivers.

> +/* ioctl definitions for hardware barrier driver */
> +struct fujitsu_hwb_ioc_bb_ctl {
> +       __u8 cmg;
> +       __u8 bb;
> +       __u8 unused[2];
> +       __u32 size;
> +       unsigned long __user *pemask;
> +};

However, this structure layout is incompatible with 32-bit user
space because of the indirect pointer. See
Documentation/driver-api/ioctl.rst for how to encode a
pointer in a __u64 member.

      Arnd

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 03/10] soc: fujitsu: hwb: Add IOC_BB_ALLOC ioctl
@ 2021-01-08 13:22     ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-01-08 13:22 UTC (permalink / raw)
  To: Misono Tomohiro
  Cc: Arnd Bergmann, Catalin Marinas, SoC Team, Olof Johansson,
	Will Deacon, Linux ARM

On Fri, Jan 8, 2021 at 11:52 AM Misono Tomohiro
<misono.tomohiro@jp.fujitsu.com> wrote:

> +static void write_init_sync_reg(void *args)
> +{
> +       struct init_sync_args *sync_args = (struct init_sync_args *)args;
> +
> +       switch (sync_args->bb) {
> +       case 0:
> +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB0_EL1);
> +               break;
> +       case 1:
> +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB1_EL1);
> +               break;
> +       case 2:
> +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB2_EL1);
> +               break;
> +       case 3:
> +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB3_EL1);
> +               break;
> +       case 4:
> +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB4_EL1);
> +               break;
> +       case 5:
> +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB5_EL1);
> +               break;
> +       }
> +}

(minor style comment

I think this could be simplified into a single write_sysreg_s() with the
register number calculated based on sync_args->bb, rather than duplicating
the same three lines six times.

> +static int ioc_bb_alloc(struct file *filp, void __user *argp)
> +{
> +       struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;

A slightly better way to write the ioctl command specific functions
is to just give the argument the correct type (struct
fujitsu_hwb_ioc_bb_ctl __user*)
instead of 'void __user *', to avoid the cast.

Similarly, as you don't use 'filp' itself, just pass the struct hwb_private_data
pointer as the first argument.

> @@ -164,6 +386,7 @@ static int fujitsu_hwb_dev_open(struct inode *inode, struct file *filp)
>  static const struct file_operations fujitsu_hwb_dev_fops = {
>         .owner          = THIS_MODULE,
>         .open           = fujitsu_hwb_dev_open,
> +       .unlocked_ioctl = fujitsu_hwb_dev_ioctl,
>  };

All drivers with an ioctl interface should work in compat mode as well,
so please add a

       .compat_ioctl = compat_ptr_ioctl;

> +#define __FUJITSU_IOCTL_MAGIC 'F'

It's hard to find a non-conflicting range of ioctl numbers, but
it would be good to note the conflict in

Documentation/userspace-api/ioctl/ioctl-number.rst

The 'F' range is also used in framebuffer drivers.

> +/* ioctl definitions for hardware barrier driver */
> +struct fujitsu_hwb_ioc_bb_ctl {
> +       __u8 cmg;
> +       __u8 bb;
> +       __u8 unused[2];
> +       __u32 size;
> +       unsigned long __user *pemask;
> +};

However, this structure layout is incompatible with 32-bit user
space because of the indirect pointer. See
Documentation/driver-api/ioctl.rst for how to encode a
pointer in a __u64 member.

      Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/10] soc: fujitsu: hwb: Add release operation
@ 2021-01-08 13:25     ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-01-08 13:25 UTC (permalink / raw)
  To: Misono Tomohiro
  Cc: Linux ARM, SoC Team, Will Deacon, Catalin Marinas, Arnd Bergmann,
	Olof Johansson

On Fri, Jan 8, 2021 at 11:52 AM Misono Tomohiro
<misono.tomohiro@jp.fujitsu.com> wrote:
>
> Upon release, we cleanup remaining resources/registers if necessary.
> This happens when user does not call IOC_BB_FREE properly and the
> function will do effectively the same operation as IOC_BB_FREE.
>
> Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>

What is the benefit of calling IOC_BB_FREE instead of always relying
on close() to do this? Would it be easier to just not implement IOC_BB_FREE?

      Arnd

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/10] soc: fujitsu: hwb: Add release operation
@ 2021-01-08 13:25     ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-01-08 13:25 UTC (permalink / raw)
  To: Misono Tomohiro
  Cc: Arnd Bergmann, Catalin Marinas, SoC Team, Olof Johansson,
	Will Deacon, Linux ARM

On Fri, Jan 8, 2021 at 11:52 AM Misono Tomohiro
<misono.tomohiro@jp.fujitsu.com> wrote:
>
> Upon release, we cleanup remaining resources/registers if necessary.
> This happens when user does not call IOC_BB_FREE properly and the
> function will do effectively the same operation as IOC_BB_FREE.
>
> Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>

What is the benefit of calling IOC_BB_FREE instead of always relying
on close() to do this? Would it be easier to just not implement IOC_BB_FREE?

      Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/10] soc: fujitsu: hwb: Add sysfs entry
@ 2021-01-08 13:27     ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-01-08 13:27 UTC (permalink / raw)
  To: Misono Tomohiro
  Cc: Linux ARM, SoC Team, Will Deacon, Catalin Marinas, Arnd Bergmann,
	Olof Johansson

On Fri, Jan 8, 2021 at 11:52 AM Misono Tomohiro
<misono.tomohiro@jp.fujitsu.com> wrote:
>
> This adds sysfs entry per CMG to show running barrier driver status
> for debugging user application. The following entries will be created:
>
> /sys/class/misc/fujitsu_hwb
>  |- hwinfo ... number of CMG/BB/BW/pe_per_cmg on running system
>  |- CMG0
>      |- core_map      ... cpuid belonging to this CMG
>      |- used_bb_bmap  ... bitmap of currently allocated BB
>      |- used_bw_bmap  ... bitmap of currently allocated BW
>      |- init_sync_bb0 ... current value of INIT_SYNC register 0
>      |- init_sync_bb1 ... current value of INIT_SYNC register 1
>      ...
>  |- CMG1
>   ...

If this is meant as a stable interface, it should be documented in
Documentation/ABI/

However, if it's purely for debugging and may not get maintained
forever, you might want to use debugfs instead.

      Arnd

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/10] soc: fujitsu: hwb: Add sysfs entry
@ 2021-01-08 13:27     ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-01-08 13:27 UTC (permalink / raw)
  To: Misono Tomohiro
  Cc: Arnd Bergmann, Catalin Marinas, SoC Team, Olof Johansson,
	Will Deacon, Linux ARM

On Fri, Jan 8, 2021 at 11:52 AM Misono Tomohiro
<misono.tomohiro@jp.fujitsu.com> wrote:
>
> This adds sysfs entry per CMG to show running barrier driver status
> for debugging user application. The following entries will be created:
>
> /sys/class/misc/fujitsu_hwb
>  |- hwinfo ... number of CMG/BB/BW/pe_per_cmg on running system
>  |- CMG0
>      |- core_map      ... cpuid belonging to this CMG
>      |- used_bb_bmap  ... bitmap of currently allocated BB
>      |- used_bw_bmap  ... bitmap of currently allocated BW
>      |- init_sync_bb0 ... current value of INIT_SYNC register 0
>      |- init_sync_bb1 ... current value of INIT_SYNC register 1
>      ...
>  |- CMG1
>   ...

If this is meant as a stable interface, it should be documented in
Documentation/ABI/

However, if it's purely for debugging and may not get maintained
forever, you might want to use debugfs instead.

      Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-08 14:23     ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-01-08 14:23 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Misono Tomohiro, Linux ARM, SoC Team, Olof Johansson,
	Catalin Marinas, Will Deacon, Arnd Bergmann

On Fri, Jan 8, 2021 at 1:54 PM Mark Rutland <mark.rutland@arm.com> wrote:
> On Fri, Jan 08, 2021 at 07:52:31PM +0900, Misono Tomohiro wrote:
> > (Resend as cover letter title was missing in the first time. Sorry for noise)
> >
> > This series adds Fujitsu A64FX SoC entry in drivers/soc and hardware
> > barrier driver for it.
> >
> > [Driver Description]
> >  A64FX CPU has several functions for HPC workload and hardware barrier
> >  is one of them. It is a mechanism to realize fast synchronization by
> >  PEs belonging to the same L3 cache domain by using implementation
> >  defined hardware registers.
> >  For more details, see A64FX HPC extension specification in
> >  https://github.com/fujitsu/A64FX
> >
> >  The driver mainly offers a set of ioctls to manipulate related registers.
> >  Patch 1-9 implements driver code and patch 10 finally adds kconfig,
> >  Makefile and MAINTAINER entry for the driver.
>
> I have a number of concerns here, and at a high level, I do not think
> that this is something Linux can reasonably support in its current form.
> Sorry if this comes across as harsh; I appreciate the work that has gone
> into this, and the effort to try to upstream support is great -- my
> concerns are with the overal picture.
>
> As a general rule, we avoid the use of IMPLEMENTATION DEFINED features
> in Linux, as they pose a number of correctness/safety challenges and
> come with a potentially significan long term maintenance burden that is
> generally not justified by the features themselves. For example, such
> features are not usable under virtualization (where a hypervisor may set
> HCR_EL2.TIDCP, or fail to context-switch state that it is unaware of).

I am somewhat less concerned about the feature being implementation
defined than I am about adding a custom user interface for one
platform.

In the end, anything outside of the CPU core that ends up in a SoC
is implementation defined, and this is usually not a problem as long
as we have an abstraction in the kernel that hides the details from
the user, and the system is still functional if the implementation is
turned off for whatever reason.

> Secondly, the intended usage model appears to expose this to EL0 for
> direct access, and the code seems to depend on threads being pinned, but
> AFAICT this is not enforced and there is no provision for
> context-switch, thread migration, or interaction with ptrace. I fear
> this is going to be very fragile in practice, and that extending that
> support in future will require much more complexity than is currently
> apparent, with potentially invasive changes to arch code.

Right, this is the main problem I see, too. I had not even realized
that this will have to tie in with user space threads in some form, but
you are right that once this has to interact with the CPU scheduler,
it all breaks down.

One way I can imagine this working out is to tie into the cpuset
mechanism that is used for isolating threads to CPU cores, and
then provide a cpuset interface that has the desired behavior
but that can fall back to a generic implementation with the same
or stronger (but normally slower) semantics.

> Thirdly, this requires userspace software to be intimately familiar with
> the HW platform that it is running on (both in terms of using IMP-DEF
> instructions and needing to know the physical layout), rather than being
> generic and portable, which I don't believe is something that we wish to
> encourage.  I also think this is unlikely to be supported by generic
> software because of the lack of portability, and consequently I struggle
> to beleive that this will see significant usage.

Agreed as well.

        Arnd

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-08 14:23     ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-01-08 14:23 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Arnd Bergmann, Catalin Marinas, Misono Tomohiro, SoC Team,
	Olof Johansson, Will Deacon, Linux ARM

On Fri, Jan 8, 2021 at 1:54 PM Mark Rutland <mark.rutland@arm.com> wrote:
> On Fri, Jan 08, 2021 at 07:52:31PM +0900, Misono Tomohiro wrote:
> > (Resend as cover letter title was missing in the first time. Sorry for noise)
> >
> > This series adds Fujitsu A64FX SoC entry in drivers/soc and hardware
> > barrier driver for it.
> >
> > [Driver Description]
> >  A64FX CPU has several functions for HPC workload and hardware barrier
> >  is one of them. It is a mechanism to realize fast synchronization by
> >  PEs belonging to the same L3 cache domain by using implementation
> >  defined hardware registers.
> >  For more details, see A64FX HPC extension specification in
> >  https://github.com/fujitsu/A64FX
> >
> >  The driver mainly offers a set of ioctls to manipulate related registers.
> >  Patch 1-9 implements driver code and patch 10 finally adds kconfig,
> >  Makefile and MAINTAINER entry for the driver.
>
> I have a number of concerns here, and at a high level, I do not think
> that this is something Linux can reasonably support in its current form.
> Sorry if this comes across as harsh; I appreciate the work that has gone
> into this, and the effort to try to upstream support is great -- my
> concerns are with the overal picture.
>
> As a general rule, we avoid the use of IMPLEMENTATION DEFINED features
> in Linux, as they pose a number of correctness/safety challenges and
> come with a potentially significan long term maintenance burden that is
> generally not justified by the features themselves. For example, such
> features are not usable under virtualization (where a hypervisor may set
> HCR_EL2.TIDCP, or fail to context-switch state that it is unaware of).

I am somewhat less concerned about the feature being implementation
defined than I am about adding a custom user interface for one
platform.

In the end, anything outside of the CPU core that ends up in a SoC
is implementation defined, and this is usually not a problem as long
as we have an abstraction in the kernel that hides the details from
the user, and the system is still functional if the implementation is
turned off for whatever reason.

> Secondly, the intended usage model appears to expose this to EL0 for
> direct access, and the code seems to depend on threads being pinned, but
> AFAICT this is not enforced and there is no provision for
> context-switch, thread migration, or interaction with ptrace. I fear
> this is going to be very fragile in practice, and that extending that
> support in future will require much more complexity than is currently
> apparent, with potentially invasive changes to arch code.

Right, this is the main problem I see, too. I had not even realized
that this will have to tie in with user space threads in some form, but
you are right that once this has to interact with the CPU scheduler,
it all breaks down.

One way I can imagine this working out is to tie into the cpuset
mechanism that is used for isolating threads to CPU cores, and
then provide a cpuset interface that has the desired behavior
but that can fall back to a generic implementation with the same
or stronger (but normally slower) semantics.

> Thirdly, this requires userspace software to be intimately familiar with
> the HW platform that it is running on (both in terms of using IMP-DEF
> instructions and needing to know the physical layout), rather than being
> generic and portable, which I don't believe is something that we wish to
> encourage.  I also think this is unlikely to be supported by generic
> software because of the lack of portability, and consequently I struggle
> to beleive that this will see significant usage.

Agreed as well.

        Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-08 15:51       ` Mark Rutland
  0 siblings, 0 replies; 66+ messages in thread
From: Mark Rutland @ 2021-01-08 15:51 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Misono Tomohiro, Linux ARM, SoC Team, Olof Johansson,
	Catalin Marinas, Will Deacon, Arnd Bergmann

On Fri, Jan 08, 2021 at 03:23:23PM +0100, Arnd Bergmann wrote:
> On Fri, Jan 8, 2021 at 1:54 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > As a general rule, we avoid the use of IMPLEMENTATION DEFINED features
> > in Linux, as they pose a number of correctness/safety challenges and
> > come with a potentially significan long term maintenance burden that is
> > generally not justified by the features themselves. For example, such
> > features are not usable under virtualization (where a hypervisor may set
> > HCR_EL2.TIDCP, or fail to context-switch state that it is unaware of).
> 
> I am somewhat less concerned about the feature being implementation
> defined than I am about adding a custom user interface for one
> platform.

I completely agree that adding a custom interface that's platform
specific is undesireable.

> In the end, anything outside of the CPU core that ends up in a SoC
> is implementation defined, and this is usually not a problem as long
> as we have an abstraction in the kernel that hides the details from
> the user, and the system is still functional if the implementation is
> turned off for whatever reason.

I think that peripherals and other bits out in the SoC are quite
different to things built into the CPU, where there's inevitably most
significant and subtle interactions with the architecture, and can be so
closely coupled as to not have a good point to apply abstraction. We
have common ways of abstracting storage devices, but the same is not as
true for userspace instructions.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-08 15:51       ` Mark Rutland
  0 siblings, 0 replies; 66+ messages in thread
From: Mark Rutland @ 2021-01-08 15:51 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Arnd Bergmann, Catalin Marinas, Misono Tomohiro, SoC Team,
	Olof Johansson, Will Deacon, Linux ARM

On Fri, Jan 08, 2021 at 03:23:23PM +0100, Arnd Bergmann wrote:
> On Fri, Jan 8, 2021 at 1:54 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > As a general rule, we avoid the use of IMPLEMENTATION DEFINED features
> > in Linux, as they pose a number of correctness/safety challenges and
> > come with a potentially significan long term maintenance burden that is
> > generally not justified by the features themselves. For example, such
> > features are not usable under virtualization (where a hypervisor may set
> > HCR_EL2.TIDCP, or fail to context-switch state that it is unaware of).
> 
> I am somewhat less concerned about the feature being implementation
> defined than I am about adding a custom user interface for one
> platform.

I completely agree that adding a custom interface that's platform
specific is undesireable.

> In the end, anything outside of the CPU core that ends up in a SoC
> is implementation defined, and this is usually not a problem as long
> as we have an abstraction in the kernel that hides the details from
> the user, and the system is still functional if the implementation is
> turned off for whatever reason.

I think that peripherals and other bits out in the SoC are quite
different to things built into the CPU, where there's inevitably most
significant and subtle interactions with the architecture, and can be so
closely coupled as to not have a good point to apply abstraction. We
have common ways of abstracting storage devices, but the same is not as
true for userspace instructions.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-12 10:24       ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-01-12 10:24 UTC (permalink / raw)
  To: 'Arnd Bergmann', Mark Rutland
  Cc: Linux ARM, SoC Team, Olof Johansson, Catalin Marinas,
	Will Deacon, Arnd Bergmann

Hi, 

First of all, thanks a lot for all the comments to both of you (cont. below).

> On Fri, Jan 8, 2021 at 1:54 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > On Fri, Jan 08, 2021 at 07:52:31PM +0900, Misono Tomohiro wrote:
> > > (Resend as cover letter title was missing in the first time. Sorry for noise)
> > >
> > > This series adds Fujitsu A64FX SoC entry in drivers/soc and hardware
> > > barrier driver for it.
> > >
> > > [Driver Description]
> > >  A64FX CPU has several functions for HPC workload and hardware barrier
> > >  is one of them. It is a mechanism to realize fast synchronization by
> > >  PEs belonging to the same L3 cache domain by using implementation
> > >  defined hardware registers.
> > >  For more details, see A64FX HPC extension specification in
> > >  https://github.com/fujitsu/A64FX
> > >
> > >  The driver mainly offers a set of ioctls to manipulate related registers.
> > >  Patch 1-9 implements driver code and patch 10 finally adds kconfig,
> > >  Makefile and MAINTAINER entry for the driver.
> >
> > I have a number of concerns here, and at a high level, I do not think
> > that this is something Linux can reasonably support in its current form.
> > Sorry if this comes across as harsh; I appreciate the work that has gone
> > into this, and the effort to try to upstream support is great -- my
> > concerns are with the overal picture.
> >
> > As a general rule, we avoid the use of IMPLEMENTATION DEFINED features
> > in Linux, as they pose a number of correctness/safety challenges and
> > come with a potentially significan long term maintenance burden that is
> > generally not justified by the features themselves. For example, such
> > features are not usable under virtualization (where a hypervisor may set
> > HCR_EL2.TIDCP, or fail to context-switch state that it is unaware of).
> 
> I am somewhat less concerned about the feature being implementation
> defined than I am about adding a custom user interface for one
> platform.
> 
> In the end, anything outside of the CPU core that ends up in a SoC
> is implementation defined, and this is usually not a problem as long
> as we have an abstraction in the kernel that hides the details from
> the user, and the system is still functional if the implementation is
> turned off for whatever reason.

Understood. However, I don't know any other processors having similar
features at this point and it is hard to provide common abstraction interface.
I would appreciate should anyone have any information.

> > Secondly, the intended usage model appears to expose this to EL0 for
> > direct access, and the code seems to depend on threads being pinned, but
> > AFAICT this is not enforced and there is no provision for
> > context-switch, thread migration, or interaction with ptrace. I fear
> > this is going to be very fragile in practice, and that extending that
> > support in future will require much more complexity than is currently
> > apparent, with potentially invasive changes to arch code.
> 
> Right, this is the main problem I see, too. I had not even realized
> that this will have to tie in with user space threads in some form, but
> you are right that once this has to interact with the CPU scheduler,
> it all breaks down.

This observation is right. I thought adding context switch etc. support for 
implementation defined registers requires core arch code changes and 
it is far less acceptable. So, I tried to confine code change in a module with 
these restrictions. 

Regarding direct access from EL0, it is necessary for realizing fast synchronization 
as this enables synchronization logic in user application check if all threads have
reached at synchronization point without switching to kernel.
Also, It is common usage that each running thread is bound to one PE in multi-threaded 
HPC applications.

> One way I can imagine this working out is to tie into the cpuset
> mechanism that is used for isolating threads to CPU cores, and
> then provide a cpuset interface that has the desired behavior
> but that can fall back to a generic implementation with the same
> or stronger (but normally slower) semantics.

I'm not sure if this approach is feasible, but I will try to look into it.

> > Thirdly, this requires userspace software to be intimately familiar with
> > the HW platform that it is running on (both in terms of using IMP-DEF
> > instructions and needing to know the physical layout), rather than being
> > generic and portable, which I don't believe is something that we wish to
> > encourage.  I also think this is unlikely to be supported by generic
> > software because of the lack of portability, and consequently I struggle
> > to beleive that this will see significant usage.
> 
> Agreed as well.

It may be possible to trap access to these implementation defined registers 
and fallback some logic in the driver. The problem is that other processors 
might use the same IMP-DEF registers for different purpose.

Regards,
Tomohiro

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-12 10:24       ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-01-12 10:24 UTC (permalink / raw)
  To: 'Arnd Bergmann', Mark Rutland
  Cc: Arnd Bergmann, Catalin Marinas, SoC Team, Olof Johansson,
	Will Deacon, Linux ARM

Hi, 

First of all, thanks a lot for all the comments to both of you (cont. below).

> On Fri, Jan 8, 2021 at 1:54 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > On Fri, Jan 08, 2021 at 07:52:31PM +0900, Misono Tomohiro wrote:
> > > (Resend as cover letter title was missing in the first time. Sorry for noise)
> > >
> > > This series adds Fujitsu A64FX SoC entry in drivers/soc and hardware
> > > barrier driver for it.
> > >
> > > [Driver Description]
> > >  A64FX CPU has several functions for HPC workload and hardware barrier
> > >  is one of them. It is a mechanism to realize fast synchronization by
> > >  PEs belonging to the same L3 cache domain by using implementation
> > >  defined hardware registers.
> > >  For more details, see A64FX HPC extension specification in
> > >  https://github.com/fujitsu/A64FX
> > >
> > >  The driver mainly offers a set of ioctls to manipulate related registers.
> > >  Patch 1-9 implements driver code and patch 10 finally adds kconfig,
> > >  Makefile and MAINTAINER entry for the driver.
> >
> > I have a number of concerns here, and at a high level, I do not think
> > that this is something Linux can reasonably support in its current form.
> > Sorry if this comes across as harsh; I appreciate the work that has gone
> > into this, and the effort to try to upstream support is great -- my
> > concerns are with the overal picture.
> >
> > As a general rule, we avoid the use of IMPLEMENTATION DEFINED features
> > in Linux, as they pose a number of correctness/safety challenges and
> > come with a potentially significan long term maintenance burden that is
> > generally not justified by the features themselves. For example, such
> > features are not usable under virtualization (where a hypervisor may set
> > HCR_EL2.TIDCP, or fail to context-switch state that it is unaware of).
> 
> I am somewhat less concerned about the feature being implementation
> defined than I am about adding a custom user interface for one
> platform.
> 
> In the end, anything outside of the CPU core that ends up in a SoC
> is implementation defined, and this is usually not a problem as long
> as we have an abstraction in the kernel that hides the details from
> the user, and the system is still functional if the implementation is
> turned off for whatever reason.

Understood. However, I don't know any other processors having similar
features at this point and it is hard to provide common abstraction interface.
I would appreciate should anyone have any information.

> > Secondly, the intended usage model appears to expose this to EL0 for
> > direct access, and the code seems to depend on threads being pinned, but
> > AFAICT this is not enforced and there is no provision for
> > context-switch, thread migration, or interaction with ptrace. I fear
> > this is going to be very fragile in practice, and that extending that
> > support in future will require much more complexity than is currently
> > apparent, with potentially invasive changes to arch code.
> 
> Right, this is the main problem I see, too. I had not even realized
> that this will have to tie in with user space threads in some form, but
> you are right that once this has to interact with the CPU scheduler,
> it all breaks down.

This observation is right. I thought adding context switch etc. support for 
implementation defined registers requires core arch code changes and 
it is far less acceptable. So, I tried to confine code change in a module with 
these restrictions. 

Regarding direct access from EL0, it is necessary for realizing fast synchronization 
as this enables synchronization logic in user application check if all threads have
reached at synchronization point without switching to kernel.
Also, It is common usage that each running thread is bound to one PE in multi-threaded 
HPC applications.

> One way I can imagine this working out is to tie into the cpuset
> mechanism that is used for isolating threads to CPU cores, and
> then provide a cpuset interface that has the desired behavior
> but that can fall back to a generic implementation with the same
> or stronger (but normally slower) semantics.

I'm not sure if this approach is feasible, but I will try to look into it.

> > Thirdly, this requires userspace software to be intimately familiar with
> > the HW platform that it is running on (both in terms of using IMP-DEF
> > instructions and needing to know the physical layout), rather than being
> > generic and portable, which I don't believe is something that we wish to
> > encourage.  I also think this is unlikely to be supported by generic
> > software because of the lack of portability, and consequently I struggle
> > to beleive that this will see significant usage.
> 
> Agreed as well.

It may be possible to trap access to these implementation defined registers 
and fallback some logic in the driver. The problem is that other processors 
might use the same IMP-DEF registers for different purpose.

Regards,
Tomohiro
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-12 10:32     ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-01-12 10:32 UTC (permalink / raw)
  To: 'Mark Rutland'
  Cc: linux-arm-kernel, soc, olof, catalin.marinas, will, arnd

> On Fri, Jan 08, 2021 at 07:52:31PM +0900, Misono Tomohiro wrote:
> > (Resend as cover letter title was missing in the first time. Sorry for noise)
> >
> > Hello,
> 
> Hi,
> 
> > This series adds Fujitsu A64FX SoC entry in drivers/soc and hardware
> > barrier driver for it.
> >
> > [Driver Description]
> >  A64FX CPU has several functions for HPC workload and hardware barrier
> >  is one of them. It is a mechanism to realize fast synchronization by
> >  PEs belonging to the same L3 cache domain by using implementation
> >  defined hardware registers.
> >  For more details, see A64FX HPC extension specification in
> >  https://github.com/fujitsu/A64FX
> >
> >  The driver mainly offers a set of ioctls to manipulate related registers.
> >  Patch 1-9 implements driver code and patch 10 finally adds kconfig,
> >  Makefile and MAINTAINER entry for the driver.
> 
> I have a number of concerns here, and at a high level, I do not think
> that this is something Linux can reasonably support in its current form.
> Sorry if this comes across as harsh; I appreciate the work that has gone
> into this, and the effort to try to upstream support is great -- my
> concerns are with the overal picture.
> 
> As a general rule, we avoid the use of IMPLEMENTATION DEFINED features
> in Linux, as they pose a number of correctness/safety challenges and
> come with a potentially significan long term maintenance burden that is
> generally not justified by the features themselves. For example, such
> features are not usable under virtualization (where a hypervisor may set
> HCR_EL2.TIDCP, or fail to context-switch state that it is unaware of).
> 
> Secondly, the intended usage model appears to expose this to EL0 for
> direct access, and the code seems to depend on threads being pinned, but
> AFAICT this is not enforced and there is no provision for
> context-switch, thread migration, or interaction with ptrace. I fear
> this is going to be very fragile in practice, and that extending that
> support in future will require much more complexity than is currently
> apparent, with potentially invasive changes to arch code.
> 
> Thirdly, this requires userspace software to be intimately familiar with
> the HW platform that it is running on (both in terms of using IMP-DEF
> instructions and needing to know the physical layout), rather than being
> generic and portable, which I don't believe is something that we wish to
> encourage.  I also think this is unlikely to be supported by generic
> software because of the lack of portability, and consequently I struggle
> to beleive that this will see significant usage.
> 
> Further, as an IMP-DEF feature, it's not clear how much of this will
> carry forward into future designs, and where things may change. It's
> extremely difficult to determine whether any of the ABI decisions (e.g.
> the sysfs layout) are sufficient, or what level of changes would be
> necessary in userspace code if there are physical topology changes or
> changes to the strucutre of the system register interfaces.
> 
> Overall, I think this needs much more justification in terms of
> practical usage, safety/correctness, and long term maintenance, and with
> that I think the longer term goal would be to use this to justify an
> architectural feature along similar lines rather than to support any
> IMPLEMENTATION DEFINED variants upstream in Linux.
> 
> >  Also, C library and test program for this driver is available on:
> >  https://github.com/fujitsu/hardware_barrier
> 
> Hmm... I see some code in that repo which looks suspiciously like code
> from the Linux kernel tree, but licensed differently, which is
> concerning.
> 
> Specifically, the asm block in internal.h (which the SPDX headers says
> is licensed as LGPL-3.0-only) looks like a copy of code from
> arch/arm64/include/asm/sysreg.h (which is licensed as GPL-2.0-only per
> its SPDX header).
> 
> If that code was copied, I don't believe that relicensing is permitted.
> I would advise that someone with legal training considers the provenance
> of that code and what is permitted.

Sorry, I must have lacked the attention where the code comes from when I wrote the code.
I have removed that part to write assemby directly:
 https://github.com/fujitsu/hardware_barrier/blob/develop/src/internal.h
 https://github.com/fujitsu/hardware_barrier/blob/develop/src/hwblib.c#L215

Regards,
Tomohiro

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-12 10:32     ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-01-12 10:32 UTC (permalink / raw)
  To: 'Mark Rutland'
  Cc: arnd, catalin.marinas, soc, olof, will, linux-arm-kernel

> On Fri, Jan 08, 2021 at 07:52:31PM +0900, Misono Tomohiro wrote:
> > (Resend as cover letter title was missing in the first time. Sorry for noise)
> >
> > Hello,
> 
> Hi,
> 
> > This series adds Fujitsu A64FX SoC entry in drivers/soc and hardware
> > barrier driver for it.
> >
> > [Driver Description]
> >  A64FX CPU has several functions for HPC workload and hardware barrier
> >  is one of them. It is a mechanism to realize fast synchronization by
> >  PEs belonging to the same L3 cache domain by using implementation
> >  defined hardware registers.
> >  For more details, see A64FX HPC extension specification in
> >  https://github.com/fujitsu/A64FX
> >
> >  The driver mainly offers a set of ioctls to manipulate related registers.
> >  Patch 1-9 implements driver code and patch 10 finally adds kconfig,
> >  Makefile and MAINTAINER entry for the driver.
> 
> I have a number of concerns here, and at a high level, I do not think
> that this is something Linux can reasonably support in its current form.
> Sorry if this comes across as harsh; I appreciate the work that has gone
> into this, and the effort to try to upstream support is great -- my
> concerns are with the overal picture.
> 
> As a general rule, we avoid the use of IMPLEMENTATION DEFINED features
> in Linux, as they pose a number of correctness/safety challenges and
> come with a potentially significan long term maintenance burden that is
> generally not justified by the features themselves. For example, such
> features are not usable under virtualization (where a hypervisor may set
> HCR_EL2.TIDCP, or fail to context-switch state that it is unaware of).
> 
> Secondly, the intended usage model appears to expose this to EL0 for
> direct access, and the code seems to depend on threads being pinned, but
> AFAICT this is not enforced and there is no provision for
> context-switch, thread migration, or interaction with ptrace. I fear
> this is going to be very fragile in practice, and that extending that
> support in future will require much more complexity than is currently
> apparent, with potentially invasive changes to arch code.
> 
> Thirdly, this requires userspace software to be intimately familiar with
> the HW platform that it is running on (both in terms of using IMP-DEF
> instructions and needing to know the physical layout), rather than being
> generic and portable, which I don't believe is something that we wish to
> encourage.  I also think this is unlikely to be supported by generic
> software because of the lack of portability, and consequently I struggle
> to beleive that this will see significant usage.
> 
> Further, as an IMP-DEF feature, it's not clear how much of this will
> carry forward into future designs, and where things may change. It's
> extremely difficult to determine whether any of the ABI decisions (e.g.
> the sysfs layout) are sufficient, or what level of changes would be
> necessary in userspace code if there are physical topology changes or
> changes to the strucutre of the system register interfaces.
> 
> Overall, I think this needs much more justification in terms of
> practical usage, safety/correctness, and long term maintenance, and with
> that I think the longer term goal would be to use this to justify an
> architectural feature along similar lines rather than to support any
> IMPLEMENTATION DEFINED variants upstream in Linux.
> 
> >  Also, C library and test program for this driver is available on:
> >  https://github.com/fujitsu/hardware_barrier
> 
> Hmm... I see some code in that repo which looks suspiciously like code
> from the Linux kernel tree, but licensed differently, which is
> concerning.
> 
> Specifically, the asm block in internal.h (which the SPDX headers says
> is licensed as LGPL-3.0-only) looks like a copy of code from
> arch/arm64/include/asm/sysreg.h (which is licensed as GPL-2.0-only per
> its SPDX header).
> 
> If that code was copied, I don't believe that relicensing is permitted.
> I would advise that someone with legal training considers the provenance
> of that code and what is permitted.

Sorry, I must have lacked the attention where the code comes from when I wrote the code.
I have removed that part to write assemby directly:
 https://github.com/fujitsu/hardware_barrier/blob/develop/src/internal.h
 https://github.com/fujitsu/hardware_barrier/blob/develop/src/hwblib.c#L215

Regards,
Tomohiro

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH 08/10] soc: fujitsu: hwb: Add release operation
@ 2021-01-12 10:38       ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-01-12 10:38 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Linux ARM, SoC Team, Will Deacon, Catalin Marinas, Arnd Bergmann,
	Olof Johansson

> On Fri, Jan 8, 2021 at 11:52 AM Misono Tomohiro
> <misono.tomohiro@jp.fujitsu.com> wrote:
> >
> > Upon release, we cleanup remaining resources/registers if necessary.
> > This happens when user does not call IOC_BB_FREE properly and the
> > function will do effectively the same operation as IOC_BB_FREE.
> >
> > Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
> 
> What is the benefit of calling IOC_BB_FREE instead of always relying
> on close() to do this? Would it be easier to just not implement IOC_BB_FREE?
> 

This is up to applications.
When the application want to reuse barrier driver resource with diffent
barrier settings (i.e. when changing which PEs joinining synchronization),
it can use IOC_BB_FREE rather than close and open device file again.

Regards,
Tomohiro

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH 08/10] soc: fujitsu: hwb: Add release operation
@ 2021-01-12 10:38       ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-01-12 10:38 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Arnd Bergmann, Catalin Marinas, SoC Team, Olof Johansson,
	Will Deacon, Linux ARM

> On Fri, Jan 8, 2021 at 11:52 AM Misono Tomohiro
> <misono.tomohiro@jp.fujitsu.com> wrote:
> >
> > Upon release, we cleanup remaining resources/registers if necessary.
> > This happens when user does not call IOC_BB_FREE properly and the
> > function will do effectively the same operation as IOC_BB_FREE.
> >
> > Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
> 
> What is the benefit of calling IOC_BB_FREE instead of always relying
> on close() to do this? Would it be easier to just not implement IOC_BB_FREE?
> 

This is up to applications.
When the application want to reuse barrier driver resource with diffent
barrier settings (i.e. when changing which PEs joinining synchronization),
it can use IOC_BB_FREE rather than close and open device file again.

Regards,
Tomohiro
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH 09/10] soc: fujitsu: hwb: Add sysfs entry
@ 2021-01-12 10:40       ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-01-12 10:40 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Linux ARM, SoC Team, Will Deacon, Catalin Marinas, Arnd Bergmann,
	Olof Johansson

> On Fri, Jan 8, 2021 at 11:52 AM Misono Tomohiro
> <misono.tomohiro@jp.fujitsu.com> wrote:
> >
> > This adds sysfs entry per CMG to show running barrier driver status
> > for debugging user application. The following entries will be created:
> >
> > /sys/class/misc/fujitsu_hwb
> >  |- hwinfo ... number of CMG/BB/BW/pe_per_cmg on running system
> >  |- CMG0
> >      |- core_map      ... cpuid belonging to this CMG
> >      |- used_bb_bmap  ... bitmap of currently allocated BB
> >      |- used_bw_bmap  ... bitmap of currently allocated BW
> >      |- init_sync_bb0 ... current value of INIT_SYNC register 0
> >      |- init_sync_bb1 ... current value of INIT_SYNC register 1
> >      ...
> >  |- CMG1
> >   ...
> 
> If this is meant as a stable interface, it should be documented in
> Documentation/ABI/
> 
> However, if it's purely for debugging and may not get maintained
> forever, you might want to use debugfs instead.
> 

As you guess this is only for debugging purpose, so I will try debugfs.

Thanks for review,
Tomohiro

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH 09/10] soc: fujitsu: hwb: Add sysfs entry
@ 2021-01-12 10:40       ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-01-12 10:40 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Arnd Bergmann, Catalin Marinas, SoC Team, Olof Johansson,
	Will Deacon, Linux ARM

> On Fri, Jan 8, 2021 at 11:52 AM Misono Tomohiro
> <misono.tomohiro@jp.fujitsu.com> wrote:
> >
> > This adds sysfs entry per CMG to show running barrier driver status
> > for debugging user application. The following entries will be created:
> >
> > /sys/class/misc/fujitsu_hwb
> >  |- hwinfo ... number of CMG/BB/BW/pe_per_cmg on running system
> >  |- CMG0
> >      |- core_map      ... cpuid belonging to this CMG
> >      |- used_bb_bmap  ... bitmap of currently allocated BB
> >      |- used_bw_bmap  ... bitmap of currently allocated BW
> >      |- init_sync_bb0 ... current value of INIT_SYNC register 0
> >      |- init_sync_bb1 ... current value of INIT_SYNC register 1
> >      ...
> >  |- CMG1
> >   ...
> 
> If this is meant as a stable interface, it should be documented in
> Documentation/ABI/
> 
> However, if it's purely for debugging and may not get maintained
> forever, you might want to use debugfs instead.
> 

As you guess this is only for debugging purpose, so I will try debugfs.

Thanks for review,
Tomohiro
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH 03/10] soc: fujitsu: hwb: Add IOC_BB_ALLOC ioctl
@ 2021-01-12 11:02       ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-01-12 11:02 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Linux ARM, SoC Team, Will Deacon, Catalin Marinas, Arnd Bergmann,
	Olof Johansson

> On Fri, Jan 8, 2021 at 11:52 AM Misono Tomohiro
> <misono.tomohiro@jp.fujitsu.com> wrote:
> 
> > +static void write_init_sync_reg(void *args)
> > +{
> > +       struct init_sync_args *sync_args = (struct init_sync_args *)args;
> > +
> > +       switch (sync_args->bb) {
> > +       case 0:
> > +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB0_EL1);
> > +               break;
> > +       case 1:
> > +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB1_EL1);
> > +               break;
> > +       case 2:
> > +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB2_EL1);
> > +               break;
> > +       case 3:
> > +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB3_EL1);
> > +               break;
> > +       case 4:
> > +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB4_EL1);
> > +               break;
> > +       case 5:
> > +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB5_EL1);
> > +               break;
> > +       }
> > +}
> 
> (minor style comment
> 
> I think this could be simplified into a single write_sysreg_s() with the
> register number calculated based on sync_args->bb, rather than duplicating
> the same three lines six times.

I think msr/mrs instructions needs register names at compile time so
it cannot be calculate dynamically. Or am I misunderstood?

> > +static int ioc_bb_alloc(struct file *filp, void __user *argp)
> > +{
> > +       struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;
> 
> A slightly better way to write the ioctl command specific functions
> is to just give the argument the correct type (struct
> fujitsu_hwb_ioc_bb_ctl __user*)
> instead of 'void __user *', to avoid the cast.
> 
> Similarly, as you don't use 'filp' itself, just pass the struct hwb_private_data
> pointer as the first argument.

thanks, I will follow this advise.
 
> > @@ -164,6 +386,7 @@ static int fujitsu_hwb_dev_open(struct inode *inode, struct file *filp)
> >  static const struct file_operations fujitsu_hwb_dev_fops = {
> >         .owner          = THIS_MODULE,
> >         .open           = fujitsu_hwb_dev_open,
> > +       .unlocked_ioctl = fujitsu_hwb_dev_ioctl,
> >  };
> 
> All drivers with an ioctl interface should work in compat mode as well,
> so please add a
> 
>        .compat_ioctl = compat_ptr_ioctl;


A64FX does not support 32-bit mode (aarch32 state).
So I think unlockd_ioctl is enough or is it better to use compat_ioctl anyway?

> 
> > +#define __FUJITSU_IOCTL_MAGIC 'F'
> 
> It's hard to find a non-conflicting range of ioctl numbers, but
> it would be good to note the conflict in
> 
> Documentation/userspace-api/ioctl/ioctl-number.rst
> 
> The 'F' range is also used in framebuffer drivers.

I didn't notice this, thanks for pointing out.

Again, thanks for all the reviews/comments.
Tomohiro

> > +/* ioctl definitions for hardware barrier driver */
> > +struct fujitsu_hwb_ioc_bb_ctl {
> > +       __u8 cmg;
> > +       __u8 bb;
> > +       __u8 unused[2];
> > +       __u32 size;
> > +       unsigned long __user *pemask;
> > +};
> 
> However, this structure layout is incompatible with 32-bit user
> space because of the indirect pointer. See
> Documentation/driver-api/ioctl.rst for how to encode a
> pointer in a __u64 member.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH 03/10] soc: fujitsu: hwb: Add IOC_BB_ALLOC ioctl
@ 2021-01-12 11:02       ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-01-12 11:02 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Arnd Bergmann, Catalin Marinas, SoC Team, Olof Johansson,
	Will Deacon, Linux ARM

> On Fri, Jan 8, 2021 at 11:52 AM Misono Tomohiro
> <misono.tomohiro@jp.fujitsu.com> wrote:
> 
> > +static void write_init_sync_reg(void *args)
> > +{
> > +       struct init_sync_args *sync_args = (struct init_sync_args *)args;
> > +
> > +       switch (sync_args->bb) {
> > +       case 0:
> > +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB0_EL1);
> > +               break;
> > +       case 1:
> > +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB1_EL1);
> > +               break;
> > +       case 2:
> > +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB2_EL1);
> > +               break;
> > +       case 3:
> > +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB3_EL1);
> > +               break;
> > +       case 4:
> > +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB4_EL1);
> > +               break;
> > +       case 5:
> > +               write_sysreg_s(sync_args->val, FHWB_INIT_SYNC_BB5_EL1);
> > +               break;
> > +       }
> > +}
> 
> (minor style comment
> 
> I think this could be simplified into a single write_sysreg_s() with the
> register number calculated based on sync_args->bb, rather than duplicating
> the same three lines six times.

I think msr/mrs instructions needs register names at compile time so
it cannot be calculate dynamically. Or am I misunderstood?

> > +static int ioc_bb_alloc(struct file *filp, void __user *argp)
> > +{
> > +       struct hwb_private_data *pdata = (struct hwb_private_data *)filp->private_data;
> 
> A slightly better way to write the ioctl command specific functions
> is to just give the argument the correct type (struct
> fujitsu_hwb_ioc_bb_ctl __user*)
> instead of 'void __user *', to avoid the cast.
> 
> Similarly, as you don't use 'filp' itself, just pass the struct hwb_private_data
> pointer as the first argument.

thanks, I will follow this advise.
 
> > @@ -164,6 +386,7 @@ static int fujitsu_hwb_dev_open(struct inode *inode, struct file *filp)
> >  static const struct file_operations fujitsu_hwb_dev_fops = {
> >         .owner          = THIS_MODULE,
> >         .open           = fujitsu_hwb_dev_open,
> > +       .unlocked_ioctl = fujitsu_hwb_dev_ioctl,
> >  };
> 
> All drivers with an ioctl interface should work in compat mode as well,
> so please add a
> 
>        .compat_ioctl = compat_ptr_ioctl;


A64FX does not support 32-bit mode (aarch32 state).
So I think unlockd_ioctl is enough or is it better to use compat_ioctl anyway?

> 
> > +#define __FUJITSU_IOCTL_MAGIC 'F'
> 
> It's hard to find a non-conflicting range of ioctl numbers, but
> it would be good to note the conflict in
> 
> Documentation/userspace-api/ioctl/ioctl-number.rst
> 
> The 'F' range is also used in framebuffer drivers.

I didn't notice this, thanks for pointing out.

Again, thanks for all the reviews/comments.
Tomohiro

> > +/* ioctl definitions for hardware barrier driver */
> > +struct fujitsu_hwb_ioc_bb_ctl {
> > +       __u8 cmg;
> > +       __u8 bb;
> > +       __u8 unused[2];
> > +       __u32 size;
> > +       unsigned long __user *pemask;
> > +};
> 
> However, this structure layout is incompatible with 32-bit user
> space because of the indirect pointer. See
> Documentation/driver-api/ioctl.rst for how to encode a
> pointer in a __u64 member.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 03/10] soc: fujitsu: hwb: Add IOC_BB_ALLOC ioctl
@ 2021-01-12 12:34         ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-01-12 12:34 UTC (permalink / raw)
  To: misono.tomohiro
  Cc: Linux ARM, SoC Team, Will Deacon, Catalin Marinas, Arnd Bergmann,
	Olof Johansson

On Tue, Jan 12, 2021 at 12:02 PM misono.tomohiro@fujitsu.com
<misono.tomohiro@fujitsu.com> wrote:
> > On Fri, Jan 8, 2021 at 11:52 AM Misono Tomohiro
> > <misono.tomohiro@jp.fujitsu.com> wrote:
> >> >
> > I think this could be simplified into a single write_sysreg_s() with the
> > register number calculated based on sync_args->bb, rather than duplicating
> > the same three lines six times.
>
> I think msr/mrs instructions needs register names at compile time so
> it cannot be calculate dynamically. Or am I misunderstood?

You are correct, I didn't realize it was implemented using a string
concatenation macro and that an inline function version would be
tricky.

> > > @@ -164,6 +386,7 @@ static int fujitsu_hwb_dev_open(struct inode *inode, struct file *filp)
> > >  static const struct file_operations fujitsu_hwb_dev_fops = {
> > >         .owner          = THIS_MODULE,
> > >         .open           = fujitsu_hwb_dev_open,
> > > +       .unlocked_ioctl = fujitsu_hwb_dev_ioctl,
> > >  };
> >
> > All drivers with an ioctl interface should work in compat mode as well,
> > so please add a
> >
> >        .compat_ioctl = compat_ptr_ioctl;
>
>
> A64FX does not support 32-bit mode (aarch32 state).
> So I think unlockd_ioctl is enough or is it better to use compat_ioctl anyway?

It's a good point that this is not supported, but I would suggest adding
it anyway out of principle. It's better to always write code in a portable
way even if you do not expect to need the portability.

       Arnd

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 03/10] soc: fujitsu: hwb: Add IOC_BB_ALLOC ioctl
@ 2021-01-12 12:34         ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-01-12 12:34 UTC (permalink / raw)
  To: misono.tomohiro
  Cc: Arnd Bergmann, Catalin Marinas, SoC Team, Olof Johansson,
	Will Deacon, Linux ARM

On Tue, Jan 12, 2021 at 12:02 PM misono.tomohiro@fujitsu.com
<misono.tomohiro@fujitsu.com> wrote:
> > On Fri, Jan 8, 2021 at 11:52 AM Misono Tomohiro
> > <misono.tomohiro@jp.fujitsu.com> wrote:
> >> >
> > I think this could be simplified into a single write_sysreg_s() with the
> > register number calculated based on sync_args->bb, rather than duplicating
> > the same three lines six times.
>
> I think msr/mrs instructions needs register names at compile time so
> it cannot be calculate dynamically. Or am I misunderstood?

You are correct, I didn't realize it was implemented using a string
concatenation macro and that an inline function version would be
tricky.

> > > @@ -164,6 +386,7 @@ static int fujitsu_hwb_dev_open(struct inode *inode, struct file *filp)
> > >  static const struct file_operations fujitsu_hwb_dev_fops = {
> > >         .owner          = THIS_MODULE,
> > >         .open           = fujitsu_hwb_dev_open,
> > > +       .unlocked_ioctl = fujitsu_hwb_dev_ioctl,
> > >  };
> >
> > All drivers with an ioctl interface should work in compat mode as well,
> > so please add a
> >
> >        .compat_ioctl = compat_ptr_ioctl;
>
>
> A64FX does not support 32-bit mode (aarch32 state).
> So I think unlockd_ioctl is enough or is it better to use compat_ioctl anyway?

It's a good point that this is not supported, but I would suggest adding
it anyway out of principle. It's better to always write code in a portable
way even if you do not expect to need the portability.

       Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-12 14:22         ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-01-12 14:22 UTC (permalink / raw)
  To: misono.tomohiro
  Cc: Mark Rutland, Arnd Bergmann, Catalin Marinas, SoC Team,
	Olof Johansson, Will Deacon, Linux ARM

On Tue, Jan 12, 2021 at 11:24 AM misono.tomohiro@fujitsu.com
<misono.tomohiro@fujitsu.com> wrote:
> > On Fri, Jan 8, 2021 at 1:54 PM Mark Rutland <mark.rutland@arm.com> wrote:
> However, I don't know any other processors having similar
> features at this point and it is hard to provide common abstraction interface.
> I would appreciate should anyone have any information.

The specification you pointed to mentions the SPARC64 XIfx, so
at a minimum, a user interface should be designed to also work on
whatever register-level interface that provides.

> > > Secondly, the intended usage model appears to expose this to EL0 for
> > > direct access, and the code seems to depend on threads being pinned, but
> > > AFAICT this is not enforced and there is no provision for
> > > context-switch, thread migration, or interaction with ptrace. I fear
> > > this is going to be very fragile in practice, and that extending that
> > > support in future will require much more complexity than is currently
> > > apparent, with potentially invasive changes to arch code.
> >
> > Right, this is the main problem I see, too. I had not even realized
> > that this will have to tie in with user space threads in some form, but
> > you are right that once this has to interact with the CPU scheduler,
> > it all breaks down.
>
> This observation is right. I thought adding context switch etc. support for
> implementation defined registers requires core arch code changes and
> it is far less acceptable. So, I tried to confine code change in a module with
> these restrictions.

My feeling is that having the code separate from where it would belong
in an operating system that was designed specifically for this feature
ends up being no better than rewriting the core scheduling code.

As Mark said, it may well be that neither approach would be sufficient
for an upstream merge. On the other hand, keeping the code in a
separate loadable module does make most sense if we end up
not merging it at all, in which case this is the easiest to port
between kernel versions.

> Regarding direct access from EL0, it is necessary for realizing fast synchronization
> as this enables synchronization logic in user application check if all threads have
> reached at synchronization point without switching to kernel.

Ok, I see.

> Also, It is common usage that each running thread is bound to one PE in
> multi-threaded HPC applications.

I think the expectation that all threads are bound to a physical CPU
makes sense for using this feature, but I think it would be necessary
to enforce that, e.g. by allowing only threads to enable it after they
are isolated to a non-shared CPU, and automatically disabling it
if the CPU isolation is changed.

For the user space interface, something based on process IDs
seems to make more sense to me than something based on CPU
numbers. All of the above does require some level of integration
with the core kernel of course.

I think the next step would be to try to come up with a high-level
user interface design that has a chance to get merged, rather than
addressing the review comments for the current implementation.

Aside from the user interface question, it would be good to
understand the performance impact of the feature.
As I understand it, the entire purpose is to make things faster, so
to put it in perspective compared to the burden of adding an
interface, there should be some numbers: What are the kinds of
applications that would use it in practice, and how much faster are
they compared to not having it?

       Arnd

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-12 14:22         ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-01-12 14:22 UTC (permalink / raw)
  To: misono.tomohiro
  Cc: Mark Rutland, Arnd Bergmann, Catalin Marinas, SoC Team,
	Olof Johansson, Will Deacon, Linux ARM

On Tue, Jan 12, 2021 at 11:24 AM misono.tomohiro@fujitsu.com
<misono.tomohiro@fujitsu.com> wrote:
> > On Fri, Jan 8, 2021 at 1:54 PM Mark Rutland <mark.rutland@arm.com> wrote:
> However, I don't know any other processors having similar
> features at this point and it is hard to provide common abstraction interface.
> I would appreciate should anyone have any information.

The specification you pointed to mentions the SPARC64 XIfx, so
at a minimum, a user interface should be designed to also work on
whatever register-level interface that provides.

> > > Secondly, the intended usage model appears to expose this to EL0 for
> > > direct access, and the code seems to depend on threads being pinned, but
> > > AFAICT this is not enforced and there is no provision for
> > > context-switch, thread migration, or interaction with ptrace. I fear
> > > this is going to be very fragile in practice, and that extending that
> > > support in future will require much more complexity than is currently
> > > apparent, with potentially invasive changes to arch code.
> >
> > Right, this is the main problem I see, too. I had not even realized
> > that this will have to tie in with user space threads in some form, but
> > you are right that once this has to interact with the CPU scheduler,
> > it all breaks down.
>
> This observation is right. I thought adding context switch etc. support for
> implementation defined registers requires core arch code changes and
> it is far less acceptable. So, I tried to confine code change in a module with
> these restrictions.

My feeling is that having the code separate from where it would belong
in an operating system that was designed specifically for this feature
ends up being no better than rewriting the core scheduling code.

As Mark said, it may well be that neither approach would be sufficient
for an upstream merge. On the other hand, keeping the code in a
separate loadable module does make most sense if we end up
not merging it at all, in which case this is the easiest to port
between kernel versions.

> Regarding direct access from EL0, it is necessary for realizing fast synchronization
> as this enables synchronization logic in user application check if all threads have
> reached at synchronization point without switching to kernel.

Ok, I see.

> Also, It is common usage that each running thread is bound to one PE in
> multi-threaded HPC applications.

I think the expectation that all threads are bound to a physical CPU
makes sense for using this feature, but I think it would be necessary
to enforce that, e.g. by allowing only threads to enable it after they
are isolated to a non-shared CPU, and automatically disabling it
if the CPU isolation is changed.

For the user space interface, something based on process IDs
seems to make more sense to me than something based on CPU
numbers. All of the above does require some level of integration
with the core kernel of course.

I think the next step would be to try to come up with a high-level
user interface design that has a chance to get merged, rather than
addressing the review comments for the current implementation.

Aside from the user interface question, it would be good to
understand the performance impact of the feature.
As I understand it, the entire purpose is to make things faster, so
to put it in perspective compared to the burden of adding an
interface, there should be some numbers: What are the kinds of
applications that would use it in practice, and how much faster are
they compared to not having it?

       Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-15 11:10           ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-01-15 11:10 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Mark Rutland, Arnd Bergmann, Catalin Marinas, SoC Team,
	Olof Johansson, Will Deacon, Linux ARM

> On Tue, Jan 12, 2021 at 11:24 AM misono.tomohiro@fujitsu.com
> <misono.tomohiro@fujitsu.com> wrote:
> > > On Fri, Jan 8, 2021 at 1:54 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > However, I don't know any other processors having similar
> > features at this point and it is hard to provide common abstraction interface.
> > I would appreciate should anyone have any information.
> 
> The specification you pointed to mentions the SPARC64 XIfx, so
> at a minimum, a user interface should be designed to also work on
> whatever register-level interface that provides.

Those our previous CPUs have hardware barrier function too, but they are
not currently used (I believe the hardware design shares common idea and 
this driver logic/ioctl interface could be applicable to both).

> > > > Secondly, the intended usage model appears to expose this to EL0 for
> > > > direct access, and the code seems to depend on threads being pinned, but
> > > > AFAICT this is not enforced and there is no provision for
> > > > context-switch, thread migration, or interaction with ptrace. I fear
> > > > this is going to be very fragile in practice, and that extending that
> > > > support in future will require much more complexity than is currently
> > > > apparent, with potentially invasive changes to arch code.
> > >
> > > Right, this is the main problem I see, too. I had not even realized
> > > that this will have to tie in with user space threads in some form, but
> > > you are right that once this has to interact with the CPU scheduler,
> > > it all breaks down.
> >
> > This observation is right. I thought adding context switch etc. support for
> > implementation defined registers requires core arch code changes and
> > it is far less acceptable. So, I tried to confine code change in a module with
> > these restrictions.
> 
> My feeling is that having the code separate from where it would belong
> in an operating system that was designed specifically for this feature
> ends up being no better than rewriting the core scheduling code.
> 
> As Mark said, it may well be that neither approach would be sufficient
> for an upstream merge. On the other hand, keeping the code in a
> separate loadable module does make most sense if we end up
> not merging it at all, in which case this is the easiest to port
> between kernel versions.
> 
> > Regarding direct access from EL0, it is necessary for realizing fast synchronization
> > as this enables synchronization logic in user application check if all threads have
> > reached at synchronization point without switching to kernel.
> 
> Ok, I see.
> 
> > Also, It is common usage that each running thread is bound to one PE in
> > multi-threaded HPC applications.
> 
> I think the expectation that all threads are bound to a physical CPU
> makes sense for using this feature, but I think it would be necessary
> to enforce that, e.g. by allowing only threads to enable it after they
> are isolated to a non-shared CPU, and automatically disabling it
> if the CPU isolation is changed.
> 
> For the user space interface, something based on process IDs
> seems to make more sense to me than something based on CPU
> numbers. All of the above does require some level of integration
> with the core kernel of course.
> 
> I think the next step would be to try to come up with a high-level
> user interface design that has a chance to get merged, rather than
> addressing the review comments for the current implementation.

Understood. One question is that high-level interface such as process
based control could solve several problems (i.e. access control/force binding),
I cannot eliminate access to IMP-DEF registers from EL0 as I exaplained
above. Is it acceptable in your sense?

> Aside from the user interface question, it would be good to
> understand the performance impact of the feature.
> As I understand it, the entire purpose is to make things faster, so
> to put it in perspective compared to the burden of adding an
> interface, there should be some numbers: What are the kinds of
> applications that would use it in practice, and how much faster are
> they compared to not having it?

Microbenchmark shows it takes around 250ns for 1 synchronization for
12 PEs with hardware barrier and it is multiple times faster than software
barrier (only measuring core synchronization logic and excluding setup time).
I don't have application results at this point and will share when I could get some.

Regards,
Tomohiro

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-15 11:10           ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-01-15 11:10 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Mark Rutland, Arnd Bergmann, Catalin Marinas, SoC Team,
	Olof Johansson, Will Deacon, Linux ARM

> On Tue, Jan 12, 2021 at 11:24 AM misono.tomohiro@fujitsu.com
> <misono.tomohiro@fujitsu.com> wrote:
> > > On Fri, Jan 8, 2021 at 1:54 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > However, I don't know any other processors having similar
> > features at this point and it is hard to provide common abstraction interface.
> > I would appreciate should anyone have any information.
> 
> The specification you pointed to mentions the SPARC64 XIfx, so
> at a minimum, a user interface should be designed to also work on
> whatever register-level interface that provides.

Those our previous CPUs have hardware barrier function too, but they are
not currently used (I believe the hardware design shares common idea and 
this driver logic/ioctl interface could be applicable to both).

> > > > Secondly, the intended usage model appears to expose this to EL0 for
> > > > direct access, and the code seems to depend on threads being pinned, but
> > > > AFAICT this is not enforced and there is no provision for
> > > > context-switch, thread migration, or interaction with ptrace. I fear
> > > > this is going to be very fragile in practice, and that extending that
> > > > support in future will require much more complexity than is currently
> > > > apparent, with potentially invasive changes to arch code.
> > >
> > > Right, this is the main problem I see, too. I had not even realized
> > > that this will have to tie in with user space threads in some form, but
> > > you are right that once this has to interact with the CPU scheduler,
> > > it all breaks down.
> >
> > This observation is right. I thought adding context switch etc. support for
> > implementation defined registers requires core arch code changes and
> > it is far less acceptable. So, I tried to confine code change in a module with
> > these restrictions.
> 
> My feeling is that having the code separate from where it would belong
> in an operating system that was designed specifically for this feature
> ends up being no better than rewriting the core scheduling code.
> 
> As Mark said, it may well be that neither approach would be sufficient
> for an upstream merge. On the other hand, keeping the code in a
> separate loadable module does make most sense if we end up
> not merging it at all, in which case this is the easiest to port
> between kernel versions.
> 
> > Regarding direct access from EL0, it is necessary for realizing fast synchronization
> > as this enables synchronization logic in user application check if all threads have
> > reached at synchronization point without switching to kernel.
> 
> Ok, I see.
> 
> > Also, It is common usage that each running thread is bound to one PE in
> > multi-threaded HPC applications.
> 
> I think the expectation that all threads are bound to a physical CPU
> makes sense for using this feature, but I think it would be necessary
> to enforce that, e.g. by allowing only threads to enable it after they
> are isolated to a non-shared CPU, and automatically disabling it
> if the CPU isolation is changed.
> 
> For the user space interface, something based on process IDs
> seems to make more sense to me than something based on CPU
> numbers. All of the above does require some level of integration
> with the core kernel of course.
> 
> I think the next step would be to try to come up with a high-level
> user interface design that has a chance to get merged, rather than
> addressing the review comments for the current implementation.

Understood. One question is that high-level interface such as process
based control could solve several problems (i.e. access control/force binding),
I cannot eliminate access to IMP-DEF registers from EL0 as I exaplained
above. Is it acceptable in your sense?

> Aside from the user interface question, it would be good to
> understand the performance impact of the feature.
> As I understand it, the entire purpose is to make things faster, so
> to put it in perspective compared to the burden of adding an
> interface, there should be some numbers: What are the kinds of
> applications that would use it in practice, and how much faster are
> they compared to not having it?

Microbenchmark shows it takes around 250ns for 1 synchronization for
12 PEs with hardware barrier and it is multiple times faster than software
barrier (only measuring core synchronization logic and excluding setup time).
I don't have application results at this point and will share when I could get some.

Regards,
Tomohiro
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-15 12:24             ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-01-15 12:24 UTC (permalink / raw)
  To: misono.tomohiro
  Cc: Mark Rutland, Arnd Bergmann, Catalin Marinas, SoC Team,
	Olof Johansson, Will Deacon, Linux ARM

On Fri, Jan 15, 2021 at 12:10 PM misono.tomohiro@fujitsu.com
<misono.tomohiro@fujitsu.com> wrote:
> > On Tue, Jan 12, 2021 at 11:24 AM misono.tomohiro@fujitsu.com <misono.tomohiro@fujitsu.com> wrote:

> > > Also, It is common usage that each running thread is bound to one PE in
> > > multi-threaded HPC applications.
> >
> > I think the expectation that all threads are bound to a physical CPU
> > makes sense for using this feature, but I think it would be necessary
> > to enforce that, e.g. by allowing only threads to enable it after they
> > are isolated to a non-shared CPU, and automatically disabling it
> > if the CPU isolation is changed.
> >
> > For the user space interface, something based on process IDs
> > seems to make more sense to me than something based on CPU
> > numbers. All of the above does require some level of integration
> > with the core kernel of course.
> >
> > I think the next step would be to try to come up with a high-level
> > user interface design that has a chance to get merged, rather than
> > addressing the review comments for the current implementation.
>
> Understood. One question is that high-level interface such as process
> based control could solve several problems (i.e. access control/force binding),
> I cannot eliminate access to IMP-DEF registers from EL0 as I explained
> above. Is it acceptable in your sense?

I think you will get different answers for that depending on who you ask ;-)

I'm generally ok with it, given that it will only affect a very small
number of specialized applications that are already built for
a specific microarchitecture for performance reasons. E.g. when
using an arm64 BLAS library, you would use different versions
of the same functions depending on CPU support for NEON,
SVE, SVE2, Apple AMX (which also uses imp-def instructions),
ARMv8.6 GEMM extensions, and likely a hand-optimized
version for the A64FX pipeline. Having a version for A64FX with
hardware barriers adds (at most) one more code path but hopefully
does not add complexity to the common code.

> > Aside from the user interface question, it would be good to
> > understand the performance impact of the feature.
> > As I understand it, the entire purpose is to make things faster, so
> > to put it in perspective compared to the burden of adding an
> > interface, there should be some numbers: What are the kinds of
> > applications that would use it in practice, and how much faster are
> > they compared to not having it?
>
> Microbenchmark shows it takes around 250ns for 1 synchronization for
> 12 PEs with hardware barrier and it is multiple times faster than software
> barrier (only measuring core synchronization logic and excluding setup time).
> I don't have application results at this point and will share when I could get some.

Thanks. That will be helpful indeed. Please also include information
about what you are comparing against for the software barrier. E.g.
Is that based on a futex() system call, or completely implemented
in user space?

      Arnd

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-15 12:24             ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-01-15 12:24 UTC (permalink / raw)
  To: misono.tomohiro
  Cc: Mark Rutland, Arnd Bergmann, Catalin Marinas, SoC Team,
	Olof Johansson, Will Deacon, Linux ARM

On Fri, Jan 15, 2021 at 12:10 PM misono.tomohiro@fujitsu.com
<misono.tomohiro@fujitsu.com> wrote:
> > On Tue, Jan 12, 2021 at 11:24 AM misono.tomohiro@fujitsu.com <misono.tomohiro@fujitsu.com> wrote:

> > > Also, It is common usage that each running thread is bound to one PE in
> > > multi-threaded HPC applications.
> >
> > I think the expectation that all threads are bound to a physical CPU
> > makes sense for using this feature, but I think it would be necessary
> > to enforce that, e.g. by allowing only threads to enable it after they
> > are isolated to a non-shared CPU, and automatically disabling it
> > if the CPU isolation is changed.
> >
> > For the user space interface, something based on process IDs
> > seems to make more sense to me than something based on CPU
> > numbers. All of the above does require some level of integration
> > with the core kernel of course.
> >
> > I think the next step would be to try to come up with a high-level
> > user interface design that has a chance to get merged, rather than
> > addressing the review comments for the current implementation.
>
> Understood. One question is that high-level interface such as process
> based control could solve several problems (i.e. access control/force binding),
> I cannot eliminate access to IMP-DEF registers from EL0 as I explained
> above. Is it acceptable in your sense?

I think you will get different answers for that depending on who you ask ;-)

I'm generally ok with it, given that it will only affect a very small
number of specialized applications that are already built for
a specific microarchitecture for performance reasons. E.g. when
using an arm64 BLAS library, you would use different versions
of the same functions depending on CPU support for NEON,
SVE, SVE2, Apple AMX (which also uses imp-def instructions),
ARMv8.6 GEMM extensions, and likely a hand-optimized
version for the A64FX pipeline. Having a version for A64FX with
hardware barriers adds (at most) one more code path but hopefully
does not add complexity to the common code.

> > Aside from the user interface question, it would be good to
> > understand the performance impact of the feature.
> > As I understand it, the entire purpose is to make things faster, so
> > to put it in perspective compared to the burden of adding an
> > interface, there should be some numbers: What are the kinds of
> > applications that would use it in practice, and how much faster are
> > they compared to not having it?
>
> Microbenchmark shows it takes around 250ns for 1 synchronization for
> 12 PEs with hardware barrier and it is multiple times faster than software
> barrier (only measuring core synchronization logic and excluding setup time).
> I don't have application results at this point and will share when I could get some.

Thanks. That will be helpful indeed. Please also include information
about what you are comparing against for the software barrier. E.g.
Is that based on a futex() system call, or completely implemented
in user space?

      Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-19  5:30               ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-01-19  5:30 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Mark Rutland, Arnd Bergmann, Catalin Marinas, SoC Team,
	Olof Johansson, Will Deacon, Linux ARM

> > > > Also, It is common usage that each running thread is bound to one PE in
> > > > multi-threaded HPC applications.
> > >
> > > I think the expectation that all threads are bound to a physical CPU
> > > makes sense for using this feature, but I think it would be necessary
> > > to enforce that, e.g. by allowing only threads to enable it after they
> > > are isolated to a non-shared CPU, and automatically disabling it
> > > if the CPU isolation is changed.
> > >
> > > For the user space interface, something based on process IDs
> > > seems to make more sense to me than something based on CPU
> > > numbers. All of the above does require some level of integration
> > > with the core kernel of course.
> > >
> > > I think the next step would be to try to come up with a high-level
> > > user interface design that has a chance to get merged, rather than
> > > addressing the review comments for the current implementation.
> >
> > Understood. One question is that high-level interface such as process
> > based control could solve several problems (i.e. access control/force binding),
> > I cannot eliminate access to IMP-DEF registers from EL0 as I explained
> > above. Is it acceptable in your sense?
> 
> I think you will get different answers for that depending on who you ask ;-)
> 
> I'm generally ok with it, given that it will only affect a very small
> number of specialized applications that are already built for
> a specific microarchitecture for performance reasons. E.g. when
> using an arm64 BLAS library, you would use different versions
> of the same functions depending on CPU support for NEON,
> SVE, SVE2, Apple AMX (which also uses imp-def instructions),
> ARMv8.6 GEMM extensions, and likely a hand-optimized
> version for the A64FX pipeline. Having a version for A64FX with
> hardware barriers adds (at most) one more code path but hopefully
> does not add complexity to the common code.

Thanks. Btw, to be precise, A64FX doesn't use imp-def instructions.
It provides imp-def registers which can be accessed by system
register access instructions (msr/mrs).

> > > Aside from the user interface question, it would be good to
> > > understand the performance impact of the feature.
> > > As I understand it, the entire purpose is to make things faster, so
> > > to put it in perspective compared to the burden of adding an
> > > interface, there should be some numbers: What are the kinds of
> > > applications that would use it in practice, and how much faster are
> > > they compared to not having it?
> >
> > Microbenchmark shows it takes around 250ns for 1 synchronization for
> > 12 PEs with hardware barrier and it is multiple times faster than software
> > barrier (only measuring core synchronization logic and excluding setup time).
> > I don't have application results at this point and will share when I could get some.
> 
> Thanks. That will be helpful indeed. Please also include information
> about what you are comparing against for the software barrier. E.g.
> Is that based on a futex() system call, or completely implemented
> in user space?

It completely implemented in user space by using shared variables
without system call.
(As all PEs to be synced shares L3, it should cause to access to L3.)

Regards,
Tomohiro

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-01-19  5:30               ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-01-19  5:30 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Mark Rutland, Arnd Bergmann, Catalin Marinas, SoC Team,
	Olof Johansson, Will Deacon, Linux ARM

> > > > Also, It is common usage that each running thread is bound to one PE in
> > > > multi-threaded HPC applications.
> > >
> > > I think the expectation that all threads are bound to a physical CPU
> > > makes sense for using this feature, but I think it would be necessary
> > > to enforce that, e.g. by allowing only threads to enable it after they
> > > are isolated to a non-shared CPU, and automatically disabling it
> > > if the CPU isolation is changed.
> > >
> > > For the user space interface, something based on process IDs
> > > seems to make more sense to me than something based on CPU
> > > numbers. All of the above does require some level of integration
> > > with the core kernel of course.
> > >
> > > I think the next step would be to try to come up with a high-level
> > > user interface design that has a chance to get merged, rather than
> > > addressing the review comments for the current implementation.
> >
> > Understood. One question is that high-level interface such as process
> > based control could solve several problems (i.e. access control/force binding),
> > I cannot eliminate access to IMP-DEF registers from EL0 as I explained
> > above. Is it acceptable in your sense?
> 
> I think you will get different answers for that depending on who you ask ;-)
> 
> I'm generally ok with it, given that it will only affect a very small
> number of specialized applications that are already built for
> a specific microarchitecture for performance reasons. E.g. when
> using an arm64 BLAS library, you would use different versions
> of the same functions depending on CPU support for NEON,
> SVE, SVE2, Apple AMX (which also uses imp-def instructions),
> ARMv8.6 GEMM extensions, and likely a hand-optimized
> version for the A64FX pipeline. Having a version for A64FX with
> hardware barriers adds (at most) one more code path but hopefully
> does not add complexity to the common code.

Thanks. Btw, to be precise, A64FX doesn't use imp-def instructions.
It provides imp-def registers which can be accessed by system
register access instructions (msr/mrs).

> > > Aside from the user interface question, it would be good to
> > > understand the performance impact of the feature.
> > > As I understand it, the entire purpose is to make things faster, so
> > > to put it in perspective compared to the burden of adding an
> > > interface, there should be some numbers: What are the kinds of
> > > applications that would use it in practice, and how much faster are
> > > they compared to not having it?
> >
> > Microbenchmark shows it takes around 250ns for 1 synchronization for
> > 12 PEs with hardware barrier and it is multiple times faster than software
> > barrier (only measuring core synchronization logic and excluding setup time).
> > I don't have application results at this point and will share when I could get some.
> 
> Thanks. That will be helpful indeed. Please also include information
> about what you are comparing against for the software barrier. E.g.
> Is that based on a futex() system call, or completely implemented
> in user space?

It completely implemented in user space by using shared variables
without system call.
(As all PEs to be synced shares L3, it should cause to access to L3.)

Regards,
Tomohiro
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-02-18  9:49               ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-02-18  9:49 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Mark Rutland, Arnd Bergmann, Catalin Marinas, SoC Team,
	Olof Johansson, Will Deacon, Linux ARM

> > > > Also, It is common usage that each running thread is bound to one PE in
> > > > multi-threaded HPC applications.
> > >
> > > I think the expectation that all threads are bound to a physical CPU
> > > makes sense for using this feature, but I think it would be necessary
> > > to enforce that, e.g. by allowing only threads to enable it after they
> > > are isolated to a non-shared CPU, and automatically disabling it
> > > if the CPU isolation is changed.
> > >
> > > For the user space interface, something based on process IDs
> > > seems to make more sense to me than something based on CPU
> > > numbers. All of the above does require some level of integration
> > > with the core kernel of course.
> > >
> > > I think the next step would be to try to come up with a high-level
> > > user interface design that has a chance to get merged, rather than
> > > addressing the review comments for the current implementation.

Hello,

Sorry for late response but while thinking new approaches, I come up with
some different idea and want to hear your opinions. How about offload
all control to user space while the driver just offers read/write access
to the needed registers? Let me explain in detail. 

Although I searched similar functions in other products, I could not find
it. Also, this hardware barrier performs intra-numa synchronization and
it is hard to be used for general inter-process barrier. So I think
generalizing this feature in kernel does not go well.

As I said this is mainly for HPC application. In the usual situations, the
user has full control of the PC nodes when running HPC application and
thus the user has full responsibility of running processes on the machine.
Offloading all controls to these registers to the user is acceptable in that
case (i.e. the driver just offers access to the registers and does not control it). 
This is the safe for the kernel operation as manipulating barrier related
registers just affects user application.

In this approach we could remove ioctls or control logic in the driver but
we need some way to access the needed registers. I firstly think if I can
use x86's MSR driver like approach but I know the idea is rejected
recently for security concerns:
 https://lore.kernel.org/linux-arm-kernel/20201130174833.41315-1-rongwei.wang@linux.alibaba.com/ 

Based on these observations, I have two ideas currently: 
 1) make the driver to only expose sysfs interface for reading/writing
   A64FX's barrier registers  
or 
 2) generalizing (1) in some way; To make some mechanism to expose 
   CPU defined registers which can be safely accessed from user space 

Are these idea acceptable ways to explore to get merged in upstream? 
I'd appreciate any criticism/comments. 

Regards, 
Tomohiro

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-02-18  9:49               ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-02-18  9:49 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Mark Rutland, Arnd Bergmann, Catalin Marinas, SoC Team,
	Olof Johansson, Will Deacon, Linux ARM

> > > > Also, It is common usage that each running thread is bound to one PE in
> > > > multi-threaded HPC applications.
> > >
> > > I think the expectation that all threads are bound to a physical CPU
> > > makes sense for using this feature, but I think it would be necessary
> > > to enforce that, e.g. by allowing only threads to enable it after they
> > > are isolated to a non-shared CPU, and automatically disabling it
> > > if the CPU isolation is changed.
> > >
> > > For the user space interface, something based on process IDs
> > > seems to make more sense to me than something based on CPU
> > > numbers. All of the above does require some level of integration
> > > with the core kernel of course.
> > >
> > > I think the next step would be to try to come up with a high-level
> > > user interface design that has a chance to get merged, rather than
> > > addressing the review comments for the current implementation.

Hello,

Sorry for late response but while thinking new approaches, I come up with
some different idea and want to hear your opinions. How about offload
all control to user space while the driver just offers read/write access
to the needed registers? Let me explain in detail. 

Although I searched similar functions in other products, I could not find
it. Also, this hardware barrier performs intra-numa synchronization and
it is hard to be used for general inter-process barrier. So I think
generalizing this feature in kernel does not go well.

As I said this is mainly for HPC application. In the usual situations, the
user has full control of the PC nodes when running HPC application and
thus the user has full responsibility of running processes on the machine.
Offloading all controls to these registers to the user is acceptable in that
case (i.e. the driver just offers access to the registers and does not control it). 
This is the safe for the kernel operation as manipulating barrier related
registers just affects user application.

In this approach we could remove ioctls or control logic in the driver but
we need some way to access the needed registers. I firstly think if I can
use x86's MSR driver like approach but I know the idea is rejected
recently for security concerns:
 https://lore.kernel.org/linux-arm-kernel/20201130174833.41315-1-rongwei.wang@linux.alibaba.com/ 

Based on these observations, I have two ideas currently: 
 1) make the driver to only expose sysfs interface for reading/writing
   A64FX's barrier registers  
or 
 2) generalizing (1) in some way; To make some mechanism to expose 
   CPU defined registers which can be safely accessed from user space 

Are these idea acceptable ways to explore to get merged in upstream? 
I'd appreciate any criticism/comments. 

Regards, 
Tomohiro
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-03-01  7:53                 ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-03-01  7:53 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Mark Rutland, Arnd Bergmann, Catalin Marinas, SoC Team,
	Olof Johansson, Will Deacon, Linux ARM

Hi,

Gentle Ping?
Tomohiro

> Subject: RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
> 
> > > > > Also, It is common usage that each running thread is bound to one PE in
> > > > > multi-threaded HPC applications.
> > > >
> > > > I think the expectation that all threads are bound to a physical CPU
> > > > makes sense for using this feature, but I think it would be necessary
> > > > to enforce that, e.g. by allowing only threads to enable it after they
> > > > are isolated to a non-shared CPU, and automatically disabling it
> > > > if the CPU isolation is changed.
> > > >
> > > > For the user space interface, something based on process IDs
> > > > seems to make more sense to me than something based on CPU
> > > > numbers. All of the above does require some level of integration
> > > > with the core kernel of course.
> > > >
> > > > I think the next step would be to try to come up with a high-level
> > > > user interface design that has a chance to get merged, rather than
> > > > addressing the review comments for the current implementation.
> 
> Hello,
> 
> Sorry for late response but while thinking new approaches, I come up with
> some different idea and want to hear your opinions. How about offload
> all control to user space while the driver just offers read/write access
> to the needed registers? Let me explain in detail.
> 
> Although I searched similar functions in other products, I could not find
> it. Also, this hardware barrier performs intra-numa synchronization and
> it is hard to be used for general inter-process barrier. So I think
> generalizing this feature in kernel does not go well.
> 
> As I said this is mainly for HPC application. In the usual situations, the
> user has full control of the PC nodes when running HPC application and
> thus the user has full responsibility of running processes on the machine.
> Offloading all controls to these registers to the user is acceptable in that
> case (i.e. the driver just offers access to the registers and does not control it).
> This is the safe for the kernel operation as manipulating barrier related
> registers just affects user application.
> 
> In this approach we could remove ioctls or control logic in the driver but
> we need some way to access the needed registers. I firstly think if I can
> use x86's MSR driver like approach but I know the idea is rejected
> recently for security concerns:
>  https://lore.kernel.org/linux-arm-kernel/20201130174833.41315-1-rongwei.wang@linux.alibaba.com/
> 
> Based on these observations, I have two ideas currently:
>  1) make the driver to only expose sysfs interface for reading/writing
>    A64FX's barrier registers
> or
>  2) generalizing (1) in some way; To make some mechanism to expose
>    CPU defined registers which can be safely accessed from user space
> 
> Are these idea acceptable ways to explore to get merged in upstream?
> I'd appreciate any criticism/comments.
> 
> Regards,
> Tomohiro


^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-03-01  7:53                 ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-03-01  7:53 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Mark Rutland, Arnd Bergmann, Catalin Marinas, SoC Team,
	Olof Johansson, Will Deacon, Linux ARM

Hi,

Gentle Ping?
Tomohiro

> Subject: RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
> 
> > > > > Also, It is common usage that each running thread is bound to one PE in
> > > > > multi-threaded HPC applications.
> > > >
> > > > I think the expectation that all threads are bound to a physical CPU
> > > > makes sense for using this feature, but I think it would be necessary
> > > > to enforce that, e.g. by allowing only threads to enable it after they
> > > > are isolated to a non-shared CPU, and automatically disabling it
> > > > if the CPU isolation is changed.
> > > >
> > > > For the user space interface, something based on process IDs
> > > > seems to make more sense to me than something based on CPU
> > > > numbers. All of the above does require some level of integration
> > > > with the core kernel of course.
> > > >
> > > > I think the next step would be to try to come up with a high-level
> > > > user interface design that has a chance to get merged, rather than
> > > > addressing the review comments for the current implementation.
> 
> Hello,
> 
> Sorry for late response but while thinking new approaches, I come up with
> some different idea and want to hear your opinions. How about offload
> all control to user space while the driver just offers read/write access
> to the needed registers? Let me explain in detail.
> 
> Although I searched similar functions in other products, I could not find
> it. Also, this hardware barrier performs intra-numa synchronization and
> it is hard to be used for general inter-process barrier. So I think
> generalizing this feature in kernel does not go well.
> 
> As I said this is mainly for HPC application. In the usual situations, the
> user has full control of the PC nodes when running HPC application and
> thus the user has full responsibility of running processes on the machine.
> Offloading all controls to these registers to the user is acceptable in that
> case (i.e. the driver just offers access to the registers and does not control it).
> This is the safe for the kernel operation as manipulating barrier related
> registers just affects user application.
> 
> In this approach we could remove ioctls or control logic in the driver but
> we need some way to access the needed registers. I firstly think if I can
> use x86's MSR driver like approach but I know the idea is rejected
> recently for security concerns:
>  https://lore.kernel.org/linux-arm-kernel/20201130174833.41315-1-rongwei.wang@linux.alibaba.com/
> 
> Based on these observations, I have two ideas currently:
>  1) make the driver to only expose sysfs interface for reading/writing
>    A64FX's barrier registers
> or
>  2) generalizing (1) in some way; To make some mechanism to expose
>    CPU defined registers which can be safely accessed from user space
> 
> Are these idea acceptable ways to explore to get merged in upstream?
> I'd appreciate any criticism/comments.
> 
> Regards,
> Tomohiro


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-03-02 11:06                 ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-03-02 11:06 UTC (permalink / raw)
  To: misono.tomohiro
  Cc: Mark Rutland, Catalin Marinas, SoC Team, Olof Johansson,
	Will Deacon, Linux ARM

On Thu, Feb 18, 2021 at 10:49 AM misono.tomohiro@fujitsu.com
<misono.tomohiro@fujitsu.com> wrote:
>
> > > > > Also, It is common usage that each running thread is bound to one PE in
> > > > > multi-threaded HPC applications.
> > > >
> > > > I think the expectation that all threads are bound to a physical CPU
> > > > makes sense for using this feature, but I think it would be necessary
> > > > to enforce that, e.g. by allowing only threads to enable it after they
> > > > are isolated to a non-shared CPU, and automatically disabling it
> > > > if the CPU isolation is changed.
> > > >
> > > > For the user space interface, something based on process IDs
> > > > seems to make more sense to me than something based on CPU
> > > > numbers. All of the above does require some level of integration
> > > > with the core kernel of course.
> > > >
> > > > I think the next step would be to try to come up with a high-level
> > > > user interface design that has a chance to get merged, rather than
> > > > addressing the review comments for the current implementation.
>
> Hello,
>
> Sorry for late response but while thinking new approaches, I come up with
> some different idea and want to hear your opinions. How about offload
> all control to user space while the driver just offers read/write access
> to the needed registers? Let me explain in detail.
>
> Although I searched similar functions in other products, I could not find
> it. Also, this hardware barrier performs intra-numa synchronization and
> it is hard to be used for general inter-process barrier. So I think
> generalizing this feature in kernel does not go well.

Ok, thank you for taking a look.

> As I said this is mainly for HPC application. In the usual situations, the
> user has full control of the PC nodes when running HPC application and
> thus the user has full responsibility of running processes on the machine.
> Offloading all controls to these registers to the user is acceptable in that
> case (i.e. the driver just offers access to the registers and does not control it).
> This is the safe for the kernel operation as manipulating barrier related
> registers just affects user application.
>
> In this approach we could remove ioctls or control logic in the driver but
> we need some way to access the needed registers. I firstly think if I can
> use x86's MSR driver like approach but I know the idea is rejected
> recently for security concerns:
>  https://lore.kernel.org/linux-arm-kernel/20201130174833.41315-1-rongwei.wang@linux.alibaba.com/
>
> Based on these observations, I have two ideas currently:
>  1) make the driver to only expose sysfs interface for reading/writing
>    A64FX's barrier registers
> or
>  2) generalizing (1) in some way; To make some mechanism to expose
>    CPU defined registers which can be safely accessed from user space
>
> Are these idea acceptable ways to explore to get merged in upstream?
> I'd appreciate any criticism/comments.

I'm also running out of ideas here. I don't think a sysfs interface would
be any different to your earlier ioctl interface or the the /dev/msr approach,
they all share the same problem that they expose low-level access to
platform specific registers in a way that is neither portable nor safe to
use for general-purpose applications outside the very narrow scope
of running highly optimized HPC applications.

You can of course continue using the module you have as an external
module that gets built outside of the kernel itself, and shipped along
with the application or library using it, rather than with the kernel.
Obviously this is not something I would generally recommend either,
but this may be the last resort to fall back to when everything else
fails.

      Arnd

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-03-02 11:06                 ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-03-02 11:06 UTC (permalink / raw)
  To: misono.tomohiro
  Cc: Mark Rutland, Catalin Marinas, SoC Team, Olof Johansson,
	Will Deacon, Linux ARM

On Thu, Feb 18, 2021 at 10:49 AM misono.tomohiro@fujitsu.com
<misono.tomohiro@fujitsu.com> wrote:
>
> > > > > Also, It is common usage that each running thread is bound to one PE in
> > > > > multi-threaded HPC applications.
> > > >
> > > > I think the expectation that all threads are bound to a physical CPU
> > > > makes sense for using this feature, but I think it would be necessary
> > > > to enforce that, e.g. by allowing only threads to enable it after they
> > > > are isolated to a non-shared CPU, and automatically disabling it
> > > > if the CPU isolation is changed.
> > > >
> > > > For the user space interface, something based on process IDs
> > > > seems to make more sense to me than something based on CPU
> > > > numbers. All of the above does require some level of integration
> > > > with the core kernel of course.
> > > >
> > > > I think the next step would be to try to come up with a high-level
> > > > user interface design that has a chance to get merged, rather than
> > > > addressing the review comments for the current implementation.
>
> Hello,
>
> Sorry for late response but while thinking new approaches, I come up with
> some different idea and want to hear your opinions. How about offload
> all control to user space while the driver just offers read/write access
> to the needed registers? Let me explain in detail.
>
> Although I searched similar functions in other products, I could not find
> it. Also, this hardware barrier performs intra-numa synchronization and
> it is hard to be used for general inter-process barrier. So I think
> generalizing this feature in kernel does not go well.

Ok, thank you for taking a look.

> As I said this is mainly for HPC application. In the usual situations, the
> user has full control of the PC nodes when running HPC application and
> thus the user has full responsibility of running processes on the machine.
> Offloading all controls to these registers to the user is acceptable in that
> case (i.e. the driver just offers access to the registers and does not control it).
> This is the safe for the kernel operation as manipulating barrier related
> registers just affects user application.
>
> In this approach we could remove ioctls or control logic in the driver but
> we need some way to access the needed registers. I firstly think if I can
> use x86's MSR driver like approach but I know the idea is rejected
> recently for security concerns:
>  https://lore.kernel.org/linux-arm-kernel/20201130174833.41315-1-rongwei.wang@linux.alibaba.com/
>
> Based on these observations, I have two ideas currently:
>  1) make the driver to only expose sysfs interface for reading/writing
>    A64FX's barrier registers
> or
>  2) generalizing (1) in some way; To make some mechanism to expose
>    CPU defined registers which can be safely accessed from user space
>
> Are these idea acceptable ways to explore to get merged in upstream?
> I'd appreciate any criticism/comments.

I'm also running out of ideas here. I don't think a sysfs interface would
be any different to your earlier ioctl interface or the the /dev/msr approach,
they all share the same problem that they expose low-level access to
platform specific registers in a way that is neither portable nor safe to
use for general-purpose applications outside the very narrow scope
of running highly optimized HPC applications.

You can of course continue using the module you have as an external
module that gets built outside of the kernel itself, and shipped along
with the application or library using it, rather than with the kernel.
Obviously this is not something I would generally recommend either,
but this may be the last resort to fall back to when everything else
fails.

      Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-03-03 11:20                   ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-03-03 11:20 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Mark Rutland, Catalin Marinas, SoC Team, Olof Johansson,
	Will Deacon, Linux ARM

On Thu, Feb 18, 2021 at 10:49 AM misono.tomohiro@fujitsu.com
> <misono.tomohiro@fujitsu.com> wrote:
> >
> > > > > > Also, It is common usage that each running thread is bound to one PE in
> > > > > > multi-threaded HPC applications.
> > > > >
> > > > > I think the expectation that all threads are bound to a physical CPU
> > > > > makes sense for using this feature, but I think it would be necessary
> > > > > to enforce that, e.g. by allowing only threads to enable it after they
> > > > > are isolated to a non-shared CPU, and automatically disabling it
> > > > > if the CPU isolation is changed.
> > > > >
> > > > > For the user space interface, something based on process IDs
> > > > > seems to make more sense to me than something based on CPU
> > > > > numbers. All of the above does require some level of integration
> > > > > with the core kernel of course.
> > > > >
> > > > > I think the next step would be to try to come up with a high-level
> > > > > user interface design that has a chance to get merged, rather than
> > > > > addressing the review comments for the current implementation.
> >
> > Hello,
> >
> > Sorry for late response but while thinking new approaches, I come up with
> > some different idea and want to hear your opinions. How about offload
> > all control to user space while the driver just offers read/write access
> > to the needed registers? Let me explain in detail.
> >
> > Although I searched similar functions in other products, I could not find
> > it. Also, this hardware barrier performs intra-numa synchronization and
> > it is hard to be used for general inter-process barrier. So I think
> > generalizing this feature in kernel does not go well.
> 
> Ok, thank you for taking a look.
> 
> > As I said this is mainly for HPC application. In the usual situations, the
> > user has full control of the PC nodes when running HPC application and
> > thus the user has full responsibility of running processes on the machine.
> > Offloading all controls to these registers to the user is acceptable in that
> > case (i.e. the driver just offers access to the registers and does not control it).
> > This is the safe for the kernel operation as manipulating barrier related
> > registers just affects user application.
> >
> > In this approach we could remove ioctls or control logic in the driver but
> > we need some way to access the needed registers. I firstly think if I can
> > use x86's MSR driver like approach but I know the idea is rejected
> > recently for security concerns:
> >  https://lore.kernel.org/linux-arm-kernel/20201130174833.41315-1-rongwei.wang@linux.alibaba.com/
> >
> > Based on these observations, I have two ideas currently:
> >  1) make the driver to only expose sysfs interface for reading/writing
> >    A64FX's barrier registers
> > or
> >  2) generalizing (1) in some way; To make some mechanism to expose
> >    CPU defined registers which can be safely accessed from user space
> >
> > Are these idea acceptable ways to explore to get merged in upstream?
> > I'd appreciate any criticism/comments.
> 
> I'm also running out of ideas here. I don't think a sysfs interface would
> be any different to your earlier ioctl interface or the the /dev/msr approach,
> they all share the same problem that they expose low-level access to
> platform specific registers in a way that is neither portable nor safe to
> use for general-purpose applications outside the very narrow scope
> of running highly optimized HPC applications.

Ok, but ARM architecture permits implementation defined registers at the
first place. So can we provide some method/interface to access them as
CPU feature if these registers do not at least affect kernel operations (like
this barrier) and only root can access them? Library could offer portable way
for user applications (under root permission) to access them.

> You can of course continue using the module you have as an external
> module that gets built outside of the kernel itself, and shipped along
> with the application or library using it, rather than with the kernel.
> Obviously this is not something I would generally recommend either,
> but this may be the last resort to fall back to when everything else
> fails.

Understood. This is the last resort.
Thanks very much for you help.

Regards,
Tomohiro

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-03-03 11:20                   ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-03-03 11:20 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Mark Rutland, Catalin Marinas, SoC Team, Olof Johansson,
	Will Deacon, Linux ARM

On Thu, Feb 18, 2021 at 10:49 AM misono.tomohiro@fujitsu.com
> <misono.tomohiro@fujitsu.com> wrote:
> >
> > > > > > Also, It is common usage that each running thread is bound to one PE in
> > > > > > multi-threaded HPC applications.
> > > > >
> > > > > I think the expectation that all threads are bound to a physical CPU
> > > > > makes sense for using this feature, but I think it would be necessary
> > > > > to enforce that, e.g. by allowing only threads to enable it after they
> > > > > are isolated to a non-shared CPU, and automatically disabling it
> > > > > if the CPU isolation is changed.
> > > > >
> > > > > For the user space interface, something based on process IDs
> > > > > seems to make more sense to me than something based on CPU
> > > > > numbers. All of the above does require some level of integration
> > > > > with the core kernel of course.
> > > > >
> > > > > I think the next step would be to try to come up with a high-level
> > > > > user interface design that has a chance to get merged, rather than
> > > > > addressing the review comments for the current implementation.
> >
> > Hello,
> >
> > Sorry for late response but while thinking new approaches, I come up with
> > some different idea and want to hear your opinions. How about offload
> > all control to user space while the driver just offers read/write access
> > to the needed registers? Let me explain in detail.
> >
> > Although I searched similar functions in other products, I could not find
> > it. Also, this hardware barrier performs intra-numa synchronization and
> > it is hard to be used for general inter-process barrier. So I think
> > generalizing this feature in kernel does not go well.
> 
> Ok, thank you for taking a look.
> 
> > As I said this is mainly for HPC application. In the usual situations, the
> > user has full control of the PC nodes when running HPC application and
> > thus the user has full responsibility of running processes on the machine.
> > Offloading all controls to these registers to the user is acceptable in that
> > case (i.e. the driver just offers access to the registers and does not control it).
> > This is the safe for the kernel operation as manipulating barrier related
> > registers just affects user application.
> >
> > In this approach we could remove ioctls or control logic in the driver but
> > we need some way to access the needed registers. I firstly think if I can
> > use x86's MSR driver like approach but I know the idea is rejected
> > recently for security concerns:
> >  https://lore.kernel.org/linux-arm-kernel/20201130174833.41315-1-rongwei.wang@linux.alibaba.com/
> >
> > Based on these observations, I have two ideas currently:
> >  1) make the driver to only expose sysfs interface for reading/writing
> >    A64FX's barrier registers
> > or
> >  2) generalizing (1) in some way; To make some mechanism to expose
> >    CPU defined registers which can be safely accessed from user space
> >
> > Are these idea acceptable ways to explore to get merged in upstream?
> > I'd appreciate any criticism/comments.
> 
> I'm also running out of ideas here. I don't think a sysfs interface would
> be any different to your earlier ioctl interface or the the /dev/msr approach,
> they all share the same problem that they expose low-level access to
> platform specific registers in a way that is neither portable nor safe to
> use for general-purpose applications outside the very narrow scope
> of running highly optimized HPC applications.

Ok, but ARM architecture permits implementation defined registers at the
first place. So can we provide some method/interface to access them as
CPU feature if these registers do not at least affect kernel operations (like
this barrier) and only root can access them? Library could offer portable way
for user applications (under root permission) to access them.

> You can of course continue using the module you have as an external
> module that gets built outside of the kernel itself, and shipped along
> with the application or library using it, rather than with the kernel.
> Obviously this is not something I would generally recommend either,
> but this may be the last resort to fall back to when everything else
> fails.

Understood. This is the last resort.
Thanks very much for you help.

Regards,
Tomohiro
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-03-03 13:33                     ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-03-03 13:33 UTC (permalink / raw)
  To: misono.tomohiro
  Cc: Mark Rutland, Catalin Marinas, SoC Team, Olof Johansson,
	Will Deacon, Linux ARM

On Wed, Mar 3, 2021 at 12:20 PM misono.tomohiro@fujitsu.com
<misono.tomohiro@fujitsu.com> wrote:
> On Thu, Feb 18, 2021 at 10:49 AM misono.tomohiro@fujitsu.com <misono.tomohiro@fujitsu.com> wrote:
>
> > I'm also running out of ideas here. I don't think a sysfs interface would
> > be any different to your earlier ioctl interface or the the /dev/msr approach,
> > they all share the same problem that they expose low-level access to
> > platform specific registers in a way that is neither portable nor safe to
> > use for general-purpose applications outside the very narrow scope
> > of running highly optimized HPC applications.
>
> Ok, but ARM architecture permits implementation defined registers at the
> first place. So can we provide some method/interface to access them as
> CPU feature if these registers do not at least affect kernel operations (like
> this barrier) and only root can access them? Library could offer portable way
> for user applications (under root permission) to access them.

The kernel is meant to provide an abstraction for any differences between the
CPUs, including implementation defined registers. While any such abstraction
will be leaky, just passing through the raw registers is generally not a helpful
abstraction at all, as seen from the x86 MSR discussion you pointed to.

One problem with having a root-only register level interface is that this
can break the boundary between kernel mode and root user space, and
this is something that a lot of people would like to strengthen for security
reasons (e.g. a root user should not be able to break secure boot).

Another problem is that exposing the raw registers from kernel space
creates an ABI, and if it turns out to be a bad idea later on, this is hard to
take back without breaking existing applications. Not breaking things that
used to work is the primary rule for the Linux kernel.

In order to merge anything into the mainline kernel, I think the requirement
would be that it does provide a sensible abstraction inside of the kernel
that can directly be used from applications without having to go through
another library that abstracts it, and that has a good chance of being
supportable forever.

      Arnd

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-03-03 13:33                     ` Arnd Bergmann
  0 siblings, 0 replies; 66+ messages in thread
From: Arnd Bergmann @ 2021-03-03 13:33 UTC (permalink / raw)
  To: misono.tomohiro
  Cc: Mark Rutland, Catalin Marinas, SoC Team, Olof Johansson,
	Will Deacon, Linux ARM

On Wed, Mar 3, 2021 at 12:20 PM misono.tomohiro@fujitsu.com
<misono.tomohiro@fujitsu.com> wrote:
> On Thu, Feb 18, 2021 at 10:49 AM misono.tomohiro@fujitsu.com <misono.tomohiro@fujitsu.com> wrote:
>
> > I'm also running out of ideas here. I don't think a sysfs interface would
> > be any different to your earlier ioctl interface or the the /dev/msr approach,
> > they all share the same problem that they expose low-level access to
> > platform specific registers in a way that is neither portable nor safe to
> > use for general-purpose applications outside the very narrow scope
> > of running highly optimized HPC applications.
>
> Ok, but ARM architecture permits implementation defined registers at the
> first place. So can we provide some method/interface to access them as
> CPU feature if these registers do not at least affect kernel operations (like
> this barrier) and only root can access them? Library could offer portable way
> for user applications (under root permission) to access them.

The kernel is meant to provide an abstraction for any differences between the
CPUs, including implementation defined registers. While any such abstraction
will be leaky, just passing through the raw registers is generally not a helpful
abstraction at all, as seen from the x86 MSR discussion you pointed to.

One problem with having a root-only register level interface is that this
can break the boundary between kernel mode and root user space, and
this is something that a lot of people would like to strengthen for security
reasons (e.g. a root user should not be able to break secure boot).

Another problem is that exposing the raw registers from kernel space
creates an ABI, and if it turns out to be a bad idea later on, this is hard to
take back without breaking existing applications. Not breaking things that
used to work is the primary rule for the Linux kernel.

In order to merge anything into the mainline kernel, I think the requirement
would be that it does provide a sensible abstraction inside of the kernel
that can directly be used from applications without having to go through
another library that abstracts it, and that has a good chance of being
supportable forever.

      Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-03-04  7:03                       ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-03-04  7:03 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Mark Rutland, Catalin Marinas, SoC Team, Olof Johansson,
	Will Deacon, Linux ARM

> > > I'm also running out of ideas here. I don't think a sysfs interface would
> > > be any different to your earlier ioctl interface or the the /dev/msr approach,
> > > they all share the same problem that they expose low-level access to
> > > platform specific registers in a way that is neither portable nor safe to
> > > use for general-purpose applications outside the very narrow scope
> > > of running highly optimized HPC applications.
> >
> > Ok, but ARM architecture permits implementation defined registers at the
> > first place. So can we provide some method/interface to access them as
> > CPU feature if these registers do not at least affect kernel operations (like
> > this barrier) and only root can access them? Library could offer portable way
> > for user applications (under root permission) to access them.
> 
> The kernel is meant to provide an abstraction for any differences between the
> CPUs, including implementation defined registers. While any such abstraction
> will be leaky, just passing through the raw registers is generally not a helpful
> abstraction at all, as seen from the x86 MSR discussion you pointed to.
> 
> One problem with having a root-only register level interface is that this
> can break the boundary between kernel mode and root user space, and
> this is something that a lot of people would like to strengthen for security
> reasons (e.g. a root user should not be able to break secure boot).
> 
> Another problem is that exposing the raw registers from kernel space
> creates an ABI, and if it turns out to be a bad idea later on, this is hard to
> take back without breaking existing applications. Not breaking things that
> used to work is the primary rule for the Linux kernel.

Ok, thanks for the thorough explanations. It helps my understandings.

> In order to merge anything into the mainline kernel, I think the requirement
> would be that it does provide a sensible abstraction inside of the kernel
> that can directly be used from applications without having to go through
> another library that abstracts it, and that has a good chance of being
> supportable forever.

As you mentioned an idea of process-based approach earlier, I will
reconsider the possibility of general abstraction interface in that way.

Regards,
Tomohiro

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
@ 2021-03-04  7:03                       ` misono.tomohiro
  0 siblings, 0 replies; 66+ messages in thread
From: misono.tomohiro @ 2021-03-04  7:03 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Mark Rutland, Catalin Marinas, SoC Team, Olof Johansson,
	Will Deacon, Linux ARM

> > > I'm also running out of ideas here. I don't think a sysfs interface would
> > > be any different to your earlier ioctl interface or the the /dev/msr approach,
> > > they all share the same problem that they expose low-level access to
> > > platform specific registers in a way that is neither portable nor safe to
> > > use for general-purpose applications outside the very narrow scope
> > > of running highly optimized HPC applications.
> >
> > Ok, but ARM architecture permits implementation defined registers at the
> > first place. So can we provide some method/interface to access them as
> > CPU feature if these registers do not at least affect kernel operations (like
> > this barrier) and only root can access them? Library could offer portable way
> > for user applications (under root permission) to access them.
> 
> The kernel is meant to provide an abstraction for any differences between the
> CPUs, including implementation defined registers. While any such abstraction
> will be leaky, just passing through the raw registers is generally not a helpful
> abstraction at all, as seen from the x86 MSR discussion you pointed to.
> 
> One problem with having a root-only register level interface is that this
> can break the boundary between kernel mode and root user space, and
> this is something that a lot of people would like to strengthen for security
> reasons (e.g. a root user should not be able to break secure boot).
> 
> Another problem is that exposing the raw registers from kernel space
> creates an ABI, and if it turns out to be a bad idea later on, this is hard to
> take back without breaking existing applications. Not breaking things that
> used to work is the primary rule for the Linux kernel.

Ok, thanks for the thorough explanations. It helps my understandings.

> In order to merge anything into the mainline kernel, I think the requirement
> would be that it does provide a sensible abstraction inside of the kernel
> that can directly be used from applications without having to go through
> another library that abstracts it, and that has a good chance of being
> supportable forever.

As you mentioned an idea of process-based approach earlier, I will
reconsider the possibility of general abstraction interface in that way.

Regards,
Tomohiro
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2021-03-04  7:10 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-08 10:52 [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver Misono Tomohiro
2021-01-08 10:52 ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 01/10] soc: fujitsu: hwb: Add hardware barrier driver init/exit code Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 02/10] soc: fujtisu: hwb: Add open operation Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 03/10] soc: fujitsu: hwb: Add IOC_BB_ALLOC ioctl Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 13:22   ` Arnd Bergmann
2021-01-08 13:22     ` Arnd Bergmann
2021-01-12 11:02     ` misono.tomohiro
2021-01-12 11:02       ` misono.tomohiro
2021-01-12 12:34       ` Arnd Bergmann
2021-01-12 12:34         ` Arnd Bergmann
2021-01-08 10:52 ` [PATCH 04/10] soc: fujitsu: hwb: Add IOC_BW_ASSIGN ioctl Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 05/10] soc: fujitsu: hwb: Add IOC_BW_UNASSIGN ioctl Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 06/10] soc: fujitsu: hwb: Add IOC_BB_FREE ioctl Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 07/10] soc: fujitsu: hwb: Add IOC_GET_PE_INFO ioctl Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 08/10] soc: fujitsu: hwb: Add release operation Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 13:25   ` Arnd Bergmann
2021-01-08 13:25     ` Arnd Bergmann
2021-01-12 10:38     ` misono.tomohiro
2021-01-12 10:38       ` misono.tomohiro
2021-01-08 10:52 ` [PATCH 09/10] soc: fujitsu: hwb: Add sysfs entry Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 13:27   ` Arnd Bergmann
2021-01-08 13:27     ` Arnd Bergmann
2021-01-12 10:40     ` misono.tomohiro
2021-01-12 10:40       ` misono.tomohiro
2021-01-08 10:52 ` [PATCH 10/10] soc: fujitsu: hwb: Add Kconfig/Makefile to build fujitsu_hwb driver Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 12:54 ` [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver Mark Rutland
2021-01-08 12:54   ` Mark Rutland
2021-01-08 14:23   ` Arnd Bergmann
2021-01-08 14:23     ` Arnd Bergmann
2021-01-08 15:51     ` Mark Rutland
2021-01-08 15:51       ` Mark Rutland
2021-01-12 10:24     ` misono.tomohiro
2021-01-12 10:24       ` misono.tomohiro
2021-01-12 14:22       ` Arnd Bergmann
2021-01-12 14:22         ` Arnd Bergmann
2021-01-15 11:10         ` misono.tomohiro
2021-01-15 11:10           ` misono.tomohiro
2021-01-15 12:24           ` Arnd Bergmann
2021-01-15 12:24             ` Arnd Bergmann
2021-01-19  5:30             ` misono.tomohiro
2021-01-19  5:30               ` misono.tomohiro
2021-02-18  9:49             ` misono.tomohiro
2021-02-18  9:49               ` misono.tomohiro
2021-03-01  7:53               ` misono.tomohiro
2021-03-01  7:53                 ` misono.tomohiro
2021-03-02 11:06               ` Arnd Bergmann
2021-03-02 11:06                 ` Arnd Bergmann
2021-03-03 11:20                 ` misono.tomohiro
2021-03-03 11:20                   ` misono.tomohiro
2021-03-03 13:33                   ` Arnd Bergmann
2021-03-03 13:33                     ` Arnd Bergmann
2021-03-04  7:03                     ` misono.tomohiro
2021-03-04  7:03                       ` misono.tomohiro
2021-01-12 10:32   ` misono.tomohiro
2021-01-12 10:32     ` misono.tomohiro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.