The existing upstream kernel doesn't boot for non-smp configuration. This patch series address various issues with non-smp configurations. The patch series is based on 5.0-rc5. Tested on QEMU and HiFive Unleashed board using both OpenSBI & BBL. Changes from v2->v3 1. Fixed spurious white space. 2. Added lockdep for smpboot completion variable. 2. Added a sanity check for hwcap. Changes from v1->v2 1. Move the cpuid to hartd id map to smp.c from setup.c 2. Split 3rd patch into several small patches based on logical grouping. 3. Added a new patch that fixes an issue in hwcap query. 4. Changed the title of the patch series. Atish Patra (8): RISC-V: Do not wait indefinitely in __cpu_up RISC-V: Move cpuid to hartid mapping to SMP. RISC-V: Remove NR_CPUs check during hartid search from DT RISC-V: Allow hartid-to-cpuid function to fail. RISC-V: Compare cpuid with NR_CPUS before mapping. clocksource/drivers/riscv: Add required checks during clock source init irqchip/irq-sifive-plic:: Check and continue in case of an invalid cpuid. RISC-V: Assign hwcap only according to boot cpu. arch/riscv/include/asm/smp.h | 14 ++++++++--- arch/riscv/kernel/cpu.c | 4 --- arch/riscv/kernel/cpufeature.c | 52 +++++++++++++++++++++++++++------------ arch/riscv/kernel/setup.c | 9 ------- arch/riscv/kernel/smp.c | 10 +++++++- arch/riscv/kernel/smpboot.c | 20 ++++++++++++--- drivers/clocksource/timer-riscv.c | 23 ++++++++++++++--- drivers/irqchip/irq-sifive-plic.c | 5 ++++ 8 files changed, 98 insertions(+), 39 deletions(-) -- 2.7.4 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
In SMP path, __cpu_up waits for other CPU to come online indefinitely. This is wrong as other CPU might be disabled in machine mode and possible CPU is set to the cpus present in DT. Introduce a completion variable and waits only for a second. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> --- arch/riscv/kernel/smpboot.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c index 18cda0e8..669eb332 100644 --- a/arch/riscv/kernel/smpboot.c +++ b/arch/riscv/kernel/smpboot.c @@ -39,6 +39,7 @@ void *__cpu_up_stack_pointer[NR_CPUS]; void *__cpu_up_task_pointer[NR_CPUS]; +static DECLARE_COMPLETION(cpu_running); void __init smp_prepare_boot_cpu(void) { @@ -77,6 +78,7 @@ void __init setup_smp(void) int __cpu_up(unsigned int cpu, struct task_struct *tidle) { + int ret = 0; int hartid = cpuid_to_hartid_map(cpu); tidle->thread_info.cpu = cpu; @@ -92,10 +94,16 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle) task_stack_page(tidle) + THREAD_SIZE); WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle); - while (!cpu_online(cpu)) - cpu_relax(); + lockdep_assert_held(&cpu_running); + wait_for_completion_timeout(&cpu_running, + msecs_to_jiffies(1000)); - return 0; + if (!cpu_online(cpu)) { + pr_crit("CPU%u: failed to come online\n", cpu); + ret = -EIO; + } + + return ret; } void __init smp_cpus_done(unsigned int max_cpus) @@ -121,6 +129,7 @@ asmlinkage void __init smp_callin(void) * a local TLB flush right now just in case. */ local_flush_tlb_all(); + complete(&cpu_running); /* * Disable preemption before enabling interrupts, so we don't try to * schedule a CPU that hasn't actually started yet. -- 2.7.4 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
Currently, logical CPU id to physical hartid mapping is defined for both smp and non-smp configurations. This is not required as we need this only for smp configuration. The mapping function can define directly boot_cpu_hartid for non-smp use case. The reverse mapping function i.e. hartid to cpuid can be called for any valid but not booted harts. So it should return default cpu 0 only if it is a boot hartid. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> --- arch/riscv/include/asm/smp.h | 14 +++++++++++--- arch/riscv/kernel/setup.c | 9 --------- arch/riscv/kernel/smp.c | 9 +++++++++ 3 files changed, 20 insertions(+), 12 deletions(-) diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h index 41aa73b4..21fd2d75 100644 --- a/arch/riscv/include/asm/smp.h +++ b/arch/riscv/include/asm/smp.h @@ -22,12 +22,13 @@ /* * Mapping between linux logical cpu index and hartid. */ -extern unsigned long __cpuid_to_hartid_map[NR_CPUS]; -#define cpuid_to_hartid_map(cpu) __cpuid_to_hartid_map[cpu] +extern unsigned long boot_cpu_hartid; struct seq_file; #ifdef CONFIG_SMP +extern unsigned long __cpuid_to_hartid_map[NR_CPUS]; +#define cpuid_to_hartid_map(cpu) __cpuid_to_hartid_map[cpu] /* print IPI stats */ void show_ipi_stats(struct seq_file *p, int prec); @@ -58,7 +59,14 @@ static inline void show_ipi_stats(struct seq_file *p, int prec) static inline int riscv_hartid_to_cpuid(int hartid) { - return 0; + if (hartid == boot_cpu_hartid) + return 0; + + return -1; +} +static inline unsigned long cpuid_to_hartid_map(int cpu) +{ + return boot_cpu_hartid; } static inline void riscv_cpuid_to_hartid_mask(const struct cpumask *in, diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c index 77564310..45e9a2f0 100644 --- a/arch/riscv/kernel/setup.c +++ b/arch/riscv/kernel/setup.c @@ -61,15 +61,6 @@ EXPORT_SYMBOL(empty_zero_page); atomic_t hart_lottery; unsigned long boot_cpu_hartid; -unsigned long __cpuid_to_hartid_map[NR_CPUS] = { - [0 ... NR_CPUS-1] = INVALID_HARTID -}; - -void __init smp_setup_processor_id(void) -{ - cpuid_to_hartid_map(0) = boot_cpu_hartid; -} - #ifdef CONFIG_BLK_DEV_INITRD static void __init setup_initrd(void) { diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c index 246635ea..b69883c6 100644 --- a/arch/riscv/kernel/smp.c +++ b/arch/riscv/kernel/smp.c @@ -36,6 +36,15 @@ enum ipi_message_type { IPI_MAX }; +unsigned long __cpuid_to_hartid_map[NR_CPUS] = { + [0 ... NR_CPUS-1] = INVALID_HARTID +}; + +void __init smp_setup_processor_id(void) +{ + cpuid_to_hartid_map(0) = boot_cpu_hartid; +} + /* A collection of single bit ipi messages. */ static struct { unsigned long stats[IPI_MAX] ____cacheline_aligned; -- 2.7.4 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
In non-smp configuration, hartid can be higher that NR_CPUS. riscv_of_processor_hartid should not be compared to hartid to NR_CPUS in that case. Moreover, this function checks all the DT properties of a hart node. NR_CPUS comparison seems out of place. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anup Patel <anup@brainfault.org> --- arch/riscv/kernel/cpu.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c index f8fa2c63..19edaeae 100644 --- a/arch/riscv/kernel/cpu.c +++ b/arch/riscv/kernel/cpu.c @@ -34,10 +34,6 @@ int riscv_of_processor_hartid(struct device_node *node) pr_warn("Found CPU without hart ID\n"); return -(ENODEV); } - if (hart >= NR_CPUS) { - pr_info("Found hart ID %d, which is above NR_CPUs. Disabling this hart\n", hart); - return -(ENODEV); - } if (of_property_read_string(node, "status", &status)) { pr_warn("CPU with hartid=%d has no \"status\" property\n", hart); -- 2.7.4 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
It is perfectly okay to call riscv_hartid_to_cpuid for a hartid that is not mapped with an CPU id. It can happen if the calling functions retrieves the hartid from DT. However, that hartid was never brought online by the firmware or kernel for any reasons. No need to BUG() in the above case. A negative error return is sufficient and the calling function should check for the return value always. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> --- arch/riscv/kernel/smp.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c index b69883c6..ca99f0fb 100644 --- a/arch/riscv/kernel/smp.c +++ b/arch/riscv/kernel/smp.c @@ -60,7 +60,6 @@ int riscv_hartid_to_cpuid(int hartid) return i; pr_err("Couldn't find cpu id for hartid [%d]\n", hartid); - BUG(); return i; } -- 2.7.4 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
We should never have a cpuid greater that NR_CPUS. Compare with NR_CPUS before creating the mapping between logical and physical CPU ids. This is also mandatory as NR_CPUS check is removed from riscv_of_processor_hartid. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> --- arch/riscv/kernel/smpboot.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c index 669eb332..f120d325 100644 --- a/arch/riscv/kernel/smpboot.c +++ b/arch/riscv/kernel/smpboot.c @@ -66,6 +66,11 @@ void __init setup_smp(void) found_boot_cpu = 1; continue; } + if (cpuid >= NR_CPUS) { + pr_warn("Invalid cpuid [%d] for hartid [%d]\n", + cpuid, hart); + break; + } cpuid_to_hartid_map(cpuid) = hart; set_cpu_possible(cpuid, true); -- 2.7.4 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
Currently, clocksource registration happens for an invalid cpu for non-smp kernels. This lead to kernel panic as cpu hotplug registration will fail for those cpus. Moreover, riscv_hartid_to_cpuid can return errors now. Do not proceed if hartid or cpuid is invalid. Take this opprtunity to print appropriate error strings for different failure cases. Signed-off-by: Atish Patra <atish.patra@wdc.com> --- drivers/clocksource/timer-riscv.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c index 43189220..3c7ea75b 100644 --- a/drivers/clocksource/timer-riscv.c +++ b/drivers/clocksource/timer-riscv.c @@ -95,13 +95,30 @@ static int __init riscv_timer_init_dt(struct device_node *n) struct clocksource *cs; hartid = riscv_of_processor_hartid(n); + if (hartid < 0) { + pr_warn("Not valid hartid for node [%pOF] error = [%d]\n", + n, hartid); + return hartid; + } + cpuid = riscv_hartid_to_cpuid(hartid); + if (cpuid < 0) { + pr_warn("Invalid cpuid for hartid [%d]\n", hartid); + return cpuid; + } if (cpuid != smp_processor_id()) return 0; + pr_err("%s: Registering clocksource cpuid [%d] hartid [%d]\n", + __func__, cpuid, hartid); cs = per_cpu_ptr(&riscv_clocksource, cpuid); - clocksource_register_hz(cs, riscv_timebase); + error = clocksource_register_hz(cs, riscv_timebase); + if (error) { + pr_err("RISCV timer register failed [%d] for cpu = [%d]\n", + error, cpuid); + return error; + } sched_clock_register(riscv_sched_clock, BITS_PER_LONG, riscv_timebase); @@ -110,8 +127,8 @@ static int __init riscv_timer_init_dt(struct device_node *n) "clockevents/riscv/timer:starting", riscv_timer_starting_cpu, riscv_timer_dying_cpu); if (error) - pr_err("RISCV timer register failed [%d] for cpu = [%d]\n", - error, cpuid); + pr_err("cpu hp setup state failed for RISCV timer [%d]\n", + error); return error; } -- 2.7.4 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
riscv_hartid_to_cpuid can return invalid cpuid for a hart that is present in DT but was never brought up. Print the appropriate warning message and continue. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> --- drivers/irqchip/irq-sifive-plic.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c index 357e9daf..254ecd76 100644 --- a/drivers/irqchip/irq-sifive-plic.c +++ b/drivers/irqchip/irq-sifive-plic.c @@ -237,6 +237,11 @@ static int __init plic_init(struct device_node *node, } cpu = riscv_hartid_to_cpuid(hartid); + if (cpu < 0) { + pr_warn("Invalid cpuid for context %d\n", i); + continue; + } + handler = per_cpu_ptr(&plic_handlers, cpu); handler->present = true; handler->ctxid = i; -- 2.7.4 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
Currently, we set hwcap based on first valid cpu from DT. This may not be correct always as that CPU might not be current booting cpu. Set hwcap based on the boot cpu instead of first valid CPU from DT. Add a sanity check to identify if any hwcap do not match. Signed-off-by: Atish Patra <atish.patra@wdc.com> --- arch/riscv/kernel/cpufeature.c | 52 +++++++++++++++++++++++++++++------------- 1 file changed, 36 insertions(+), 16 deletions(-) diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index a6e369ed..ed8f0c28 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -20,6 +20,7 @@ #include <linux/of.h> #include <asm/processor.h> #include <asm/hwcap.h> +#include <asm/smp.h> unsigned long elf_hwcap __read_mostly; #ifdef CONFIG_FPU @@ -32,6 +33,8 @@ void riscv_fill_hwcap(void) const char *isa; size_t i; static unsigned long isa2hwcap[256] = {0}; + int hartid; + unsigned long temp_hwcap = 0, boot_hwcap = 0; isa2hwcap['i'] = isa2hwcap['I'] = COMPAT_HWCAP_ISA_I; isa2hwcap['m'] = isa2hwcap['M'] = COMPAT_HWCAP_ISA_M; @@ -43,27 +46,44 @@ void riscv_fill_hwcap(void) elf_hwcap = 0; /* - * We don't support running Linux on hertergenous ISA systems. For - * now, we just check the ISA of the first "okay" processor. + * We don't support running Linux on hertergenous ISA systems. + * But first "okay" processor might not be the boot cpu. + * Check the ISA of boot cpu. */ - while ((node = of_find_node_by_type(node, "cpu"))) - if (riscv_of_processor_hartid(node) >= 0) - break; - if (!node) { - pr_warning("Unable to find \"cpu\" devicetree entry"); - return; - } + while ((node = of_find_node_by_type(node, "cpu"))) { + if (!node) { + pr_warn("Unable to find \"cpu\" devicetree entry"); + return; + } + + hartid = riscv_of_processor_hartid(node); + if (hartid < 0) + continue; - if (of_property_read_string(node, "riscv,isa", &isa)) { - pr_warning("Unable to find \"riscv,isa\" devicetree entry"); + if (of_property_read_string(node, "riscv,isa", &isa)) { + pr_warn("Unable to find \"riscv,isa\" devicetree entry"); + of_node_put(node); + return; + } of_node_put(node); - return; - } - of_node_put(node); - for (i = 0; i < strlen(isa); ++i) - elf_hwcap |= isa2hwcap[(unsigned char)(isa[i])]; + for (i = 0; i < strlen(isa); ++i) + temp_hwcap |= isa2hwcap[(unsigned char)(isa[i])]; + /* + * All "okay" hart should have same isa. We don't know how to + * handle if they don't. Throw a warning for now. + */ + if (elf_hwcap && temp_hwcap != elf_hwcap) + pr_warn("isa mismatch: 0x%lx != 0x%lx\n", + elf_hwcap, temp_hwcap); + + if (hartid == boot_cpu_hartid) + boot_hwcap = temp_hwcap; + elf_hwcap = temp_hwcap; + temp_hwcap = 0; + } + elf_hwcap = boot_hwcap; /* We don't support systems with F but without D, so mask those out * here. */ if ((elf_hwcap & COMPAT_HWCAP_ISA_F) && !(elf_hwcap & COMPAT_HWCAP_ISA_D)) { -- 2.7.4 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
On Thu, Feb 07, 2019 at 05:51:14PM -0800, Atish Patra wrote: > In SMP path, __cpu_up waits for other CPU to come online > indefinitely. This is wrong as other CPU might be disabled > in machine mode and possible CPU is set to the cpus present > in DT. > > Introduce a completion variable and waits only for a second. > > Signed-off-by: Atish Patra <atish.patra@wdc.com> > Reviewed-by: Anup Patel <anup@brainfault.org> Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
On Thu, Feb 07, 2019 at 05:51:15PM -0800, Atish Patra wrote: > Currently, logical CPU id to physical hartid mapping is > defined for both smp and non-smp configurations. This > is not required as we need this only for smp configuration. > The mapping function can define directly boot_cpu_hartid > for non-smp use case. Please use up your available 72 chars for the changelog. (probably also in other patches). > > The reverse mapping function i.e. hartid to cpuid can be called > for any valid but not booted harts. So it should return default > cpu 0 only if it is a boot hartid. > > Signed-off-by: Atish Patra <atish.patra@wdc.com> > Reviewed-by: Anup Patel <anup@brainfault.org> > --- > arch/riscv/include/asm/smp.h | 14 +++++++++++--- > arch/riscv/kernel/setup.c | 9 --------- > arch/riscv/kernel/smp.c | 9 +++++++++ > 3 files changed, 20 insertions(+), 12 deletions(-) > > diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h > index 41aa73b4..21fd2d75 100644 > --- a/arch/riscv/include/asm/smp.h > +++ b/arch/riscv/include/asm/smp.h > @@ -22,12 +22,13 @@ > /* > * Mapping between linux logical cpu index and hartid. > */ > -extern unsigned long __cpuid_to_hartid_map[NR_CPUS]; > -#define cpuid_to_hartid_map(cpu) __cpuid_to_hartid_map[cpu] > > +extern unsigned long boot_cpu_hartid; > struct seq_file; We usually try to keep forward declatations at the top of the file. Can you add the new external declaration below the forward one? Otherwise looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
On Thu, Feb 07, 2019 at 05:51:19PM -0800, Atish Patra wrote: > Currently, clocksource registration happens for an invalid cpu > for non-smp kernels. This lead to kernel panic as cpu hotplug > registration will fail for those cpus. Moreover, > riscv_hartid_to_cpuid can return errors now. > > Do not proceed if hartid or cpuid is invalid. Take this opprtunity > to print appropriate error strings for different failure cases. > > Signed-off-by: Atish Patra <atish.patra@wdc.com> > --- > drivers/clocksource/timer-riscv.c | 23 ++++++++++++++++++++--- > 1 file changed, 20 insertions(+), 3 deletions(-) > > diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c > index 43189220..3c7ea75b 100644 > --- a/drivers/clocksource/timer-riscv.c > +++ b/drivers/clocksource/timer-riscv.c > @@ -95,13 +95,30 @@ static int __init riscv_timer_init_dt(struct device_node *n) > struct clocksource *cs; > > hartid = riscv_of_processor_hartid(n); > + if (hartid < 0) { > + pr_warn("Not valid hartid for node [%pOF] error = [%d]\n", > + n, hartid); > + return hartid; > + } > + > cpuid = riscv_hartid_to_cpuid(hartid); > + if (cpuid < 0) { > + pr_warn("Invalid cpuid for hartid [%d]\n", hartid); > + return cpuid; > + } > > if (cpuid != smp_processor_id()) > return 0; > > + pr_err("%s: Registering clocksource cpuid [%d] hartid [%d]\n", > + __func__, cpuid, hartid); This does not look like an error case to me. At best it is info, if not debug. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
> + * We don't support running Linux on hertergenous ISA systems. > + * But first "okay" processor might not be the boot cpu. > + * Check the ISA of boot cpu. Please use up your available 80 characters per line in comments. > + /* > + * All "okay" hart should have same isa. We don't know how to > + * handle if they don't. Throw a warning for now. > + */ > + if (elf_hwcap && temp_hwcap != elf_hwcap) > + pr_warn("isa mismatch: 0x%lx != 0x%lx\n", > + elf_hwcap, temp_hwcap); > + > + if (hartid == boot_cpu_hartid) > + boot_hwcap = temp_hwcap; > + elf_hwcap = temp_hwcap; So we always set elf_hwcap to the capabilities of the previous cpu. > + temp_hwcap = 0; I think tmp_hwcap should be declared and initialized inside the outer loop instead having to manually reset it like this. > + } > > + elf_hwcap = boot_hwcap; And then reset it here to the boot cpu. Shoudn't we only report the features supported by all cores? Otherwise we'll still have problems if the boot cpu supports a feature, but not others. Something like: for () { unsigned long this_hwcap = 0; for (i = 0; i < strlen(isa); i++) this_hwcap |= isa2hwcap[(unsigned char)(isa[i])]; if (elf_hwcap) elf_hwcap &= this_hwcap; else elf_hwcap = this_hwcap; } _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
On 2/8/19 1:04 AM, Christoph Hellwig wrote: > On Thu, Feb 07, 2019 at 05:51:19PM -0800, Atish Patra wrote: >> Currently, clocksource registration happens for an invalid cpu >> for non-smp kernels. This lead to kernel panic as cpu hotplug >> registration will fail for those cpus. Moreover, >> riscv_hartid_to_cpuid can return errors now. >> >> Do not proceed if hartid or cpuid is invalid. Take this opprtunity >> to print appropriate error strings for different failure cases. >> >> Signed-off-by: Atish Patra <atish.patra@wdc.com> >> --- >> drivers/clocksource/timer-riscv.c | 23 ++++++++++++++++++++--- >> 1 file changed, 20 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c >> index 43189220..3c7ea75b 100644 >> --- a/drivers/clocksource/timer-riscv.c >> +++ b/drivers/clocksource/timer-riscv.c >> @@ -95,13 +95,30 @@ static int __init riscv_timer_init_dt(struct device_node *n) >> struct clocksource *cs; >> >> hartid = riscv_of_processor_hartid(n); >> + if (hartid < 0) { >> + pr_warn("Not valid hartid for node [%pOF] error = [%d]\n", >> + n, hartid); >> + return hartid; >> + } >> + >> cpuid = riscv_hartid_to_cpuid(hartid); >> + if (cpuid < 0) { >> + pr_warn("Invalid cpuid for hartid [%d]\n", hartid); >> + return cpuid; >> + } >> >> if (cpuid != smp_processor_id()) >> return 0; >> >> + pr_err("%s: Registering clocksource cpuid [%d] hartid [%d]\n", >> + __func__, cpuid, hartid); > > This does not look like an error case to me. At best it is info, > if not debug. > Thanks for catching. It was a typo. I will fix it in next version. Regards, Atish > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
On 2/8/19 1:03 AM, Christoph Hellwig wrote: > On Thu, Feb 07, 2019 at 05:51:15PM -0800, Atish Patra wrote: >> Currently, logical CPU id to physical hartid mapping is >> defined for both smp and non-smp configurations. This >> is not required as we need this only for smp configuration. >> The mapping function can define directly boot_cpu_hartid >> for non-smp use case. > > Please use up your available 72 chars for the changelog. (probably also > in other patches). > Sorry. I will fix all patches to use 72 chars. >> >> The reverse mapping function i.e. hartid to cpuid can be called >> for any valid but not booted harts. So it should return default >> cpu 0 only if it is a boot hartid. >> >> Signed-off-by: Atish Patra <atish.patra@wdc.com> >> Reviewed-by: Anup Patel <anup@brainfault.org> >> --- >> arch/riscv/include/asm/smp.h | 14 +++++++++++--- >> arch/riscv/kernel/setup.c | 9 --------- >> arch/riscv/kernel/smp.c | 9 +++++++++ >> 3 files changed, 20 insertions(+), 12 deletions(-) >> >> diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h >> index 41aa73b4..21fd2d75 100644 >> --- a/arch/riscv/include/asm/smp.h >> +++ b/arch/riscv/include/asm/smp.h >> @@ -22,12 +22,13 @@ >> /* >> * Mapping between linux logical cpu index and hartid. >> */ >> -extern unsigned long __cpuid_to_hartid_map[NR_CPUS]; >> -#define cpuid_to_hartid_map(cpu) __cpuid_to_hartid_map[cpu] >> >> +extern unsigned long boot_cpu_hartid; >> struct seq_file; > > We usually try to keep forward declatations at the top of the file. > > Can you add the new external declaration below the forward one? > Sure. Regards, Atish > Otherwise looks good: > > Reviewed-by: Christoph Hellwig <hch@lst.de> > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
On 2/8/19 1:11 AM, Christoph Hellwig wrote: >> + * We don't support running Linux on hertergenous ISA systems. >> + * But first "okay" processor might not be the boot cpu. >> + * Check the ISA of boot cpu. > > Please use up your available 80 characters per line in comments. > I will fix it. >> + /* >> + * All "okay" hart should have same isa. We don't know how to >> + * handle if they don't. Throw a warning for now. >> + */ >> + if (elf_hwcap && temp_hwcap != elf_hwcap) >> + pr_warn("isa mismatch: 0x%lx != 0x%lx\n", >> + elf_hwcap, temp_hwcap); >> + >> + if (hartid == boot_cpu_hartid) >> + boot_hwcap = temp_hwcap; >> + elf_hwcap = temp_hwcap; > > So we always set elf_hwcap to the capabilities of the previous cpu. > >> + temp_hwcap = 0; > > I think tmp_hwcap should be declared and initialized inside the outer loop > instead having to manually reset it like this. > >> + } >> >> + elf_hwcap = boot_hwcap; > > And then reset it here to the boot cpu. > > Shoudn't we only report the features supported by all cores? Otherwise > we'll still have problems if the boot cpu supports a feature, but not > others. > Hmm. The other side of the argument is boot cpu does have a feature that is not supported by other hart that didn't even boot. The user space may execute something based on boot cpu capability but that won't be enabled. At least, in this way we know that we are compatible completely with boot cpu capabilities. Thoughts ? Regards, Atish > Something like: > > for () { > unsigned long this_hwcap = 0; > > for (i = 0; i < strlen(isa); i++) > this_hwcap |= isa2hwcap[(unsigned char)(isa[i])]; > > if (elf_hwcap) > elf_hwcap &= this_hwcap; > else > elf_hwcap = this_hwcap; > } > > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
On Sat, Feb 9, 2019 at 12:03 AM Atish Patra <atish.patra@wdc.com> wrote: > > On 2/8/19 1:11 AM, Christoph Hellwig wrote: > >> + * We don't support running Linux on hertergenous ISA systems. > >> + * But first "okay" processor might not be the boot cpu. > >> + * Check the ISA of boot cpu. > > > > Please use up your available 80 characters per line in comments. > > > I will fix it. > > >> + /* > >> + * All "okay" hart should have same isa. We don't know how to > >> + * handle if they don't. Throw a warning for now. > >> + */ > >> + if (elf_hwcap && temp_hwcap != elf_hwcap) > >> + pr_warn("isa mismatch: 0x%lx != 0x%lx\n", > >> + elf_hwcap, temp_hwcap); > >> + > >> + if (hartid == boot_cpu_hartid) > >> + boot_hwcap = temp_hwcap; > >> + elf_hwcap = temp_hwcap; > > > > So we always set elf_hwcap to the capabilities of the previous cpu. > > > >> + temp_hwcap = 0; > > > > I think tmp_hwcap should be declared and initialized inside the outer loop > > instead having to manually reset it like this. > > > >> + } > >> > >> + elf_hwcap = boot_hwcap; > > > > And then reset it here to the boot cpu. > > > > Shoudn't we only report the features supported by all cores? Otherwise > > we'll still have problems if the boot cpu supports a feature, but not > > others. > > > > Hmm. The other side of the argument is boot cpu does have a feature that > is not supported by other hart that didn't even boot. > The user space may execute something based on boot cpu capability but > that won't be enabled. > > At least, in this way we know that we are compatible completely with > boot cpu capabilities. Thoughts ? There is one example on the market, e.g., Samsung Exynos 9810. Mongoose 3 (big cores) only support ARMv8.0, while Cortex-A55 (little ones) support ARMv8.2 (and that brings atomics support). I think, it's the only ARM SOC that supports different ISA extensions between cores on the same package. Kernel scheduler doesn't know that big cores are missing atomics support or that applications needs it and moves the thread resulting in illegal instruction. E.g., see Golang issue: https://github.com/golang/go/issues/28431 I also recall Jon Masters (Computer Architect at Red Hat) advocating against having cores with mismatched capabilities on the server market. It just causes more problems down the line. david _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
On Sat, 09 Feb 2019 04:26:07 +0000, David Abdurachmanov <david.abdurachmanov@gmail.com> wrote: > > On Sat, Feb 9, 2019 at 12:03 AM Atish Patra <atish.patra@wdc.com> wrote: > > > > On 2/8/19 1:11 AM, Christoph Hellwig wrote: > > >> + * We don't support running Linux on hertergenous ISA systems. > > >> + * But first "okay" processor might not be the boot cpu. > > >> + * Check the ISA of boot cpu. > > > > > > Please use up your available 80 characters per line in comments. > > > > > I will fix it. > > > > >> + /* > > >> + * All "okay" hart should have same isa. We don't know how to > > >> + * handle if they don't. Throw a warning for now. > > >> + */ > > >> + if (elf_hwcap && temp_hwcap != elf_hwcap) > > >> + pr_warn("isa mismatch: 0x%lx != 0x%lx\n", > > >> + elf_hwcap, temp_hwcap); > > >> + > > >> + if (hartid == boot_cpu_hartid) > > >> + boot_hwcap = temp_hwcap; > > >> + elf_hwcap = temp_hwcap; > > > > > > So we always set elf_hwcap to the capabilities of the previous cpu. > > > > > >> + temp_hwcap = 0; > > > > > > I think tmp_hwcap should be declared and initialized inside the outer loop > > > instead having to manually reset it like this. > > > > > >> + } > > >> > > >> + elf_hwcap = boot_hwcap; > > > > > > And then reset it here to the boot cpu. > > > > > > Shoudn't we only report the features supported by all cores? Otherwise > > > we'll still have problems if the boot cpu supports a feature, but not > > > others. > > > > > > > Hmm. The other side of the argument is boot cpu does have a feature that > > is not supported by other hart that didn't even boot. > > The user space may execute something based on boot cpu capability but > > that won't be enabled. > > > > At least, in this way we know that we are compatible completely with > > boot cpu capabilities. Thoughts ? > > There is one example on the market, e.g., Samsung Exynos 9810. > > Mongoose 3 (big cores) only support ARMv8.0, while Cortex-A55 > (little ones) support ARMv8.2 (and that brings atomics support). > I think, it's the only ARM SOC that supports different ISA extensions > between cores on the same package. > > Kernel scheduler doesn't know that big cores are missing atomics > support or that applications needs it and moves the thread > resulting in illegal instruction. Not quite. The scheduler doesn't have to know (thankfully). The problem is that the Samsung folks tampered with the detection logic in the kernel, and ended up advertising the LSE atomics to userspace (despite only being available on half the cores). If you run a mainline kernel on this things, it will just work, as the LSE atomics are not advertised to userspace at all. > > E.g., see Golang issue: https://github.com/golang/go/issues/28431 > > I also recall Jon Masters (Computer Architect at Red Hat) advocating > against having cores with mismatched capabilities on the server > market. Well, nobody recommends that, server or not. That being said, it is possible to handle it, and the arm64 kernel has been dealing with such thing from day 1. We can have CPUs with different PMUs, implemented page sizes, VA and PA spaces... What it takes is some work in the kernel to sanitize it, and be careful in what you expose to userspace. The thing to realise is that people will build stupid systems, no matter how loud you shout. You can either pretend they don't exist, or try to deal with them. Thanks, M. -- Jazz is not dead, it just smell funny. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
On Feb 07 2019, Atish Patra <atish.patra@wdc.com> wrote: > + while ((node = of_find_node_by_type(node, "cpu"))) { > + if (!node) { That can never be true. > + pr_warn("Unable to find \"cpu\" devicetree entry"); > + return; > + } > + > + hartid = riscv_of_processor_hartid(node); > + if (hartid < 0) > + continue; > > - if (of_property_read_string(node, "riscv,isa", &isa)) { > - pr_warning("Unable to find \"riscv,isa\" devicetree entry"); > + if (of_property_read_string(node, "riscv,isa", &isa)) { > + pr_warn("Unable to find \"riscv,isa\" devicetree entry"); > + of_node_put(node); > + return; > + } > of_node_put(node); [ 0.000000] OF: ERROR: Bad of_node_put() on /cpus/cpu@1 [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc6-00020-g5903f30f1310 #12 [ 0.000000] Call Trace: [ 0.000000] [<ffffffe001076812>] walk_stackframe+0x0/0xa4 [ 0.000000] [<ffffffe001076a12>] show_stack+0x2a/0x34 [ 0.000000] [<ffffffe0015cf9ea>] dump_stack+0x62/0x7c [ 0.000000] [<ffffffe00149fed4>] of_node_release+0xbe/0xc0 [ 0.000000] [<ffffffe0015d465a>] kobject_put+0xa6/0x1e8 [ 0.000000] [<ffffffe00149f44e>] of_node_put+0x16/0x20 [ 0.000000] [<ffffffe00149b45e>] of_find_node_by_type+0x66/0xa4 [ 0.000000] [<ffffffe0010755ca>] riscv_fill_hwcap+0x14c/0x1ce [ 0.000000] [<ffffffe0000026d4>] 0xffffffe0000026d4 [ 0.000000] [<ffffffe0000006ec>] 0xffffffe0000006ec [ 0.000000] [<ffffffe000000076>] 0xffffffe000000076 Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
On Fri, 08 Feb 2019 20:26:07 PST (-0800), david.abdurachmanov@gmail.com wrote: > On Sat, Feb 9, 2019 at 12:03 AM Atish Patra <atish.patra@wdc.com> wrote: >> >> On 2/8/19 1:11 AM, Christoph Hellwig wrote: >> >> + * We don't support running Linux on hertergenous ISA systems. >> >> + * But first "okay" processor might not be the boot cpu. >> >> + * Check the ISA of boot cpu. >> > >> > Please use up your available 80 characters per line in comments. >> > >> I will fix it. >> >> >> + /* >> >> + * All "okay" hart should have same isa. We don't know how to >> >> + * handle if they don't. Throw a warning for now. >> >> + */ >> >> + if (elf_hwcap && temp_hwcap != elf_hwcap) >> >> + pr_warn("isa mismatch: 0x%lx != 0x%lx\n", >> >> + elf_hwcap, temp_hwcap); >> >> + >> >> + if (hartid == boot_cpu_hartid) >> >> + boot_hwcap = temp_hwcap; >> >> + elf_hwcap = temp_hwcap; >> > >> > So we always set elf_hwcap to the capabilities of the previous cpu. >> > >> >> + temp_hwcap = 0; >> > >> > I think tmp_hwcap should be declared and initialized inside the outer loop >> > instead having to manually reset it like this. >> > >> >> + } >> >> >> >> + elf_hwcap = boot_hwcap; >> > >> > And then reset it here to the boot cpu. >> > >> > Shoudn't we only report the features supported by all cores? Otherwise >> > we'll still have problems if the boot cpu supports a feature, but not >> > others. >> > >> >> Hmm. The other side of the argument is boot cpu does have a feature that >> is not supported by other hart that didn't even boot. >> The user space may execute something based on boot cpu capability but >> that won't be enabled. >> >> At least, in this way we know that we are compatible completely with >> boot cpu capabilities. Thoughts ? > > There is one example on the market, e.g., Samsung Exynos 9810. > > Mongoose 3 (big cores) only support ARMv8.0, while Cortex-A55 > (little ones) support ARMv8.2 (and that brings atomics support). > I think, it's the only ARM SOC that supports different ISA extensions > between cores on the same package. > > Kernel scheduler doesn't know that big cores are missing atomics > support or that applications needs it and moves the thread > resulting in illegal instruction. > > E.g., see Golang issue: https://github.com/golang/go/issues/28431 > > I also recall Jon Masters (Computer Architect at Red Hat) advocating > against having cores with mismatched capabilities on the server market. > > It just causes more problems down the line. IMO the best bet is to only put extensions in HWCAP that are supported by all the harts that userspace will be scheduled on. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
On 2/11/19 11:02 AM, Palmer Dabbelt wrote: > On Fri, 08 Feb 2019 20:26:07 PST (-0800), david.abdurachmanov@gmail.com wrote: >> On Sat, Feb 9, 2019 at 12:03 AM Atish Patra <atish.patra@wdc.com> wrote: >>> >>> On 2/8/19 1:11 AM, Christoph Hellwig wrote: >>>>> + * We don't support running Linux on hertergenous ISA systems. >>>>> + * But first "okay" processor might not be the boot cpu. >>>>> + * Check the ISA of boot cpu. >>>> >>>> Please use up your available 80 characters per line in comments. >>>> >>> I will fix it. >>> >>>>> + /* >>>>> + * All "okay" hart should have same isa. We don't know how to >>>>> + * handle if they don't. Throw a warning for now. >>>>> + */ >>>>> + if (elf_hwcap && temp_hwcap != elf_hwcap) >>>>> + pr_warn("isa mismatch: 0x%lx != 0x%lx\n", >>>>> + elf_hwcap, temp_hwcap); >>>>> + >>>>> + if (hartid == boot_cpu_hartid) >>>>> + boot_hwcap = temp_hwcap; >>>>> + elf_hwcap = temp_hwcap; >>>> >>>> So we always set elf_hwcap to the capabilities of the previous cpu. >>>> >>>>> + temp_hwcap = 0; >>>> >>>> I think tmp_hwcap should be declared and initialized inside the outer loop >>>> instead having to manually reset it like this. >>>> >>>>> + } >>>>> >>>>> + elf_hwcap = boot_hwcap; >>>> >>>> And then reset it here to the boot cpu. >>>> >>>> Shoudn't we only report the features supported by all cores? Otherwise >>>> we'll still have problems if the boot cpu supports a feature, but not >>>> others. >>>> >>> >>> Hmm. The other side of the argument is boot cpu does have a feature that >>> is not supported by other hart that didn't even boot. >>> The user space may execute something based on boot cpu capability but >>> that won't be enabled. >>> >>> At least, in this way we know that we are compatible completely with >>> boot cpu capabilities. Thoughts ? >> >> There is one example on the market, e.g., Samsung Exynos 9810. >> >> Mongoose 3 (big cores) only support ARMv8.0, while Cortex-A55 >> (little ones) support ARMv8.2 (and that brings atomics support). >> I think, it's the only ARM SOC that supports different ISA extensions >> between cores on the same package. >> >> Kernel scheduler doesn't know that big cores are missing atomics >> support or that applications needs it and moves the thread >> resulting in illegal instruction. >> >> E.g., see Golang issue: https://github.com/golang/go/issues/28431 >> >> I also recall Jon Masters (Computer Architect at Red Hat) advocating >> against having cores with mismatched capabilities on the server market. >> >> It just causes more problems down the line. > > IMO the best bet is to only put extensions in HWCAP that are supported by all > the harts that userspace will be scheduled on. > Fair enough. Instead of setting HWCAP in setup_arch() once, we can set it only for boot cpu. It will be updated after every cpu comes up online. Thus, HWCAP will consists all extensions supported by all cpus that are online currently. Regards, Atish _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
On Mon, 11 Feb 2019 12:03:30 -0800 Atish Patra <atish.patra@wdc.com> wrote: > On 2/11/19 11:02 AM, Palmer Dabbelt wrote: > > On Fri, 08 Feb 2019 20:26:07 PST (-0800), david.abdurachmanov@gmail.com wrote: > >> On Sat, Feb 9, 2019 at 12:03 AM Atish Patra <atish.patra@wdc.com> wrote: > >>> > >>> On 2/8/19 1:11 AM, Christoph Hellwig wrote: > >>>>> + * We don't support running Linux on hertergenous ISA systems. > >>>>> + * But first "okay" processor might not be the boot cpu. > >>>>> + * Check the ISA of boot cpu. > >>>> > >>>> Please use up your available 80 characters per line in comments. > >>>> > >>> I will fix it. > >>> > >>>>> + /* > >>>>> + * All "okay" hart should have same isa. We don't know how to > >>>>> + * handle if they don't. Throw a warning for now. > >>>>> + */ > >>>>> + if (elf_hwcap && temp_hwcap != elf_hwcap) > >>>>> + pr_warn("isa mismatch: 0x%lx != 0x%lx\n", > >>>>> + elf_hwcap, temp_hwcap); > >>>>> + > >>>>> + if (hartid == boot_cpu_hartid) > >>>>> + boot_hwcap = temp_hwcap; > >>>>> + elf_hwcap = temp_hwcap; > >>>> > >>>> So we always set elf_hwcap to the capabilities of the previous cpu. > >>>> > >>>>> + temp_hwcap = 0; > >>>> > >>>> I think tmp_hwcap should be declared and initialized inside the outer loop > >>>> instead having to manually reset it like this. > >>>> > >>>>> + } > >>>>> > >>>>> + elf_hwcap = boot_hwcap; > >>>> > >>>> And then reset it here to the boot cpu. > >>>> > >>>> Shoudn't we only report the features supported by all cores? Otherwise > >>>> we'll still have problems if the boot cpu supports a feature, but not > >>>> others. > >>>> > >>> > >>> Hmm. The other side of the argument is boot cpu does have a feature that > >>> is not supported by other hart that didn't even boot. > >>> The user space may execute something based on boot cpu capability but > >>> that won't be enabled. > >>> > >>> At least, in this way we know that we are compatible completely with > >>> boot cpu capabilities. Thoughts ? > >> > >> There is one example on the market, e.g., Samsung Exynos 9810. > >> > >> Mongoose 3 (big cores) only support ARMv8.0, while Cortex-A55 > >> (little ones) support ARMv8.2 (and that brings atomics support). > >> I think, it's the only ARM SOC that supports different ISA extensions > >> between cores on the same package. > >> > >> Kernel scheduler doesn't know that big cores are missing atomics > >> support or that applications needs it and moves the thread > >> resulting in illegal instruction. > >> > >> E.g., see Golang issue: https://github.com/golang/go/issues/28431 > >> > >> I also recall Jon Masters (Computer Architect at Red Hat) advocating > >> against having cores with mismatched capabilities on the server market. > >> > >> It just causes more problems down the line. > > > IMO the best bet is to only put extensions in HWCAP that are supported by all > > the harts that userspace will be scheduled on. > > Fair enough. Instead of setting HWCAP in setup_arch() once, we can set it only for boot cpu. It will be updated after every cpu comes up online. > > Thus, HWCAP will consists all extensions supported by all cpus that are online currently. You must thus prevent CPUs that have a different set of capabilities from coming up late (once userspace has started). M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
On Mon, 11 Feb 2019 14:13:25 PST (-0800), marc.zyngier@arm.com wrote: > On Mon, 11 Feb 2019 12:03:30 -0800 > Atish Patra <atish.patra@wdc.com> wrote: > >> On 2/11/19 11:02 AM, Palmer Dabbelt wrote: >> > On Fri, 08 Feb 2019 20:26:07 PST (-0800), david.abdurachmanov@gmail.com wrote: >> >> On Sat, Feb 9, 2019 at 12:03 AM Atish Patra <atish.patra@wdc.com> wrote: >> >>> >> >>> On 2/8/19 1:11 AM, Christoph Hellwig wrote: >> >>>>> + * We don't support running Linux on hertergenous ISA systems. >> >>>>> + * But first "okay" processor might not be the boot cpu. >> >>>>> + * Check the ISA of boot cpu. >> >>>> >> >>>> Please use up your available 80 characters per line in comments. >> >>>> >> >>> I will fix it. >> >>> >> >>>>> + /* >> >>>>> + * All "okay" hart should have same isa. We don't know how to >> >>>>> + * handle if they don't. Throw a warning for now. >> >>>>> + */ >> >>>>> + if (elf_hwcap && temp_hwcap != elf_hwcap) >> >>>>> + pr_warn("isa mismatch: 0x%lx != 0x%lx\n", >> >>>>> + elf_hwcap, temp_hwcap); >> >>>>> + >> >>>>> + if (hartid == boot_cpu_hartid) >> >>>>> + boot_hwcap = temp_hwcap; >> >>>>> + elf_hwcap = temp_hwcap; >> >>>> >> >>>> So we always set elf_hwcap to the capabilities of the previous cpu. >> >>>> >> >>>>> + temp_hwcap = 0; >> >>>> >> >>>> I think tmp_hwcap should be declared and initialized inside the outer loop >> >>>> instead having to manually reset it like this. >> >>>> >> >>>>> + } >> >>>>> >> >>>>> + elf_hwcap = boot_hwcap; >> >>>> >> >>>> And then reset it here to the boot cpu. >> >>>> >> >>>> Shoudn't we only report the features supported by all cores? Otherwise >> >>>> we'll still have problems if the boot cpu supports a feature, but not >> >>>> others. >> >>>> >> >>> >> >>> Hmm. The other side of the argument is boot cpu does have a feature that >> >>> is not supported by other hart that didn't even boot. >> >>> The user space may execute something based on boot cpu capability but >> >>> that won't be enabled. >> >>> >> >>> At least, in this way we know that we are compatible completely with >> >>> boot cpu capabilities. Thoughts ? >> >> >> >> There is one example on the market, e.g., Samsung Exynos 9810. >> >> >> >> Mongoose 3 (big cores) only support ARMv8.0, while Cortex-A55 >> >> (little ones) support ARMv8.2 (and that brings atomics support). >> >> I think, it's the only ARM SOC that supports different ISA extensions >> >> between cores on the same package. >> >> >> >> Kernel scheduler doesn't know that big cores are missing atomics >> >> support or that applications needs it and moves the thread >> >> resulting in illegal instruction. >> >> >> >> E.g., see Golang issue: https://github.com/golang/go/issues/28431 >> >> >> >> I also recall Jon Masters (Computer Architect at Red Hat) advocating >> >> against having cores with mismatched capabilities on the server market. >> >> >> >> It just causes more problems down the line. >> > > IMO the best bet is to only put extensions in HWCAP that are supported by all >> > the harts that userspace will be scheduled on. >> > Fair enough. Instead of setting HWCAP in setup_arch() once, we can set it only for boot cpu. It will be updated after every cpu comes up online. >> >> Thus, HWCAP will consists all extensions supported by all cpus that are online currently. > > You must thus prevent CPUs that have a different set of capabilities > from coming up late (once userspace has started). and we have no way to do that. I'd prefer if we just looked through the entire device tree and only showed userspace the features that are on every possible CPU from the start. Otherwise the HWCAP will shift around during a userspace run, which seems odd. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
> On Feb 11, 2019, at 2:23 PM, Palmer Dabbelt <palmer@sifive.com> wrote: > > On Mon, 11 Feb 2019 14:13:25 PST (-0800), marc.zyngier@arm.com wrote: >> On Mon, 11 Feb 2019 12:03:30 -0800 >> Atish Patra <atish.patra@wdc.com> wrote: >> >>> On 2/11/19 11:02 AM, Palmer Dabbelt wrote: >>> > On Fri, 08 Feb 2019 20:26:07 PST (-0800), david.abdurachmanov@gmail.com wrote: >>> >> On Sat, Feb 9, 2019 at 12:03 AM Atish Patra <atish.patra@wdc.com> wrote: >>> >>> >>> >>> On 2/8/19 1:11 AM, Christoph Hellwig wrote: >>> >>>>> + * We don't support running Linux on hertergenous ISA systems. >>> >>>>> + * But first "okay" processor might not be the boot cpu. >>> >>>>> + * Check the ISA of boot cpu. >>> >>>> >>> >>>> Please use up your available 80 characters per line in comments. >>> >>>> >>> >>> I will fix it. >>> >>> >>> >>>>> + /* >>> >>>>> + * All "okay" hart should have same isa. We don't know how to >>> >>>>> + * handle if they don't. Throw a warning for now. >>> >>>>> + */ >>> >>>>> + if (elf_hwcap && temp_hwcap != elf_hwcap) >>> >>>>> + pr_warn("isa mismatch: 0x%lx != 0x%lx\n", >>> >>>>> + elf_hwcap, temp_hwcap); >>> >>>>> + >>> >>>>> + if (hartid == boot_cpu_hartid) >>> >>>>> + boot_hwcap = temp_hwcap; >>> >>>>> + elf_hwcap = temp_hwcap; >>> >>>> >>> >>>> So we always set elf_hwcap to the capabilities of the previous cpu. >>> >>>> >>> >>>>> + temp_hwcap = 0; >>> >>>> >>> >>>> I think tmp_hwcap should be declared and initialized inside the outer loop >>> >>>> instead having to manually reset it like this. >>> >>>> >>> >>>>> + } >>> >>>>> >>> >>>>> + elf_hwcap = boot_hwcap; >>> >>>> >>> >>>> And then reset it here to the boot cpu. >>> >>>> >>> >>>> Shoudn't we only report the features supported by all cores? Otherwise >>> >>>> we'll still have problems if the boot cpu supports a feature, but not >>> >>>> others. >>> >>>> >>> >>> >>> >>> Hmm. The other side of the argument is boot cpu does have a feature that >>> >>> is not supported by other hart that didn't even boot. >>> >>> The user space may execute something based on boot cpu capability but >>> >>> that won't be enabled. >>> >>> >>> >>> At least, in this way we know that we are compatible completely with >>> >>> boot cpu capabilities. Thoughts ? >>> >> >>> >> There is one example on the market, e.g., Samsung Exynos 9810. >>> >> >>> >> Mongoose 3 (big cores) only support ARMv8.0, while Cortex-A55 >>> >> (little ones) support ARMv8.2 (and that brings atomics support). >>> >> I think, it's the only ARM SOC that supports different ISA extensions >>> >> between cores on the same package. >>> >> >>> >> Kernel scheduler doesn't know that big cores are missing atomics >>> >> support or that applications needs it and moves the thread >>> >> resulting in illegal instruction. >>> >> >>> >> E.g., see Golang issue: https://github.com/golang/go/issues/28431 >>> >> >>> >> I also recall Jon Masters (Computer Architect at Red Hat) advocating >>> >> against having cores with mismatched capabilities on the server market. >>> >> >>> >> It just causes more problems down the line. >>> > > IMO the best bet is to only put extensions in HWCAP that are supported by all >>> > the harts that userspace will be scheduled on. >>> > Fair enough. Instead of setting HWCAP in setup_arch() once, we can set it only for boot cpu. It will be updated after every cpu comes up online. >>> >>> Thus, HWCAP will consists all extensions supported by all cpus that are online currently. >> >> You must thus prevent CPUs that have a different set of capabilities >> from coming up late (once userspace has started). > > and we have no way to do that. I'd prefer if we just looked through the entire device tree and only showed userspace the features that are on every possible CPU from the start. Otherwise the HWCAP will shift around during a userspace run, which seems odd. ok. I will do this for now. Once we have cpu hotplug enabled, we can revisit this. Regards, Atish _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv