linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Relax CPU features sanity checking on heterogeneous architectures
@ 2019-10-11  5:49 Sai Prakash Ranjan
  2019-10-11  9:19 ` Marc Gonzalez
  2019-10-11 10:50 ` Mark Rutland
  0 siblings, 2 replies; 20+ messages in thread
From: Sai Prakash Ranjan @ 2019-10-11  5:49 UTC (permalink / raw)
  To: suzuki.poulose, mark.rutland, linux-arm-kernel, catalin.marinas,
	will, Dave.Martin, andrew.murray, jeremy.linton
  Cc: linux-arm-msm, linux-kernel, rnayak, bjorn.andersson, saiprakash.ranjan

On latest QCOM SoCs like SM8150 and SC7180 with big.LITTLE arch, below 
warnings are observed during bootup of big cpu cores.

SM8150:

[    0.271177] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU4: 0x00000011111112
[    0.271184] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_ISAR4_EL1. Boot CPU: 0x00000000011142, CPU4: 0x00000000010142
[    0.271189] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_PFR1_EL1. Boot CPU: 0x00000010011011, CPU4: 0x00000010010000
[    0.271192] CPU features: Unsupported CPU feature variation detected.
[    0.271208] GICv3: CPU4: found redistributor 400 region 
0:0x0000000017ae0000
[    0.271237] CPU4: Booted secondary processor 0x0000000004 
[0x51df804e]
[    0.302919] Detected PIPT I-cache on CPU5
[    0.302930] CPU features: SANITY         CHECK: Unexpected variation 
in SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU5: 
0x00000011111112
[    0.302936] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_ISAR4_EL1. Boot CPU: 0x00000000011142, CPU5: 0x00000000010142
[    0.302941] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_PFR1_EL1. Boot CPU: 0x00000010011011, CPU5: 0x00000010010000
[    0.302957] GICv3: CPU5: found redistributor 500 region 
0:0x0000000017b00000
[    0.302987] CPU5: Booted secondary processor 0x0000000005 
[0x51df804e]
[    0.335066] Detected PIPT I-cache on CPU6
[    0.335076] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU6: 0x00000011111112
[    0.335082] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_ISAR4_EL1. Boot CPU: 0x00000000011142, CPU6: 0x00000000010142
[    0.335087] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_PFR1_EL1. Boot CPU: 0x00000010011011, CPU6: 0x00000010010000
[    0.335104] GICv3: CPU6: found redistributor 600 region 
0:0x0000000017b20000
[    0.335135] CPU6: Booted secondary processor 0x0000000006 
[0x51df804e]
[    0.367597] Detected PIPT I-cache on CPU7
[    0.367605] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU7: 0x00000011111112
[    0.367610] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_ISAR4_EL1. Boot CPU: 0x00000000011142, CPU7: 0x00000000010142
[    0.367615] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_PFR1_EL1. Boot CPU: 0x00000010011011, CPU7: 0x00000010010000
[    0.367632] GICv3: CPU7: found redistributor 700 region 
0:0x0000000017b40000
[    0.367661] CPU7: Booted secondary processor 0x0000000007 
[0x51df804e]

SC7180:

[    0.812770] CPU features: SANITY CHECK: Unexpected variation in 
SYS_CTR_EL0. Boot CPU: 0x00000084448004, CPU6: 0x0000009444c004
[    0.812838] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_AA64MMFR2_EL1. Boot CPU: 0x00000000001011, CPU6: 0x00000000000011
[    0.812876] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU6: 
0x1100000011111112
[    0.812924] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_ISAR4_EL1. Boot CPU: 0x00000000011142, CPU6: 0x00000000010142
[    0.812950] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_PFR0_EL1. Boot CPU: 0x00000010000131, CPU6: 0x00000010010131
[    0.812977] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_PFR1_EL1. Boot CPU: 0x00000010011011, CPU6: 0x00000010010000
[    0.813018] CPU features: Unsupported CPU feature variation detected.
[    0.813447] GICv3: CPU6: found redistributor 600 region 
0:0x0000000017b20000
[    0.814144] CPU6: Booted secondary processor 0x0000000600 
[0x51ff804f]
[    0.902441] Detected PIPT I-cache on CPU7
[    0.902528] CPU features: SANITY CHECK: Unexpected variation in 
SYS_CTR_EL0. Boot CPU: 0x00000084448004, CPU7: 0x0000009444c004
[    0.902591] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_AA64MMFR2_EL1. Boot CPU: 0x00000000001011, CPU7: 0x00000000000011
[    0.902610] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU7: 
0x1100000011111112
[    0.902659] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_ISAR4_EL1. Boot CPU: 0x00000000011142, CPU7: 0x00000000010142
[    0.902695] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_PFR0_EL1. Boot CPU: 0x00000010000131, CPU7: 0x00000010010131
[    0.902713] CPU features: SANITY CHECK: Unexpected variation in 
SYS_ID_PFR1_EL1. Boot CPU: 0x00000010011011, CPU7: 0x00000010010000
[    0.903217] GICv3: CPU7: found redistributor 700 region 
0:0x0000000017b40000
[    0.903965] CPU7: Booted secondary processor 0x0000000700 
[0x51ff804f]


Can we relax some sanity checking for these by making it FTR_NONSTRICT 
or by some other means? I just tried below roughly for SM8150 but I 
guess this is not correct,
maybe for ftr_generic_32bits we should be checking bootcpu and nonboot 
cpu partnum(to identify big.LITTLE) and then make it nonstrict?
These are all my wild assumptions, please correct me if I am wrong.

diff --git a/arch/arm64/kernel/cpufeature.c 
b/arch/arm64/kernel/cpufeature.c
index cabebf1a7976..207197692caa 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -164,8 +164,8 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] 
= {
         S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64PFR0_FP_SHIFT, 4, ID_AA64PFR0_FP_NI),
         /* Linux doesn't care about the EL3 */
         ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 
ID_AA64PFR0_EL3_SHIFT, 4, 0),
-       ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64PFR0_EL2_SHIFT, 4, 0),
-       ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64PFR0_EL1_SHIFT, 4, ID_AA64PFR0_EL1_64BIT_ONLY),
+       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 
ID_AA64PFR0_EL2_SHIFT, 4, 0),
+       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 
ID_AA64PFR0_EL1_SHIFT, 4, ID_AA64PFR0_EL1_64BIT_ONLY),
         ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64PFR0_EL0_SHIFT, 4, ID_AA64PFR0_EL0_64BIT_ONLY),
         ARM64_FTR_END,
  };
@@ -345,10 +345,10 @@ static const struct arm64_ftr_bits 
ftr_generic_32bits[] = {
         ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 24, 4, 
0),
         ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 20, 4, 
0),
         ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 
0),
-       ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 12, 4, 
0),
+       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 12, 4, 
0),
         ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 8, 4, 0),
-       ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 4, 4, 0),
-       ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0),
+       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 4, 4, 
0),
+       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 0, 4, 
0),
         ARM64_FTR_END,
  };


Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-11  5:49 Relax CPU features sanity checking on heterogeneous architectures Sai Prakash Ranjan
@ 2019-10-11  9:19 ` Marc Gonzalez
  2019-10-11  9:57   ` Sai Prakash Ranjan
  2019-10-11 10:50 ` Mark Rutland
  1 sibling, 1 reply; 20+ messages in thread
From: Marc Gonzalez @ 2019-10-11  9:19 UTC (permalink / raw)
  To: Sai Prakash Ranjan; +Cc: MSM, Linux ARM

On 11/10/2019 07:49, Sai Prakash Ranjan wrote:

> diff --git a/arch/arm64/kernel/cpufeature.c 
> b/arch/arm64/kernel/cpufeature.c
> index cabebf1a7976..207197692caa 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -164,8 +164,8 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] 
> = {
>          S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 
> ID_AA64PFR0_FP_SHIFT, 4, ID_AA64PFR0_FP_NI),
>          /* Linux doesn't care about the EL3 */
>          ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 
> ID_AA64PFR0_EL3_SHIFT, 4, 0),
> -       ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 
> ID_AA64PFR0_EL2_SHIFT, 4, 0),
> -       ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 
> ID_AA64PFR0_EL1_SHIFT, 4, ID_AA64PFR0_EL1_64BIT_ONLY),
> +       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 
> ID_AA64PFR0_EL2_SHIFT, 4, 0),
> +       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 
> ID_AA64PFR0_EL1_SHIFT, 4, ID_AA64PFR0_EL1_64BIT_ONLY),
>          ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 
> ID_AA64PFR0_EL0_SHIFT, 4, ID_AA64PFR0_EL0_64BIT_ONLY),
>          ARM64_FTR_END,
>   };
> @@ -345,10 +345,10 @@ static const struct arm64_ftr_bits 
> ftr_generic_32bits[] = {
>          ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 24, 4, 
> 0),
>          ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 20, 4, 
> 0),
>          ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 
> 0),
> -       ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 12, 4, 
> 0),
> +       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 12, 4, 
> 0),
>          ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 8, 4, 0),
> -       ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 4, 4, 0),
> -       ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0),
> +       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 4, 4, 
> 0),
> +       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 0, 4, 
> 0),
>          ARM64_FTR_END,
>   };

Hello Sai,

Could you configure your webmail client to not wrap "long" lines?

Wrapping might break the patch, and the kernel logs would look better
in their original form.

Regards.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-11  9:19 ` Marc Gonzalez
@ 2019-10-11  9:57   ` Sai Prakash Ranjan
  0 siblings, 0 replies; 20+ messages in thread
From: Sai Prakash Ranjan @ 2019-10-11  9:57 UTC (permalink / raw)
  To: Marc Gonzalez; +Cc: MSM, Linux ARM

Hi Marc,

On 2019-10-11 14:49, Marc Gonzalez wrote:
> 
> Hello Sai,
> 
> Could you configure your webmail client to not wrap "long" lines?
> 
> Wrapping might break the patch, and the kernel logs would look better
> in their original form.
> 

Oh right, sorry did not see that. I use git send-email for patches, so 
no problem of wrapping in those cases.
Here I just used my webmail to compose the message and just pasted the 
diff, will take care of it in future.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-11  5:49 Relax CPU features sanity checking on heterogeneous architectures Sai Prakash Ranjan
  2019-10-11  9:19 ` Marc Gonzalez
@ 2019-10-11 10:50 ` Mark Rutland
  2019-10-11 11:09   ` Marc Gonzalez
                     ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Mark Rutland @ 2019-10-11 10:50 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: suzuki.poulose, linux-arm-kernel, catalin.marinas, will,
	Dave.Martin, andrew.murray, jeremy.linton, linux-arm-msm,
	linux-kernel, rnayak, bjorn.andersson

Hi,

On Fri, Oct 11, 2019 at 11:19:00AM +0530, Sai Prakash Ranjan wrote:
> On latest QCOM SoCs like SM8150 and SC7180 with big.LITTLE arch, below
> warnings are observed during bootup of big cpu cores.

For reference, which CPUs are in those SoCs?

> SM8150:
> 
> [    0.271177] CPU features: SANITY CHECK: Unexpected variation in
> SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU4: 0x00000011111112

The differing fields are EL3, EL2, and EL1: the boot CPU supports
AArch64 and AArch32 at those exception levels, while the secondary only
supports AArch64.

Do we handle this variation in KVM?

> [    0.271184] CPU features: SANITY CHECK: Unexpected variation in
> SYS_ID_ISAR4_EL1. Boot CPU: 0x00000000011142, CPU4: 0x00000000010142

The differing field is (AArch32) SMC: present on the boot CPU, but
missing on the secondary CPU.

This is mandated to be zero when AArch32 isn' implemented at EL1.

> [    0.271189] CPU features: SANITY CHECK: Unexpected variation in
> SYS_ID_PFR1_EL1. Boot CPU: 0x00000010011011, CPU4: 0x00000010010000

The differing fields are (AArch32) Virtualization, Security, and
ProgMod: all present on the boot CPU, but missing on the secondary
CPU.

All mandated to be zero when AArch32 isn' implemented at EL1.

> SC7180:
> 
> [    0.812770] CPU features: SANITY CHECK: Unexpected variation in
> SYS_CTR_EL0. Boot CPU: 0x00000084448004, CPU6: 0x0000009444c004

The differing fields are:

* IDC: present only on the secondary CPU. This is a worrying mismatch
  because it could mean that required cache maintenance is missed in
  some cases. Does the secondary CPU definitely broadcast PoU
  maintenance to the boot CPU that requires it?

* L1Ip: VIPT on the boot CPU, PIPT on the secondary CPU.

> [    0.812838] CPU features: SANITY CHECK: Unexpected variation in
> SYS_ID_AA64MMFR2_EL1. Boot CPU: 0x00000000001011, CPU6: 0x00000000000011

The differing field is IESB: presend on the boot CPU, missing on the
secondary CPU.

> [    0.812876] CPU features: SANITY CHECK: Unexpected variation in
> SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU6: 0x1100000011111112
> [    0.812924] CPU features: SANITY CHECK: Unexpected variation in
> SYS_ID_ISAR4_EL1. Boot CPU: 0x00000000011142, CPU6: 0x00000000010142
> [    0.812950] CPU features: SANITY CHECK: Unexpected variation in
> SYS_ID_PFR0_EL1. Boot CPU: 0x00000010000131, CPU6: 0x00000010010131
> [    0.812977] CPU features: SANITY CHECK: Unexpected variation in
> SYS_ID_PFR1_EL1. Boot CPU: 0x00000010011011, CPU6: 0x00000010010000

These are the same story as for SM8150.

> Can we relax some sanity checking for these by making it FTR_NONSTRICT or by
> some other means? I just tried below roughly for SM8150 but I guess this is
> not correct,
> maybe for ftr_generic_32bits we should be checking bootcpu and nonboot cpu
> partnum(to identify big.LITTLE) and then make it nonstrict?
> These are all my wild assumptions, please correct me if I am wrong.

Before we make any changes, we need to check whether we do actually
handle this variation in a safe way, and we need to consider what this
means w.r.t. late CPU hotplug.

Even if we can handle variation at boot time, once we've determined the
set of system-wide features we cannot allow those to regress, and I
believe we'll need new code to enforce that. I don't think it's
sufficient to mark these as NONSTRICT, though we might do that with
other changes.

We shouldn't look at the part number at all here. We care about
variation across CPUs regardless of whether this is big.LITTLE or some
variation in tie-offs, etc.

Thanks,
Mark.

> 
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index cabebf1a7976..207197692caa 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -164,8 +164,8 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
>         S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE,
> ID_AA64PFR0_FP_SHIFT, 4, ID_AA64PFR0_FP_NI),
>         /* Linux doesn't care about the EL3 */
>         ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE,
> ID_AA64PFR0_EL3_SHIFT, 4, 0),
> -       ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE,
> ID_AA64PFR0_EL2_SHIFT, 4, 0),
> -       ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE,
> ID_AA64PFR0_EL1_SHIFT, 4, ID_AA64PFR0_EL1_64BIT_ONLY),
> +       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE,
> ID_AA64PFR0_EL2_SHIFT, 4, 0),
> +       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE,
> ID_AA64PFR0_EL1_SHIFT, 4, ID_AA64PFR0_EL1_64BIT_ONLY),
>         ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE,
> ID_AA64PFR0_EL0_SHIFT, 4, ID_AA64PFR0_EL0_64BIT_ONLY),
>         ARM64_FTR_END,
>  };
> @@ -345,10 +345,10 @@ static const struct arm64_ftr_bits
> ftr_generic_32bits[] = {
>         ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 24, 4, 0),
>         ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 20, 4, 0),
>         ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 0),
> -       ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 12, 4, 0),
> +       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 12, 4, 0),
>         ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 8, 4, 0),
> -       ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 4, 4, 0),
> -       ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0),
> +       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 4, 4, 0),
> +       ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 0, 4, 0),
>         ARM64_FTR_END,
>  };
> 
> 
> Thanks,
> Sai
> 
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-11 10:50 ` Mark Rutland
@ 2019-10-11 11:09   ` Marc Gonzalez
  2019-10-11 13:33     ` Sai Prakash Ranjan
  2019-10-11 13:17   ` Sai Prakash Ranjan
  2019-10-11 13:33   ` Marc Zyngier
  2 siblings, 1 reply; 20+ messages in thread
From: Marc Gonzalez @ 2019-10-11 11:09 UTC (permalink / raw)
  To: Mark Rutland, Sai Prakash Ranjan
  Cc: MSM, Linux ARM, Ard Biesheuvel, Suzuki K. Poulose, Catalin Marinas

On 11/10/2019 12:50, Mark Rutland wrote:

> Before we make any changes, we need to check whether we do actually
> handle this variation in a safe way, and we need to consider what this
> means w.r.t. late CPU hotplug.
> 
> Even if we can handle variation at boot time, once we've determined the
> set of system-wide features we cannot allow those to regress, and I
> believe we'll need new code to enforce that. I don't think it's
> sufficient to mark these as NONSTRICT, though we might do that with
> other changes.
> 
> We shouldn't look at the part number at all here. We care about
> variation across CPUs regardless of whether this is big.LITTLE or some
> variation in tie-offs, etc.

See also the "Unexpected variation in SYS_ID_AA64MMFR0_EL1" thread
from a year ago: (that was on msm8998)

	https://www.spinics.net/lists/arm-kernel/msg691242.html

Regards.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-11 10:50 ` Mark Rutland
  2019-10-11 11:09   ` Marc Gonzalez
@ 2019-10-11 13:17   ` Sai Prakash Ranjan
  2019-10-11 13:34     ` Marc Zyngier
  2019-10-11 13:33   ` Marc Zyngier
  2 siblings, 1 reply; 20+ messages in thread
From: Sai Prakash Ranjan @ 2019-10-11 13:17 UTC (permalink / raw)
  To: Mark Rutland
  Cc: rnayak, suzuki.poulose, catalin.marinas, linux-kernel,
	jeremy.linton, bjorn.andersson, linux-arm-msm, andrew.murray,
	will, Dave.Martin, linux-arm-kernel, linux-arm-kernel

Hi Mark,

Thanks a lot for the detailed explanations, I did have a look at all the 
variations before posting this.

On 2019-10-11 16:20, Mark Rutland wrote:
> Hi,
> 
> On Fri, Oct 11, 2019 at 11:19:00AM +0530, Sai Prakash Ranjan wrote:
>> On latest QCOM SoCs like SM8150 and SC7180 with big.LITTLE arch, below
>> warnings are observed during bootup of big cpu cores.
> 
> For reference, which CPUs are in those SoCs?
> 

SM8150 is based on Cortex-A55(little cores) and Cortex-A76(big cores). 
I'm afraid I cannot give details about SC7180 yet.

>> SM8150:
>> 
>> [    0.271177] CPU features: SANITY CHECK: Unexpected variation in
>> SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU4: 
>> 0x00000011111112
> 
> The differing fields are EL3, EL2, and EL1: the boot CPU supports
> AArch64 and AArch32 at those exception levels, while the secondary only
> supports AArch64.
> 
> Do we handle this variation in KVM?

We do not support KVM.

> 
>> [    0.271184] CPU features: SANITY CHECK: Unexpected variation in
>> SYS_ID_ISAR4_EL1. Boot CPU: 0x00000000011142, CPU4: 0x00000000010142
> 
> The differing field is (AArch32) SMC: present on the boot CPU, but
> missing on the secondary CPU.
> 
> This is mandated to be zero when AArch32 isn' implemented at EL1.
> 

So this need not be strict?

>> [    0.271189] CPU features: SANITY CHECK: Unexpected variation in
>> SYS_ID_PFR1_EL1. Boot CPU: 0x00000010011011, CPU4: 0x00000010010000
> 
> The differing fields are (AArch32) Virtualization, Security, and
> ProgMod: all present on the boot CPU, but missing on the secondary
> CPU.
> 
> All mandated to be zero when AArch32 isn' implemented at EL1.
> 

Same here, this need not be strict?

>> SC7180:
>> 
>> [    0.812770] CPU features: SANITY CHECK: Unexpected variation in
>> SYS_CTR_EL0. Boot CPU: 0x00000084448004, CPU6: 0x0000009444c004
> 
> The differing fields are:
> 
> * IDC: present only on the secondary CPU. This is a worrying mismatch
>   because it could mean that required cache maintenance is missed in
>   some cases. Does the secondary CPU definitely broadcast PoU
>   maintenance to the boot CPU that requires it?
> 

I will get some more details from internal cpu team about this one.

> * L1Ip: VIPT on the boot CPU, PIPT on the secondary CPU.
> 
>> [    0.812838] CPU features: SANITY CHECK: Unexpected variation in
>> SYS_ID_AA64MMFR2_EL1. Boot CPU: 0x00000000001011, CPU6: 
>> 0x00000000000011
> 
> The differing field is IESB: presend on the boot CPU, missing on the
> secondary CPU.
> 
>> [    0.812876] CPU features: SANITY CHECK: Unexpected variation in
>> SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU6:
> 0x1100000011111112
>> [    0.812924] CPU features: SANITY CHECK: Unexpected variation in
>> SYS_ID_ISAR4_EL1. Boot CPU: 0x00000000011142, CPU6: 0x00000000010142
>> [    0.812950] CPU features: SANITY CHECK: Unexpected variation in
>> SYS_ID_PFR0_EL1. Boot CPU: 0x00000010000131, CPU6: 0x00000010010131
>> [    0.812977] CPU features: SANITY CHECK: Unexpected variation in
>> SYS_ID_PFR1_EL1. Boot CPU: 0x00000010011011, CPU6: 0x00000010010000
> 
> These are the same story as for SM8150.
> 
>> Can we relax some sanity checking for these by making it FTR_NONSTRICT
> or by
>> some other means? I just tried below roughly for SM8150 but I guess 
>> this
> is
>> not correct,
>> maybe for ftr_generic_32bits we should be checking bootcpu and nonboot
> cpu
>> partnum(to identify big.LITTLE) and then make it nonstrict?
>> These are all my wild assumptions, please correct me if I am wrong.
> 
> Before we make any changes, we need to check whether we do actually
> handle this variation in a safe way, and we need to consider what this
> means w.r.t. late CPU hotplug.
> 
> Even if we can handle variation at boot time, once we've determined the
> set of system-wide features we cannot allow those to regress, and I
> believe we'll need new code to enforce that. I don't think it's
> sufficient to mark these as NONSTRICT, though we might do that with
> other changes.
> 
> We shouldn't look at the part number at all here. We care about
> variation across CPUs regardless of whether this is big.LITTLE or some
> variation in tie-offs, etc.
> 

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-11 10:50 ` Mark Rutland
  2019-10-11 11:09   ` Marc Gonzalez
  2019-10-11 13:17   ` Sai Prakash Ranjan
@ 2019-10-11 13:33   ` Marc Zyngier
  2019-10-11 13:54     ` Mark Rutland
  2 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2019-10-11 13:33 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Sai Prakash Ranjan, rnayak, suzuki.poulose, catalin.marinas,
	linux-kernel, jeremy.linton, bjorn.andersson, linux-arm-msm,
	andrew.murray, will, Dave.Martin, linux-arm-kernel

On Fri, 11 Oct 2019 11:50:11 +0100
Mark Rutland <mark.rutland@arm.com> wrote:

> Hi,
> 
> On Fri, Oct 11, 2019 at 11:19:00AM +0530, Sai Prakash Ranjan wrote:
> > On latest QCOM SoCs like SM8150 and SC7180 with big.LITTLE arch, below
> > warnings are observed during bootup of big cpu cores.  
> 
> For reference, which CPUs are in those SoCs?
> 
> > SM8150:
> > 
> > [    0.271177] CPU features: SANITY CHECK: Unexpected variation in
> > SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU4: 0x00000011111112  
> 
> The differing fields are EL3, EL2, and EL1: the boot CPU supports
> AArch64 and AArch32 at those exception levels, while the secondary only
> supports AArch64.
> 
> Do we handle this variation in KVM?

We do, at least at vcpu creation time (see kvm_reset_vcpu). But if one
of the !AArch32 CPU comes in late in the game (after we've started a
guest), all bets are off (we'll schedule the 32bit guest on that CPU,
enter the guest, immediately take an Illegal Exception Return, and
return to userspace with KVM_EXIT_FAIL_ENTRY).

Not sure we could do better, given the HW. My preference would be to
fail these CPUs if they aren't present at boot time.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-11 11:09   ` Marc Gonzalez
@ 2019-10-11 13:33     ` Sai Prakash Ranjan
  0 siblings, 0 replies; 20+ messages in thread
From: Sai Prakash Ranjan @ 2019-10-11 13:33 UTC (permalink / raw)
  To: Marc Gonzalez
  Cc: Mark Rutland, MSM, Suzuki K. Poulose, Catalin Marinas, Linux ARM,
	Ard Biesheuvel, linux-arm-kernel

On 2019-10-11 16:39, Marc Gonzalez wrote:
> On 11/10/2019 12:50, Mark Rutland wrote:
> 
>> Before we make any changes, we need to check whether we do actually
>> handle this variation in a safe way, and we need to consider what this
>> means w.r.t. late CPU hotplug.
>> 
>> Even if we can handle variation at boot time, once we've determined 
>> the
>> set of system-wide features we cannot allow those to regress, and I
>> believe we'll need new code to enforce that. I don't think it's
>> sufficient to mark these as NONSTRICT, though we might do that with
>> other changes.
>> 
>> We shouldn't look at the part number at all here. We care about
>> variation across CPUs regardless of whether this is big.LITTLE or some
>> variation in tie-offs, etc.
> 
> See also the "Unexpected variation in SYS_ID_AA64MMFR0_EL1" thread
> from a year ago: (that was on msm8998)
> 
> 	https://www.spinics.net/lists/arm-kernel/msg691242.html
> 

I think, it was fixed by commit: 5717fe5ab38f ("arm64: cpufeature: Don't 
treat granule sizes as strict")

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-11 13:17   ` Sai Prakash Ranjan
@ 2019-10-11 13:34     ` Marc Zyngier
  2019-10-11 13:40       ` Sai Prakash Ranjan
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2019-10-11 13:34 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Mark Rutland, rnayak, suzuki.poulose, catalin.marinas,
	linux-arm-kernel, linux-kernel, jeremy.linton, bjorn.andersson,
	linux-arm-msm, andrew.murray, will, Dave.Martin,
	linux-arm-kernel

On Fri, 11 Oct 2019 18:47:39 +0530
Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> wrote:

> Hi Mark,
> 
> Thanks a lot for the detailed explanations, I did have a look at all the variations before posting this.
> 
> On 2019-10-11 16:20, Mark Rutland wrote:
> > Hi,
> > 
> > On Fri, Oct 11, 2019 at 11:19:00AM +0530, Sai Prakash Ranjan wrote:  
> >> On latest QCOM SoCs like SM8150 and SC7180 with big.LITTLE arch, below
> >> warnings are observed during bootup of big cpu cores.  
> > 
> > For reference, which CPUs are in those SoCs?
> >   
> 
> SM8150 is based on Cortex-A55(little cores) and Cortex-A76(big cores). I'm afraid I cannot give details about SC7180 yet.
> 
> >> SM8150:  
> >> >> [    0.271177] CPU features: SANITY CHECK: Unexpected variation in  
> >> SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU4: >> 0x00000011111112  
> > 
> > The differing fields are EL3, EL2, and EL1: the boot CPU supports
> > AArch64 and AArch32 at those exception levels, while the secondary only
> > supports AArch64.
> > 
> > Do we handle this variation in KVM?  
> 
> We do not support KVM.

Mainline does. You don't get to pick and choose what is supported or
not.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-11 13:34     ` Marc Zyngier
@ 2019-10-11 13:40       ` Sai Prakash Ranjan
  2019-10-17 20:00         ` Stephen Boyd
  0 siblings, 1 reply; 20+ messages in thread
From: Sai Prakash Ranjan @ 2019-10-11 13:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Mark Rutland, rnayak, suzuki.poulose, catalin.marinas,
	linux-arm-kernel, linux-kernel, jeremy.linton, bjorn.andersson,
	linux-arm-msm, andrew.murray, will, Dave.Martin,
	linux-arm-kernel, linux-arm-msm-owner, marc.w.gonzalez

On 2019-10-11 19:04, Marc Zyngier wrote:
> On Fri, 11 Oct 2019 18:47:39 +0530
> Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> wrote:
> 
>> Hi Mark,
>> 
>> Thanks a lot for the detailed explanations, I did have a look at all 
>> the variations before posting this.
>> 
>> On 2019-10-11 16:20, Mark Rutland wrote:
>> > Hi,
>> >
>> > On Fri, Oct 11, 2019 at 11:19:00AM +0530, Sai Prakash Ranjan wrote:
>> >> On latest QCOM SoCs like SM8150 and SC7180 with big.LITTLE arch, below
>> >> warnings are observed during bootup of big cpu cores.
>> >
>> > For reference, which CPUs are in those SoCs?
>> >
>> 
>> SM8150 is based on Cortex-A55(little cores) and Cortex-A76(big cores). 
>> I'm afraid I cannot give details about SC7180 yet.
>> 
>> >> SM8150:
>> >> >> [    0.271177] CPU features: SANITY CHECK: Unexpected variation in
>> >> SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU4: >> 0x00000011111112
>> >
>> > The differing fields are EL3, EL2, and EL1: the boot CPU supports
>> > AArch64 and AArch32 at those exception levels, while the secondary only
>> > supports AArch64.
>> >
>> > Do we handle this variation in KVM?
>> 
>> We do not support KVM.
> 
> Mainline does. You don't get to pick and choose what is supported or
> not.
> 

Ok thats good.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-11 13:33   ` Marc Zyngier
@ 2019-10-11 13:54     ` Mark Rutland
  2019-10-11 14:06       ` Marc Zyngier
                         ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Mark Rutland @ 2019-10-11 13:54 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sai Prakash Ranjan, rnayak, suzuki.poulose, catalin.marinas,
	linux-kernel, jeremy.linton, bjorn.andersson, linux-arm-msm,
	andrew.murray, will, Dave.Martin, linux-arm-kernel

On Fri, Oct 11, 2019 at 02:33:43PM +0100, Marc Zyngier wrote:
> On Fri, 11 Oct 2019 11:50:11 +0100
> Mark Rutland <mark.rutland@arm.com> wrote:
> 
> > Hi,
> > 
> > On Fri, Oct 11, 2019 at 11:19:00AM +0530, Sai Prakash Ranjan wrote:
> > > On latest QCOM SoCs like SM8150 and SC7180 with big.LITTLE arch, below
> > > warnings are observed during bootup of big cpu cores.  
> > 
> > For reference, which CPUs are in those SoCs?
> > 
> > > SM8150:
> > > 
> > > [    0.271177] CPU features: SANITY CHECK: Unexpected variation in
> > > SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU4: 0x00000011111112  
> > 
> > The differing fields are EL3, EL2, and EL1: the boot CPU supports
> > AArch64 and AArch32 at those exception levels, while the secondary only
> > supports AArch64.
> > 
> > Do we handle this variation in KVM?
> 
> We do, at least at vcpu creation time (see kvm_reset_vcpu). But if one
> of the !AArch32 CPU comes in late in the game (after we've started a
> guest), all bets are off (we'll schedule the 32bit guest on that CPU,
> enter the guest, immediately take an Illegal Exception Return, and
> return to userspace with KVM_EXIT_FAIL_ENTRY).

Ouch. We certainly can't remove the warning untill we deal with that
somehow, then.

> Not sure we could do better, given the HW. My preference would be to
> fail these CPUs if they aren't present at boot time.

I agree; I think we need logic to check the ID register fields against
their EXACT, {LOWER,HIGHER}_SAFE, etc rules regardless of whether we
have an associated cap. That can then abort a late onlining of a CPU
which violates those rules w.r.t. the finalised system value.

I suspect that we may want to split the notion of
safe-for-{user,kernel-guest} in the feature tables, as if nothing else
it will force us to consider those cases separately when adding new
stuff.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-11 13:54     ` Mark Rutland
@ 2019-10-11 14:06       ` Marc Zyngier
  2019-10-17 21:39       ` Jeremy Linton
  2020-01-20  2:47       ` Sai Prakash Ranjan
  2 siblings, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2019-10-11 14:06 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Sai Prakash Ranjan, rnayak, suzuki.poulose, catalin.marinas,
	linux-kernel, jeremy.linton, bjorn.andersson, linux-arm-msm,
	andrew.murray, will, Dave.Martin, linux-arm-kernel

On Fri, 11 Oct 2019 14:54:31 +0100
Mark Rutland <mark.rutland@arm.com> wrote:

> On Fri, Oct 11, 2019 at 02:33:43PM +0100, Marc Zyngier wrote:
> > On Fri, 11 Oct 2019 11:50:11 +0100
> > Mark Rutland <mark.rutland@arm.com> wrote:
> >   
> > > Hi,
> > > 
> > > On Fri, Oct 11, 2019 at 11:19:00AM +0530, Sai Prakash Ranjan wrote:  
> > > > On latest QCOM SoCs like SM8150 and SC7180 with big.LITTLE arch, below
> > > > warnings are observed during bootup of big cpu cores.    
> > > 
> > > For reference, which CPUs are in those SoCs?
> > >   
> > > > SM8150:
> > > > 
> > > > [    0.271177] CPU features: SANITY CHECK: Unexpected variation in
> > > > SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU4: 0x00000011111112    
> > > 
> > > The differing fields are EL3, EL2, and EL1: the boot CPU supports
> > > AArch64 and AArch32 at those exception levels, while the secondary only
> > > supports AArch64.
> > > 
> > > Do we handle this variation in KVM?  
> > 
> > We do, at least at vcpu creation time (see kvm_reset_vcpu). But if one
> > of the !AArch32 CPU comes in late in the game (after we've started a
> > guest), all bets are off (we'll schedule the 32bit guest on that CPU,
> > enter the guest, immediately take an Illegal Exception Return, and
> > return to userspace with KVM_EXIT_FAIL_ENTRY).  
> 
> Ouch. We certainly can't remove the warning untill we deal with that
> somehow, then.

Indeed. Same thing applies for hot-removing the AArch32-capable CPUs,
by the way. You'd end-up in a situation where guests can't run, despite
the initial contract that we're happy that configuration.

> > Not sure we could do better, given the HW. My preference would be to
> > fail these CPUs if they aren't present at boot time.  
> 
> I agree; I think we need logic to check the ID register fields against
> their EXACT, {LOWER,HIGHER}_SAFE, etc rules regardless of whether we
> have an associated cap. That can then abort a late onlining of a CPU
> which violates those rules w.r.t. the finalised system value.
> 
> I suspect that we may want to split the notion of
> safe-for-{user,kernel-guest} in the feature tables, as if nothing else
> it will force us to consider those cases separately when adding new
> stuff.

Probably. There are bizarre overlaps, in the sense that some
capabilities (such as this AArch32 EL1 support) are firmly kernel
related, and yet have a direct impact on userspace. KVM blurs the lines
in "interesting" ways... :-(.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-11 13:40       ` Sai Prakash Ranjan
@ 2019-10-17 20:00         ` Stephen Boyd
  2019-10-18  7:20           ` Marc Zyngier
  2019-10-18 10:18           ` Sai Prakash Ranjan
  0 siblings, 2 replies; 20+ messages in thread
From: Stephen Boyd @ 2019-10-17 20:00 UTC (permalink / raw)
  To: Marc Zyngier, Sai Prakash Ranjan
  Cc: Mark Rutland, rnayak, suzuki.poulose, catalin.marinas,
	linux-arm-kernel, linux-kernel, jeremy.linton, bjorn.andersson,
	linux-arm-msm, andrew.murray, will, Dave.Martin,
	linux-arm-kernel, marc.w.gonzalez

Quoting Sai Prakash Ranjan (2019-10-11 06:40:13)
> On 2019-10-11 19:04, Marc Zyngier wrote:
> > On Fri, 11 Oct 2019 18:47:39 +0530
> > Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> wrote:
> > 
> >> Hi Mark,
> >> 
> >> Thanks a lot for the detailed explanations, I did have a look at all 
> >> the variations before posting this.
> >> 
> >> On 2019-10-11 16:20, Mark Rutland wrote:
> >> > Hi,
> >> >
> >> > On Fri, Oct 11, 2019 at 11:19:00AM +0530, Sai Prakash Ranjan wrote:
> >> >> On latest QCOM SoCs like SM8150 and SC7180 with big.LITTLE arch, below
> >> >> warnings are observed during bootup of big cpu cores.
> >> >
> >> > For reference, which CPUs are in those SoCs?
> >> >
> >> 
> >> SM8150 is based on Cortex-A55(little cores) and Cortex-A76(big cores). 
> >> I'm afraid I cannot give details about SC7180 yet.
> >> 
> >> >> SM8150:
> >> >> >> [    0.271177] CPU features: SANITY CHECK: Unexpected variation in
> >> >> SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU4: >> 0x00000011111112
> >> >
> >> > The differing fields are EL3, EL2, and EL1: the boot CPU supports
> >> > AArch64 and AArch32 at those exception levels, while the secondary only
> >> > supports AArch64.
> >> >
> >> > Do we handle this variation in KVM?
> >> 
> >> We do not support KVM.
> > 
> > Mainline does. You don't get to pick and choose what is supported or
> > not.
> > 
> 
> Ok thats good.
> 

I want KVM on sc7180. How do I get it? Is something going to not work?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-11 13:54     ` Mark Rutland
  2019-10-11 14:06       ` Marc Zyngier
@ 2019-10-17 21:39       ` Jeremy Linton
  2019-10-18  9:01         ` Catalin Marinas
  2020-01-20  2:47       ` Sai Prakash Ranjan
  2 siblings, 1 reply; 20+ messages in thread
From: Jeremy Linton @ 2019-10-17 21:39 UTC (permalink / raw)
  To: Mark Rutland, Marc Zyngier
  Cc: Sai Prakash Ranjan, rnayak, suzuki.poulose, catalin.marinas,
	linux-kernel, bjorn.andersson, linux-arm-msm, andrew.murray,
	will, Dave.Martin, linux-arm-kernel

Hi,

On 10/11/19 8:54 AM, Mark Rutland wrote:
> On Fri, Oct 11, 2019 at 02:33:43PM +0100, Marc Zyngier wrote:
>> On Fri, 11 Oct 2019 11:50:11 +0100
>> Mark Rutland <mark.rutland@arm.com> wrote:
>>
>>> Hi,
>>>
>>> On Fri, Oct 11, 2019 at 11:19:00AM +0530, Sai Prakash Ranjan wrote:
>>>> On latest QCOM SoCs like SM8150 and SC7180 with big.LITTLE arch, below
>>>> warnings are observed during bootup of big cpu cores.
>>>
>>> For reference, which CPUs are in those SoCs?
>>>
>>>> SM8150:
>>>>
>>>> [    0.271177] CPU features: SANITY CHECK: Unexpected variation in
>>>> SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU4: 0x00000011111112
>>>
>>> The differing fields are EL3, EL2, and EL1: the boot CPU supports
>>> AArch64 and AArch32 at those exception levels, while the secondary only
>>> supports AArch64.
>>>
>>> Do we handle this variation in KVM?
>>
>> We do, at least at vcpu creation time (see kvm_reset_vcpu). But if one
>> of the !AArch32 CPU comes in late in the game (after we've started a
>> guest), all bets are off (we'll schedule the 32bit guest on that CPU,
>> enter the guest, immediately take an Illegal Exception Return, and
>> return to userspace with KVM_EXIT_FAIL_ENTRY).
> 
> Ouch. We certainly can't remove the warning untill we deal with that
> somehow, then.
> 
>> Not sure we could do better, given the HW. My preference would be to
>> fail these CPUs if they aren't present at boot time.
> 
> I agree; I think we need logic to check the ID register fields against
> their EXACT, {LOWER,HIGHER}_SAFE, etc rules regardless of whether we
> have an associated cap. That can then abort a late onlining of a CPU
> which violates those rules w.r.t. the finalised system value.

Except one of the cases is the user who doesn't care about aarch32 @ 
el2/1 and just wants to add another core to their 64-bit "clean" OS.

So my $.02 is the online should only fail if someone has actually 
started a 32-bit guest on the machine.

> 
> I suspect that we may want to split the notion of
> safe-for-{user,kernel-guest} in the feature tables, as if nothing else
> it will force us to consider those cases separately when adding new
> stuff.

As i'm sure everyone knows, this is all going to happen again with el0 
support. I wonder if some of this more "advanced" functionality should 
be buried behind EXPERT. At least on ACPI its possible to tell at early 
boot if the machine is heterogeneous (not necessarily in which ways) and 
just automatically sanitize away 32-bit support and some of the stickier 
things when a heterogeneous machine is detected.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-17 20:00         ` Stephen Boyd
@ 2019-10-18  7:20           ` Marc Zyngier
  2019-10-18 14:33             ` Stephen Boyd
  2019-10-18 10:18           ` Sai Prakash Ranjan
  1 sibling, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2019-10-18  7:20 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Sai Prakash Ranjan, Mark Rutland, rnayak, suzuki.poulose,
	catalin.marinas, linux-arm-kernel, linux-kernel, jeremy.linton,
	bjorn.andersson, linux-arm-msm, andrew.murray, will, dave.martin,
	linux-arm-kernel, marc.w.gonzalez

On 2019-10-17 21:00, Stephen Boyd wrote:
> Quoting Sai Prakash Ranjan (2019-10-11 06:40:13)
>> On 2019-10-11 19:04, Marc Zyngier wrote:
>> > On Fri, 11 Oct 2019 18:47:39 +0530
>> > Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> wrote:
>> >
>> >> Hi Mark,
>> >>
>> >> Thanks a lot for the detailed explanations, I did have a look at 
>> all
>> >> the variations before posting this.
>> >>
>> >> On 2019-10-11 16:20, Mark Rutland wrote:
>> >> > Hi,
>> >> >
>> >> > On Fri, Oct 11, 2019 at 11:19:00AM +0530, Sai Prakash Ranjan 
>> wrote:
>> >> >> On latest QCOM SoCs like SM8150 and SC7180 with big.LITTLE 
>> arch, below
>> >> >> warnings are observed during bootup of big cpu cores.
>> >> >
>> >> > For reference, which CPUs are in those SoCs?
>> >> >
>> >>
>> >> SM8150 is based on Cortex-A55(little cores) and Cortex-A76(big 
>> cores).
>> >> I'm afraid I cannot give details about SC7180 yet.
>> >>
>> >> >> SM8150:
>> >> >> >> [    0.271177] CPU features: SANITY CHECK: Unexpected 
>> variation in
>> >> >> SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU4: >> 
>> 0x00000011111112
>> >> >
>> >> > The differing fields are EL3, EL2, and EL1: the boot CPU 
>> supports
>> >> > AArch64 and AArch32 at those exception levels, while the 
>> secondary only
>> >> > supports AArch64.
>> >> >
>> >> > Do we handle this variation in KVM?
>> >>
>> >> We do not support KVM.
>> >
>> > Mainline does. You don't get to pick and choose what is supported 
>> or
>> > not.
>> >
>>
>> Ok thats good.
>>
>
> I want KVM on sc7180. How do I get it? Is something going to not 
> work?

If this SoC is anythinig like SM8150, 32bit guests will be hit and 
miss,
depending on the CPU your guest runs on, or is migrated to. We need to
either drop capabilities from the 32bit-capable CPU, or prevent the
non-32bit capable CPU from booting if a 32bit guest has been started.

You just have to hope that the kernel is entered at EL2, and that QC's
"value add" has been moved somewhere else...

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-17 21:39       ` Jeremy Linton
@ 2019-10-18  9:01         ` Catalin Marinas
  0 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2019-10-18  9:01 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Mark Rutland, Marc Zyngier, Sai Prakash Ranjan, rnayak,
	suzuki.poulose, linux-kernel, bjorn.andersson, linux-arm-msm,
	andrew.murray, will, Dave.Martin, linux-arm-kernel

On Thu, Oct 17, 2019 at 04:39:23PM -0500, Jeremy Linton wrote:
> On 10/11/19 8:54 AM, Mark Rutland wrote:
> > On Fri, Oct 11, 2019 at 02:33:43PM +0100, Marc Zyngier wrote:
> > > On Fri, 11 Oct 2019 11:50:11 +0100
> > > Mark Rutland <mark.rutland@arm.com> wrote:
> > > > On Fri, Oct 11, 2019 at 11:19:00AM +0530, Sai Prakash Ranjan wrote:
> > > > > On latest QCOM SoCs like SM8150 and SC7180 with big.LITTLE arch, below
> > > > > warnings are observed during bootup of big cpu cores.
> > > > 
> > > > For reference, which CPUs are in those SoCs?
> > > > 
> > > > > SM8150:
> > > > > 
> > > > > [    0.271177] CPU features: SANITY CHECK: Unexpected variation in
> > > > > SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU4: 0x00000011111112
> > > > 
> > > > The differing fields are EL3, EL2, and EL1: the boot CPU supports
> > > > AArch64 and AArch32 at those exception levels, while the secondary only
> > > > supports AArch64.
> > > > 
> > > > Do we handle this variation in KVM?
> > > 
> > > We do, at least at vcpu creation time (see kvm_reset_vcpu). But if one
> > > of the !AArch32 CPU comes in late in the game (after we've started a
> > > guest), all bets are off (we'll schedule the 32bit guest on that CPU,
> > > enter the guest, immediately take an Illegal Exception Return, and
> > > return to userspace with KVM_EXIT_FAIL_ENTRY).
> > 
> > Ouch. We certainly can't remove the warning untill we deal with that
> > somehow, then.

Luckily, qemu refuses to start a guest on two different CPU types.

> > > Not sure we could do better, given the HW. My preference would be to
> > > fail these CPUs if they aren't present at boot time.

That's my preference as well.

> > I agree; I think we need logic to check the ID register fields against
> > their EXACT, {LOWER,HIGHER}_SAFE, etc rules regardless of whether we
> > have an associated cap. That can then abort a late onlining of a CPU
> > which violates those rules w.r.t. the finalised system value.
> 
> Except one of the cases is the user who doesn't care about aarch32 @ el2/1
> and just wants to add another core to their 64-bit "clean" OS.
> 
> So my $.02 is the online should only fail if someone has actually started a
> 32-bit guest on the machine.

I don't really think it's worth the hassle. This could even be racy
(32-bit guest starting at the same time with a CPU being onlined), so it
needs extra care.

If you have such platform, just make sure that you don't have
incompatible CPUs coming up late (during boot it should be fine).

> > I suspect that we may want to split the notion of
> > safe-for-{user,kernel-guest} in the feature tables, as if nothing else
> > it will force us to consider those cases separately when adding new
> > stuff.
> 
> As i'm sure everyone knows, this is all going to happen again with el0
> support. I wonder if some of this more "advanced" functionality should be
> buried behind EXPERT. At least on ACPI its possible to tell at early boot if
> the machine is heterogeneous (not necessarily in which ways) and just
> automatically sanitize away 32-bit support and some of the stickier things
> when a heterogeneous machine is detected.

We should improve (remove) the warnings for things we know the kernel
can handled during boot. For example, 32-bit not available on all CPUs
during boot should be fine as we just disable the feature. However, late
onlining of a CPU that does not support the already advertised features
should be blocked.

-- 
Catalin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-17 20:00         ` Stephen Boyd
  2019-10-18  7:20           ` Marc Zyngier
@ 2019-10-18 10:18           ` Sai Prakash Ranjan
  1 sibling, 0 replies; 20+ messages in thread
From: Sai Prakash Ranjan @ 2019-10-18 10:18 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Marc Zyngier, Mark Rutland, rnayak, suzuki.poulose,
	catalin.marinas, linux-arm-kernel, linux-kernel, jeremy.linton,
	bjorn.andersson, linux-arm-msm, andrew.murray, will, Dave.Martin,
	linux-arm-kernel, marc.w.gonzalez, linux-arm-msm-owner

On 2019-10-18 01:30, Stephen Boyd wrote:
> Quoting Sai Prakash Ranjan (2019-10-11 06:40:13)
>> On 2019-10-11 19:04, Marc Zyngier wrote:
>> > On Fri, 11 Oct 2019 18:47:39 +0530
>> > Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> wrote:
>> >
>> >> Hi Mark,
>> >>
>> >> Thanks a lot for the detailed explanations, I did have a look at all
>> >> the variations before posting this.
>> >>
>> >> On 2019-10-11 16:20, Mark Rutland wrote:
>> >> > Hi,
>> >> >
>> >> > On Fri, Oct 11, 2019 at 11:19:00AM +0530, Sai Prakash Ranjan wrote:
>> >> >> On latest QCOM SoCs like SM8150 and SC7180 with big.LITTLE arch, below
>> >> >> warnings are observed during bootup of big cpu cores.
>> >> >
>> >> > For reference, which CPUs are in those SoCs?
>> >> >
>> >>
>> >> SM8150 is based on Cortex-A55(little cores) and Cortex-A76(big cores).
>> >> I'm afraid I cannot give details about SC7180 yet.
>> >>
>> >> >> SM8150:
>> >> >> >> [    0.271177] CPU features: SANITY CHECK: Unexpected variation in
>> >> >> SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU4: >> 0x00000011111112
>> >> >
>> >> > The differing fields are EL3, EL2, and EL1: the boot CPU supports
>> >> > AArch64 and AArch32 at those exception levels, while the secondary only
>> >> > supports AArch64.
>> >> >
>> >> > Do we handle this variation in KVM?
>> >>
>> >> We do not support KVM.
>> >
>> > Mainline does. You don't get to pick and choose what is supported or
>> > not.
>> >
>> 
>> Ok thats good.
>> 
> 
> I want KVM on sc7180. How do I get it? Is something going to not work?

I meant KVM is not supported for downstream android case where we do not 
have kernel booting from EL2.
And obviously I am wrong because SC7180 is not for android, so my bad.
I think Mark R's question about handling KVM variation was for Marc Z 
not me :p

As for something not going to work, as Mark said this warning does 
indicate that 32 bit EL1 guests won't
be able to run on big CPU cores.

- Sai

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-18  7:20           ` Marc Zyngier
@ 2019-10-18 14:33             ` Stephen Boyd
  2019-10-18 16:40               ` Marc Zyngier
  0 siblings, 1 reply; 20+ messages in thread
From: Stephen Boyd @ 2019-10-18 14:33 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sai Prakash Ranjan, Mark Rutland, rnayak, suzuki.poulose,
	catalin.marinas, linux-arm-kernel, linux-kernel, jeremy.linton,
	bjorn.andersson, linux-arm-msm, andrew.murray, will, dave.martin,
	linux-arm-kernel, marc.w.gonzalez

Quoting Marc Zyngier (2019-10-18 00:20:56)
> 
> If this SoC is anythinig like SM8150, 32bit guests will be hit and 
> miss,
> depending on the CPU your guest runs on, or is migrated to. We need to
> either drop capabilities from the 32bit-capable CPU, or prevent the
> non-32bit capable CPU from booting if a 32bit guest has been started.
> 
> You just have to hope that the kernel is entered at EL2, and that QC's
> "value add" has been moved somewhere else...
> 

Ok that's good.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-18 14:33             ` Stephen Boyd
@ 2019-10-18 16:40               ` Marc Zyngier
  0 siblings, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2019-10-18 16:40 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Sai Prakash Ranjan, Mark Rutland, rnayak, suzuki.poulose,
	catalin.marinas, linux-arm-kernel, linux-kernel, jeremy.linton,
	bjorn.andersson, linux-arm-msm, andrew.murray, will, dave.martin,
	linux-arm-kernel, marc.w.gonzalez

On Fri, 18 Oct 2019 15:33:29 +0100,
Stephen Boyd <swboyd@chromium.org> wrote:
> 
> Quoting Marc Zyngier (2019-10-18 00:20:56)
> > 
> > If this SoC is anythinig like SM8150, 32bit guests will be hit and 
> > miss,
> > depending on the CPU your guest runs on, or is migrated to. We need to
> > either drop capabilities from the 32bit-capable CPU, or prevent the
> > non-32bit capable CPU from booting if a 32bit guest has been started.
> > 
> > You just have to hope that the kernel is entered at EL2, and that QC's
> > "value add" has been moved somewhere else...
> > 
> 
> Ok that's good.

I need a new signature.

	M.

-- 
Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Relax CPU features sanity checking on heterogeneous architectures
  2019-10-11 13:54     ` Mark Rutland
  2019-10-11 14:06       ` Marc Zyngier
  2019-10-17 21:39       ` Jeremy Linton
@ 2020-01-20  2:47       ` Sai Prakash Ranjan
  2 siblings, 0 replies; 20+ messages in thread
From: Sai Prakash Ranjan @ 2020-01-20  2:47 UTC (permalink / raw)
  To: Mark Rutland, Marc Zyngier, catalin.marinas
  Cc: suzuki.poulose, linux-kernel, jeremy.linton, bjorn.andersson,
	linux-arm-msm, andrew.murray, will, Dave.Martin,
	linux-arm-kernel, Stephen Boyd, Douglas Anderson

Hi Mark,

On 2019-10-11 19:24, Mark Rutland wrote:
> On Fri, Oct 11, 2019 at 02:33:43PM +0100, Marc Zyngier wrote:
>> On Fri, 11 Oct 2019 11:50:11 +0100
>> Mark Rutland <mark.rutland@arm.com> wrote:
>> 
>> > Hi,
>> >
>> > On Fri, Oct 11, 2019 at 11:19:00AM +0530, Sai Prakash Ranjan wrote:
>> > > On latest QCOM SoCs like SM8150 and SC7180 with big.LITTLE arch, below
>> > > warnings are observed during bootup of big cpu cores.
>> >
>> > For reference, which CPUs are in those SoCs?
>> >
>> > > SM8150:
>> > >
>> > > [    0.271177] CPU features: SANITY CHECK: Unexpected variation in
>> > > SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU4: 0x00000011111112
>> >
>> > The differing fields are EL3, EL2, and EL1: the boot CPU supports
>> > AArch64 and AArch32 at those exception levels, while the secondary only
>> > supports AArch64.
>> >
>> > Do we handle this variation in KVM?
>> 
>> We do, at least at vcpu creation time (see kvm_reset_vcpu). But if one
>> of the !AArch32 CPU comes in late in the game (after we've started a
>> guest), all bets are off (we'll schedule the 32bit guest on that CPU,
>> enter the guest, immediately take an Illegal Exception Return, and
>> return to userspace with KVM_EXIT_FAIL_ENTRY).
> 
> Ouch. We certainly can't remove the warning untill we deal with that
> somehow, then.
> 
>> Not sure we could do better, given the HW. My preference would be to
>> fail these CPUs if they aren't present at boot time.
> 
> I agree; I think we need logic to check the ID register fields against
> their EXACT, {LOWER,HIGHER}_SAFE, etc rules regardless of whether we
> have an associated cap. That can then abort a late onlining of a CPU
> which violates those rules w.r.t. the finalised system value.
> 
> I suspect that we may want to split the notion of
> safe-for-{user,kernel-guest} in the feature tables, as if nothing else
> it will force us to consider those cases separately when adding new
> stuff.
> 

I can help with testing these if you have any sample patches.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-01-20  2:47 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-11  5:49 Relax CPU features sanity checking on heterogeneous architectures Sai Prakash Ranjan
2019-10-11  9:19 ` Marc Gonzalez
2019-10-11  9:57   ` Sai Prakash Ranjan
2019-10-11 10:50 ` Mark Rutland
2019-10-11 11:09   ` Marc Gonzalez
2019-10-11 13:33     ` Sai Prakash Ranjan
2019-10-11 13:17   ` Sai Prakash Ranjan
2019-10-11 13:34     ` Marc Zyngier
2019-10-11 13:40       ` Sai Prakash Ranjan
2019-10-17 20:00         ` Stephen Boyd
2019-10-18  7:20           ` Marc Zyngier
2019-10-18 14:33             ` Stephen Boyd
2019-10-18 16:40               ` Marc Zyngier
2019-10-18 10:18           ` Sai Prakash Ranjan
2019-10-11 13:33   ` Marc Zyngier
2019-10-11 13:54     ` Mark Rutland
2019-10-11 14:06       ` Marc Zyngier
2019-10-17 21:39       ` Jeremy Linton
2019-10-18  9:01         ` Catalin Marinas
2020-01-20  2:47       ` Sai Prakash Ranjan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).