All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-04-25 17:41 ` Naveen N. Rao
  0 siblings, 0 replies; 33+ messages in thread
From: Naveen N. Rao @ 2022-04-25 17:41 UTC (permalink / raw)
  To: Michael Ellerman, Eric Biederman; +Cc: linuxppc-dev, linux-kernel, kexec

kexec_load_purgatory() can fail for many reasons - there is no need to
print an error when encountering unsupported relocations.

This solves a build issue on powerpc with binutils v2.36 and newer [1].
Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
symbols") [2], binutils started dropping section symbols that it thought
were unused.  This isn't an issue in general, but with kexec_file.c, gcc
is placing kexec_arch_apply_relocations[_add] into a separate
.text.unlikely section and the section symbol ".text.unlikely" is being
dropped. Due to this, recordmcount is unable to find a non-weak symbol
in .text.unlikely to generate a relocation record against. Dropping
pr_err() calls results in these functions being left in .text section,
enabling recordmcount to emit a proper relocation record.

[1] https://github.com/linuxppc/issues/issues/388
[2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 kernel/kexec_file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 8347fc158d2b96..55d144c58b5278 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -121,7 +121,6 @@ int __weak
 arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
 				 const Elf_Shdr *relsec, const Elf_Shdr *symtab)
 {
-	pr_err("RELA relocation unsupported.\n");
 	return -ENOEXEC;
 }
 
@@ -138,7 +137,6 @@ int __weak
 arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section,
 			     const Elf_Shdr *relsec, const Elf_Shdr *symtab)
 {
-	pr_err("REL relocation unsupported.\n");
 	return -ENOEXEC;
 }
 

base-commit: 83d8a0d166119de813cad27ae7d61f54f9aea707
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-04-25 17:41 ` Naveen N. Rao
  0 siblings, 0 replies; 33+ messages in thread
From: Naveen N. Rao @ 2022-04-25 17:41 UTC (permalink / raw)
  To: Michael Ellerman, Eric Biederman; +Cc: kexec, linuxppc-dev, linux-kernel

kexec_load_purgatory() can fail for many reasons - there is no need to
print an error when encountering unsupported relocations.

This solves a build issue on powerpc with binutils v2.36 and newer [1].
Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
symbols") [2], binutils started dropping section symbols that it thought
were unused.  This isn't an issue in general, but with kexec_file.c, gcc
is placing kexec_arch_apply_relocations[_add] into a separate
.text.unlikely section and the section symbol ".text.unlikely" is being
dropped. Due to this, recordmcount is unable to find a non-weak symbol
in .text.unlikely to generate a relocation record against. Dropping
pr_err() calls results in these functions being left in .text section,
enabling recordmcount to emit a proper relocation record.

[1] https://github.com/linuxppc/issues/issues/388
[2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 kernel/kexec_file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 8347fc158d2b96..55d144c58b5278 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -121,7 +121,6 @@ int __weak
 arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
 				 const Elf_Shdr *relsec, const Elf_Shdr *symtab)
 {
-	pr_err("RELA relocation unsupported.\n");
 	return -ENOEXEC;
 }
 
@@ -138,7 +137,6 @@ int __weak
 arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section,
 			     const Elf_Shdr *relsec, const Elf_Shdr *symtab)
 {
-	pr_err("REL relocation unsupported.\n");
 	return -ENOEXEC;
 }
 

base-commit: 83d8a0d166119de813cad27ae7d61f54f9aea707
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-04-25 17:41 ` Naveen N. Rao
  0 siblings, 0 replies; 33+ messages in thread
From: Naveen N. Rao @ 2022-04-25 17:41 UTC (permalink / raw)
  To: kexec

kexec_load_purgatory() can fail for many reasons - there is no need to
print an error when encountering unsupported relocations.

This solves a build issue on powerpc with binutils v2.36 and newer [1].
Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
symbols") [2], binutils started dropping section symbols that it thought
were unused.  This isn't an issue in general, but with kexec_file.c, gcc
is placing kexec_arch_apply_relocations[_add] into a separate
.text.unlikely section and the section symbol ".text.unlikely" is being
dropped. Due to this, recordmcount is unable to find a non-weak symbol
in .text.unlikely to generate a relocation record against. Dropping
pr_err() calls results in these functions being left in .text section,
enabling recordmcount to emit a proper relocation record.

[1] https://github.com/linuxppc/issues/issues/388
[2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 kernel/kexec_file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 8347fc158d2b96..55d144c58b5278 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -121,7 +121,6 @@ int __weak
 arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
 				 const Elf_Shdr *relsec, const Elf_Shdr *symtab)
 {
-	pr_err("RELA relocation unsupported.\n");
 	return -ENOEXEC;
 }
 
@@ -138,7 +137,6 @@ int __weak
 arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section,
 			     const Elf_Shdr *relsec, const Elf_Shdr *symtab)
 {
-	pr_err("REL relocation unsupported.\n");
 	return -ENOEXEC;
 }
 

base-commit: 83d8a0d166119de813cad27ae7d61f54f9aea707
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
  2022-04-25 17:41 ` Naveen N. Rao
  (?)
@ 2022-05-17  7:58   ` Naveen N. Rao
  -1 siblings, 0 replies; 33+ messages in thread
From: Naveen N. Rao @ 2022-05-17  7:58 UTC (permalink / raw)
  To: Eric Biederman, Michael Ellerman; +Cc: kexec, linux-kernel, linuxppc-dev

Naveen N. Rao wrote:
> kexec_load_purgatory() can fail for many reasons - there is no need to
> print an error when encountering unsupported relocations.
> 
> This solves a build issue on powerpc with binutils v2.36 and newer [1].
> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
> symbols") [2], binutils started dropping section symbols that it thought
> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
> is placing kexec_arch_apply_relocations[_add] into a separate
> .text.unlikely section and the section symbol ".text.unlikely" is being
> dropped. Due to this, recordmcount is unable to find a non-weak symbol
> in .text.unlikely to generate a relocation record against. Dropping
> pr_err() calls results in these functions being left in .text section,
> enabling recordmcount to emit a proper relocation record.
> 
> [1] https://github.com/linuxppc/issues/issues/388
> [2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> ---
>  kernel/kexec_file.c | 2 --
>  1 file changed, 2 deletions(-)

Eric,
Any comments on this? There have been many reports of build breakages 
due to this.

FWIW, there have been similar fixes in the past:
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/init/initramfs.c?id=55d5b7dd6451b58489ce384282ca5a4a289eb8d5
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9


- Naveen


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-17  7:58   ` Naveen N. Rao
  0 siblings, 0 replies; 33+ messages in thread
From: Naveen N. Rao @ 2022-05-17  7:58 UTC (permalink / raw)
  To: Eric Biederman, Michael Ellerman; +Cc: linuxppc-dev, kexec, linux-kernel

Naveen N. Rao wrote:
> kexec_load_purgatory() can fail for many reasons - there is no need to
> print an error when encountering unsupported relocations.
> 
> This solves a build issue on powerpc with binutils v2.36 and newer [1].
> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
> symbols") [2], binutils started dropping section symbols that it thought
> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
> is placing kexec_arch_apply_relocations[_add] into a separate
> .text.unlikely section and the section symbol ".text.unlikely" is being
> dropped. Due to this, recordmcount is unable to find a non-weak symbol
> in .text.unlikely to generate a relocation record against. Dropping
> pr_err() calls results in these functions being left in .text section,
> enabling recordmcount to emit a proper relocation record.
> 
> [1] https://github.com/linuxppc/issues/issues/388
> [2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> ---
>  kernel/kexec_file.c | 2 --
>  1 file changed, 2 deletions(-)

Eric,
Any comments on this? There have been many reports of build breakages 
due to this.

FWIW, there have been similar fixes in the past:
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/init/initramfs.c?id=55d5b7dd6451b58489ce384282ca5a4a289eb8d5
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9


- Naveen


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-17  7:58   ` Naveen N. Rao
  0 siblings, 0 replies; 33+ messages in thread
From: Naveen N. Rao @ 2022-05-17  7:58 UTC (permalink / raw)
  To: kexec

Naveen N. Rao wrote:
> kexec_load_purgatory() can fail for many reasons - there is no need to
> print an error when encountering unsupported relocations.
> 
> This solves a build issue on powerpc with binutils v2.36 and newer [1].
> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
> symbols") [2], binutils started dropping section symbols that it thought
> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
> is placing kexec_arch_apply_relocations[_add] into a separate
> .text.unlikely section and the section symbol ".text.unlikely" is being
> dropped. Due to this, recordmcount is unable to find a non-weak symbol
> in .text.unlikely to generate a relocation record against. Dropping
> pr_err() calls results in these functions being left in .text section,
> enabling recordmcount to emit a proper relocation record.
> 
> [1] https://github.com/linuxppc/issues/issues/388
> [2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> ---
>  kernel/kexec_file.c | 2 --
>  1 file changed, 2 deletions(-)

Eric,
Any comments on this? There have been many reports of build breakages 
due to this.

FWIW, there have been similar fixes in the past:
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/init/initramfs.c?id=55d5b7dd6451b58489ce384282ca5a4a289eb8d5
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9


- Naveen



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
  2022-04-25 17:41 ` Naveen N. Rao
  (?)
@ 2022-05-17  9:25   ` Baoquan He
  -1 siblings, 0 replies; 33+ messages in thread
From: Baoquan He @ 2022-05-17  9:25 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Michael Ellerman, Eric Biederman, linuxppc-dev, linux-kernel, kexec

On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
> kexec_load_purgatory() can fail for many reasons - there is no need to
> print an error when encountering unsupported relocations.
> 
> This solves a build issue on powerpc with binutils v2.36 and newer [1].
> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
> symbols") [2], binutils started dropping section symbols that it thought

I am not familiar with binutils, while wondering if this exists in other
ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
have problem with it?

> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
> is placing kexec_arch_apply_relocations[_add] into a separate
> .text.unlikely section and the section symbol ".text.unlikely" is being
> dropped. Due to this, recordmcount is unable to find a non-weak symbol

But arch_kexec_apply_relocations_add is weak symbol on ppc.

> in .text.unlikely to generate a relocation record against. Dropping
> pr_err() calls results in these functions being left in .text section,

Why dropping pr_err() can make arch_kexec_apply_relocations_add put in
.text?

> enabling recordmcount to emit a proper relocation record.
> 
> [1] https://github.com/linuxppc/issues/issues/388
> [2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> ---
>  kernel/kexec_file.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 8347fc158d2b96..55d144c58b5278 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -121,7 +121,6 @@ int __weak
>  arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
>  				 const Elf_Shdr *relsec, const Elf_Shdr *symtab)
>  {
> -	pr_err("RELA relocation unsupported.\n");
>  	return -ENOEXEC;
>  }
>  
> @@ -138,7 +137,6 @@ int __weak
>  arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section,
>  			     const Elf_Shdr *relsec, const Elf_Shdr *symtab)
>  {
> -	pr_err("REL relocation unsupported.\n");
>  	return -ENOEXEC;
>  }
>  
> 
> base-commit: 83d8a0d166119de813cad27ae7d61f54f9aea707
> -- 
> 2.35.1
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-17  9:25   ` Baoquan He
  0 siblings, 0 replies; 33+ messages in thread
From: Baoquan He @ 2022-05-17  9:25 UTC (permalink / raw)
  To: Naveen N. Rao; +Cc: linuxppc-dev, Eric Biederman, kexec, linux-kernel

On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
> kexec_load_purgatory() can fail for many reasons - there is no need to
> print an error when encountering unsupported relocations.
> 
> This solves a build issue on powerpc with binutils v2.36 and newer [1].
> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
> symbols") [2], binutils started dropping section symbols that it thought

I am not familiar with binutils, while wondering if this exists in other
ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
have problem with it?

> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
> is placing kexec_arch_apply_relocations[_add] into a separate
> .text.unlikely section and the section symbol ".text.unlikely" is being
> dropped. Due to this, recordmcount is unable to find a non-weak symbol

But arch_kexec_apply_relocations_add is weak symbol on ppc.

> in .text.unlikely to generate a relocation record against. Dropping
> pr_err() calls results in these functions being left in .text section,

Why dropping pr_err() can make arch_kexec_apply_relocations_add put in
.text?

> enabling recordmcount to emit a proper relocation record.
> 
> [1] https://github.com/linuxppc/issues/issues/388
> [2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> ---
>  kernel/kexec_file.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 8347fc158d2b96..55d144c58b5278 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -121,7 +121,6 @@ int __weak
>  arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
>  				 const Elf_Shdr *relsec, const Elf_Shdr *symtab)
>  {
> -	pr_err("RELA relocation unsupported.\n");
>  	return -ENOEXEC;
>  }
>  
> @@ -138,7 +137,6 @@ int __weak
>  arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section,
>  			     const Elf_Shdr *relsec, const Elf_Shdr *symtab)
>  {
> -	pr_err("REL relocation unsupported.\n");
>  	return -ENOEXEC;
>  }
>  
> 
> base-commit: 83d8a0d166119de813cad27ae7d61f54f9aea707
> -- 
> 2.35.1
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-17  9:25   ` Baoquan He
  0 siblings, 0 replies; 33+ messages in thread
From: Baoquan He @ 2022-05-17  9:25 UTC (permalink / raw)
  To: kexec

On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
> kexec_load_purgatory() can fail for many reasons - there is no need to
> print an error when encountering unsupported relocations.
> 
> This solves a build issue on powerpc with binutils v2.36 and newer [1].
> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
> symbols") [2], binutils started dropping section symbols that it thought

I am not familiar with binutils, while wondering if this exists in other
ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
have problem with it?

> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
> is placing kexec_arch_apply_relocations[_add] into a separate
> .text.unlikely section and the section symbol ".text.unlikely" is being
> dropped. Due to this, recordmcount is unable to find a non-weak symbol

But arch_kexec_apply_relocations_add is weak symbol on ppc.

> in .text.unlikely to generate a relocation record against. Dropping
> pr_err() calls results in these functions being left in .text section,

Why dropping pr_err() can make arch_kexec_apply_relocations_add put in
.text?

> enabling recordmcount to emit a proper relocation record.
> 
> [1] https://github.com/linuxppc/issues/issues/388
> [2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> ---
>  kernel/kexec_file.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 8347fc158d2b96..55d144c58b5278 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -121,7 +121,6 @@ int __weak
>  arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
>  				 const Elf_Shdr *relsec, const Elf_Shdr *symtab)
>  {
> -	pr_err("RELA relocation unsupported.\n");
>  	return -ENOEXEC;
>  }
>  
> @@ -138,7 +137,6 @@ int __weak
>  arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section,
>  			     const Elf_Shdr *relsec, const Elf_Shdr *symtab)
>  {
> -	pr_err("REL relocation unsupported.\n");
>  	return -ENOEXEC;
>  }
>  
> 
> base-commit: 83d8a0d166119de813cad27ae7d61f54f9aea707
> -- 
> 2.35.1
> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
  2022-05-17  9:25   ` Baoquan He
  (?)
@ 2022-05-17 10:19     ` Naveen N. Rao
  -1 siblings, 0 replies; 33+ messages in thread
From: Naveen N. Rao @ 2022-05-17 10:19 UTC (permalink / raw)
  To: Baoquan He; +Cc: linuxppc-dev, kexec, Eric Biederman, linux-kernel

Baoquan He wrote:
> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>> kexec_load_purgatory() can fail for many reasons - there is no need to
>> print an error when encountering unsupported relocations.
>> 
>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>> symbols") [2], binutils started dropping section symbols that it thought
> 
> I am not familiar with binutils, while wondering if this exists in other
> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
> have problem with it?

I'm not aware of this specific file causing a problem on other 
architectures - perhaps the config options differ enough. There are 
however more reports of similar issues affecting other architectures 
with the llvm integrated assembler:
https://github.com/ClangBuiltLinux/linux/issues/981

> 
>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>> is placing kexec_arch_apply_relocations[_add] into a separate
>> .text.unlikely section and the section symbol ".text.unlikely" is being
>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
> 
> But arch_kexec_apply_relocations_add is weak symbol on ppc.

Yes. Note that it is just the section symbol that gets dropped. The 
section is still present and will continue to hold the symbols for the 
functions themselves.

> 
>> in .text.unlikely to generate a relocation record against. Dropping
>> pr_err() calls results in these functions being left in .text section,
> 
> Why dropping pr_err() can make arch_kexec_apply_relocations_add put in
> .text?

I'm not actually sure, though Josh suspected that printk() might be 
cold:
http://lkml.kernel.org/r/20210214155147.3owdimqv2lyhu6by@treble


- Naveen


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-17 10:19     ` Naveen N. Rao
  0 siblings, 0 replies; 33+ messages in thread
From: Naveen N. Rao @ 2022-05-17 10:19 UTC (permalink / raw)
  To: Baoquan He
  Cc: Eric Biederman, kexec, linux-kernel, linuxppc-dev, Michael Ellerman

Baoquan He wrote:
> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>> kexec_load_purgatory() can fail for many reasons - there is no need to
>> print an error when encountering unsupported relocations.
>> 
>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>> symbols") [2], binutils started dropping section symbols that it thought
> 
> I am not familiar with binutils, while wondering if this exists in other
> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
> have problem with it?

I'm not aware of this specific file causing a problem on other 
architectures - perhaps the config options differ enough. There are 
however more reports of similar issues affecting other architectures 
with the llvm integrated assembler:
https://github.com/ClangBuiltLinux/linux/issues/981

> 
>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>> is placing kexec_arch_apply_relocations[_add] into a separate
>> .text.unlikely section and the section symbol ".text.unlikely" is being
>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
> 
> But arch_kexec_apply_relocations_add is weak symbol on ppc.

Yes. Note that it is just the section symbol that gets dropped. The 
section is still present and will continue to hold the symbols for the 
functions themselves.

> 
>> in .text.unlikely to generate a relocation record against. Dropping
>> pr_err() calls results in these functions being left in .text section,
> 
> Why dropping pr_err() can make arch_kexec_apply_relocations_add put in
> .text?

I'm not actually sure, though Josh suspected that printk() might be 
cold:
http://lkml.kernel.org/r/20210214155147.3owdimqv2lyhu6by@treble


- Naveen


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-17 10:19     ` Naveen N. Rao
  0 siblings, 0 replies; 33+ messages in thread
From: Naveen N. Rao @ 2022-05-17 10:19 UTC (permalink / raw)
  To: kexec

Baoquan He wrote:
> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>> kexec_load_purgatory() can fail for many reasons - there is no need to
>> print an error when encountering unsupported relocations.
>> 
>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>> symbols") [2], binutils started dropping section symbols that it thought
> 
> I am not familiar with binutils, while wondering if this exists in other
> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
> have problem with it?

I'm not aware of this specific file causing a problem on other 
architectures - perhaps the config options differ enough. There are 
however more reports of similar issues affecting other architectures 
with the llvm integrated assembler:
https://github.com/ClangBuiltLinux/linux/issues/981

> 
>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>> is placing kexec_arch_apply_relocations[_add] into a separate
>> .text.unlikely section and the section symbol ".text.unlikely" is being
>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
> 
> But arch_kexec_apply_relocations_add is weak symbol on ppc.

Yes. Note that it is just the section symbol that gets dropped. The 
section is still present and will continue to hold the symbols for the 
functions themselves.

> 
>> in .text.unlikely to generate a relocation record against. Dropping
>> pr_err() calls results in these functions being left in .text section,
> 
> Why dropping pr_err() can make arch_kexec_apply_relocations_add put in
> .text?

I'm not actually sure, though Josh suspected that printk() might be 
cold:
http://lkml.kernel.org/r/20210214155147.3owdimqv2lyhu6by at treble


- Naveen



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
  2022-05-17 10:19     ` Naveen N. Rao
  (?)
@ 2022-05-17 15:32       ` Eric W. Biederman
  -1 siblings, 0 replies; 33+ messages in thread
From: Eric W. Biederman @ 2022-05-17 15:32 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Baoquan He, kexec, linux-kernel, linuxppc-dev, Michael Ellerman



Looking at this the pr_err is absolutely needed.  If an unsupported case
winds up in the purgatory blob and the code can't handle it things
will fail silently much worse later.  So the proposed patch is
unfortunately the wrong direction.

"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:

> Baoquan He wrote:
>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>>> kexec_load_purgatory() can fail for many reasons - there is no need to
>>> print an error when encountering unsupported relocations.
>>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
>>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>>> symbols") [2], binutils started dropping section symbols that it thought
>> I am not familiar with binutils, while wondering if this exists in other
>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
>> have problem with it?
>
> I'm not aware of this specific file causing a problem on other architectures -
> perhaps the config options differ enough. There are however more reports of
> similar issues affecting other architectures with the llvm integrated assembler:
> https://github.com/ClangBuiltLinux/linux/issues/981
>
>> 
>>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>>> is placing kexec_arch_apply_relocations[_add] into a separate
>>> .text.unlikely section and the section symbol ".text.unlikely" is being
>>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
>
> Yes. Note that it is just the section symbol that gets dropped. The section is
> still present and will continue to hold the symbols for the functions
> themselves.

So we have a case where binutils thinks it is doing something useful
and our kernel specific tool gets tripped up by it.

Reading the recordmcount code it looks like it is finding any symbol
within a section but ignoring weak symbols.  So I suspect the only
remaining symbol in the section is __weak and that confuses
recordmcount.

Does removing the __weak annotation on those functions fix the build
error?  If so we can restructure the kexec code to simply not use __weak
symbols.

Otherwise the fix needs to be in recordmcount or binutils, and we should
loop whoever maintains recordmcount in to see what they can do.

Eric

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-17 15:32       ` Eric W. Biederman
  0 siblings, 0 replies; 33+ messages in thread
From: Eric W. Biederman @ 2022-05-17 15:32 UTC (permalink / raw)
  To: Naveen N. Rao; +Cc: linuxppc-dev, kexec, linux-kernel, Baoquan He



Looking at this the pr_err is absolutely needed.  If an unsupported case
winds up in the purgatory blob and the code can't handle it things
will fail silently much worse later.  So the proposed patch is
unfortunately the wrong direction.

"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:

> Baoquan He wrote:
>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>>> kexec_load_purgatory() can fail for many reasons - there is no need to
>>> print an error when encountering unsupported relocations.
>>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
>>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>>> symbols") [2], binutils started dropping section symbols that it thought
>> I am not familiar with binutils, while wondering if this exists in other
>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
>> have problem with it?
>
> I'm not aware of this specific file causing a problem on other architectures -
> perhaps the config options differ enough. There are however more reports of
> similar issues affecting other architectures with the llvm integrated assembler:
> https://github.com/ClangBuiltLinux/linux/issues/981
>
>> 
>>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>>> is placing kexec_arch_apply_relocations[_add] into a separate
>>> .text.unlikely section and the section symbol ".text.unlikely" is being
>>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
>
> Yes. Note that it is just the section symbol that gets dropped. The section is
> still present and will continue to hold the symbols for the functions
> themselves.

So we have a case where binutils thinks it is doing something useful
and our kernel specific tool gets tripped up by it.

Reading the recordmcount code it looks like it is finding any symbol
within a section but ignoring weak symbols.  So I suspect the only
remaining symbol in the section is __weak and that confuses
recordmcount.

Does removing the __weak annotation on those functions fix the build
error?  If so we can restructure the kexec code to simply not use __weak
symbols.

Otherwise the fix needs to be in recordmcount or binutils, and we should
loop whoever maintains recordmcount in to see what they can do.

Eric

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-17 15:32       ` Eric W. Biederman
  0 siblings, 0 replies; 33+ messages in thread
From: Eric W. Biederman @ 2022-05-17 15:32 UTC (permalink / raw)
  To: kexec



Looking at this the pr_err is absolutely needed.  If an unsupported case
winds up in the purgatory blob and the code can't handle it things
will fail silently much worse later.  So the proposed patch is
unfortunately the wrong direction.

"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:

> Baoquan He wrote:
>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>>> kexec_load_purgatory() can fail for many reasons - there is no need to
>>> print an error when encountering unsupported relocations.
>>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
>>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>>> symbols") [2], binutils started dropping section symbols that it thought
>> I am not familiar with binutils, while wondering if this exists in other
>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
>> have problem with it?
>
> I'm not aware of this specific file causing a problem on other architectures -
> perhaps the config options differ enough. There are however more reports of
> similar issues affecting other architectures with the llvm integrated assembler:
> https://github.com/ClangBuiltLinux/linux/issues/981
>
>> 
>>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>>> is placing kexec_arch_apply_relocations[_add] into a separate
>>> .text.unlikely section and the section symbol ".text.unlikely" is being
>>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
>
> Yes. Note that it is just the section symbol that gets dropped. The section is
> still present and will continue to hold the symbols for the functions
> themselves.

So we have a case where binutils thinks it is doing something useful
and our kernel specific tool gets tripped up by it.

Reading the recordmcount code it looks like it is finding any symbol
within a section but ignoring weak symbols.  So I suspect the only
remaining symbol in the section is __weak and that confuses
recordmcount.

Does removing the __weak annotation on those functions fix the build
error?  If so we can restructure the kexec code to simply not use __weak
symbols.

Otherwise the fix needs to be in recordmcount or binutils, and we should
loop whoever maintains recordmcount in to see what they can do.

Eric


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
  2022-05-17 15:32       ` Eric W. Biederman
  (?)
@ 2022-05-18  2:26         ` Michael Ellerman
  -1 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2022-05-18  2:26 UTC (permalink / raw)
  To: Eric W. Biederman, Naveen N. Rao
  Cc: Baoquan He, kexec, linux-kernel, linuxppc-dev

"Eric W. Biederman" <ebiederm@xmission.com> writes:
> Looking at this the pr_err is absolutely needed.  If an unsupported case
> winds up in the purgatory blob and the code can't handle it things
> will fail silently much worse later.

It won't fail later, it will fail the syscall.

sys_kexec_file_load()
  kimage_file_alloc_init()
    kimage_file_prepare_segments()
      arch_kexec_kernel_image_load()
        kexec_image_load_default()
          image->fops->load()
            elf64_load()        # powerpc
            bzImage64_load()    # x86
              kexec_load_purgatory()
                kexec_apply_relocations()

Which does:

	if (relsec->sh_type == SHT_RELA)
		ret = arch_kexec_apply_relocations_add(pi, section,
						       relsec, symtab);
	else if (relsec->sh_type == SHT_REL)
		ret = arch_kexec_apply_relocations(pi, section,
						   relsec, symtab);
	if (ret)
		return ret;

And that error is bubbled all the way back up. So as long as
arch_kexec_apply_relocations() returns an error the syscall will fail
back to userspace and there'll be an error message at that level.

It's true that having nothing printed in dmesg makes it harder to work
out why the syscall failed. But it's a kernel bug if there are unhandled
relocations in the kernel-supplied purgatory code, so a user really has
no way to do anything about the error even if it is printed.

> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
>
>> Baoquan He wrote:
>>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>>>> kexec_load_purgatory() can fail for many reasons - there is no need to
>>>> print an error when encountering unsupported relocations.
>>>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
>>>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>>>> symbols") [2], binutils started dropping section symbols that it thought
>>> I am not familiar with binutils, while wondering if this exists in other
>>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
>>> have problem with it?
>>
>> I'm not aware of this specific file causing a problem on other architectures -
>> perhaps the config options differ enough. There are however more reports of
>> similar issues affecting other architectures with the llvm integrated assembler:
>> https://github.com/ClangBuiltLinux/linux/issues/981
>>
>>>
>>>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>>>> is placing kexec_arch_apply_relocations[_add] into a separate
>>>> .text.unlikely section and the section symbol ".text.unlikely" is being
>>>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
>>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
>>
>> Yes. Note that it is just the section symbol that gets dropped. The section is
>> still present and will continue to hold the symbols for the functions
>> themselves.
>
> So we have a case where binutils thinks it is doing something useful
> and our kernel specific tool gets tripped up by it.

It's not just binutils, the LLVM assembler has the same behavior.

> Reading the recordmcount code it looks like it is finding any symbol
> within a section but ignoring weak symbols.  So I suspect the only
> remaining symbol in the section is __weak and that confuses
> recordmcount.
>
> Does removing the __weak annotation on those functions fix the build
> error?  If so we can restructure the kexec code to simply not use __weak
> symbols.
>
> Otherwise the fix needs to be in recordmcount or binutils, and we should
> loop whoever maintains recordmcount in to see what they can do.

It seems that recordmcount is not really maintained anymore now that x86
uses objtool?

There've been several threads about fixing recordmcount, but none of
them seem to have lead to a solution.

These weak symbol vs recordmcount problems have been worked around going
back as far as 2020:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-18  2:26         ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2022-05-18  2:26 UTC (permalink / raw)
  To: Eric W. Biederman, Naveen N. Rao
  Cc: linuxppc-dev, kexec, linux-kernel, Baoquan He

"Eric W. Biederman" <ebiederm@xmission.com> writes:
> Looking at this the pr_err is absolutely needed.  If an unsupported case
> winds up in the purgatory blob and the code can't handle it things
> will fail silently much worse later.

It won't fail later, it will fail the syscall.

sys_kexec_file_load()
  kimage_file_alloc_init()
    kimage_file_prepare_segments()
      arch_kexec_kernel_image_load()
        kexec_image_load_default()
          image->fops->load()
            elf64_load()        # powerpc
            bzImage64_load()    # x86
              kexec_load_purgatory()
                kexec_apply_relocations()

Which does:

	if (relsec->sh_type == SHT_RELA)
		ret = arch_kexec_apply_relocations_add(pi, section,
						       relsec, symtab);
	else if (relsec->sh_type == SHT_REL)
		ret = arch_kexec_apply_relocations(pi, section,
						   relsec, symtab);
	if (ret)
		return ret;

And that error is bubbled all the way back up. So as long as
arch_kexec_apply_relocations() returns an error the syscall will fail
back to userspace and there'll be an error message at that level.

It's true that having nothing printed in dmesg makes it harder to work
out why the syscall failed. But it's a kernel bug if there are unhandled
relocations in the kernel-supplied purgatory code, so a user really has
no way to do anything about the error even if it is printed.

> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
>
>> Baoquan He wrote:
>>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>>>> kexec_load_purgatory() can fail for many reasons - there is no need to
>>>> print an error when encountering unsupported relocations.
>>>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
>>>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>>>> symbols") [2], binutils started dropping section symbols that it thought
>>> I am not familiar with binutils, while wondering if this exists in other
>>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
>>> have problem with it?
>>
>> I'm not aware of this specific file causing a problem on other architectures -
>> perhaps the config options differ enough. There are however more reports of
>> similar issues affecting other architectures with the llvm integrated assembler:
>> https://github.com/ClangBuiltLinux/linux/issues/981
>>
>>>
>>>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>>>> is placing kexec_arch_apply_relocations[_add] into a separate
>>>> .text.unlikely section and the section symbol ".text.unlikely" is being
>>>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
>>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
>>
>> Yes. Note that it is just the section symbol that gets dropped. The section is
>> still present and will continue to hold the symbols for the functions
>> themselves.
>
> So we have a case where binutils thinks it is doing something useful
> and our kernel specific tool gets tripped up by it.

It's not just binutils, the LLVM assembler has the same behavior.

> Reading the recordmcount code it looks like it is finding any symbol
> within a section but ignoring weak symbols.  So I suspect the only
> remaining symbol in the section is __weak and that confuses
> recordmcount.
>
> Does removing the __weak annotation on those functions fix the build
> error?  If so we can restructure the kexec code to simply not use __weak
> symbols.
>
> Otherwise the fix needs to be in recordmcount or binutils, and we should
> loop whoever maintains recordmcount in to see what they can do.

It seems that recordmcount is not really maintained anymore now that x86
uses objtool?

There've been several threads about fixing recordmcount, but none of
them seem to have lead to a solution.

These weak symbol vs recordmcount problems have been worked around going
back as far as 2020:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-18  2:26         ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2022-05-18  2:26 UTC (permalink / raw)
  To: kexec

"Eric W. Biederman" <ebiederm@xmission.com> writes:
> Looking at this the pr_err is absolutely needed.  If an unsupported case
> winds up in the purgatory blob and the code can't handle it things
> will fail silently much worse later.

It won't fail later, it will fail the syscall.

sys_kexec_file_load()
  kimage_file_alloc_init()
    kimage_file_prepare_segments()
      arch_kexec_kernel_image_load()
        kexec_image_load_default()
          image->fops->load()
            elf64_load()        # powerpc
            bzImage64_load()    # x86
              kexec_load_purgatory()
                kexec_apply_relocations()

Which does:

	if (relsec->sh_type == SHT_RELA)
		ret = arch_kexec_apply_relocations_add(pi, section,
						       relsec, symtab);
	else if (relsec->sh_type == SHT_REL)
		ret = arch_kexec_apply_relocations(pi, section,
						   relsec, symtab);
	if (ret)
		return ret;

And that error is bubbled all the way back up. So as long as
arch_kexec_apply_relocations() returns an error the syscall will fail
back to userspace and there'll be an error message at that level.

It's true that having nothing printed in dmesg makes it harder to work
out why the syscall failed. But it's a kernel bug if there are unhandled
relocations in the kernel-supplied purgatory code, so a user really has
no way to do anything about the error even if it is printed.

> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
>
>> Baoquan He wrote:
>>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>>>> kexec_load_purgatory() can fail for many reasons - there is no need to
>>>> print an error when encountering unsupported relocations.
>>>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
>>>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>>>> symbols") [2], binutils started dropping section symbols that it thought
>>> I am not familiar with binutils, while wondering if this exists in other
>>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
>>> have problem with it?
>>
>> I'm not aware of this specific file causing a problem on other architectures -
>> perhaps the config options differ enough. There are however more reports of
>> similar issues affecting other architectures with the llvm integrated assembler:
>> https://github.com/ClangBuiltLinux/linux/issues/981
>>
>>>
>>>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>>>> is placing kexec_arch_apply_relocations[_add] into a separate
>>>> .text.unlikely section and the section symbol ".text.unlikely" is being
>>>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
>>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
>>
>> Yes. Note that it is just the section symbol that gets dropped. The section is
>> still present and will continue to hold the symbols for the functions
>> themselves.
>
> So we have a case where binutils thinks it is doing something useful
> and our kernel specific tool gets tripped up by it.

It's not just binutils, the LLVM assembler has the same behavior.

> Reading the recordmcount code it looks like it is finding any symbol
> within a section but ignoring weak symbols.  So I suspect the only
> remaining symbol in the section is __weak and that confuses
> recordmcount.
>
> Does removing the __weak annotation on those functions fix the build
> error?  If so we can restructure the kexec code to simply not use __weak
> symbols.
>
> Otherwise the fix needs to be in recordmcount or binutils, and we should
> loop whoever maintains recordmcount in to see what they can do.

It seems that recordmcount is not really maintained anymore now that x86
uses objtool?

There've been several threads about fixing recordmcount, but none of
them seem to have lead to a solution.

These weak symbol vs recordmcount problems have been worked around going
back as far as 2020:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9

cheers


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
  2022-05-18  2:26         ` Michael Ellerman
  (?)
@ 2022-05-18  7:49           ` Baoquan He
  -1 siblings, 0 replies; 33+ messages in thread
From: Baoquan He @ 2022-05-18  7:49 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Eric W. Biederman, Naveen N. Rao, kexec, linux-kernel, linuxppc-dev

On 05/18/22 at 12:26pm, Michael Ellerman wrote:
> "Eric W. Biederman" <ebiederm@xmission.com> writes:
> > Looking at this the pr_err is absolutely needed.  If an unsupported case
> > winds up in the purgatory blob and the code can't handle it things
> > will fail silently much worse later.
> 
> It won't fail later, it will fail the syscall.
> 
> sys_kexec_file_load()
>   kimage_file_alloc_init()
>     kimage_file_prepare_segments()
>       arch_kexec_kernel_image_load()
>         kexec_image_load_default()
>           image->fops->load()
>             elf64_load()        # powerpc
>             bzImage64_load()    # x86
>               kexec_load_purgatory()
>                 kexec_apply_relocations()
> 
> Which does:
> 
> 	if (relsec->sh_type == SHT_RELA)
> 		ret = arch_kexec_apply_relocations_add(pi, section,
> 						       relsec, symtab);
> 	else if (relsec->sh_type == SHT_REL)
> 		ret = arch_kexec_apply_relocations(pi, section,
> 						   relsec, symtab);
> 	if (ret)
> 		return ret;
> 
> And that error is bubbled all the way back up. So as long as
> arch_kexec_apply_relocations() returns an error the syscall will fail
> back to userspace and there'll be an error message at that level.
> 
> It's true that having nothing printed in dmesg makes it harder to work
> out why the syscall failed. But it's a kernel bug if there are unhandled
> relocations in the kernel-supplied purgatory code, so a user really has
> no way to do anything about the error even if it is printed.
> 
> > "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
> >
> >> Baoquan He wrote:
> >>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
> >>>> kexec_load_purgatory() can fail for many reasons - there is no need to
> >>>> print an error when encountering unsupported relocations.
> >>>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
> >>>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
> >>>> symbols") [2], binutils started dropping section symbols that it thought
> >>> I am not familiar with binutils, while wondering if this exists in other
> >>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
> >>> have problem with it?
> >>
> >> I'm not aware of this specific file causing a problem on other architectures -
> >> perhaps the config options differ enough. There are however more reports of
> >> similar issues affecting other architectures with the llvm integrated assembler:
> >> https://github.com/ClangBuiltLinux/linux/issues/981
> >>
> >>>
> >>>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
> >>>> is placing kexec_arch_apply_relocations[_add] into a separate
> >>>> .text.unlikely section and the section symbol ".text.unlikely" is being
> >>>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
> >>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
> >>
> >> Yes. Note that it is just the section symbol that gets dropped. The section is
> >> still present and will continue to hold the symbols for the functions
> >> themselves.
> >
> > So we have a case where binutils thinks it is doing something useful
> > and our kernel specific tool gets tripped up by it.
> 
> It's not just binutils, the LLVM assembler has the same behavior.
> 
> > Reading the recordmcount code it looks like it is finding any symbol
> > within a section but ignoring weak symbols.  So I suspect the only
> > remaining symbol in the section is __weak and that confuses
> > recordmcount.
> >
> > Does removing the __weak annotation on those functions fix the build
> > error?  If so we can restructure the kexec code to simply not use __weak
> > symbols.
> >
> > Otherwise the fix needs to be in recordmcount or binutils, and we should
> > loop whoever maintains recordmcount in to see what they can do.
> 
> It seems that recordmcount is not really maintained anymore now that x86
> uses objtool?
> 
> There've been several threads about fixing recordmcount, but none of
> them seem to have lead to a solution.
> 
> These weak symbol vs recordmcount problems have been worked around going
> back as far as 2020:

It gives me feeling that llvm or recordmcount should make adjustment,
but not innocent kernel code, if there are a lot of places reported.
I am curious how llvm or recordmcount dev respond to this.

> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9
> 
> cheers
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-18  7:49           ` Baoquan He
  0 siblings, 0 replies; 33+ messages in thread
From: Baoquan He @ 2022-05-18  7:49 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, Naveen N. Rao, kexec, Eric W. Biederman, linux-kernel

On 05/18/22 at 12:26pm, Michael Ellerman wrote:
> "Eric W. Biederman" <ebiederm@xmission.com> writes:
> > Looking at this the pr_err is absolutely needed.  If an unsupported case
> > winds up in the purgatory blob and the code can't handle it things
> > will fail silently much worse later.
> 
> It won't fail later, it will fail the syscall.
> 
> sys_kexec_file_load()
>   kimage_file_alloc_init()
>     kimage_file_prepare_segments()
>       arch_kexec_kernel_image_load()
>         kexec_image_load_default()
>           image->fops->load()
>             elf64_load()        # powerpc
>             bzImage64_load()    # x86
>               kexec_load_purgatory()
>                 kexec_apply_relocations()
> 
> Which does:
> 
> 	if (relsec->sh_type == SHT_RELA)
> 		ret = arch_kexec_apply_relocations_add(pi, section,
> 						       relsec, symtab);
> 	else if (relsec->sh_type == SHT_REL)
> 		ret = arch_kexec_apply_relocations(pi, section,
> 						   relsec, symtab);
> 	if (ret)
> 		return ret;
> 
> And that error is bubbled all the way back up. So as long as
> arch_kexec_apply_relocations() returns an error the syscall will fail
> back to userspace and there'll be an error message at that level.
> 
> It's true that having nothing printed in dmesg makes it harder to work
> out why the syscall failed. But it's a kernel bug if there are unhandled
> relocations in the kernel-supplied purgatory code, so a user really has
> no way to do anything about the error even if it is printed.
> 
> > "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
> >
> >> Baoquan He wrote:
> >>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
> >>>> kexec_load_purgatory() can fail for many reasons - there is no need to
> >>>> print an error when encountering unsupported relocations.
> >>>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
> >>>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
> >>>> symbols") [2], binutils started dropping section symbols that it thought
> >>> I am not familiar with binutils, while wondering if this exists in other
> >>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
> >>> have problem with it?
> >>
> >> I'm not aware of this specific file causing a problem on other architectures -
> >> perhaps the config options differ enough. There are however more reports of
> >> similar issues affecting other architectures with the llvm integrated assembler:
> >> https://github.com/ClangBuiltLinux/linux/issues/981
> >>
> >>>
> >>>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
> >>>> is placing kexec_arch_apply_relocations[_add] into a separate
> >>>> .text.unlikely section and the section symbol ".text.unlikely" is being
> >>>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
> >>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
> >>
> >> Yes. Note that it is just the section symbol that gets dropped. The section is
> >> still present and will continue to hold the symbols for the functions
> >> themselves.
> >
> > So we have a case where binutils thinks it is doing something useful
> > and our kernel specific tool gets tripped up by it.
> 
> It's not just binutils, the LLVM assembler has the same behavior.
> 
> > Reading the recordmcount code it looks like it is finding any symbol
> > within a section but ignoring weak symbols.  So I suspect the only
> > remaining symbol in the section is __weak and that confuses
> > recordmcount.
> >
> > Does removing the __weak annotation on those functions fix the build
> > error?  If so we can restructure the kexec code to simply not use __weak
> > symbols.
> >
> > Otherwise the fix needs to be in recordmcount or binutils, and we should
> > loop whoever maintains recordmcount in to see what they can do.
> 
> It seems that recordmcount is not really maintained anymore now that x86
> uses objtool?
> 
> There've been several threads about fixing recordmcount, but none of
> them seem to have lead to a solution.
> 
> These weak symbol vs recordmcount problems have been worked around going
> back as far as 2020:

It gives me feeling that llvm or recordmcount should make adjustment,
but not innocent kernel code, if there are a lot of places reported.
I am curious how llvm or recordmcount dev respond to this.

> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9
> 
> cheers
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-18  7:49           ` Baoquan He
  0 siblings, 0 replies; 33+ messages in thread
From: Baoquan He @ 2022-05-18  7:49 UTC (permalink / raw)
  To: kexec

On 05/18/22 at 12:26pm, Michael Ellerman wrote:
> "Eric W. Biederman" <ebiederm@xmission.com> writes:
> > Looking at this the pr_err is absolutely needed.  If an unsupported case
> > winds up in the purgatory blob and the code can't handle it things
> > will fail silently much worse later.
> 
> It won't fail later, it will fail the syscall.
> 
> sys_kexec_file_load()
>   kimage_file_alloc_init()
>     kimage_file_prepare_segments()
>       arch_kexec_kernel_image_load()
>         kexec_image_load_default()
>           image->fops->load()
>             elf64_load()        # powerpc
>             bzImage64_load()    # x86
>               kexec_load_purgatory()
>                 kexec_apply_relocations()
> 
> Which does:
> 
> 	if (relsec->sh_type == SHT_RELA)
> 		ret = arch_kexec_apply_relocations_add(pi, section,
> 						       relsec, symtab);
> 	else if (relsec->sh_type == SHT_REL)
> 		ret = arch_kexec_apply_relocations(pi, section,
> 						   relsec, symtab);
> 	if (ret)
> 		return ret;
> 
> And that error is bubbled all the way back up. So as long as
> arch_kexec_apply_relocations() returns an error the syscall will fail
> back to userspace and there'll be an error message at that level.
> 
> It's true that having nothing printed in dmesg makes it harder to work
> out why the syscall failed. But it's a kernel bug if there are unhandled
> relocations in the kernel-supplied purgatory code, so a user really has
> no way to do anything about the error even if it is printed.
> 
> > "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
> >
> >> Baoquan He wrote:
> >>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
> >>>> kexec_load_purgatory() can fail for many reasons - there is no need to
> >>>> print an error when encountering unsupported relocations.
> >>>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
> >>>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
> >>>> symbols") [2], binutils started dropping section symbols that it thought
> >>> I am not familiar with binutils, while wondering if this exists in other
> >>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
> >>> have problem with it?
> >>
> >> I'm not aware of this specific file causing a problem on other architectures -
> >> perhaps the config options differ enough. There are however more reports of
> >> similar issues affecting other architectures with the llvm integrated assembler:
> >> https://github.com/ClangBuiltLinux/linux/issues/981
> >>
> >>>
> >>>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
> >>>> is placing kexec_arch_apply_relocations[_add] into a separate
> >>>> .text.unlikely section and the section symbol ".text.unlikely" is being
> >>>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
> >>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
> >>
> >> Yes. Note that it is just the section symbol that gets dropped. The section is
> >> still present and will continue to hold the symbols for the functions
> >> themselves.
> >
> > So we have a case where binutils thinks it is doing something useful
> > and our kernel specific tool gets tripped up by it.
> 
> It's not just binutils, the LLVM assembler has the same behavior.
> 
> > Reading the recordmcount code it looks like it is finding any symbol
> > within a section but ignoring weak symbols.  So I suspect the only
> > remaining symbol in the section is __weak and that confuses
> > recordmcount.
> >
> > Does removing the __weak annotation on those functions fix the build
> > error?  If so we can restructure the kexec code to simply not use __weak
> > symbols.
> >
> > Otherwise the fix needs to be in recordmcount or binutils, and we should
> > loop whoever maintains recordmcount in to see what they can do.
> 
> It seems that recordmcount is not really maintained anymore now that x86
> uses objtool?
> 
> There've been several threads about fixing recordmcount, but none of
> them seem to have lead to a solution.
> 
> These weak symbol vs recordmcount problems have been worked around going
> back as far as 2020:

It gives me feeling that llvm or recordmcount should make adjustment,
but not innocent kernel code, if there are a lot of places reported.
I am curious how llvm or recordmcount dev respond to this.

> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9
> 
> cheers
> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
  2022-05-18  7:49           ` Baoquan He
  (?)
@ 2022-05-18  9:18             ` Naveen N. Rao
  -1 siblings, 0 replies; 33+ messages in thread
From: Naveen N. Rao @ 2022-05-18  9:18 UTC (permalink / raw)
  To: Baoquan He, Michael Ellerman
  Cc: Eric W. Biederman, kexec, linux-kernel, linuxppc-dev

Baoquan He wrote:
> On 05/18/22 at 12:26pm, Michael Ellerman wrote:
>> 
>> It seems that recordmcount is not really maintained anymore now that x86
>> uses objtool?
>> 
>> There've been several threads about fixing recordmcount, but none of
>> them seem to have lead to a solution.
>> 
>> These weak symbol vs recordmcount problems have been worked around going
>> back as far as 2020:
> 
> It gives me feeling that llvm or recordmcount should make adjustment,
> but not innocent kernel code, if there are a lot of places reported.
> I am curious how llvm or recordmcount dev respond to this.

As Michael stated, this is not just llvm - binutils has also adopted the 
same and "unused" section symbols are being dropped.

For recordmcount, there were a few threads and approaches that have been 
tried:
- https://patchwork.ozlabs.org/project/linuxppc-dev/patch/cd0f6bdfdf1ee096fb2c07e7b38940921b8e9118.1637764848.git.christophe.leroy@csgroup.eu/
- https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=297434&state=*

Objtool has picked up a more appropriate fix for this recently, and 
long-term, we would like to move to using objtool for ftrace purposes:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/tools/objtool/elf.c?id=4abff6d48dbcea8200c7ea35ba70c242d128ebf3

While that is being pursued, we want to unbreak some of the CI and users 
who are hitting this.


- Naveen


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-18  9:18             ` Naveen N. Rao
  0 siblings, 0 replies; 33+ messages in thread
From: Naveen N. Rao @ 2022-05-18  9:18 UTC (permalink / raw)
  To: Baoquan He, Michael Ellerman
  Cc: linuxppc-dev, kexec, Eric W. Biederman, linux-kernel

Baoquan He wrote:
> On 05/18/22 at 12:26pm, Michael Ellerman wrote:
>> 
>> It seems that recordmcount is not really maintained anymore now that x86
>> uses objtool?
>> 
>> There've been several threads about fixing recordmcount, but none of
>> them seem to have lead to a solution.
>> 
>> These weak symbol vs recordmcount problems have been worked around going
>> back as far as 2020:
> 
> It gives me feeling that llvm or recordmcount should make adjustment,
> but not innocent kernel code, if there are a lot of places reported.
> I am curious how llvm or recordmcount dev respond to this.

As Michael stated, this is not just llvm - binutils has also adopted the 
same and "unused" section symbols are being dropped.

For recordmcount, there were a few threads and approaches that have been 
tried:
- https://patchwork.ozlabs.org/project/linuxppc-dev/patch/cd0f6bdfdf1ee096fb2c07e7b38940921b8e9118.1637764848.git.christophe.leroy@csgroup.eu/
- https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=297434&state=*

Objtool has picked up a more appropriate fix for this recently, and 
long-term, we would like to move to using objtool for ftrace purposes:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/tools/objtool/elf.c?id=4abff6d48dbcea8200c7ea35ba70c242d128ebf3

While that is being pursued, we want to unbreak some of the CI and users 
who are hitting this.


- Naveen


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-18  9:18             ` Naveen N. Rao
  0 siblings, 0 replies; 33+ messages in thread
From: Naveen N. Rao @ 2022-05-18  9:18 UTC (permalink / raw)
  To: kexec

Baoquan He wrote:
> On 05/18/22 at 12:26pm, Michael Ellerman wrote:
>> 
>> It seems that recordmcount is not really maintained anymore now that x86
>> uses objtool?
>> 
>> There've been several threads about fixing recordmcount, but none of
>> them seem to have lead to a solution.
>> 
>> These weak symbol vs recordmcount problems have been worked around going
>> back as far as 2020:
> 
> It gives me feeling that llvm or recordmcount should make adjustment,
> but not innocent kernel code, if there are a lot of places reported.
> I am curious how llvm or recordmcount dev respond to this.

As Michael stated, this is not just llvm - binutils has also adopted the 
same and "unused" section symbols are being dropped.

For recordmcount, there were a few threads and approaches that have been 
tried:
- https://patchwork.ozlabs.org/project/linuxppc-dev/patch/cd0f6bdfdf1ee096fb2c07e7b38940921b8e9118.1637764848.git.christophe.leroy at csgroup.eu/
- https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=297434&state=*

Objtool has picked up a more appropriate fix for this recently, and 
long-term, we would like to move to using objtool for ftrace purposes:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/tools/objtool/elf.c?id=4abff6d48dbcea8200c7ea35ba70c242d128ebf3

While that is being pursued, we want to unbreak some of the CI and users 
who are hitting this.


- Naveen



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
  2022-05-18  9:18             ` Naveen N. Rao
  (?)
@ 2022-05-18 10:11               ` Baoquan He
  -1 siblings, 0 replies; 33+ messages in thread
From: Baoquan He @ 2022-05-18 10:11 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Michael Ellerman, Eric W. Biederman, kexec, linux-kernel, linuxppc-dev

On 05/18/22 at 02:48pm, Naveen N. Rao wrote:
> Baoquan He wrote:
> > On 05/18/22 at 12:26pm, Michael Ellerman wrote:
> > > 
> > > It seems that recordmcount is not really maintained anymore now that x86
> > > uses objtool?
> > > 
> > > There've been several threads about fixing recordmcount, but none of
> > > them seem to have lead to a solution.
> > > 
> > > These weak symbol vs recordmcount problems have been worked around going
> > > back as far as 2020:
> > 
> > It gives me feeling that llvm or recordmcount should make adjustment,
> > but not innocent kernel code, if there are a lot of places reported.
> > I am curious how llvm or recordmcount dev respond to this.
> 
> As Michael stated, this is not just llvm - binutils has also adopted the
> same and "unused" section symbols are being dropped.
> 
> For recordmcount, there were a few threads and approaches that have been
> tried:
> - https://patchwork.ozlabs.org/project/linuxppc-dev/patch/cd0f6bdfdf1ee096fb2c07e7b38940921b8e9118.1637764848.git.christophe.leroy@csgroup.eu/
> - https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=297434&state=*
> 
> Objtool has picked up a more appropriate fix for this recently, and
> long-term, we would like to move to using objtool for ftrace purposes:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/tools/objtool/elf.c?id=4abff6d48dbcea8200c7ea35ba70c242d128ebf3
> 
> While that is being pursued, we want to unbreak some of the CI and users who
> are hitting this.

I see, thanks for the details. I would persue fix in recordmcount if
possible, while has no objection to fix it in kernel with justification
if have to. Given my limited linking knowledge, leave this to other
expert to decide.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-18 10:11               ` Baoquan He
  0 siblings, 0 replies; 33+ messages in thread
From: Baoquan He @ 2022-05-18 10:11 UTC (permalink / raw)
  To: Naveen N. Rao; +Cc: linuxppc-dev, kexec, Eric W. Biederman, linux-kernel

On 05/18/22 at 02:48pm, Naveen N. Rao wrote:
> Baoquan He wrote:
> > On 05/18/22 at 12:26pm, Michael Ellerman wrote:
> > > 
> > > It seems that recordmcount is not really maintained anymore now that x86
> > > uses objtool?
> > > 
> > > There've been several threads about fixing recordmcount, but none of
> > > them seem to have lead to a solution.
> > > 
> > > These weak symbol vs recordmcount problems have been worked around going
> > > back as far as 2020:
> > 
> > It gives me feeling that llvm or recordmcount should make adjustment,
> > but not innocent kernel code, if there are a lot of places reported.
> > I am curious how llvm or recordmcount dev respond to this.
> 
> As Michael stated, this is not just llvm - binutils has also adopted the
> same and "unused" section symbols are being dropped.
> 
> For recordmcount, there were a few threads and approaches that have been
> tried:
> - https://patchwork.ozlabs.org/project/linuxppc-dev/patch/cd0f6bdfdf1ee096fb2c07e7b38940921b8e9118.1637764848.git.christophe.leroy@csgroup.eu/
> - https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=297434&state=*
> 
> Objtool has picked up a more appropriate fix for this recently, and
> long-term, we would like to move to using objtool for ftrace purposes:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/tools/objtool/elf.c?id=4abff6d48dbcea8200c7ea35ba70c242d128ebf3
> 
> While that is being pursued, we want to unbreak some of the CI and users who
> are hitting this.

I see, thanks for the details. I would persue fix in recordmcount if
possible, while has no objection to fix it in kernel with justification
if have to. Given my limited linking knowledge, leave this to other
expert to decide.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-18 10:11               ` Baoquan He
  0 siblings, 0 replies; 33+ messages in thread
From: Baoquan He @ 2022-05-18 10:11 UTC (permalink / raw)
  To: kexec

On 05/18/22 at 02:48pm, Naveen N. Rao wrote:
> Baoquan He wrote:
> > On 05/18/22 at 12:26pm, Michael Ellerman wrote:
> > > 
> > > It seems that recordmcount is not really maintained anymore now that x86
> > > uses objtool?
> > > 
> > > There've been several threads about fixing recordmcount, but none of
> > > them seem to have lead to a solution.
> > > 
> > > These weak symbol vs recordmcount problems have been worked around going
> > > back as far as 2020:
> > 
> > It gives me feeling that llvm or recordmcount should make adjustment,
> > but not innocent kernel code, if there are a lot of places reported.
> > I am curious how llvm or recordmcount dev respond to this.
> 
> As Michael stated, this is not just llvm - binutils has also adopted the
> same and "unused" section symbols are being dropped.
> 
> For recordmcount, there were a few threads and approaches that have been
> tried:
> - https://patchwork.ozlabs.org/project/linuxppc-dev/patch/cd0f6bdfdf1ee096fb2c07e7b38940921b8e9118.1637764848.git.christophe.leroy at csgroup.eu/
> - https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=297434&state=*
> 
> Objtool has picked up a more appropriate fix for this recently, and
> long-term, we would like to move to using objtool for ftrace purposes:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/tools/objtool/elf.c?id=4abff6d48dbcea8200c7ea35ba70c242d128ebf3
> 
> While that is being pursued, we want to unbreak some of the CI and users who
> are hitting this.

I see, thanks for the details. I would persue fix in recordmcount if
possible, while has no objection to fix it in kernel with justification
if have to. Given my limited linking knowledge, leave this to other
expert to decide.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
  2022-05-18  2:26         ` Michael Ellerman
  (?)
@ 2022-05-18 14:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 33+ messages in thread
From: Eric W. Biederman @ 2022-05-18 14:48 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Naveen N. Rao, Baoquan He, kexec, linux-kernel, linuxppc-dev

Michael Ellerman <mpe@ellerman.id.au> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>> Looking at this the pr_err is absolutely needed.  If an unsupported case
>> winds up in the purgatory blob and the code can't handle it things
>> will fail silently much worse later.
>
> It won't fail later, it will fail the syscall.
>
> sys_kexec_file_load()
>   kimage_file_alloc_init()
>     kimage_file_prepare_segments()
>       arch_kexec_kernel_image_load()
>         kexec_image_load_default()
>           image->fops->load()
>             elf64_load()        # powerpc
>             bzImage64_load()    # x86
>               kexec_load_purgatory()
>                 kexec_apply_relocations()
>
> Which does:
>
> 	if (relsec->sh_type == SHT_RELA)
> 		ret = arch_kexec_apply_relocations_add(pi, section,
> 						       relsec, symtab);
> 	else if (relsec->sh_type == SHT_REL)
> 		ret = arch_kexec_apply_relocations(pi, section,
> 						   relsec, symtab);
> 	if (ret)
> 		return ret;
>
> And that error is bubbled all the way back up. So as long as
> arch_kexec_apply_relocations() returns an error the syscall will fail
> back to userspace and there'll be an error message at that level.
>
> It's true that having nothing printed in dmesg makes it harder to work
> out why the syscall failed. But it's a kernel bug if there are unhandled
> relocations in the kernel-supplied purgatory code, so a user really has
> no way to do anything about the error even if it is printed.

Good point.  I really hadn't noticed the error code in there when I
looked.

I still don't think changing the functionality of the code because of
a tool issue is the right solution.


>> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
>>
>>> Baoquan He wrote:
>>>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>>>>> kexec_load_purgatory() can fail for many reasons - there is no need to
>>>>> print an error when encountering unsupported relocations.
>>>>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
>>>>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>>>>> symbols") [2], binutils started dropping section symbols that it thought
>>>> I am not familiar with binutils, while wondering if this exists in other
>>>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
>>>> have problem with it?
>>>
>>> I'm not aware of this specific file causing a problem on other architectures -
>>> perhaps the config options differ enough. There are however more reports of
>>> similar issues affecting other architectures with the llvm integrated assembler:
>>> https://github.com/ClangBuiltLinux/linux/issues/981
>>>
>>>>
>>>>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>>>>> is placing kexec_arch_apply_relocations[_add] into a separate
>>>>> .text.unlikely section and the section symbol ".text.unlikely" is being
>>>>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
>>>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
>>>
>>> Yes. Note that it is just the section symbol that gets dropped. The section is
>>> still present and will continue to hold the symbols for the functions
>>> themselves.
>>
>> So we have a case where binutils thinks it is doing something useful
>> and our kernel specific tool gets tripped up by it.
>
> It's not just binutils, the LLVM assembler has the same behavior.
>
>> Reading the recordmcount code it looks like it is finding any symbol
>> within a section but ignoring weak symbols.  So I suspect the only
>> remaining symbol in the section is __weak and that confuses
>> recordmcount.
>>
>> Does removing the __weak annotation on those functions fix the build
>> error?  If so we can restructure the kexec code to simply not use __weak
>> symbols.
>>
>> Otherwise the fix needs to be in recordmcount or binutils, and we should
>> loop whoever maintains recordmcount in to see what they can do.
>
> It seems that recordmcount is not really maintained anymore now that x86
> uses objtool?
>
> There've been several threads about fixing recordmcount, but none of
> them seem to have lead to a solution.

That is unfortunate.

> These weak symbol vs recordmcount problems have been worked around going
> back as far as 2020:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9

I am more than happy to adopt the kind of solution that was adopted
there in elfcore.h and simply get rid of __weak symbols in the kexec
code.

Using __weak symbols is really not the common kernel way of doing
things.  Using __weak symbols introduces a bit of magic in how the
kernel gets built that is unnecessary.

Can someone verify that deleting __weak is enough to get powerpc to
build?  AKA:

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 8347fc158d2b..7f4ca8dbe26f 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -117,7 +117,7 @@ int __weak arch_kexec_kernel_verify_sig(struct kimage *image, void *buf,
  *
  * Return: 0 on success, negative errno on error.
  */
-int __weak
+int
 arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
                                 const Elf_Shdr *relsec, const Elf_Shdr *symtab)
 {
@@ -134,7 +134,7 @@ arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
  *
  * Return: 0 on success, negative errno on error.
  */
-int __weak
+int
 arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section,
                             const Elf_Shdr *relsec, const Elf_Shdr *symtab)
 {

If that change is verified to work a proper patch that keeps x86 and
s390 building that have actual implementations should not be too
difficult to write.

Eric

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-18 14:48           ` Eric W. Biederman
  0 siblings, 0 replies; 33+ messages in thread
From: Eric W. Biederman @ 2022-05-18 14:48 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, Naveen N. Rao, kexec, linux-kernel, Baoquan He

Michael Ellerman <mpe@ellerman.id.au> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>> Looking at this the pr_err is absolutely needed.  If an unsupported case
>> winds up in the purgatory blob and the code can't handle it things
>> will fail silently much worse later.
>
> It won't fail later, it will fail the syscall.
>
> sys_kexec_file_load()
>   kimage_file_alloc_init()
>     kimage_file_prepare_segments()
>       arch_kexec_kernel_image_load()
>         kexec_image_load_default()
>           image->fops->load()
>             elf64_load()        # powerpc
>             bzImage64_load()    # x86
>               kexec_load_purgatory()
>                 kexec_apply_relocations()
>
> Which does:
>
> 	if (relsec->sh_type == SHT_RELA)
> 		ret = arch_kexec_apply_relocations_add(pi, section,
> 						       relsec, symtab);
> 	else if (relsec->sh_type == SHT_REL)
> 		ret = arch_kexec_apply_relocations(pi, section,
> 						   relsec, symtab);
> 	if (ret)
> 		return ret;
>
> And that error is bubbled all the way back up. So as long as
> arch_kexec_apply_relocations() returns an error the syscall will fail
> back to userspace and there'll be an error message at that level.
>
> It's true that having nothing printed in dmesg makes it harder to work
> out why the syscall failed. But it's a kernel bug if there are unhandled
> relocations in the kernel-supplied purgatory code, so a user really has
> no way to do anything about the error even if it is printed.

Good point.  I really hadn't noticed the error code in there when I
looked.

I still don't think changing the functionality of the code because of
a tool issue is the right solution.


>> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
>>
>>> Baoquan He wrote:
>>>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>>>>> kexec_load_purgatory() can fail for many reasons - there is no need to
>>>>> print an error when encountering unsupported relocations.
>>>>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
>>>>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>>>>> symbols") [2], binutils started dropping section symbols that it thought
>>>> I am not familiar with binutils, while wondering if this exists in other
>>>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
>>>> have problem with it?
>>>
>>> I'm not aware of this specific file causing a problem on other architectures -
>>> perhaps the config options differ enough. There are however more reports of
>>> similar issues affecting other architectures with the llvm integrated assembler:
>>> https://github.com/ClangBuiltLinux/linux/issues/981
>>>
>>>>
>>>>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>>>>> is placing kexec_arch_apply_relocations[_add] into a separate
>>>>> .text.unlikely section and the section symbol ".text.unlikely" is being
>>>>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
>>>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
>>>
>>> Yes. Note that it is just the section symbol that gets dropped. The section is
>>> still present and will continue to hold the symbols for the functions
>>> themselves.
>>
>> So we have a case where binutils thinks it is doing something useful
>> and our kernel specific tool gets tripped up by it.
>
> It's not just binutils, the LLVM assembler has the same behavior.
>
>> Reading the recordmcount code it looks like it is finding any symbol
>> within a section but ignoring weak symbols.  So I suspect the only
>> remaining symbol in the section is __weak and that confuses
>> recordmcount.
>>
>> Does removing the __weak annotation on those functions fix the build
>> error?  If so we can restructure the kexec code to simply not use __weak
>> symbols.
>>
>> Otherwise the fix needs to be in recordmcount or binutils, and we should
>> loop whoever maintains recordmcount in to see what they can do.
>
> It seems that recordmcount is not really maintained anymore now that x86
> uses objtool?
>
> There've been several threads about fixing recordmcount, but none of
> them seem to have lead to a solution.

That is unfortunate.

> These weak symbol vs recordmcount problems have been worked around going
> back as far as 2020:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9

I am more than happy to adopt the kind of solution that was adopted
there in elfcore.h and simply get rid of __weak symbols in the kexec
code.

Using __weak symbols is really not the common kernel way of doing
things.  Using __weak symbols introduces a bit of magic in how the
kernel gets built that is unnecessary.

Can someone verify that deleting __weak is enough to get powerpc to
build?  AKA:

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 8347fc158d2b..7f4ca8dbe26f 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -117,7 +117,7 @@ int __weak arch_kexec_kernel_verify_sig(struct kimage *image, void *buf,
  *
  * Return: 0 on success, negative errno on error.
  */
-int __weak
+int
 arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
                                 const Elf_Shdr *relsec, const Elf_Shdr *symtab)
 {
@@ -134,7 +134,7 @@ arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
  *
  * Return: 0 on success, negative errno on error.
  */
-int __weak
+int
 arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section,
                             const Elf_Shdr *relsec, const Elf_Shdr *symtab)
 {

If that change is verified to work a proper patch that keeps x86 and
s390 building that have actual implementations should not be too
difficult to write.

Eric

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-18 14:48           ` Eric W. Biederman
  0 siblings, 0 replies; 33+ messages in thread
From: Eric W. Biederman @ 2022-05-18 14:48 UTC (permalink / raw)
  To: kexec

Michael Ellerman <mpe@ellerman.id.au> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>> Looking at this the pr_err is absolutely needed.  If an unsupported case
>> winds up in the purgatory blob and the code can't handle it things
>> will fail silently much worse later.
>
> It won't fail later, it will fail the syscall.
>
> sys_kexec_file_load()
>   kimage_file_alloc_init()
>     kimage_file_prepare_segments()
>       arch_kexec_kernel_image_load()
>         kexec_image_load_default()
>           image->fops->load()
>             elf64_load()        # powerpc
>             bzImage64_load()    # x86
>               kexec_load_purgatory()
>                 kexec_apply_relocations()
>
> Which does:
>
> 	if (relsec->sh_type == SHT_RELA)
> 		ret = arch_kexec_apply_relocations_add(pi, section,
> 						       relsec, symtab);
> 	else if (relsec->sh_type == SHT_REL)
> 		ret = arch_kexec_apply_relocations(pi, section,
> 						   relsec, symtab);
> 	if (ret)
> 		return ret;
>
> And that error is bubbled all the way back up. So as long as
> arch_kexec_apply_relocations() returns an error the syscall will fail
> back to userspace and there'll be an error message at that level.
>
> It's true that having nothing printed in dmesg makes it harder to work
> out why the syscall failed. But it's a kernel bug if there are unhandled
> relocations in the kernel-supplied purgatory code, so a user really has
> no way to do anything about the error even if it is printed.

Good point.  I really hadn't noticed the error code in there when I
looked.

I still don't think changing the functionality of the code because of
a tool issue is the right solution.


>> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
>>
>>> Baoquan He wrote:
>>>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>>>>> kexec_load_purgatory() can fail for many reasons - there is no need to
>>>>> print an error when encountering unsupported relocations.
>>>>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
>>>>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>>>>> symbols") [2], binutils started dropping section symbols that it thought
>>>> I am not familiar with binutils, while wondering if this exists in other
>>>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
>>>> have problem with it?
>>>
>>> I'm not aware of this specific file causing a problem on other architectures -
>>> perhaps the config options differ enough. There are however more reports of
>>> similar issues affecting other architectures with the llvm integrated assembler:
>>> https://github.com/ClangBuiltLinux/linux/issues/981
>>>
>>>>
>>>>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>>>>> is placing kexec_arch_apply_relocations[_add] into a separate
>>>>> .text.unlikely section and the section symbol ".text.unlikely" is being
>>>>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
>>>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
>>>
>>> Yes. Note that it is just the section symbol that gets dropped. The section is
>>> still present and will continue to hold the symbols for the functions
>>> themselves.
>>
>> So we have a case where binutils thinks it is doing something useful
>> and our kernel specific tool gets tripped up by it.
>
> It's not just binutils, the LLVM assembler has the same behavior.
>
>> Reading the recordmcount code it looks like it is finding any symbol
>> within a section but ignoring weak symbols.  So I suspect the only
>> remaining symbol in the section is __weak and that confuses
>> recordmcount.
>>
>> Does removing the __weak annotation on those functions fix the build
>> error?  If so we can restructure the kexec code to simply not use __weak
>> symbols.
>>
>> Otherwise the fix needs to be in recordmcount or binutils, and we should
>> loop whoever maintains recordmcount in to see what they can do.
>
> It seems that recordmcount is not really maintained anymore now that x86
> uses objtool?
>
> There've been several threads about fixing recordmcount, but none of
> them seem to have lead to a solution.

That is unfortunate.

> These weak symbol vs recordmcount problems have been worked around going
> back as far as 2020:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9

I am more than happy to adopt the kind of solution that was adopted
there in elfcore.h and simply get rid of __weak symbols in the kexec
code.

Using __weak symbols is really not the common kernel way of doing
things.  Using __weak symbols introduces a bit of magic in how the
kernel gets built that is unnecessary.

Can someone verify that deleting __weak is enough to get powerpc to
build?  AKA:

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 8347fc158d2b..7f4ca8dbe26f 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -117,7 +117,7 @@ int __weak arch_kexec_kernel_verify_sig(struct kimage *image, void *buf,
  *
  * Return: 0 on success, negative errno on error.
  */
-int __weak
+int
 arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
                                 const Elf_Shdr *relsec, const Elf_Shdr *symtab)
 {
@@ -134,7 +134,7 @@ arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
  *
  * Return: 0 on success, negative errno on error.
  */
-int __weak
+int
 arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section,
                             const Elf_Shdr *relsec, const Elf_Shdr *symtab)
 {

If that change is verified to work a proper patch that keeps x86 and
s390 building that have actual implementations should not be too
difficult to write.

Eric


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
  2022-05-18 14:48           ` Eric W. Biederman
  (?)
@ 2022-05-18 16:48             ` Naveen N. Rao
  -1 siblings, 0 replies; 33+ messages in thread
From: Naveen N. Rao @ 2022-05-18 16:48 UTC (permalink / raw)
  To: Eric W. Biederman, Michael Ellerman
  Cc: Baoquan He, kexec, linux-kernel, linuxppc-dev

Eric W. Biederman wrote:
> Michael Ellerman <mpe@ellerman.id.au> writes:
> 
>> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>>> Looking at this the pr_err is absolutely needed.  If an unsupported case
>>> winds up in the purgatory blob and the code can't handle it things
>>> will fail silently much worse later.
>>
>> It won't fail later, it will fail the syscall.
>>
>> sys_kexec_file_load()
>>   kimage_file_alloc_init()
>>     kimage_file_prepare_segments()
>>       arch_kexec_kernel_image_load()
>>         kexec_image_load_default()
>>           image->fops->load()
>>             elf64_load()        # powerpc
>>             bzImage64_load()    # x86
>>               kexec_load_purgatory()
>>                 kexec_apply_relocations()
>>
>> Which does:
>>
>> 	if (relsec->sh_type == SHT_RELA)
>> 		ret = arch_kexec_apply_relocations_add(pi, section,
>> 						       relsec, symtab);
>> 	else if (relsec->sh_type == SHT_REL)
>> 		ret = arch_kexec_apply_relocations(pi, section,
>> 						   relsec, symtab);
>> 	if (ret)
>> 		return ret;
>>
>> And that error is bubbled all the way back up. So as long as
>> arch_kexec_apply_relocations() returns an error the syscall will fail
>> back to userspace and there'll be an error message at that level.
>>
>> It's true that having nothing printed in dmesg makes it harder to work
>> out why the syscall failed. But it's a kernel bug if there are unhandled
>> relocations in the kernel-supplied purgatory code, so a user really has
>> no way to do anything about the error even if it is printed.
> 
> Good point.  I really hadn't noticed the error code in there when I
> looked.
> 
> I still don't think changing the functionality of the code because of
> a tool issue is the right solution.

Ok.

> 
> 
>>> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
>>>
>>>> Baoquan He wrote:
>>>>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>>>>>> kexec_load_purgatory() can fail for many reasons - there is no need to
>>>>>> print an error when encountering unsupported relocations.
>>>>>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
>>>>>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>>>>>> symbols") [2], binutils started dropping section symbols that it thought
>>>>> I am not familiar with binutils, while wondering if this exists in other
>>>>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
>>>>> have problem with it?
>>>>
>>>> I'm not aware of this specific file causing a problem on other architectures -
>>>> perhaps the config options differ enough. There are however more reports of
>>>> similar issues affecting other architectures with the llvm integrated assembler:
>>>> https://github.com/ClangBuiltLinux/linux/issues/981
>>>>
>>>>>
>>>>>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>>>>>> is placing kexec_arch_apply_relocations[_add] into a separate
>>>>>> .text.unlikely section and the section symbol ".text.unlikely" is being
>>>>>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
>>>>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
>>>>
>>>> Yes. Note that it is just the section symbol that gets dropped. The section is
>>>> still present and will continue to hold the symbols for the functions
>>>> themselves.
>>>
>>> So we have a case where binutils thinks it is doing something useful
>>> and our kernel specific tool gets tripped up by it.
>>
>> It's not just binutils, the LLVM assembler has the same behavior.
>>
>>> Reading the recordmcount code it looks like it is finding any symbol
>>> within a section but ignoring weak symbols.  So I suspect the only
>>> remaining symbol in the section is __weak and that confuses
>>> recordmcount.
>>>
>>> Does removing the __weak annotation on those functions fix the build
>>> error?  If so we can restructure the kexec code to simply not use __weak
>>> symbols.
>>>
>>> Otherwise the fix needs to be in recordmcount or binutils, and we should
>>> loop whoever maintains recordmcount in to see what they can do.
>>
>> It seems that recordmcount is not really maintained anymore now that x86
>> uses objtool?
>>
>> There've been several threads about fixing recordmcount, but none of
>> them seem to have lead to a solution.
> 
> That is unfortunate.
> 
>> These weak symbol vs recordmcount problems have been worked around going
>> back as far as 2020:
>>
>>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9
> 
> I am more than happy to adopt the kind of solution that was adopted
> there in elfcore.h and simply get rid of __weak symbols in the kexec
> code.
> 
> Using __weak symbols is really not the common kernel way of doing
> things.  Using __weak symbols introduces a bit of magic in how the
> kernel gets built that is unnecessary.
> 
> Can someone verify that deleting __weak is enough to get powerpc to
> build?  AKA:
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 8347fc158d2b..7f4ca8dbe26f 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -117,7 +117,7 @@ int __weak arch_kexec_kernel_verify_sig(struct kimage *image, void *buf,
>   *
>   * Return: 0 on success, negative errno on error.
>   */
> -int __weak
> +int
>  arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
>                                  const Elf_Shdr *relsec, const Elf_Shdr *symtab)
>  {
> @@ -134,7 +134,7 @@ arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
>   *
>   * Return: 0 on success, negative errno on error.
>   */
> -int __weak
> +int
>  arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section,
>                              const Elf_Shdr *relsec, const Elf_Shdr *symtab)
>  {

Yes, dropping the __weak attribute allows recordmcount to emit a 
relocation using those symbols, so that resolves the problem.

> 
> If that change is verified to work a proper patch that keeps x86 and
> s390 building that have actual implementations should not be too
> difficult to write.

Sure, I will post a patch for that.


Thanks,
Naveen


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-18 16:48             ` Naveen N. Rao
  0 siblings, 0 replies; 33+ messages in thread
From: Naveen N. Rao @ 2022-05-18 16:48 UTC (permalink / raw)
  To: Eric W. Biederman, Michael Ellerman
  Cc: linuxppc-dev, kexec, linux-kernel, Baoquan He

Eric W. Biederman wrote:
> Michael Ellerman <mpe@ellerman.id.au> writes:
> 
>> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>>> Looking at this the pr_err is absolutely needed.  If an unsupported case
>>> winds up in the purgatory blob and the code can't handle it things
>>> will fail silently much worse later.
>>
>> It won't fail later, it will fail the syscall.
>>
>> sys_kexec_file_load()
>>   kimage_file_alloc_init()
>>     kimage_file_prepare_segments()
>>       arch_kexec_kernel_image_load()
>>         kexec_image_load_default()
>>           image->fops->load()
>>             elf64_load()        # powerpc
>>             bzImage64_load()    # x86
>>               kexec_load_purgatory()
>>                 kexec_apply_relocations()
>>
>> Which does:
>>
>> 	if (relsec->sh_type == SHT_RELA)
>> 		ret = arch_kexec_apply_relocations_add(pi, section,
>> 						       relsec, symtab);
>> 	else if (relsec->sh_type == SHT_REL)
>> 		ret = arch_kexec_apply_relocations(pi, section,
>> 						   relsec, symtab);
>> 	if (ret)
>> 		return ret;
>>
>> And that error is bubbled all the way back up. So as long as
>> arch_kexec_apply_relocations() returns an error the syscall will fail
>> back to userspace and there'll be an error message at that level.
>>
>> It's true that having nothing printed in dmesg makes it harder to work
>> out why the syscall failed. But it's a kernel bug if there are unhandled
>> relocations in the kernel-supplied purgatory code, so a user really has
>> no way to do anything about the error even if it is printed.
> 
> Good point.  I really hadn't noticed the error code in there when I
> looked.
> 
> I still don't think changing the functionality of the code because of
> a tool issue is the right solution.

Ok.

> 
> 
>>> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
>>>
>>>> Baoquan He wrote:
>>>>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>>>>>> kexec_load_purgatory() can fail for many reasons - there is no need to
>>>>>> print an error when encountering unsupported relocations.
>>>>>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
>>>>>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>>>>>> symbols") [2], binutils started dropping section symbols that it thought
>>>>> I am not familiar with binutils, while wondering if this exists in other
>>>>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
>>>>> have problem with it?
>>>>
>>>> I'm not aware of this specific file causing a problem on other architectures -
>>>> perhaps the config options differ enough. There are however more reports of
>>>> similar issues affecting other architectures with the llvm integrated assembler:
>>>> https://github.com/ClangBuiltLinux/linux/issues/981
>>>>
>>>>>
>>>>>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>>>>>> is placing kexec_arch_apply_relocations[_add] into a separate
>>>>>> .text.unlikely section and the section symbol ".text.unlikely" is being
>>>>>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
>>>>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
>>>>
>>>> Yes. Note that it is just the section symbol that gets dropped. The section is
>>>> still present and will continue to hold the symbols for the functions
>>>> themselves.
>>>
>>> So we have a case where binutils thinks it is doing something useful
>>> and our kernel specific tool gets tripped up by it.
>>
>> It's not just binutils, the LLVM assembler has the same behavior.
>>
>>> Reading the recordmcount code it looks like it is finding any symbol
>>> within a section but ignoring weak symbols.  So I suspect the only
>>> remaining symbol in the section is __weak and that confuses
>>> recordmcount.
>>>
>>> Does removing the __weak annotation on those functions fix the build
>>> error?  If so we can restructure the kexec code to simply not use __weak
>>> symbols.
>>>
>>> Otherwise the fix needs to be in recordmcount or binutils, and we should
>>> loop whoever maintains recordmcount in to see what they can do.
>>
>> It seems that recordmcount is not really maintained anymore now that x86
>> uses objtool?
>>
>> There've been several threads about fixing recordmcount, but none of
>> them seem to have lead to a solution.
> 
> That is unfortunate.
> 
>> These weak symbol vs recordmcount problems have been worked around going
>> back as far as 2020:
>>
>>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9
> 
> I am more than happy to adopt the kind of solution that was adopted
> there in elfcore.h and simply get rid of __weak symbols in the kexec
> code.
> 
> Using __weak symbols is really not the common kernel way of doing
> things.  Using __weak symbols introduces a bit of magic in how the
> kernel gets built that is unnecessary.
> 
> Can someone verify that deleting __weak is enough to get powerpc to
> build?  AKA:
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 8347fc158d2b..7f4ca8dbe26f 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -117,7 +117,7 @@ int __weak arch_kexec_kernel_verify_sig(struct kimage *image, void *buf,
>   *
>   * Return: 0 on success, negative errno on error.
>   */
> -int __weak
> +int
>  arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
>                                  const Elf_Shdr *relsec, const Elf_Shdr *symtab)
>  {
> @@ -134,7 +134,7 @@ arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
>   *
>   * Return: 0 on success, negative errno on error.
>   */
> -int __weak
> +int
>  arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section,
>                              const Elf_Shdr *relsec, const Elf_Shdr *symtab)
>  {

Yes, dropping the __weak attribute allows recordmcount to emit a 
relocation using those symbols, so that resolves the problem.

> 
> If that change is verified to work a proper patch that keeps x86 and
> s390 building that have actual implementations should not be too
> difficult to write.

Sure, I will post a patch for that.


Thanks,
Naveen


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]
@ 2022-05-18 16:48             ` Naveen N. Rao
  0 siblings, 0 replies; 33+ messages in thread
From: Naveen N. Rao @ 2022-05-18 16:48 UTC (permalink / raw)
  To: kexec

Eric W. Biederman wrote:
> Michael Ellerman <mpe@ellerman.id.au> writes:
> 
>> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>>> Looking at this the pr_err is absolutely needed.  If an unsupported case
>>> winds up in the purgatory blob and the code can't handle it things
>>> will fail silently much worse later.
>>
>> It won't fail later, it will fail the syscall.
>>
>> sys_kexec_file_load()
>>   kimage_file_alloc_init()
>>     kimage_file_prepare_segments()
>>       arch_kexec_kernel_image_load()
>>         kexec_image_load_default()
>>           image->fops->load()
>>             elf64_load()        # powerpc
>>             bzImage64_load()    # x86
>>               kexec_load_purgatory()
>>                 kexec_apply_relocations()
>>
>> Which does:
>>
>> 	if (relsec->sh_type == SHT_RELA)
>> 		ret = arch_kexec_apply_relocations_add(pi, section,
>> 						       relsec, symtab);
>> 	else if (relsec->sh_type == SHT_REL)
>> 		ret = arch_kexec_apply_relocations(pi, section,
>> 						   relsec, symtab);
>> 	if (ret)
>> 		return ret;
>>
>> And that error is bubbled all the way back up. So as long as
>> arch_kexec_apply_relocations() returns an error the syscall will fail
>> back to userspace and there'll be an error message at that level.
>>
>> It's true that having nothing printed in dmesg makes it harder to work
>> out why the syscall failed. But it's a kernel bug if there are unhandled
>> relocations in the kernel-supplied purgatory code, so a user really has
>> no way to do anything about the error even if it is printed.
> 
> Good point.  I really hadn't noticed the error code in there when I
> looked.
> 
> I still don't think changing the functionality of the code because of
> a tool issue is the right solution.

Ok.

> 
> 
>>> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
>>>
>>>> Baoquan He wrote:
>>>>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>>>>>> kexec_load_purgatory() can fail for many reasons - there is no need to
>>>>>> print an error when encountering unsupported relocations.
>>>>>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
>>>>>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>>>>>> symbols") [2], binutils started dropping section symbols that it thought
>>>>> I am not familiar with binutils, while wondering if this exists in other
>>>>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
>>>>> have problem with it?
>>>>
>>>> I'm not aware of this specific file causing a problem on other architectures -
>>>> perhaps the config options differ enough. There are however more reports of
>>>> similar issues affecting other architectures with the llvm integrated assembler:
>>>> https://github.com/ClangBuiltLinux/linux/issues/981
>>>>
>>>>>
>>>>>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>>>>>> is placing kexec_arch_apply_relocations[_add] into a separate
>>>>>> .text.unlikely section and the section symbol ".text.unlikely" is being
>>>>>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
>>>>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
>>>>
>>>> Yes. Note that it is just the section symbol that gets dropped. The section is
>>>> still present and will continue to hold the symbols for the functions
>>>> themselves.
>>>
>>> So we have a case where binutils thinks it is doing something useful
>>> and our kernel specific tool gets tripped up by it.
>>
>> It's not just binutils, the LLVM assembler has the same behavior.
>>
>>> Reading the recordmcount code it looks like it is finding any symbol
>>> within a section but ignoring weak symbols.  So I suspect the only
>>> remaining symbol in the section is __weak and that confuses
>>> recordmcount.
>>>
>>> Does removing the __weak annotation on those functions fix the build
>>> error?  If so we can restructure the kexec code to simply not use __weak
>>> symbols.
>>>
>>> Otherwise the fix needs to be in recordmcount or binutils, and we should
>>> loop whoever maintains recordmcount in to see what they can do.
>>
>> It seems that recordmcount is not really maintained anymore now that x86
>> uses objtool?
>>
>> There've been several threads about fixing recordmcount, but none of
>> them seem to have lead to a solution.
> 
> That is unfortunate.
> 
>> These weak symbol vs recordmcount problems have been worked around going
>> back as far as 2020:
>>
>>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9
> 
> I am more than happy to adopt the kind of solution that was adopted
> there in elfcore.h and simply get rid of __weak symbols in the kexec
> code.
> 
> Using __weak symbols is really not the common kernel way of doing
> things.  Using __weak symbols introduces a bit of magic in how the
> kernel gets built that is unnecessary.
> 
> Can someone verify that deleting __weak is enough to get powerpc to
> build?  AKA:
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 8347fc158d2b..7f4ca8dbe26f 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -117,7 +117,7 @@ int __weak arch_kexec_kernel_verify_sig(struct kimage *image, void *buf,
>   *
>   * Return: 0 on success, negative errno on error.
>   */
> -int __weak
> +int
>  arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
>                                  const Elf_Shdr *relsec, const Elf_Shdr *symtab)
>  {
> @@ -134,7 +134,7 @@ arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
>   *
>   * Return: 0 on success, negative errno on error.
>   */
> -int __weak
> +int
>  arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section,
>                              const Elf_Shdr *relsec, const Elf_Shdr *symtab)
>  {

Yes, dropping the __weak attribute allows recordmcount to emit a 
relocation using those symbols, so that resolves the problem.

> 
> If that change is verified to work a proper patch that keeps x86 and
> s390 building that have actual implementations should not be too
> difficult to write.

Sure, I will post a patch for that.


Thanks,
Naveen



^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-05-18 16:49 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-25 17:41 [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add] Naveen N. Rao
2022-04-25 17:41 ` Naveen N. Rao
2022-04-25 17:41 ` Naveen N. Rao
2022-05-17  7:58 ` Naveen N. Rao
2022-05-17  7:58   ` Naveen N. Rao
2022-05-17  7:58   ` Naveen N. Rao
2022-05-17  9:25 ` Baoquan He
2022-05-17  9:25   ` Baoquan He
2022-05-17  9:25   ` Baoquan He
2022-05-17 10:19   ` Naveen N. Rao
2022-05-17 10:19     ` Naveen N. Rao
2022-05-17 10:19     ` Naveen N. Rao
2022-05-17 15:32     ` Eric W. Biederman
2022-05-17 15:32       ` Eric W. Biederman
2022-05-17 15:32       ` Eric W. Biederman
2022-05-18  2:26       ` Michael Ellerman
2022-05-18  2:26         ` Michael Ellerman
2022-05-18  2:26         ` Michael Ellerman
2022-05-18  7:49         ` Baoquan He
2022-05-18  7:49           ` Baoquan He
2022-05-18  7:49           ` Baoquan He
2022-05-18  9:18           ` Naveen N. Rao
2022-05-18  9:18             ` Naveen N. Rao
2022-05-18  9:18             ` Naveen N. Rao
2022-05-18 10:11             ` Baoquan He
2022-05-18 10:11               ` Baoquan He
2022-05-18 10:11               ` Baoquan He
2022-05-18 14:48         ` Eric W. Biederman
2022-05-18 14:48           ` Eric W. Biederman
2022-05-18 14:48           ` Eric W. Biederman
2022-05-18 16:48           ` Naveen N. Rao
2022-05-18 16:48             ` Naveen N. Rao
2022-05-18 16:48             ` Naveen N. Rao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.