On Sun, Mar 17, 2024 at 07:49:43PM -0700, David Wei wrote: > I'm working on a similar proposal for zero copy Rx but to host memory > and depend on this memory provider API. How do you need a different provider for that vs just udmabuf? > Jakub also designed this API for hugepages too IIRC. Basically there's > going to be at least three fancy ways of providing pages (one of which > isn't actually pages, hence the merged netmem_t series) to drivers. How do hugepages different from a normal page allocation? They should just a different ordered passed to the page allocator.
From: Jason A. Donenfeld <Jason@zx2c4.com> Sent: Monday, March 18, 2024 2:03 PM > > On Mon, Mar 18, 2024 at 08:54:08AM -0700, mhkelley58@gmail.com wrote: > > From: Michael Kelley <mhklinux@outlook.com> > > > > A Hyper-V host provides its guest VMs with entropy in a custom ACPI > > table named "OEM0". The entropy bits are updated each time Hyper-V > > boots the VM, and are suitable for seeding the Linux guest random > > number generator (rng). See a brief description of OEM0 in [1]. > > > > Generation 2 VMs on Hyper-V use UEFI to boot. Existing EFI code in > > Linux seeds the rng with entropy bits from the EFI_RNG_PROTOCOL. > > Via this path, the rng is seeded very early during boot with good > > entropy. The ACPI OEM0 table provided in such VMs is an additional > > source of entropy. > > > > Generation 1 VMs on Hyper-V boot from BIOS. For these VMs, Linux > > doesn't currently get any entropy from the Hyper-V host. While this > > is not fundamentally broken because Linux can generate its own entropy, > > using the Hyper-V host provided entropy would get the rng off to a > > better start and would do so earlier in the boot process. > > > > Improve the rng seeding for Generation 1 VMs by having Hyper-V specific > > code in Linux take advantage of the OEM0 table to seed the rng. For > > Generation 2 VMs, use the OEM0 table to provide additional entropy > > beyond the EFI_RNG_PROTOCOL. Because the OEM0 table is custom to > > Hyper-V, parse it directly in the Hyper-V code in the Linux kernel > > and use add_bootloader_randomness() to add it to the rng. Once the > > entropy bits are read from OEM0, zero them out in the table so > > they don't appear in /sys/firmware/acpi/tables/OEM0 in the running > > VM. The zero'ing is done out of an abundance of caution to avoid > > potential security risks to the rng. Also set the OEM0 data length > > to zero so a kexec or other subsequent use of the table won't try > > to use the zero'ed bits. > > > > [1] https://download.microsoft.com/download/1/c/9/1c9813b8-089c-4fef-b2ad-ad80e79403ba/Whitepaper%20-%20The%20Windows%2010%20random%20number%20generation%20infrastructure.pdf > > Looks good to me. Assuming you've tested this and it works, Yes, tested on both x86 and arm64. Thanks. Michael > > Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> > > Thanks for the v3. > > Jason
On Mon, Mar 18, 2024 at 08:54:08AM -0700, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
>
> A Hyper-V host provides its guest VMs with entropy in a custom ACPI
> table named "OEM0". The entropy bits are updated each time Hyper-V
> boots the VM, and are suitable for seeding the Linux guest random
> number generator (rng). See a brief description of OEM0 in [1].
>
> Generation 2 VMs on Hyper-V use UEFI to boot. Existing EFI code in
> Linux seeds the rng with entropy bits from the EFI_RNG_PROTOCOL.
> Via this path, the rng is seeded very early during boot with good
> entropy. The ACPI OEM0 table provided in such VMs is an additional
> source of entropy.
>
> Generation 1 VMs on Hyper-V boot from BIOS. For these VMs, Linux
> doesn't currently get any entropy from the Hyper-V host. While this
> is not fundamentally broken because Linux can generate its own entropy,
> using the Hyper-V host provided entropy would get the rng off to a
> better start and would do so earlier in the boot process.
>
> Improve the rng seeding for Generation 1 VMs by having Hyper-V specific
> code in Linux take advantage of the OEM0 table to seed the rng. For
> Generation 2 VMs, use the OEM0 table to provide additional entropy
> beyond the EFI_RNG_PROTOCOL. Because the OEM0 table is custom to
> Hyper-V, parse it directly in the Hyper-V code in the Linux kernel
> and use add_bootloader_randomness() to add it to the rng. Once the
> entropy bits are read from OEM0, zero them out in the table so
> they don't appear in /sys/firmware/acpi/tables/OEM0 in the running
> VM. The zero'ing is done out of an abundance of caution to avoid
> potential security risks to the rng. Also set the OEM0 data length
> to zero so a kexec or other subsequent use of the table won't try
> to use the zero'ed bits.
>
> [1] https://download.microsoft.com/download/1/c/9/1c9813b8-089c-4fef-b2ad-ad80e79403ba/Whitepaper%20-%20The%20Windows%2010%20random%20number%20generation%20infrastructure.pdf
>
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Applied to hyperv-next. Thanks.
Long gave his review in the last version. I assumed he's at least
onboard with the general idea here. Time is tight, so I applied this
patch without waiting for Long's ack again.
Long, if you have objections, please speak up.
Hi Michael,
On Mon, Mar 18, 2024 at 08:54:08AM -0700, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
>
> A Hyper-V host provides its guest VMs with entropy in a custom ACPI
> table named "OEM0". The entropy bits are updated each time Hyper-V
> boots the VM, and are suitable for seeding the Linux guest random
> number generator (rng). See a brief description of OEM0 in [1].
>
> Generation 2 VMs on Hyper-V use UEFI to boot. Existing EFI code in
> Linux seeds the rng with entropy bits from the EFI_RNG_PROTOCOL.
> Via this path, the rng is seeded very early during boot with good
> entropy. The ACPI OEM0 table provided in such VMs is an additional
> source of entropy.
>
> Generation 1 VMs on Hyper-V boot from BIOS. For these VMs, Linux
> doesn't currently get any entropy from the Hyper-V host. While this
> is not fundamentally broken because Linux can generate its own entropy,
> using the Hyper-V host provided entropy would get the rng off to a
> better start and would do so earlier in the boot process.
>
> Improve the rng seeding for Generation 1 VMs by having Hyper-V specific
> code in Linux take advantage of the OEM0 table to seed the rng. For
> Generation 2 VMs, use the OEM0 table to provide additional entropy
> beyond the EFI_RNG_PROTOCOL. Because the OEM0 table is custom to
> Hyper-V, parse it directly in the Hyper-V code in the Linux kernel
> and use add_bootloader_randomness() to add it to the rng. Once the
> entropy bits are read from OEM0, zero them out in the table so
> they don't appear in /sys/firmware/acpi/tables/OEM0 in the running
> VM. The zero'ing is done out of an abundance of caution to avoid
> potential security risks to the rng. Also set the OEM0 data length
> to zero so a kexec or other subsequent use of the table won't try
> to use the zero'ed bits.
>
> [1] https://download.microsoft.com/download/1/c/9/1c9813b8-089c-4fef-b2ad-ad80e79403ba/Whitepaper%20-%20The%20Windows%2010%20random%20number%20generation%20infrastructure.pdf
Looks good to me. Assuming you've tested this and it works,
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Thanks for the v3.
Jason
On 3/18/2024 8:54 AM, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
>
> A Hyper-V host provides its guest VMs with entropy in a custom ACPI
> table named "OEM0". The entropy bits are updated each time Hyper-V
> boots the VM, and are suitable for seeding the Linux guest random
> number generator (rng). See a brief description of OEM0 in [1].
>
> Generation 2 VMs on Hyper-V use UEFI to boot. Existing EFI code in
> Linux seeds the rng with entropy bits from the EFI_RNG_PROTOCOL.
> Via this path, the rng is seeded very early during boot with good
> entropy. The ACPI OEM0 table provided in such VMs is an additional
> source of entropy.
>
> Generation 1 VMs on Hyper-V boot from BIOS. For these VMs, Linux
> doesn't currently get any entropy from the Hyper-V host. While this
> is not fundamentally broken because Linux can generate its own entropy,
> using the Hyper-V host provided entropy would get the rng off to a
> better start and would do so earlier in the boot process.
>
> Improve the rng seeding for Generation 1 VMs by having Hyper-V specific
> code in Linux take advantage of the OEM0 table to seed the rng. For
> Generation 2 VMs, use the OEM0 table to provide additional entropy
> beyond the EFI_RNG_PROTOCOL. Because the OEM0 table is custom to
> Hyper-V, parse it directly in the Hyper-V code in the Linux kernel
> and use add_bootloader_randomness() to add it to the rng. Once the
> entropy bits are read from OEM0, zero them out in the table so
> they don't appear in /sys/firmware/acpi/tables/OEM0 in the running
> VM. The zero'ing is done out of an abundance of caution to avoid
> potential security risks to the rng. Also set the OEM0 data length
> to zero so a kexec or other subsequent use of the table won't try
> to use the zero'ed bits.
>
> [1] https://download.microsoft.com/download/1/c/9/1c9813b8-089c-4fef-b2ad-ad80e79403ba/Whitepaper%20-%20The%20Windows%2010%20random%20number%20generation%20infrastructure.pdf
>
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
> Changes in v3:
> * Removed restriction to just Generation 1 VMs. Generation 2 VMs
> now also use the additional entropy even though they also get
> initial entropy via EFI_RNG_PROTOCOL [Jason Donenfeld]
> * Process the OEM0 table on ARM64 systems in addition to x86/x64,
> as a result of no longer excluding Generation 2 VM.
> * Enlarge the range of entropy byte counts that are considered valid
> in the OEM0 table. New range is 8 to 4K; previously the range was
> 32 to 256. [Jason Donenfeld]
> * After processing the entropy bits in OEM0, also set the OEM0
> table length to indicate that the entropy byte count is zero,
> to prevent a subsequent kexec or other use of the table from
> trying to use the zero'ed bits. [Jason Donenfeld]
>
> Changes in v2:
> * Tweaked commit message [Wei Liu]
> * Removed message when OEM0 table isn't found. Added debug-level
> message when OEM0 is successfully used to add randomness. [Wei Liu]
>
> arch/arm64/hyperv/mshyperv.c | 2 +
> arch/x86/kernel/cpu/mshyperv.c | 1 +
> drivers/hv/hv_common.c | 69 ++++++++++++++++++++++++++++++++++
> include/asm-generic/mshyperv.h | 2 +
> 4 files changed, 74 insertions(+)
>
> diff --git a/arch/arm64/hyperv/mshyperv.c b/arch/arm64/hyperv/mshyperv.c
> index f1b8a04ee9f2..c8193cec1b90 100644
> --- a/arch/arm64/hyperv/mshyperv.c
> +++ b/arch/arm64/hyperv/mshyperv.c
> @@ -74,6 +74,8 @@ static int __init hyperv_init(void)
> return ret;
> }
>
> + ms_hyperv_late_init();
> +
> hyperv_initialized = true;
> return 0;
> }
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 303fef824167..65c9cbdd2282 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -648,6 +648,7 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
> .init.x2apic_available = ms_hyperv_x2apic_available,
> .init.msi_ext_dest_id = ms_hyperv_msi_ext_dest_id,
> .init.init_platform = ms_hyperv_init_platform,
> + .init.guest_late_init = ms_hyperv_late_init,
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> .runtime.sev_es_hcall_prepare = hv_sev_es_hcall_prepare,
> .runtime.sev_es_hcall_finish = hv_sev_es_hcall_finish,
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 0285a74363b3..724de94d885f 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -20,8 +20,11 @@
> #include <linux/sched/task_stack.h>
> #include <linux/panic_notifier.h>
> #include <linux/ptrace.h>
> +#include <linux/random.h>
> +#include <linux/efi.h>
> #include <linux/kdebug.h>
> #include <linux/kmsg_dump.h>
> +#include <linux/sizes.h>
> #include <linux/slab.h>
> #include <linux/dma-map-ops.h>
> #include <linux/set_memory.h>
> @@ -347,6 +350,72 @@ int __init hv_common_init(void)
> return 0;
> }
>
> +void __init ms_hyperv_late_init(void)
> +{
> + struct acpi_table_header *header;
> + acpi_status status;
> + u8 *randomdata;
> + u32 length, i;
> +
> + /*
> + * Seed the Linux random number generator with entropy provided by
> + * the Hyper-V host in ACPI table OEM0.
> + */
> + if (!IS_ENABLED(CONFIG_ACPI))
> + return;
> +
> + status = acpi_get_table("OEM0", 0, &header);
> + if (ACPI_FAILURE(status) || !header)
> + return;
> +
> + /*
> + * Since the "OEM0" table name is for OEM specific usage, verify
> + * that what we're seeing purports to be from Microsoft.
> + */
> + if (strncmp(header->oem_table_id, "MICROSFT", 8))
> + goto error;
> +
> + /*
> + * Ensure the length is reasonable. Requiring at least 8 bytes and
> + * no more than 4K bytes is somewhat arbitrary and just protects
> + * against a malformed table. Hyper-V currently provides 64 bytes,
> + * but allow for a change in a later version.
> + */
> + if (header->length < sizeof(*header) + 8 ||
> + header->length > sizeof(*header) + SZ_4K> + goto error;
> +
> + length = header->length - sizeof(*header);
> + randomdata = (u8 *)(header + 1);
> +
> + pr_debug("Hyper-V: Seeding rng with %d random bytes from ACPI table OEM0\n",
> + length);
> +
> + add_bootloader_randomness(randomdata, length);
> +
> + /*
> + * To prevent the seed data from being visible in /sys/firmware/acpi,
> + * zero out the random data in the ACPI table and fixup the checksum.
> + * The zero'ing is done out of an abundance of caution in avoiding
> + * potential security risks to the rng. Similarly, reset the table
> + * length to just the header size so that a subsequent kexec doesn't
> + * try to use the zero'ed out random data.
> + */
> + for (i = 0; i < length; i++) {
> + header->checksum += randomdata[i];
> + randomdata[i] = 0;
> + }
> +
> + for (i = 0; i < sizeof(header->length); i++)
> + header->checksum += ((u8 *)&header->length)[i];
> + header->length = sizeof(*header);
> + for (i = 0; i < sizeof(header->length); i++)
> + header->checksum -= ((u8 *)&header->length)[i];
> +
> +error:
> + acpi_put_table(header);
> +}
> +
> /*
> * Hyper-V specific initialization and die code for
> * individual CPUs that is common across all architectures.
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 430f0ae0dde2..e861223093df 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -193,6 +193,7 @@ extern u64 (*hv_read_reference_counter)(void);
>
> int __init hv_common_init(void);
> void __init hv_common_free(void);
> +void __init ms_hyperv_late_init(void);
> int hv_common_cpu_init(unsigned int cpu);
> int hv_common_cpu_die(unsigned int cpu);
>
> @@ -290,6 +291,7 @@ void hv_setup_dma_ops(struct device *dev, bool coherent);
> static inline bool hv_is_hyperv_initialized(void) { return false; }
> static inline bool hv_is_hibernation_supported(void) { return false; }
> static inline void hyperv_cleanup(void) {}
> +static inline void ms_hyperv_late_init(void) {}
> static inline bool hv_is_isolation_supported(void) { return false; }
> static inline enum hv_isolation_type hv_get_isolation_type(void)
> {
This patch looks good to me. The code comments were very helpful in explaining
what is going on.
Nuno
Hi Hucai,
On Mon, 2024-03-18 at 22:21 +0800, Huacai Chen wrote:
> Hi, SuperH maintainers,
>
> On Wed, Feb 8, 2023 at 8:59 PM John Paul Adrian Glaubitz
> <glaubitz@physik.fu-berlin.de> wrote:
> >
> > On Thu, 2022-07-14 at 16:41 +0800, Huacai Chen wrote:
> > > When CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS is selected,
> > > cpu_max_bits_warn() generates a runtime warning similar as below while
> > > we show /proc/cpuinfo. Fix this by using nr_cpu_ids (the runtime limit)
> > > instead of NR_CPUS to iterate CPUs.
> > >
> > > [ 3.052463] ------------[ cut here ]------------
> > > [ 3.059679] WARNING: CPU: 3 PID: 1 at include/linux/cpumask.h:108 show_cpuinfo+0x5e8/0x5f0
> > > [ 3.070072] Modules linked in: efivarfs autofs4
> > > [ 3.076257] CPU: 0 PID: 1 Comm: systemd Not tainted 5.19-rc5+ #1052
> > > [ 3.099465] Stack : 9000000100157b08 9000000000f18530 9000000000cf846c 9000000100154000
> > > [ 3.109127] 9000000100157a50 0000000000000000 9000000100157a58 9000000000ef7430
> > > [ 3.118774] 90000001001578e8 0000000000000040 0000000000000020 ffffffffffffffff
> > > [ 3.128412] 0000000000aaaaaa 1ab25f00eec96a37 900000010021de80 900000000101c890
> > > [ 3.138056] 0000000000000000 0000000000000000 0000000000000000 0000000000aaaaaa
> > > [ 3.147711] ffff8000339dc220 0000000000000001 0000000006ab4000 0000000000000000
> > > [ 3.157364] 900000000101c998 0000000000000004 9000000000ef7430 0000000000000000
> > > [ 3.167012] 0000000000000009 000000000000006c 0000000000000000 0000000000000000
> > > [ 3.176641] 9000000000d3de08 9000000001639390 90000000002086d8 00007ffff0080286
> > > [ 3.186260] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1c
> > > [ 3.195868] ...
> > > [ 3.199917] Call Trace:
> > > [ 3.203941] [<90000000002086d8>] show_stack+0x38/0x14c
> > > [ 3.210666] [<9000000000cf846c>] dump_stack_lvl+0x60/0x88
> > > [ 3.217625] [<900000000023d268>] __warn+0xd0/0x100
> > > [ 3.223958] [<9000000000cf3c90>] warn_slowpath_fmt+0x7c/0xcc
> > > [ 3.231150] [<9000000000210220>] show_cpuinfo+0x5e8/0x5f0
> > > [ 3.238080] [<90000000004f578c>] seq_read_iter+0x354/0x4b4
> > > [ 3.245098] [<90000000004c2e90>] new_sync_read+0x17c/0x1c4
> > > [ 3.252114] [<90000000004c5174>] vfs_read+0x138/0x1d0
> > > [ 3.258694] [<90000000004c55f8>] ksys_read+0x70/0x100
> > > [ 3.265265] [<9000000000cfde9c>] do_syscall+0x7c/0x94
> > > [ 3.271820] [<9000000000202fe4>] handle_syscall+0xc4/0x160
> > > [ 3.281824] ---[ end trace 8b484262b4b8c24c ]---
> > >
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > > ---
> > > arch/sh/kernel/cpu/proc.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/arch/sh/kernel/cpu/proc.c b/arch/sh/kernel/cpu/proc.c
> > > index a306bcd6b341..5f6d0e827bae 100644
> > > --- a/arch/sh/kernel/cpu/proc.c
> > > +++ b/arch/sh/kernel/cpu/proc.c
> > > @@ -132,7 +132,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> > >
> > > static void *c_start(struct seq_file *m, loff_t *pos)
> > > {
> > > - return *pos < NR_CPUS ? cpu_data + *pos : NULL;
> > > + return *pos < nr_cpu_ids ? cpu_data + *pos : NULL;
> > > }
> > > static void *c_next(struct seq_file *m, void *v, loff_t *pos)
> > > {
> >
> > I build-tested the patch and also booted the patched kernel successfully
> > on my SH-7785LCR board.
> >
> > Showing the contents of /proc/cpuinfo works fine, too:
> >
> > root@tirpitz:~> cat /proc/cpuinfo
> > machine : SH7785LCR
> > processor : 0
> > cpu family : sh4a
> > cpu type : SH7785
> > cut : 7.x
> > cpu flags : fpu perfctr llsc
> > cache type : split (harvard)
> > icache size : 32KiB (4-way)
> > dcache size : 32KiB (4-way)
> > address sizes : 32 bits physical
> > bogomips : 599.99
> > root@tirpitz:~>
> >
> > Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> >
> > I am not sure yet whether the change is also correct as I don't know whether
> > it's possible to change the number of CPUs at runtime on SuperH.
> Can this patch be merged? This is the only one still unmerged in the
> whole series.
Thanks for the reminder. I will pick it up for 6.10.
Got sick this week, so I can't pick up anymore patches for 6.9 and will just
send Linus a PR later this week.
Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
From: Michael Kelley <mhklinux@outlook.com> A Hyper-V host provides its guest VMs with entropy in a custom ACPI table named "OEM0". The entropy bits are updated each time Hyper-V boots the VM, and are suitable for seeding the Linux guest random number generator (rng). See a brief description of OEM0 in [1]. Generation 2 VMs on Hyper-V use UEFI to boot. Existing EFI code in Linux seeds the rng with entropy bits from the EFI_RNG_PROTOCOL. Via this path, the rng is seeded very early during boot with good entropy. The ACPI OEM0 table provided in such VMs is an additional source of entropy. Generation 1 VMs on Hyper-V boot from BIOS. For these VMs, Linux doesn't currently get any entropy from the Hyper-V host. While this is not fundamentally broken because Linux can generate its own entropy, using the Hyper-V host provided entropy would get the rng off to a better start and would do so earlier in the boot process. Improve the rng seeding for Generation 1 VMs by having Hyper-V specific code in Linux take advantage of the OEM0 table to seed the rng. For Generation 2 VMs, use the OEM0 table to provide additional entropy beyond the EFI_RNG_PROTOCOL. Because the OEM0 table is custom to Hyper-V, parse it directly in the Hyper-V code in the Linux kernel and use add_bootloader_randomness() to add it to the rng. Once the entropy bits are read from OEM0, zero them out in the table so they don't appear in /sys/firmware/acpi/tables/OEM0 in the running VM. The zero'ing is done out of an abundance of caution to avoid potential security risks to the rng. Also set the OEM0 data length to zero so a kexec or other subsequent use of the table won't try to use the zero'ed bits. [1] https://download.microsoft.com/download/1/c/9/1c9813b8-089c-4fef-b2ad-ad80e79403ba/Whitepaper%20-%20The%20Windows%2010%20random%20number%20generation%20infrastructure.pdf Signed-off-by: Michael Kelley <mhklinux@outlook.com> --- Changes in v3: * Removed restriction to just Generation 1 VMs. Generation 2 VMs now also use the additional entropy even though they also get initial entropy via EFI_RNG_PROTOCOL [Jason Donenfeld] * Process the OEM0 table on ARM64 systems in addition to x86/x64, as a result of no longer excluding Generation 2 VM. * Enlarge the range of entropy byte counts that are considered valid in the OEM0 table. New range is 8 to 4K; previously the range was 32 to 256. [Jason Donenfeld] * After processing the entropy bits in OEM0, also set the OEM0 table length to indicate that the entropy byte count is zero, to prevent a subsequent kexec or other use of the table from trying to use the zero'ed bits. [Jason Donenfeld] Changes in v2: * Tweaked commit message [Wei Liu] * Removed message when OEM0 table isn't found. Added debug-level message when OEM0 is successfully used to add randomness. [Wei Liu] arch/arm64/hyperv/mshyperv.c | 2 + arch/x86/kernel/cpu/mshyperv.c | 1 + drivers/hv/hv_common.c | 69 ++++++++++++++++++++++++++++++++++ include/asm-generic/mshyperv.h | 2 + 4 files changed, 74 insertions(+) diff --git a/arch/arm64/hyperv/mshyperv.c b/arch/arm64/hyperv/mshyperv.c index f1b8a04ee9f2..c8193cec1b90 100644 --- a/arch/arm64/hyperv/mshyperv.c +++ b/arch/arm64/hyperv/mshyperv.c @@ -74,6 +74,8 @@ static int __init hyperv_init(void) return ret; } + ms_hyperv_late_init(); + hyperv_initialized = true; return 0; } diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index 303fef824167..65c9cbdd2282 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -648,6 +648,7 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = { .init.x2apic_available = ms_hyperv_x2apic_available, .init.msi_ext_dest_id = ms_hyperv_msi_ext_dest_id, .init.init_platform = ms_hyperv_init_platform, + .init.guest_late_init = ms_hyperv_late_init, #ifdef CONFIG_AMD_MEM_ENCRYPT .runtime.sev_es_hcall_prepare = hv_sev_es_hcall_prepare, .runtime.sev_es_hcall_finish = hv_sev_es_hcall_finish, diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c index 0285a74363b3..724de94d885f 100644 --- a/drivers/hv/hv_common.c +++ b/drivers/hv/hv_common.c @@ -20,8 +20,11 @@ #include <linux/sched/task_stack.h> #include <linux/panic_notifier.h> #include <linux/ptrace.h> +#include <linux/random.h> +#include <linux/efi.h> #include <linux/kdebug.h> #include <linux/kmsg_dump.h> +#include <linux/sizes.h> #include <linux/slab.h> #include <linux/dma-map-ops.h> #include <linux/set_memory.h> @@ -347,6 +350,72 @@ int __init hv_common_init(void) return 0; } +void __init ms_hyperv_late_init(void) +{ + struct acpi_table_header *header; + acpi_status status; + u8 *randomdata; + u32 length, i; + + /* + * Seed the Linux random number generator with entropy provided by + * the Hyper-V host in ACPI table OEM0. + */ + if (!IS_ENABLED(CONFIG_ACPI)) + return; + + status = acpi_get_table("OEM0", 0, &header); + if (ACPI_FAILURE(status) || !header) + return; + + /* + * Since the "OEM0" table name is for OEM specific usage, verify + * that what we're seeing purports to be from Microsoft. + */ + if (strncmp(header->oem_table_id, "MICROSFT", 8)) + goto error; + + /* + * Ensure the length is reasonable. Requiring at least 8 bytes and + * no more than 4K bytes is somewhat arbitrary and just protects + * against a malformed table. Hyper-V currently provides 64 bytes, + * but allow for a change in a later version. + */ + if (header->length < sizeof(*header) + 8 || + header->length > sizeof(*header) + SZ_4K) + goto error; + + length = header->length - sizeof(*header); + randomdata = (u8 *)(header + 1); + + pr_debug("Hyper-V: Seeding rng with %d random bytes from ACPI table OEM0\n", + length); + + add_bootloader_randomness(randomdata, length); + + /* + * To prevent the seed data from being visible in /sys/firmware/acpi, + * zero out the random data in the ACPI table and fixup the checksum. + * The zero'ing is done out of an abundance of caution in avoiding + * potential security risks to the rng. Similarly, reset the table + * length to just the header size so that a subsequent kexec doesn't + * try to use the zero'ed out random data. + */ + for (i = 0; i < length; i++) { + header->checksum += randomdata[i]; + randomdata[i] = 0; + } + + for (i = 0; i < sizeof(header->length); i++) + header->checksum += ((u8 *)&header->length)[i]; + header->length = sizeof(*header); + for (i = 0; i < sizeof(header->length); i++) + header->checksum -= ((u8 *)&header->length)[i]; + +error: + acpi_put_table(header); +} + /* * Hyper-V specific initialization and die code for * individual CPUs that is common across all architectures. diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h index 430f0ae0dde2..e861223093df 100644 --- a/include/asm-generic/mshyperv.h +++ b/include/asm-generic/mshyperv.h @@ -193,6 +193,7 @@ extern u64 (*hv_read_reference_counter)(void); int __init hv_common_init(void); void __init hv_common_free(void); +void __init ms_hyperv_late_init(void); int hv_common_cpu_init(unsigned int cpu); int hv_common_cpu_die(unsigned int cpu); @@ -290,6 +291,7 @@ void hv_setup_dma_ops(struct device *dev, bool coherent); static inline bool hv_is_hyperv_initialized(void) { return false; } static inline bool hv_is_hibernation_supported(void) { return false; } static inline void hyperv_cleanup(void) {} +static inline void ms_hyperv_late_init(void) {} static inline bool hv_is_isolation_supported(void) { return false; } static inline enum hv_isolation_type hv_get_isolation_type(void) { -- 2.25.1
Hi, SuperH maintainers, On Wed, Feb 8, 2023 at 8:59 PM John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote: > > On Thu, 2022-07-14 at 16:41 +0800, Huacai Chen wrote: > > When CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS is selected, > > cpu_max_bits_warn() generates a runtime warning similar as below while > > we show /proc/cpuinfo. Fix this by using nr_cpu_ids (the runtime limit) > > instead of NR_CPUS to iterate CPUs. > > > > [ 3.052463] ------------[ cut here ]------------ > > [ 3.059679] WARNING: CPU: 3 PID: 1 at include/linux/cpumask.h:108 show_cpuinfo+0x5e8/0x5f0 > > [ 3.070072] Modules linked in: efivarfs autofs4 > > [ 3.076257] CPU: 0 PID: 1 Comm: systemd Not tainted 5.19-rc5+ #1052 > > [ 3.099465] Stack : 9000000100157b08 9000000000f18530 9000000000cf846c 9000000100154000 > > [ 3.109127] 9000000100157a50 0000000000000000 9000000100157a58 9000000000ef7430 > > [ 3.118774] 90000001001578e8 0000000000000040 0000000000000020 ffffffffffffffff > > [ 3.128412] 0000000000aaaaaa 1ab25f00eec96a37 900000010021de80 900000000101c890 > > [ 3.138056] 0000000000000000 0000000000000000 0000000000000000 0000000000aaaaaa > > [ 3.147711] ffff8000339dc220 0000000000000001 0000000006ab4000 0000000000000000 > > [ 3.157364] 900000000101c998 0000000000000004 9000000000ef7430 0000000000000000 > > [ 3.167012] 0000000000000009 000000000000006c 0000000000000000 0000000000000000 > > [ 3.176641] 9000000000d3de08 9000000001639390 90000000002086d8 00007ffff0080286 > > [ 3.186260] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1c > > [ 3.195868] ... > > [ 3.199917] Call Trace: > > [ 3.203941] [<90000000002086d8>] show_stack+0x38/0x14c > > [ 3.210666] [<9000000000cf846c>] dump_stack_lvl+0x60/0x88 > > [ 3.217625] [<900000000023d268>] __warn+0xd0/0x100 > > [ 3.223958] [<9000000000cf3c90>] warn_slowpath_fmt+0x7c/0xcc > > [ 3.231150] [<9000000000210220>] show_cpuinfo+0x5e8/0x5f0 > > [ 3.238080] [<90000000004f578c>] seq_read_iter+0x354/0x4b4 > > [ 3.245098] [<90000000004c2e90>] new_sync_read+0x17c/0x1c4 > > [ 3.252114] [<90000000004c5174>] vfs_read+0x138/0x1d0 > > [ 3.258694] [<90000000004c55f8>] ksys_read+0x70/0x100 > > [ 3.265265] [<9000000000cfde9c>] do_syscall+0x7c/0x94 > > [ 3.271820] [<9000000000202fe4>] handle_syscall+0xc4/0x160 > > [ 3.281824] ---[ end trace 8b484262b4b8c24c ]--- > > > > Cc: stable@vger.kernel.org > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> > > --- > > arch/sh/kernel/cpu/proc.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/sh/kernel/cpu/proc.c b/arch/sh/kernel/cpu/proc.c > > index a306bcd6b341..5f6d0e827bae 100644 > > --- a/arch/sh/kernel/cpu/proc.c > > +++ b/arch/sh/kernel/cpu/proc.c > > @@ -132,7 +132,7 @@ static int show_cpuinfo(struct seq_file *m, void *v) > > > > static void *c_start(struct seq_file *m, loff_t *pos) > > { > > - return *pos < NR_CPUS ? cpu_data + *pos : NULL; > > + return *pos < nr_cpu_ids ? cpu_data + *pos : NULL; > > } > > static void *c_next(struct seq_file *m, void *v, loff_t *pos) > > { > > I build-tested the patch and also booted the patched kernel successfully > on my SH-7785LCR board. > > Showing the contents of /proc/cpuinfo works fine, too: > > root@tirpitz:~> cat /proc/cpuinfo > machine : SH7785LCR > processor : 0 > cpu family : sh4a > cpu type : SH7785 > cut : 7.x > cpu flags : fpu perfctr llsc > cache type : split (harvard) > icache size : 32KiB (4-way) > dcache size : 32KiB (4-way) > address sizes : 32 bits physical > bogomips : 599.99 > root@tirpitz:~> > > Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> > > I am not sure yet whether the change is also correct as I don't know whether > it's possible to change the number of CPUs at runtime on SuperH. Can this patch be merged? This is the only one still unmerged in the whole series. Huacai > > Adrian > > -- > .''`. John Paul Adrian Glaubitz > : :' : Debian Developer > `. `' Physicist > `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
> 2024年3月18日 07:02,Paul E. McKenney <paulmck@kernel.org> wrote: > > On Mon, Mar 18, 2024 at 01:47:43AM +0800, Alan Huang wrote: >> Hi, >> >> I’m playing with the LKMM, then I saw the ISA2+pooncelock+pooncelock+pombonce. >> >> The original litmus is as follows: >> ------------------------------------------------------ >> P0(int *x, int *y, spinlock_t *mylock) >> { >> spin_lock(mylock); >> WRITE_ONCE(*x, 1); >> WRITE_ONCE(*y, 1); >> spin_unlock(mylock); >> } >> >> P1(int *y, int *z, spinlock_t *mylock) >> { >> int r0; >> >> spin_lock(mylock); >> r0 = READ_ONCE(*y); >> WRITE_ONCE(*z, 1); >> spin_unlock(mylock); >> } >> >> P2(int *x, int *z) >> { >> int r1; >> int r2; >> >> r2 = READ_ONCE(*z); >> smp_mb(); >> r1 = READ_ONCE(*x); >> } >> >> exists (1:r0=1 /\ 2:r2=1 /\ 2:r1=0) >> ------------------------------------------------------ >> Of course, the result is Never. >> >> But when I delete P0’s spin_lock and P1’s spin_unlock: >> ------------------------------------------------------- >> P0(int *x, int *y, spinlock_t *mylock) >> { >> WRITE_ONCE(*x, 1); >> WRITE_ONCE(*y, 1); >> spin_unlock(mylock); >> } >> >> P1(int *y, int *z, spinlock_t *mylock) >> { >> int r0; >> >> spin_lock(mylock); >> r0 = READ_ONCE(*y); >> WRITE_ONCE(*z, 1); >> } >> >> P2(int *x, int *z) >> { >> int r1; >> int r2; >> >> r2 = READ_ONCE(*z); >> smp_mb(); >> r1 = READ_ONCE(*x); >> } >> >> exists (1:r0=1 /\ 2:r2=1 /\ 2:r1=0) >> ------------------------------------------------------ >> Then herd told me the result is Sometimes. > > You mean like this? > > Test ISA2+pooncelock+pooncelock+pombonce Allowed > States 8 > 1:r0=0; 2:r1=0; 2:r2=0; > 1:r0=0; 2:r1=0; 2:r2=1; > 1:r0=0; 2:r1=1; 2:r2=0; > 1:r0=0; 2:r1=1; 2:r2=1; > 1:r0=1; 2:r1=0; 2:r2=0; > 1:r0=1; 2:r1=0; 2:r2=1; > 1:r0=1; 2:r1=1; 2:r2=0; > 1:r0=1; 2:r1=1; 2:r2=1; > Ok > Witnesses > Positive: 1 Negative: 7 > Flag unmatched-unlock > Condition exists (1:r0=1 /\ 2:r2=1 /\ 2:r1=0) > Observation ISA2+pooncelock+pooncelock+pombonce Sometimes 1 7 > Time ISA2+pooncelock+pooncelock+pombonce 0.01 > Hash=f55b8515e48310f812aa676084f2cc88 > >> Is this expected? > > There are no locks held initially, so why can't the following > sequence of events unfold: > > o P1() acquires the lock. > > o P0() does WRITE_ONCE(*y, 1). (Yes, out of order) > > o P1() does READ_ONCE(*y), and gets 1. > > o P1() does WRITE_ONCE(*z, 1). > > o P2() does READ_ONCE(*z) and gets 1. > > o P2() does smp_mb(), but there is nothing to order with. > > o P2() does READ_ONCE(*x) and gets 0. > > o P0() does WRITE_ONCE(*x, 1), but too late to affect P2(). > > o P0() releases the lock that is does not hold, which is why you see > the "Flag unmatched-unlock" in the output. LKMM is complaining > that the litmus test is not legitimate, and rightly so! Oh! I missed that line, Thank you for pointing this out! :) > > Or am I missing your point? > > Thanx, Paul
On 18.03.24 08:14, Xin Li (Intel) wrote:
> The stack of a task has been separated from the memory of a task_struct
> struture for a long time on x86, as a result __{start,end}_init_task no
> longer mark the start and end of the init_task structure, but its stack
> only.
>
> Rename __{start,end}_init_task to __{start,end}_init_stack.
>
> Note other architectures are not affected because __{start,end}_init_task
> are used on x86 only.
>
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Juergen
The stack of a task has been separated from the memory of a task_struct struture for a long time on x86, as a result __{start,end}_init_task no longer mark the start and end of the init_task structure, but its stack only. Rename __{start,end}_init_task to __{start,end}_init_stack. Note other architectures are not affected because __{start,end}_init_task are used on x86 only. Signed-off-by: Xin Li (Intel) <xin@zytor.com> --- Change since v1: * Revert an accident insane change, init_task to init_stack (Jürgen Groß). --- arch/x86/include/asm/processor.h | 4 ++-- arch/x86/kernel/head_64.S | 2 +- arch/x86/xen/xen-head.S | 2 +- include/asm-generic/vmlinux.lds.h | 6 +++--- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 811548f131f4..8b3a3f3bb859 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -636,10 +636,10 @@ static __always_inline void prefetchw(const void *x) #define KSTK_ESP(task) (task_pt_regs(task)->sp) #else -extern unsigned long __end_init_task[]; +extern unsigned long __end_init_stack[]; #define INIT_THREAD { \ - .sp = (unsigned long)&__end_init_task - \ + .sp = (unsigned long)&__end_init_stack - \ TOP_OF_KERNEL_STACK_PADDING - \ sizeof(struct pt_regs), \ } diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index d8198fbd70e5..c7babd7ebb0f 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -66,7 +66,7 @@ SYM_CODE_START_NOALIGN(startup_64) mov %rsi, %r15 /* Set up the stack for verify_cpu() */ - leaq (__end_init_task - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE)(%rip), %rsp + leaq (__end_init_stack - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE)(%rip), %rsp /* Setup GSBASE to allow stack canary access for C code */ movl $MSR_GS_BASE, %ecx diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S index 04101b984f24..43eadf03f46d 100644 --- a/arch/x86/xen/xen-head.S +++ b/arch/x86/xen/xen-head.S @@ -49,7 +49,7 @@ SYM_CODE_START(startup_xen) ANNOTATE_NOENDBR cld - leaq (__end_init_task - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE)(%rip), %rsp + leaq (__end_init_stack - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE)(%rip), %rsp /* Set up %gs. * diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 514d3002ad8a..cdfdcca23045 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -399,13 +399,13 @@ #define INIT_TASK_DATA(align) \ . = ALIGN(align); \ - __start_init_task = .; \ + __start_init_stack = .; \ init_thread_union = .; \ init_stack = .; \ KEEP(*(.data..init_task)) \ KEEP(*(.data..init_thread_info)) \ - . = __start_init_task + THREAD_SIZE; \ - __end_init_task = .; + . = __start_init_stack + THREAD_SIZE; \ + __end_init_stack = .; #define JUMP_TABLE_DATA \ . = ALIGN(8); \ base-commit: 7e19a79344df2ed5e106091c29338962261b0290 -- 2.44.0
Guenter Roeck <linux@roeck-us.net> writes:
> On 3/14/24 07:37, Guenter Roeck wrote:
>> On 3/14/24 06:36, Geert Uytterhoeven wrote:
>>> Hi Günter,
>>>
>>> On Tue, Mar 12, 2024 at 6:03 PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>> Some unit tests intentionally trigger warning backtraces by passing bad
>>>> parameters to kernel API functions. Such unit tests typically check the
>>>> return value from such calls, not the existence of the warning backtrace.
>>>>
>>>> Such intentionally generated warning backtraces are neither desirable
>>>> nor useful for a number of reasons.
>>>> - They can result in overlooked real problems.
>>>> - A warning that suddenly starts to show up in unit tests needs to be
>>>> investigated and has to be marked to be ignored, for example by
>>>> adjusting filter scripts. Such filters are ad-hoc because there is
>>>> no real standard format for warnings. On top of that, such filter
>>>> scripts would require constant maintenance.
>>>>
>>>> One option to address problem would be to add messages such as "expected
>>>> warning backtraces start / end here" to the kernel log. However, that
>>>> would again require filter scripts, it might result in missing real
>>>> problematic warning backtraces triggered while the test is running, and
>>>> the irrelevant backtrace(s) would still clog the kernel log.
>>>>
>>>> Solve the problem by providing a means to identify and suppress specific
>>>> warning backtraces while executing test code. Support suppressing multiple
>>>> backtraces while at the same time limiting changes to generic code to the
>>>> absolute minimum. Architecture specific changes are kept at minimum by
>>>> retaining function names only if both CONFIG_DEBUG_BUGVERBOSE and
>>>> CONFIG_KUNIT are enabled.
>>>>
>>>> The first patch of the series introduces the necessary infrastructure.
>>>> The second patch introduces support for counting suppressed backtraces.
>>>> This capability is used in patch three to implement unit tests.
>>>> Patch four documents the new API.
>>>> The next two patches add support for suppressing backtraces in drm_rect
>>>> and dev_addr_lists unit tests. These patches are intended to serve as
>>>> examples for the use of the functionality introduced with this series.
>>>> The remaining patches implement the necessary changes for all
>>>> architectures with GENERIC_BUG support.
>>>
>>> Thanks for your series!
>>>
>>> I gave it a try on m68k, just running backtrace-suppression-test,
>>> and that seems to work fine.
>>>
>>>> Design note:
>>>> Function pointers are only added to the __bug_table section if both
>>>> CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE are enabled to avoid image
>>>> size increases if CONFIG_KUNIT=n. There would be some benefits to
>>>> adding those pointers all the time (reduced complexity, ability to
>>>> display function names in BUG/WARNING messages). That change, if
>>>> desired, can be made later.
>>>
>>> Unfortunately this also increases kernel size in the CONFIG_KUNIT=m
>>> case (ca. 80 KiB for atari_defconfig), making it less attractive to have
>>> kunit and all tests enabled as modules in my standard kernel.
>>>
>>
>> Good point. Indeed, it does. I wanted to avoid adding a configuration option,
>> but maybe I should add it after all. How about something like this ?
>>
>> +config KUNIT_SUPPRESS_BACKTRACE
>> + bool "KUnit - Enable backtrace suppression"
>> + default y
>> + help
>> + Enable backtrace suppression for KUnit. If enabled, backtraces
>> + generated intentionally by KUnit tests can be suppressed. Disable
>> + to reduce kernel image size if image size is more important than
>> + suppression of backtraces generated by KUnit tests.
>
> Any more comments / feedback on this ? Otherwise I'll introduce the
> above configuration option in v2 of the series.
>
> In this context, any suggestions if it should be enabled or disabled by
> default ? I personally think it would be more important to be able to
> suppress backtraces, but I understand that others may not be willing to
> accept a ~1% image size increase with CONFIG_KUNIT=m unless
> KUNIT_SUPPRESS_BACKTRACE is explicitly disabled.
Please enable it by default.
There are multiple CI systems that will benefit from it, whereas the
number of users enabling KUNIT in severely spaced constrainted
environments is surely small - perhaps just Geert ;).
cheers
On 2024-03-17 19:02, Christoph Hellwig wrote: > On Mon, Mar 04, 2024 at 06:01:37PM -0800, Mina Almasry wrote: >> From: Jakub Kicinski <kuba@kernel.org> >> >> The page providers which try to reuse the same pages will >> need to hold onto the ref, even if page gets released from >> the pool - as in releasing the page from the pp just transfers >> the "ownership" reference from pp to the provider, and provider >> will wait for other references to be gone before feeding this >> page back into the pool. > > The word hook always rings a giant warning bell for me, and looking into > this series I am concerned indeed. > > The only provider provided here is the dma-buf one, and that basically > is the only sensible one for the documented design. So instead of > adding hooks that random proprietary crap can hook into, why not hard > code the dma buf provide and just use a flag? That'll also avoid > expensive indirect calls. I'm working on a similar proposal for zero copy Rx but to host memory and depend on this memory provider API. Jakub also designed this API for hugepages too IIRC. Basically there's going to be at least three fancy ways of providing pages (one of which isn't actually pages, hence the merged netmem_t series) to drivers. >
On Mon, Mar 04, 2024 at 06:01:37PM -0800, Mina Almasry wrote:
> From: Jakub Kicinski <kuba@kernel.org>
>
> The page providers which try to reuse the same pages will
> need to hold onto the ref, even if page gets released from
> the pool - as in releasing the page from the pp just transfers
> the "ownership" reference from pp to the provider, and provider
> will wait for other references to be gone before feeding this
> page back into the pool.
The word hook always rings a giant warning bell for me, and looking into
this series I am concerned indeed.
The only provider provided here is the dma-buf one, and that basically
is the only sensible one for the documented design. So instead of
adding hooks that random proprietary crap can hook into, why not hard
code the dma buf provide and just use a flag? That'll also avoid
expensive indirect calls.
On Mon, Mar 18, 2024 at 01:47:43AM +0800, Alan Huang wrote: > Hi, > > I’m playing with the LKMM, then I saw the ISA2+pooncelock+pooncelock+pombonce. > > The original litmus is as follows: > ------------------------------------------------------ > P0(int *x, int *y, spinlock_t *mylock) > { > spin_lock(mylock); > WRITE_ONCE(*x, 1); > WRITE_ONCE(*y, 1); > spin_unlock(mylock); > } > > P1(int *y, int *z, spinlock_t *mylock) > { > int r0; > > spin_lock(mylock); > r0 = READ_ONCE(*y); > WRITE_ONCE(*z, 1); > spin_unlock(mylock); > } > > P2(int *x, int *z) > { > int r1; > int r2; > > r2 = READ_ONCE(*z); > smp_mb(); > r1 = READ_ONCE(*x); > } > > exists (1:r0=1 /\ 2:r2=1 /\ 2:r1=0) > ------------------------------------------------------ > Of course, the result is Never. > > But when I delete P0’s spin_lock and P1’s spin_unlock: > ------------------------------------------------------- > P0(int *x, int *y, spinlock_t *mylock) > { > WRITE_ONCE(*x, 1); > WRITE_ONCE(*y, 1); > spin_unlock(mylock); > } > > P1(int *y, int *z, spinlock_t *mylock) > { > int r0; > > spin_lock(mylock); > r0 = READ_ONCE(*y); > WRITE_ONCE(*z, 1); > } > > P2(int *x, int *z) > { > int r1; > int r2; > > r2 = READ_ONCE(*z); > smp_mb(); > r1 = READ_ONCE(*x); > } > > exists (1:r0=1 /\ 2:r2=1 /\ 2:r1=0) > ------------------------------------------------------ > Then herd told me the result is Sometimes. You mean like this? Test ISA2+pooncelock+pooncelock+pombonce Allowed States 8 1:r0=0; 2:r1=0; 2:r2=0; 1:r0=0; 2:r1=0; 2:r2=1; 1:r0=0; 2:r1=1; 2:r2=0; 1:r0=0; 2:r1=1; 2:r2=1; 1:r0=1; 2:r1=0; 2:r2=0; 1:r0=1; 2:r1=0; 2:r2=1; 1:r0=1; 2:r1=1; 2:r2=0; 1:r0=1; 2:r1=1; 2:r2=1; Ok Witnesses Positive: 1 Negative: 7 Flag unmatched-unlock Condition exists (1:r0=1 /\ 2:r2=1 /\ 2:r1=0) Observation ISA2+pooncelock+pooncelock+pombonce Sometimes 1 7 Time ISA2+pooncelock+pooncelock+pombonce 0.01 Hash=f55b8515e48310f812aa676084f2cc88 > Is this expected? There are no locks held initially, so why can't the following sequence of events unfold: o P1() acquires the lock. o P0() does WRITE_ONCE(*y, 1). (Yes, out of order) o P1() does READ_ONCE(*y), and gets 1. o P1() does WRITE_ONCE(*z, 1). o P2() does READ_ONCE(*z) and gets 1. o P2() does smp_mb(), but there is nothing to order with. o P2() does READ_ONCE(*x) and gets 0. o P0() does WRITE_ONCE(*x, 1), but too late to affect P2(). o P0() releases the lock that is does not hold, which is why you see the "Flag unmatched-unlock" in the output. LKMM is complaining that the litmus test is not legitimate, and rightly so! Or am I missing your point? Thanx, Paul
Hi, I’m playing with the LKMM, then I saw the ISA2+pooncelock+pooncelock+pombonce. The original litmus is as follows: ------------------------------------------------------ P0(int *x, int *y, spinlock_t *mylock) { spin_lock(mylock); WRITE_ONCE(*x, 1); WRITE_ONCE(*y, 1); spin_unlock(mylock); } P1(int *y, int *z, spinlock_t *mylock) { int r0; spin_lock(mylock); r0 = READ_ONCE(*y); WRITE_ONCE(*z, 1); spin_unlock(mylock); } P2(int *x, int *z) { int r1; int r2; r2 = READ_ONCE(*z); smp_mb(); r1 = READ_ONCE(*x); } exists (1:r0=1 /\ 2:r2=1 /\ 2:r1=0) ------------------------------------------------------ Of course, the result is Never. But when I delete P0’s spin_lock and P1’s spin_unlock: ------------------------------------------------------- P0(int *x, int *y, spinlock_t *mylock) { WRITE_ONCE(*x, 1); WRITE_ONCE(*y, 1); spin_unlock(mylock); } P1(int *y, int *z, spinlock_t *mylock) { int r0; spin_lock(mylock); r0 = READ_ONCE(*y); WRITE_ONCE(*z, 1); } P2(int *x, int *z) { int r1; int r2; r2 = READ_ONCE(*z); smp_mb(); r1 = READ_ONCE(*x); } exists (1:r0=1 /\ 2:r2=1 /\ 2:r1=0) ------------------------------------------------------ Then herd told me the result is Sometimes. Is this expected?
On 3/14/24 07:37, Guenter Roeck wrote:
> On 3/14/24 06:36, Geert Uytterhoeven wrote:
>> Hi Günter,
>>
>> On Tue, Mar 12, 2024 at 6:03 PM Guenter Roeck <linux@roeck-us.net> wrote:
>>> Some unit tests intentionally trigger warning backtraces by passing bad
>>> parameters to kernel API functions. Such unit tests typically check the
>>> return value from such calls, not the existence of the warning backtrace.
>>>
>>> Such intentionally generated warning backtraces are neither desirable
>>> nor useful for a number of reasons.
>>> - They can result in overlooked real problems.
>>> - A warning that suddenly starts to show up in unit tests needs to be
>>> investigated and has to be marked to be ignored, for example by
>>> adjusting filter scripts. Such filters are ad-hoc because there is
>>> no real standard format for warnings. On top of that, such filter
>>> scripts would require constant maintenance.
>>>
>>> One option to address problem would be to add messages such as "expected
>>> warning backtraces start / end here" to the kernel log. However, that
>>> would again require filter scripts, it might result in missing real
>>> problematic warning backtraces triggered while the test is running, and
>>> the irrelevant backtrace(s) would still clog the kernel log.
>>>
>>> Solve the problem by providing a means to identify and suppress specific
>>> warning backtraces while executing test code. Support suppressing multiple
>>> backtraces while at the same time limiting changes to generic code to the
>>> absolute minimum. Architecture specific changes are kept at minimum by
>>> retaining function names only if both CONFIG_DEBUG_BUGVERBOSE and
>>> CONFIG_KUNIT are enabled.
>>>
>>> The first patch of the series introduces the necessary infrastructure.
>>> The second patch introduces support for counting suppressed backtraces.
>>> This capability is used in patch three to implement unit tests.
>>> Patch four documents the new API.
>>> The next two patches add support for suppressing backtraces in drm_rect
>>> and dev_addr_lists unit tests. These patches are intended to serve as
>>> examples for the use of the functionality introduced with this series.
>>> The remaining patches implement the necessary changes for all
>>> architectures with GENERIC_BUG support.
>>
>> Thanks for your series!
>>
>> I gave it a try on m68k, just running backtrace-suppression-test,
>> and that seems to work fine.
>>
>>> Design note:
>>> Function pointers are only added to the __bug_table section if both
>>> CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE are enabled to avoid image
>>> size increases if CONFIG_KUNIT=n. There would be some benefits to
>>> adding those pointers all the time (reduced complexity, ability to
>>> display function names in BUG/WARNING messages). That change, if
>>> desired, can be made later.
>>
>> Unfortunately this also increases kernel size in the CONFIG_KUNIT=m
>> case (ca. 80 KiB for atari_defconfig), making it less attractive to have
>> kunit and all tests enabled as modules in my standard kernel.
>>
>
> Good point. Indeed, it does. I wanted to avoid adding a configuration option,
> but maybe I should add it after all. How about something like this ?
>
> +config KUNIT_SUPPRESS_BACKTRACE
> + bool "KUnit - Enable backtrace suppression"
> + default y
> + help
> + Enable backtrace suppression for KUnit. If enabled, backtraces
> + generated intentionally by KUnit tests can be suppressed. Disable
> + to reduce kernel image size if image size is more important than
> + suppression of backtraces generated by KUnit tests.
> +
>
Any more comments / feedback on this ? Otherwise I'll introduce the
above configuration option in v2 of the series.
In this context, any suggestions if it should be enabled or disabled by
default ? I personally think it would be more important to be able to
suppress backtraces, but I understand that others may not be willing to
accept a ~1% image size increase with CONFIG_KUNIT=m unless
KUNIT_SUPPRESS_BACKTRACE is explicitly disabled.
Thanks,
Guenter
On Fri, Mar 15, 2024 at 4:52 PM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 3/15/24 16:43, Suren Baghdasaryan wrote: > > On Fri, Mar 15, 2024 at 3:58 AM Vlastimil Babka <vbabka@suse.cz> wrote: > >> > >> On 3/6/24 19:24, Suren Baghdasaryan wrote: > >> > Account slab allocations using codetag reference embedded into slabobj_ext. > >> > > >> > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > >> > Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> > >> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> > >> > Reviewed-by: Kees Cook <keescook@chromium.org> > >> > >> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> > >> > >> Nit below: > >> > >> > @@ -3833,6 +3913,7 @@ void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, > >> > unsigned int orig_size) > >> > { > >> > unsigned int zero_size = s->object_size; > >> > + struct slabobj_ext *obj_exts; > >> > bool kasan_init = init; > >> > size_t i; > >> > gfp_t init_flags = flags & gfp_allowed_mask; > >> > @@ -3875,6 +3956,12 @@ void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, > >> > kmemleak_alloc_recursive(p[i], s->object_size, 1, > >> > s->flags, init_flags); > >> > kmsan_slab_alloc(s, p[i], init_flags); > >> > + obj_exts = prepare_slab_obj_exts_hook(s, flags, p[i]); > >> > +#ifdef CONFIG_MEM_ALLOC_PROFILING > >> > + /* obj_exts can be allocated for other reasons */ > >> > + if (likely(obj_exts) && mem_alloc_profiling_enabled()) > > Could you at least flip these two checks then so the static key one goes first? Yes, definitely. I was thinking about removing need_slab_obj_ext() from prepare_slab_obj_exts_hook() and adding this instead of the above code: + if (need_slab_obj_ext()) { + obj_exts = prepare_slab_obj_exts_hook(s, flags, p[i]); +#ifdef CONFIG_MEM_ALLOC_PROFILING + /* + * Currently obj_exts is used only for allocation profiling. If other users appear + * then mem_alloc_profiling_enabled() check should be added here. + */ + if (likely(obj_exts)) + alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size); +#endif + } Does that look good? > >> > +#ifdef CONFIG_MEM_ALLOC_PROFILING > >> > + /* obj_exts can be allocated for other reasons */ > >> > + if (likely(obj_exts) && mem_alloc_profiling_enabled()) > > >> > + alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size); > >> > +#endif > >> > >> I think you could still do this a bit better: > >> > >> Check mem_alloc_profiling_enabled() once before the whole block calling > >> prepare_slab_obj_exts_hook() and alloc_tag_add() > >> Remove need_slab_obj_ext() check from prepare_slab_obj_exts_hook() > > > > Agree about checking mem_alloc_profiling_enabled() early and one time, > > except I would like to use need_slab_obj_ext() instead of > > mem_alloc_profiling_enabled() for that check. Currently they are > > equivalent but if there are more slab_obj_ext users in the future then > > there will be cases when we need to prepare_slab_obj_exts_hook() even > > when mem_alloc_profiling_enabled()==false. need_slab_obj_ext() will be > > easy to extend for such cases. > > I thought we don't generally future-proof internal implementation details > like this until it's actually needed. But at least what I suggested above > would help, thanks. > > > Thanks, > > Suren. > > > >> > >> > } > >> > > >> > memcg_slab_post_alloc_hook(s, objcg, flags, size, p); > >> > @@ -4353,6 +4440,7 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object, > >> > unsigned long addr) > >> > { > >> > memcg_slab_free_hook(s, slab, &object, 1); > >> > + alloc_tagging_slab_free_hook(s, slab, &object, 1); > >> > > >> > if (likely(slab_free_hook(s, object, slab_want_init_on_free(s)))) > >> > do_slab_free(s, slab, object, object, 1, addr); > >> > @@ -4363,6 +4451,7 @@ void slab_free_bulk(struct kmem_cache *s, struct slab *slab, void *head, > >> > void *tail, void **p, int cnt, unsigned long addr) > >> > { > >> > memcg_slab_free_hook(s, slab, p, cnt); > >> > + alloc_tagging_slab_free_hook(s, slab, p, cnt); > >> > /* > >> > * With KASAN enabled slab_free_freelist_hook modifies the freelist > >> > * to remove objects, whose reuse must be delayed. > >> > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
On 3/15/24 16:43, Suren Baghdasaryan wrote: > On Fri, Mar 15, 2024 at 3:58 AM Vlastimil Babka <vbabka@suse.cz> wrote: >> >> On 3/6/24 19:24, Suren Baghdasaryan wrote: >> > Account slab allocations using codetag reference embedded into slabobj_ext. >> > >> > Signed-off-by: Suren Baghdasaryan <surenb@google.com> >> > Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> >> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> >> > Reviewed-by: Kees Cook <keescook@chromium.org> >> >> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> >> >> Nit below: >> >> > @@ -3833,6 +3913,7 @@ void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, >> > unsigned int orig_size) >> > { >> > unsigned int zero_size = s->object_size; >> > + struct slabobj_ext *obj_exts; >> > bool kasan_init = init; >> > size_t i; >> > gfp_t init_flags = flags & gfp_allowed_mask; >> > @@ -3875,6 +3956,12 @@ void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, >> > kmemleak_alloc_recursive(p[i], s->object_size, 1, >> > s->flags, init_flags); >> > kmsan_slab_alloc(s, p[i], init_flags); >> > + obj_exts = prepare_slab_obj_exts_hook(s, flags, p[i]); >> > +#ifdef CONFIG_MEM_ALLOC_PROFILING >> > + /* obj_exts can be allocated for other reasons */ >> > + if (likely(obj_exts) && mem_alloc_profiling_enabled()) Could you at least flip these two checks then so the static key one goes first? >> > + alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size); >> > +#endif >> >> I think you could still do this a bit better: >> >> Check mem_alloc_profiling_enabled() once before the whole block calling >> prepare_slab_obj_exts_hook() and alloc_tag_add() >> Remove need_slab_obj_ext() check from prepare_slab_obj_exts_hook() > > Agree about checking mem_alloc_profiling_enabled() early and one time, > except I would like to use need_slab_obj_ext() instead of > mem_alloc_profiling_enabled() for that check. Currently they are > equivalent but if there are more slab_obj_ext users in the future then > there will be cases when we need to prepare_slab_obj_exts_hook() even > when mem_alloc_profiling_enabled()==false. need_slab_obj_ext() will be > easy to extend for such cases. I thought we don't generally future-proof internal implementation details like this until it's actually needed. But at least what I suggested above would help, thanks. > Thanks, > Suren. > >> >> > } >> > >> > memcg_slab_post_alloc_hook(s, objcg, flags, size, p); >> > @@ -4353,6 +4440,7 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object, >> > unsigned long addr) >> > { >> > memcg_slab_free_hook(s, slab, &object, 1); >> > + alloc_tagging_slab_free_hook(s, slab, &object, 1); >> > >> > if (likely(slab_free_hook(s, object, slab_want_init_on_free(s)))) >> > do_slab_free(s, slab, object, object, 1, addr); >> > @@ -4363,6 +4451,7 @@ void slab_free_bulk(struct kmem_cache *s, struct slab *slab, void *head, >> > void *tail, void **p, int cnt, unsigned long addr) >> > { >> > memcg_slab_free_hook(s, slab, p, cnt); >> > + alloc_tagging_slab_free_hook(s, slab, p, cnt); >> > /* >> > * With KASAN enabled slab_free_freelist_hook modifies the freelist >> > * to remove objects, whose reuse must be delayed. >>
On Fri, Mar 15, 2024 at 7:24 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Wed, Mar 06, 2024 at 10:24:12AM -0800, Suren Baghdasaryan wrote: > > +static inline void pgalloc_tag_add(struct page *page, struct task_struct *task, > > + unsigned int order) > > If you make this "unsigned int nr" instead of order, (a) it won't look > completely insane (what does adding an order even mean?) and (b) you > can reuse it from the __free_pages path. Sounds good to me. > > > @@ -1101,6 +1102,7 @@ __always_inline bool free_pages_prepare(struct page *page, > > /* Do not let hwpoison pages hit pcplists/buddy */ > > reset_page_owner(page, order); > > page_table_check_free(page, order); > > + pgalloc_tag_sub(page, order); > > Obviously you'll need to make sure all the callers now pass in 1 << > order instead of just order. Ack. >
On Fri, Mar 15, 2024 at 3:58 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 3/6/24 19:24, Suren Baghdasaryan wrote: > > Account slab allocations using codetag reference embedded into slabobj_ext. > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> > > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> > > Reviewed-by: Kees Cook <keescook@chromium.org> > > Reviewed-by: Vlastimil Babka <vbabka@suse.cz> > > Nit below: > > > @@ -3833,6 +3913,7 @@ void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, > > unsigned int orig_size) > > { > > unsigned int zero_size = s->object_size; > > + struct slabobj_ext *obj_exts; > > bool kasan_init = init; > > size_t i; > > gfp_t init_flags = flags & gfp_allowed_mask; > > @@ -3875,6 +3956,12 @@ void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, > > kmemleak_alloc_recursive(p[i], s->object_size, 1, > > s->flags, init_flags); > > kmsan_slab_alloc(s, p[i], init_flags); > > + obj_exts = prepare_slab_obj_exts_hook(s, flags, p[i]); > > +#ifdef CONFIG_MEM_ALLOC_PROFILING > > + /* obj_exts can be allocated for other reasons */ > > + if (likely(obj_exts) && mem_alloc_profiling_enabled()) > > + alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size); > > +#endif > > I think you could still do this a bit better: > > Check mem_alloc_profiling_enabled() once before the whole block calling > prepare_slab_obj_exts_hook() and alloc_tag_add() > Remove need_slab_obj_ext() check from prepare_slab_obj_exts_hook() Agree about checking mem_alloc_profiling_enabled() early and one time, except I would like to use need_slab_obj_ext() instead of mem_alloc_profiling_enabled() for that check. Currently they are equivalent but if there are more slab_obj_ext users in the future then there will be cases when we need to prepare_slab_obj_exts_hook() even when mem_alloc_profiling_enabled()==false. need_slab_obj_ext() will be easy to extend for such cases. Thanks, Suren. > > > } > > > > memcg_slab_post_alloc_hook(s, objcg, flags, size, p); > > @@ -4353,6 +4440,7 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object, > > unsigned long addr) > > { > > memcg_slab_free_hook(s, slab, &object, 1); > > + alloc_tagging_slab_free_hook(s, slab, &object, 1); > > > > if (likely(slab_free_hook(s, object, slab_want_init_on_free(s)))) > > do_slab_free(s, slab, object, object, 1, addr); > > @@ -4363,6 +4451,7 @@ void slab_free_bulk(struct kmem_cache *s, struct slab *slab, void *head, > > void *tail, void **p, int cnt, unsigned long addr) > > { > > memcg_slab_free_hook(s, slab, p, cnt); > > + alloc_tagging_slab_free_hook(s, slab, p, cnt); > > /* > > * With KASAN enabled slab_free_freelist_hook modifies the freelist > > * to remove objects, whose reuse must be delayed. >
On Wed, Mar 06, 2024 at 10:24:12AM -0800, Suren Baghdasaryan wrote: > +static inline void pgalloc_tag_add(struct page *page, struct task_struct *task, > + unsigned int order) If you make this "unsigned int nr" instead of order, (a) it won't look completely insane (what does adding an order even mean?) and (b) you can reuse it from the __free_pages path. > @@ -1101,6 +1102,7 @@ __always_inline bool free_pages_prepare(struct page *page, > /* Do not let hwpoison pages hit pcplists/buddy */ > reset_page_owner(page, order); > page_table_check_free(page, order); > + pgalloc_tag_sub(page, order); Obviously you'll need to make sure all the callers now pass in 1 << order instead of just order.
On 3/12/24 18:03, Guenter Roeck wrote: > Add name of functions triggering warning backtraces to the __bug_table > object section to enable support for suppressing WARNING backtraces. > > To limit image size impact, the pointer to the function name is only added > to the __bug_table section if both CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE > are enabled. Otherwise, the __func__ assembly parameter is replaced with a > (dummy) NULL parameter to avoid an image size increase due to unused > __func__ entries (this is necessary because __func__ is not a define but a > virtual variable). > > While at it, declare assembler parameters as constants where possible. > Refine .blockz instructions to calculate the necessary padding instead > of using fixed values. > > Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Helge Deller <deller@gmx.de> Helge > --- > arch/parisc/include/asm/bug.h | 29 +++++++++++++++++++++-------- > 1 file changed, 21 insertions(+), 8 deletions(-) > > diff --git a/arch/parisc/include/asm/bug.h b/arch/parisc/include/asm/bug.h > index 833555f74ffa..792dacc2a653 100644 > --- a/arch/parisc/include/asm/bug.h > +++ b/arch/parisc/include/asm/bug.h > @@ -23,8 +23,17 @@ > # define __BUG_REL(val) ".word " __stringify(val) > #endif > > - > #ifdef CONFIG_DEBUG_BUGVERBOSE > + > +#if IS_ENABLED(CONFIG_KUNIT) > +# define HAVE_BUG_FUNCTION > +# define __BUG_FUNC_PTR __BUG_REL(%c1) > +# define __BUG_FUNC __func__ > +#else > +# define __BUG_FUNC_PTR > +# define __BUG_FUNC NULL > +#endif /* IS_ENABLED(CONFIG_KUNIT) */ > + > #define BUG() \ > do { \ > asm volatile("\n" \ > @@ -33,10 +42,12 @@ > "\t.align 4\n" \ > "2:\t" __BUG_REL(1b) "\n" \ > "\t" __BUG_REL(%c0) "\n" \ > - "\t.short %1, %2\n" \ > - "\t.blockz %3-2*4-2*2\n" \ > + "\t" __BUG_FUNC_PTR "\n" \ > + "\t.short %c2, %c3\n" \ > + "\t.blockz %c4-(.-2b)\n" \ > "\t.popsection" \ > - : : "i" (__FILE__), "i" (__LINE__), \ > + : : "i" (__FILE__), "i" (__BUG_FUNC), \ > + "i" (__LINE__), \ > "i" (0), "i" (sizeof(struct bug_entry)) ); \ > unreachable(); \ > } while(0) > @@ -58,10 +69,12 @@ > "\t.align 4\n" \ > "2:\t" __BUG_REL(1b) "\n" \ > "\t" __BUG_REL(%c0) "\n" \ > - "\t.short %1, %2\n" \ > - "\t.blockz %3-2*4-2*2\n" \ > + "\t" __BUG_FUNC_PTR "\n" \ > + "\t.short %c2, %3\n" \ > + "\t.blockz %c4-(.-2b)\n" \ > "\t.popsection" \ > - : : "i" (__FILE__), "i" (__LINE__), \ > + : : "i" (__FILE__), "i" (__BUG_FUNC), \ > + "i" (__LINE__), \ > "i" (BUGFLAG_WARNING|(flags)), \ > "i" (sizeof(struct bug_entry)) ); \ > } while(0) > @@ -74,7 +87,7 @@ > "\t.align 4\n" \ > "2:\t" __BUG_REL(1b) "\n" \ > "\t.short %0\n" \ > - "\t.blockz %1-4-2\n" \ > + "\t.blockz %c1-(.-2b)\n" \ > "\t.popsection" \ > : : "i" (BUGFLAG_WARNING|(flags)), \ > "i" (sizeof(struct bug_entry)) ); \
On 3/6/24 19:24, Suren Baghdasaryan wrote: > Account slab allocations using codetag reference embedded into slabobj_ext. > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> > Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Nit below: > @@ -3833,6 +3913,7 @@ void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, > unsigned int orig_size) > { > unsigned int zero_size = s->object_size; > + struct slabobj_ext *obj_exts; > bool kasan_init = init; > size_t i; > gfp_t init_flags = flags & gfp_allowed_mask; > @@ -3875,6 +3956,12 @@ void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, > kmemleak_alloc_recursive(p[i], s->object_size, 1, > s->flags, init_flags); > kmsan_slab_alloc(s, p[i], init_flags); > + obj_exts = prepare_slab_obj_exts_hook(s, flags, p[i]); > +#ifdef CONFIG_MEM_ALLOC_PROFILING > + /* obj_exts can be allocated for other reasons */ > + if (likely(obj_exts) && mem_alloc_profiling_enabled()) > + alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size); > +#endif I think you could still do this a bit better: Check mem_alloc_profiling_enabled() once before the whole block calling prepare_slab_obj_exts_hook() and alloc_tag_add() Remove need_slab_obj_ext() check from prepare_slab_obj_exts_hook() > } > > memcg_slab_post_alloc_hook(s, objcg, flags, size, p); > @@ -4353,6 +4440,7 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object, > unsigned long addr) > { > memcg_slab_free_hook(s, slab, &object, 1); > + alloc_tagging_slab_free_hook(s, slab, &object, 1); > > if (likely(slab_free_hook(s, object, slab_want_init_on_free(s)))) > do_slab_free(s, slab, object, object, 1, addr); > @@ -4363,6 +4451,7 @@ void slab_free_bulk(struct kmem_cache *s, struct slab *slab, void *head, > void *tail, void **p, int cnt, unsigned long addr) > { > memcg_slab_free_hook(s, slab, p, cnt); > + alloc_tagging_slab_free_hook(s, slab, p, cnt); > /* > * With KASAN enabled slab_free_freelist_hook modifies the freelist > * to remove objects, whose reuse must be delayed.
From GCC commit 3f13154553f8546a ("df-scan: remove ad-hoc handling of global regs in asms"), global registers will no longer be forced to add to the def-use chain. Then current_thread_info(), current_stack_pointer and __my_cpu_offset may be lifted out of the loop because they are no longer treated as "volatile variables". This optimization is still correct for the current_thread_info() and current_stack_pointer usages because they are associated to a thread. However it is wrong for __my_cpu_offset because it is associated to a CPU rather than a thread: if the thread migrates to a different CPU in the loop, __my_cpu_offset should be changed. Change __my_cpu_offset definition to treat it as a "volatile variable", in order to avoid such a mis-optimization. Cc: stable@vger.kernel.org Reported-by: Xiaotian Wu <wuxiaotian@loongson.cn> Reported-by: Miao Wang <shankerwangmiao@gmail.com> Signed-off-by: Xing Li <lixing@loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn> Signed-off-by: Rui Wang <wangrui@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> --- arch/loongarch/include/asm/percpu.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/loongarch/include/asm/percpu.h b/arch/loongarch/include/asm/percpu.h index 9b36ac003f89..03b98491d301 100644 --- a/arch/loongarch/include/asm/percpu.h +++ b/arch/loongarch/include/asm/percpu.h @@ -29,7 +29,12 @@ static inline void set_my_cpu_offset(unsigned long off) __my_cpu_offset = off; csr_write64(off, PERCPU_BASE_KS); } -#define __my_cpu_offset __my_cpu_offset + +#define __my_cpu_offset \ +({ \ + __asm__ __volatile__("":"+r"(__my_cpu_offset)); \ + __my_cpu_offset; \ +}) #define PERCPU_OP(op, asm_op, c_op) \ static __always_inline unsigned long __percpu_##op(void *ptr, \ -- 2.43.0