All of lore.kernel.org
 help / color / mirror / Atom feed
* linux uml segfault
@ 2021-02-23  8:06 Ritesh Raj Sarraf
  2021-02-23 10:50 ` Anton Ivanov
  2021-03-05 20:43 ` [PATCH] um: mark all kernel symbols as local Johannes Berg
  0 siblings, 2 replies; 41+ messages in thread
From: Ritesh Raj Sarraf @ 2021-02-23  8:06 UTC (permalink / raw)
  To: linux-um


[-- Attachment #1.1: Type: text/plain, Size: 5860 bytes --]

Hi,

Recently, with the Linux 5.10 release, I have run into the following
segfault on UML. I was a little disappointed in myself that this
slipped my regular set of tests, before being pushed to Debian. It is
right now part of Debian Testing too and I'd hate to have it removed
from the Bullseye release.

What is worse is that (to do some quick tests) I reverted to an older
UML (5.9) which I recollect to have working, and that too failed on the
setups.

In regard to setups, I tried and reproduced the issue on 3 different
machines, but all running Intel hardware. And all running 5.10 host
kernel


It would really help if others on this mailing list can check and
validate if they run into this problem. So far I have had 1 report of
being able to reproduce this bug other than me. I have also had 1
report of not being able to reproduce this bug.


Thanks,
Ritesh


```
rrs@priyasi:~$ linux ubd0=~/rrs-home/Libvirt-Images/uml.img
vec0:transport=tap,ifname=tap0,gro=1 mem=1024M rw
Core dump limits :
        soft - 0
        hard - NONE
Checking that ptrace can change system call numbers...OK
Checking syscall emulation patch for ptrace...OK
Checking advanced syscall emulation patch for ptrace...OK
Checking environment variables for a tempdir...none found
Checking if /dev/shm is on tmpfs...OK
Checking PROT_EXEC mmap in /dev/shm...OK
Adding 5906432 bytes to physical memory to account for exec-shield gap
kmsg_dump:
<5>Linux version 5.10.5 (buildd@x86-conova-01) (gcc (Debian 10.2.1-6)
10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1) #1 Mon Jan 11
20:40:53 UTC 2021
<6>Zone ranges:
<6>  Normal   [mem 0x0000000000000000-0x00000000a05a1fff]
<6>Movable zone start for each node
<6>Early memory node ranges
<6>  node   0: [mem 0x0000000000000000-0x00000000405a1fff]
<6>Initmem setup node 0 [mem 0x0000000000000000-0x00000000405a1fff]
<7>On node 0 totalpages: 263586
<7>  Normal zone: 4119 pages used for memmap
<7>  Normal zone: 0 pages reserved
<7>  Normal zone: 263586 pages, LIFO batch:63
<7>pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
<7>pcpu-alloc: [0] 0 
<6>Built 1 zonelists, mobility grouping on.  Total pages: 259467
<5>Kernel command line: ubd0=/home/rrs/rrs-home/Libvirt-Images/uml.img
vec0:transport=tap,ifname=tap0,gro=1 mem=1024M rw root=98:0
<6>Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes,
linear)
<6>Inode-cache hash table entries: 65536 (order: 7, 524288 bytes,
linear)
<6>mem auto-init: stack:off, heap alloc:off, heap free:off
<6>Memory: 1016464K/1054344K available (5830K kernel code, 1535K
rwdata, 1744K rodata, 191K init, 225K bss, 37880K reserved, 0K cma-
reserved)
<6>SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
<6>NR_IRQS: 24
<6>clocksource: timer: mask: 0xffffffffffffffff max_cycles:
0x1cd42e205, max_idle_ns: 881590404426 ns
<6>Calibrating delay loop... 5731.94 BogoMIPS (lpj=28659712)
<6>pid_max: default: 32768 minimum: 301
<6>LSM: Security Framework initializing
<6>Yama: disabled by default; enable with sysctl kernel.yama.*
<6>SELinux:  Initializing.
<6>TOMOYO Linux initialized
<6>Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
<6>Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes,
linear)
<4>
<4>Modules linked in:
<6>Pid: 0, comm: swapper Not tainted 5.10.5
<6>RIP: 0033:[<00000000604d4201>]
<6>RSP: 00007ffca56a8890  EFLAGS: 00010206
<6>RAX: 0000000600000000 RBX: 0000000000000059 RCX: 00007ffca56a8000
<6>RDX: 0000000000000035 RSI: 0000000060b69a71 RDI: 0000000060d8ac3b
<6>RBP: 0000000000000000 R08: 0000000060b69a72 R09: 0000000060d8abe2
<6>R10: 0000000080000000 R11: 3d74696e695f676e R12: 0000000000000002
<6>R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000001
<0>Kernel panic - not syncing: Segfault with no mm
<4>CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.5 #1
<4>Stack:
<4> 61335b50 8000000000000000 7fae69465908 7fae69465ae5
<4> 7fae698ae9e8 00000000 7ffca56a88d0 00000400
<4> 7fae6985bf20 7fae698ae9e8 00000000 00000000Call Trace:
<4> [<604d4fa3>] ? __printk_safe_enter+0x0/0x35
<4> [<604d154a>] ? arch_local_irq_save+0x0/0x22
<4> [<604d46f5>] ? vprintk_emit+0x9d/0x185
<4> [<604d49d3>] ? vprintk_deferred+0x1d/0x32
<4> [<60a26ee2>] ? printk_deferred+0x93/0x9b
<4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
<4> [<60a26e4f>] ? printk_deferred+0x0/0x9b
<4> [<6049cddb>] ? set_signals+0x0/0x38
<4> [<60589588>] ? arch_local_irq_save+0x0/0x22
<4> [<6055c928>] ? kvmalloc_node+0x56/0x96
<4> [<6058d3c0>] ? __kmalloc+0x1e2/0x1f9
<4> [<608e3d32>] ? ___ratelimit+0xd0/0xde
<4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
<4> [<60901485>] ? _warn_unseeded_randomness+0x60/0x8f
<4> [<6090295b>] ? get_random_u32+0x29/0x98
<4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
<4> [<6088f68a>] ? bucket_table_alloc.isra.0+0x0/0x13d
<4> [<6088ff7a>] ? rhashtable_init+0x175/0x1ca
<4> [<607ef317>] ? ipc_init_ids+0x4e/0x6f
<4> [<600153bd>] ? sem_init+0x17/0x45
<4> [<6049d0e5>] ? start_ptraced_child+0x0/0x180
<4> [<604a0ce0>] ? kernel_longjmp+0x0/0x20
<4> [<6049cc3d>] ? set_handler+0x123/0x15b
<4> [<6049c9ee>] ? hard_handler+0x0/0xcd
<4> [<604a0ce0>] ? kernel_longjmp+0x0/0x20
<4> [<6049c3a6>] ? openpty_cb+0x22/0x3b
<4> [<6049fb4b>] ? start_idle_thread+0x66/0x116
<4> [<60004613>] ? linux_main+0x2e7/0x2f9
<4> [<6049cc86>] ? change_sig+0x0/0x6a
<4> [<6000565e>] ? main+0x230/0x2dc
<4> [<60a256b0>] ? __libc_csu_init+0x0/0x60
<4> [<604827d0>] ? _start+0x0/0x30
<4> [<6000542e>] ? main+0x0/0x2dc
<4> [<604827d0>] ? _start+0x0/0x30
<4> [<604827d0>] ? _start+0x0/0x30
<4> [<604827fa>] ? _start+0x2a/0x30
<4> [<604827d0>] ? _start+0x0/0x30
Aborted (core dumped)
```
-- 
Ritesh Raj Sarraf | http://people.debian.org/~rrs
Debian - The Universal Operating System

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 152 bytes --]

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-02-23  8:06 linux uml segfault Ritesh Raj Sarraf
@ 2021-02-23 10:50 ` Anton Ivanov
  2021-02-23 12:12   ` Christopher Obbard
  2021-03-05 20:43 ` [PATCH] um: mark all kernel symbols as local Johannes Berg
  1 sibling, 1 reply; 41+ messages in thread
From: Anton Ivanov @ 2021-02-23 10:50 UTC (permalink / raw)
  To: rrs, linux-um



On 23/02/2021 08:06, Ritesh Raj Sarraf wrote:
> Hi,
> 
> Recently, with the Linux 5.10 release, I have run into the following
> segfault on UML. I was a little disappointed in myself that this
> slipped my regular set of tests, before being pushed to Debian. It is
> right now part of Debian Testing too and I'd hate to have it removed
> from the Bullseye release.
> 
> What is worse is that (to do some quick tests) I reverted to an older
> UML (5.9) which I recollect to have working, and that too failed on the
> setups.
> 
> In regard to setups, I tried and reproduced the issue on 3 different
> machines, but all running Intel hardware. And all running 5.10 host
> kernel
> 
> 
> It would really help if others on this mailing list can check and
> validate if they run into this problem. So far I have had 1 report of
> being able to reproduce this bug other than me. I have also had 1
> report of not being able to reproduce this bug.

Confirmed. This is the asprintf issue. It is usually just a warning, but for your config it causes a guaranteed segfault.

You need 97be7ceaf7fea68104824b6aa874cff235333ac1 um: Remove use of asprinf in umid.c

In the patchset for the debian package.

A.
> 
> 
> Thanks,
> Ritesh
> 
> 
> ```
> rrs@priyasi:~$ linux ubd0=~/rrs-home/Libvirt-Images/uml.img
> vec0:transport=tap,ifname=tap0,gro=1 mem=1024M rw
> Core dump limits :
>          soft - 0
>          hard - NONE
> Checking that ptrace can change system call numbers...OK
> Checking syscall emulation patch for ptrace...OK
> Checking advanced syscall emulation patch for ptrace...OK
> Checking environment variables for a tempdir...none found
> Checking if /dev/shm is on tmpfs...OK
> Checking PROT_EXEC mmap in /dev/shm...OK
> Adding 5906432 bytes to physical memory to account for exec-shield gap
> kmsg_dump:
> <5>Linux version 5.10.5 (buildd@x86-conova-01) (gcc (Debian 10.2.1-6)
> 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1) #1 Mon Jan 11
> 20:40:53 UTC 2021
> <6>Zone ranges:
> <6>  Normal   [mem 0x0000000000000000-0x00000000a05a1fff]
> <6>Movable zone start for each node
> <6>Early memory node ranges
> <6>  node   0: [mem 0x0000000000000000-0x00000000405a1fff]
> <6>Initmem setup node 0 [mem 0x0000000000000000-0x00000000405a1fff]
> <7>On node 0 totalpages: 263586
> <7>  Normal zone: 4119 pages used for memmap
> <7>  Normal zone: 0 pages reserved
> <7>  Normal zone: 263586 pages, LIFO batch:63
> <7>pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
> <7>pcpu-alloc: [0] 0
> <6>Built 1 zonelists, mobility grouping on.  Total pages: 259467
> <5>Kernel command line: ubd0=/home/rrs/rrs-home/Libvirt-Images/uml.img
> vec0:transport=tap,ifname=tap0,gro=1 mem=1024M rw root=98:0
> <6>Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes,
> linear)
> <6>Inode-cache hash table entries: 65536 (order: 7, 524288 bytes,
> linear)
> <6>mem auto-init: stack:off, heap alloc:off, heap free:off
> <6>Memory: 1016464K/1054344K available (5830K kernel code, 1535K
> rwdata, 1744K rodata, 191K init, 225K bss, 37880K reserved, 0K cma-
> reserved)
> <6>SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
> <6>NR_IRQS: 24
> <6>clocksource: timer: mask: 0xffffffffffffffff max_cycles:
> 0x1cd42e205, max_idle_ns: 881590404426 ns
> <6>Calibrating delay loop... 5731.94 BogoMIPS (lpj=28659712)
> <6>pid_max: default: 32768 minimum: 301
> <6>LSM: Security Framework initializing
> <6>Yama: disabled by default; enable with sysctl kernel.yama.*
> <6>SELinux:  Initializing.
> <6>TOMOYO Linux initialized
> <6>Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
> <6>Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes,
> linear)
> <4>
> <4>Modules linked in:
> <6>Pid: 0, comm: swapper Not tainted 5.10.5
> <6>RIP: 0033:[<00000000604d4201>]
> <6>RSP: 00007ffca56a8890  EFLAGS: 00010206
> <6>RAX: 0000000600000000 RBX: 0000000000000059 RCX: 00007ffca56a8000
> <6>RDX: 0000000000000035 RSI: 0000000060b69a71 RDI: 0000000060d8ac3b
> <6>RBP: 0000000000000000 R08: 0000000060b69a72 R09: 0000000060d8abe2
> <6>R10: 0000000080000000 R11: 3d74696e695f676e R12: 0000000000000002
> <6>R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000001
> <0>Kernel panic - not syncing: Segfault with no mm
> <4>CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.5 #1
> <4>Stack:
> <4> 61335b50 8000000000000000 7fae69465908 7fae69465ae5
> <4> 7fae698ae9e8 00000000 7ffca56a88d0 00000400
> <4> 7fae6985bf20 7fae698ae9e8 00000000 00000000Call Trace:
> <4> [<604d4fa3>] ? __printk_safe_enter+0x0/0x35
> <4> [<604d154a>] ? arch_local_irq_save+0x0/0x22
> <4> [<604d46f5>] ? vprintk_emit+0x9d/0x185
> <4> [<604d49d3>] ? vprintk_deferred+0x1d/0x32
> <4> [<60a26ee2>] ? printk_deferred+0x93/0x9b
> <4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
> <4> [<60a26e4f>] ? printk_deferred+0x0/0x9b
> <4> [<6049cddb>] ? set_signals+0x0/0x38
> <4> [<60589588>] ? arch_local_irq_save+0x0/0x22
> <4> [<6055c928>] ? kvmalloc_node+0x56/0x96
> <4> [<6058d3c0>] ? __kmalloc+0x1e2/0x1f9
> <4> [<608e3d32>] ? ___ratelimit+0xd0/0xde
> <4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
> <4> [<60901485>] ? _warn_unseeded_randomness+0x60/0x8f
> <4> [<6090295b>] ? get_random_u32+0x29/0x98
> <4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
> <4> [<6088f68a>] ? bucket_table_alloc.isra.0+0x0/0x13d
> <4> [<6088ff7a>] ? rhashtable_init+0x175/0x1ca
> <4> [<607ef317>] ? ipc_init_ids+0x4e/0x6f
> <4> [<600153bd>] ? sem_init+0x17/0x45
> <4> [<6049d0e5>] ? start_ptraced_child+0x0/0x180
> <4> [<604a0ce0>] ? kernel_longjmp+0x0/0x20
> <4> [<6049cc3d>] ? set_handler+0x123/0x15b
> <4> [<6049c9ee>] ? hard_handler+0x0/0xcd
> <4> [<604a0ce0>] ? kernel_longjmp+0x0/0x20
> <4> [<6049c3a6>] ? openpty_cb+0x22/0x3b
> <4> [<6049fb4b>] ? start_idle_thread+0x66/0x116
> <4> [<60004613>] ? linux_main+0x2e7/0x2f9
> <4> [<6049cc86>] ? change_sig+0x0/0x6a
> <4> [<6000565e>] ? main+0x230/0x2dc
> <4> [<60a256b0>] ? __libc_csu_init+0x0/0x60
> <4> [<604827d0>] ? _start+0x0/0x30
> <4> [<6000542e>] ? main+0x0/0x2dc
> <4> [<604827d0>] ? _start+0x0/0x30
> <4> [<604827d0>] ? _start+0x0/0x30
> <4> [<604827fa>] ? _start+0x2a/0x30
> <4> [<604827d0>] ? _start+0x0/0x30
> Aborted (core dumped)
> ```
> 
> 
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
> 

-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-02-23 10:50 ` Anton Ivanov
@ 2021-02-23 12:12   ` Christopher Obbard
  2021-02-23 12:24     ` Anton Ivanov
  2021-02-23 17:19     ` Anton Ivanov
  0 siblings, 2 replies; 41+ messages in thread
From: Christopher Obbard @ 2021-02-23 12:12 UTC (permalink / raw)
  To: Anton Ivanov, rrs, linux-um

Hi Anton,

On 23/02/2021 10:50, Anton Ivanov wrote:
> 
> 
> On 23/02/2021 08:06, Ritesh Raj Sarraf wrote:
>> Hi,
>>
>> Recently, with the Linux 5.10 release, I have run into the following
>> segfault on UML. I was a little disappointed in myself that this
>> slipped my regular set of tests, before being pushed to Debian. It is
>> right now part of Debian Testing too and I'd hate to have it removed
>> from the Bullseye release.
>>
>> What is worse is that (to do some quick tests) I reverted to an older
>> UML (5.9) which I recollect to have working, and that too failed on the
>> setups.
>>
>> In regard to setups, I tried and reproduced the issue on 3 different
>> machines, but all running Intel hardware. And all running 5.10 host
>> kernel
>>
>>
>> It would really help if others on this mailing list can check and
>> validate if they run into this problem. So far I have had 1 report of
>> being able to reproduce this bug other than me. I have also had 1
>> report of not being able to reproduce this bug.
> 
> Confirmed. This is the asprintf issue. It is usually just a warning, but 
> for your config it causes a guaranteed segfault.
> 
> You need 97be7ceaf7fea68104824b6aa874cff235333ac1 um: Remove use of 
> asprinf in umid.c
> 
> In the patchset for the debian package.

The current Debian user-mode-linux package in unstable is based on the 
5.10.5 stable source which includes the mentioned patch, but is still 
causing an error for some users.

thanks!
Chris

> 
> A.
>>
>>
>> Thanks,
>> Ritesh
>>
>>
>> ```
>> rrs@priyasi:~$ linux ubd0=~/rrs-home/Libvirt-Images/uml.img
>> vec0:transport=tap,ifname=tap0,gro=1 mem=1024M rw
>> Core dump limits :
>>          soft - 0
>>          hard - NONE
>> Checking that ptrace can change system call numbers...OK
>> Checking syscall emulation patch for ptrace...OK
>> Checking advanced syscall emulation patch for ptrace...OK
>> Checking environment variables for a tempdir...none found
>> Checking if /dev/shm is on tmpfs...OK
>> Checking PROT_EXEC mmap in /dev/shm...OK
>> Adding 5906432 bytes to physical memory to account for exec-shield gap
>> kmsg_dump:
>> <5>Linux version 5.10.5 (buildd@x86-conova-01) (gcc (Debian 10.2.1-6)
>> 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1) #1 Mon Jan 11
>> 20:40:53 UTC 2021
>> <6>Zone ranges:
>> <6>  Normal   [mem 0x0000000000000000-0x00000000a05a1fff]
>> <6>Movable zone start for each node
>> <6>Early memory node ranges
>> <6>  node   0: [mem 0x0000000000000000-0x00000000405a1fff]
>> <6>Initmem setup node 0 [mem 0x0000000000000000-0x00000000405a1fff]
>> <7>On node 0 totalpages: 263586
>> <7>  Normal zone: 4119 pages used for memmap
>> <7>  Normal zone: 0 pages reserved
>> <7>  Normal zone: 263586 pages, LIFO batch:63
>> <7>pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
>> <7>pcpu-alloc: [0] 0
>> <6>Built 1 zonelists, mobility grouping on.  Total pages: 259467
>> <5>Kernel command line: ubd0=/home/rrs/rrs-home/Libvirt-Images/uml.img
>> vec0:transport=tap,ifname=tap0,gro=1 mem=1024M rw root=98:0
>> <6>Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes,
>> linear)
>> <6>Inode-cache hash table entries: 65536 (order: 7, 524288 bytes,
>> linear)
>> <6>mem auto-init: stack:off, heap alloc:off, heap free:off
>> <6>Memory: 1016464K/1054344K available (5830K kernel code, 1535K
>> rwdata, 1744K rodata, 191K init, 225K bss, 37880K reserved, 0K cma-
>> reserved)
>> <6>SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
>> <6>NR_IRQS: 24
>> <6>clocksource: timer: mask: 0xffffffffffffffff max_cycles:
>> 0x1cd42e205, max_idle_ns: 881590404426 ns
>> <6>Calibrating delay loop... 5731.94 BogoMIPS (lpj=28659712)
>> <6>pid_max: default: 32768 minimum: 301
>> <6>LSM: Security Framework initializing
>> <6>Yama: disabled by default; enable with sysctl kernel.yama.*
>> <6>SELinux:  Initializing.
>> <6>TOMOYO Linux initialized
>> <6>Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
>> <6>Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes,
>> linear)
>> <4>
>> <4>Modules linked in:
>> <6>Pid: 0, comm: swapper Not tainted 5.10.5
>> <6>RIP: 0033:[<00000000604d4201>]
>> <6>RSP: 00007ffca56a8890  EFLAGS: 00010206
>> <6>RAX: 0000000600000000 RBX: 0000000000000059 RCX: 00007ffca56a8000
>> <6>RDX: 0000000000000035 RSI: 0000000060b69a71 RDI: 0000000060d8ac3b
>> <6>RBP: 0000000000000000 R08: 0000000060b69a72 R09: 0000000060d8abe2
>> <6>R10: 0000000080000000 R11: 3d74696e695f676e R12: 0000000000000002
>> <6>R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000001
>> <0>Kernel panic - not syncing: Segfault with no mm
>> <4>CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.5 #1
>> <4>Stack:
>> <4> 61335b50 8000000000000000 7fae69465908 7fae69465ae5
>> <4> 7fae698ae9e8 00000000 7ffca56a88d0 00000400
>> <4> 7fae6985bf20 7fae698ae9e8 00000000 00000000Call Trace:
>> <4> [<604d4fa3>] ? __printk_safe_enter+0x0/0x35
>> <4> [<604d154a>] ? arch_local_irq_save+0x0/0x22
>> <4> [<604d46f5>] ? vprintk_emit+0x9d/0x185
>> <4> [<604d49d3>] ? vprintk_deferred+0x1d/0x32
>> <4> [<60a26ee2>] ? printk_deferred+0x93/0x9b
>> <4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
>> <4> [<60a26e4f>] ? printk_deferred+0x0/0x9b
>> <4> [<6049cddb>] ? set_signals+0x0/0x38
>> <4> [<60589588>] ? arch_local_irq_save+0x0/0x22
>> <4> [<6055c928>] ? kvmalloc_node+0x56/0x96
>> <4> [<6058d3c0>] ? __kmalloc+0x1e2/0x1f9
>> <4> [<608e3d32>] ? ___ratelimit+0xd0/0xde
>> <4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
>> <4> [<60901485>] ? _warn_unseeded_randomness+0x60/0x8f
>> <4> [<6090295b>] ? get_random_u32+0x29/0x98
>> <4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
>> <4> [<6088f68a>] ? bucket_table_alloc.isra.0+0x0/0x13d
>> <4> [<6088ff7a>] ? rhashtable_init+0x175/0x1ca
>> <4> [<607ef317>] ? ipc_init_ids+0x4e/0x6f
>> <4> [<600153bd>] ? sem_init+0x17/0x45
>> <4> [<6049d0e5>] ? start_ptraced_child+0x0/0x180
>> <4> [<604a0ce0>] ? kernel_longjmp+0x0/0x20
>> <4> [<6049cc3d>] ? set_handler+0x123/0x15b
>> <4> [<6049c9ee>] ? hard_handler+0x0/0xcd
>> <4> [<604a0ce0>] ? kernel_longjmp+0x0/0x20
>> <4> [<6049c3a6>] ? openpty_cb+0x22/0x3b
>> <4> [<6049fb4b>] ? start_idle_thread+0x66/0x116
>> <4> [<60004613>] ? linux_main+0x2e7/0x2f9
>> <4> [<6049cc86>] ? change_sig+0x0/0x6a
>> <4> [<6000565e>] ? main+0x230/0x2dc
>> <4> [<60a256b0>] ? __libc_csu_init+0x0/0x60
>> <4> [<604827d0>] ? _start+0x0/0x30
>> <4> [<6000542e>] ? main+0x0/0x2dc
>> <4> [<604827d0>] ? _start+0x0/0x30
>> <4> [<604827d0>] ? _start+0x0/0x30
>> <4> [<604827fa>] ? _start+0x2a/0x30
>> <4> [<604827d0>] ? _start+0x0/0x30
>> Aborted (core dumped)
>> ```
>>
>>
>> _______________________________________________
>> linux-um mailing list
>> linux-um@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-um
>>
> 

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-02-23 12:12   ` Christopher Obbard
@ 2021-02-23 12:24     ` Anton Ivanov
  2021-02-23 17:19     ` Anton Ivanov
  1 sibling, 0 replies; 41+ messages in thread
From: Anton Ivanov @ 2021-02-23 12:24 UTC (permalink / raw)
  To: Christopher Obbard, rrs, linux-um

On 23/02/2021 12:12, Christopher Obbard wrote:
> Hi Anton,
> 
> On 23/02/2021 10:50, Anton Ivanov wrote:
>>
>>
>> On 23/02/2021 08:06, Ritesh Raj Sarraf wrote:
>>> Hi,
>>>
>>> Recently, with the Linux 5.10 release, I have run into the following
>>> segfault on UML. I was a little disappointed in myself that this
>>> slipped my regular set of tests, before being pushed to Debian. It is
>>> right now part of Debian Testing too and I'd hate to have it removed
>>> from the Bullseye release.
>>>
>>> What is worse is that (to do some quick tests) I reverted to an older
>>> UML (5.9) which I recollect to have working, and that too failed on the
>>> setups.
>>>
>>> In regard to setups, I tried and reproduced the issue on 3 different
>>> machines, but all running Intel hardware. And all running 5.10 host
>>> kernel
>>>
>>>
>>> It would really help if others on this mailing list can check and
>>> validate if they run into this problem. So far I have had 1 report of
>>> being able to reproduce this bug other than me. I have also had 1
>>> report of not being able to reproduce this bug.
>>
>> Confirmed. This is the asprintf issue. It is usually just a warning, 
>> but for your config it causes a guaranteed segfault.
>>
>> You need 97be7ceaf7fea68104824b6aa874cff235333ac1 um: Remove use of 
>> asprinf in umid.c
>>
>> In the patchset for the debian package.
> 
> The current Debian user-mode-linux package in unstable is based on the 
> 5.10.5 stable source which includes the mentioned patch, but is still 
> causing an error for some users.

OK, let me dig a bit further into this.

Brgds,

A.


> 
> thanks!
> Chris
> 
>>
>> A.
>>>
>>>
>>> Thanks,
>>> Ritesh
>>>
>>>
>>> ```
>>> rrs@priyasi:~$ linux ubd0=~/rrs-home/Libvirt-Images/uml.img
>>> vec0:transport=tap,ifname=tap0,gro=1 mem=1024M rw
>>> Core dump limits :
>>>          soft - 0
>>>          hard - NONE
>>> Checking that ptrace can change system call numbers...OK
>>> Checking syscall emulation patch for ptrace...OK
>>> Checking advanced syscall emulation patch for ptrace...OK
>>> Checking environment variables for a tempdir...none found
>>> Checking if /dev/shm is on tmpfs...OK
>>> Checking PROT_EXEC mmap in /dev/shm...OK
>>> Adding 5906432 bytes to physical memory to account for exec-shield gap
>>> kmsg_dump:
>>> <5>Linux version 5.10.5 (buildd@x86-conova-01) (gcc (Debian 10.2.1-6)
>>> 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1) #1 Mon Jan 11
>>> 20:40:53 UTC 2021
>>> <6>Zone ranges:
>>> <6>  Normal   [mem 0x0000000000000000-0x00000000a05a1fff]
>>> <6>Movable zone start for each node
>>> <6>Early memory node ranges
>>> <6>  node   0: [mem 0x0000000000000000-0x00000000405a1fff]
>>> <6>Initmem setup node 0 [mem 0x0000000000000000-0x00000000405a1fff]
>>> <7>On node 0 totalpages: 263586
>>> <7>  Normal zone: 4119 pages used for memmap
>>> <7>  Normal zone: 0 pages reserved
>>> <7>  Normal zone: 263586 pages, LIFO batch:63
>>> <7>pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
>>> <7>pcpu-alloc: [0] 0
>>> <6>Built 1 zonelists, mobility grouping on.  Total pages: 259467
>>> <5>Kernel command line: ubd0=/home/rrs/rrs-home/Libvirt-Images/uml.img
>>> vec0:transport=tap,ifname=tap0,gro=1 mem=1024M rw root=98:0
>>> <6>Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes,
>>> linear)
>>> <6>Inode-cache hash table entries: 65536 (order: 7, 524288 bytes,
>>> linear)
>>> <6>mem auto-init: stack:off, heap alloc:off, heap free:off
>>> <6>Memory: 1016464K/1054344K available (5830K kernel code, 1535K
>>> rwdata, 1744K rodata, 191K init, 225K bss, 37880K reserved, 0K cma-
>>> reserved)
>>> <6>SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
>>> <6>NR_IRQS: 24
>>> <6>clocksource: timer: mask: 0xffffffffffffffff max_cycles:
>>> 0x1cd42e205, max_idle_ns: 881590404426 ns
>>> <6>Calibrating delay loop... 5731.94 BogoMIPS (lpj=28659712)
>>> <6>pid_max: default: 32768 minimum: 301
>>> <6>LSM: Security Framework initializing
>>> <6>Yama: disabled by default; enable with sysctl kernel.yama.*
>>> <6>SELinux:  Initializing.
>>> <6>TOMOYO Linux initialized
>>> <6>Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
>>> <6>Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes,
>>> linear)
>>> <4>
>>> <4>Modules linked in:
>>> <6>Pid: 0, comm: swapper Not tainted 5.10.5
>>> <6>RIP: 0033:[<00000000604d4201>]
>>> <6>RSP: 00007ffca56a8890  EFLAGS: 00010206
>>> <6>RAX: 0000000600000000 RBX: 0000000000000059 RCX: 00007ffca56a8000
>>> <6>RDX: 0000000000000035 RSI: 0000000060b69a71 RDI: 0000000060d8ac3b
>>> <6>RBP: 0000000000000000 R08: 0000000060b69a72 R09: 0000000060d8abe2
>>> <6>R10: 0000000080000000 R11: 3d74696e695f676e R12: 0000000000000002
>>> <6>R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000001
>>> <0>Kernel panic - not syncing: Segfault with no mm
>>> <4>CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.5 #1
>>> <4>Stack:
>>> <4> 61335b50 8000000000000000 7fae69465908 7fae69465ae5
>>> <4> 7fae698ae9e8 00000000 7ffca56a88d0 00000400
>>> <4> 7fae6985bf20 7fae698ae9e8 00000000 00000000Call Trace:
>>> <4> [<604d4fa3>] ? __printk_safe_enter+0x0/0x35
>>> <4> [<604d154a>] ? arch_local_irq_save+0x0/0x22
>>> <4> [<604d46f5>] ? vprintk_emit+0x9d/0x185
>>> <4> [<604d49d3>] ? vprintk_deferred+0x1d/0x32
>>> <4> [<60a26ee2>] ? printk_deferred+0x93/0x9b
>>> <4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
>>> <4> [<60a26e4f>] ? printk_deferred+0x0/0x9b
>>> <4> [<6049cddb>] ? set_signals+0x0/0x38
>>> <4> [<60589588>] ? arch_local_irq_save+0x0/0x22
>>> <4> [<6055c928>] ? kvmalloc_node+0x56/0x96
>>> <4> [<6058d3c0>] ? __kmalloc+0x1e2/0x1f9
>>> <4> [<608e3d32>] ? ___ratelimit+0xd0/0xde
>>> <4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
>>> <4> [<60901485>] ? _warn_unseeded_randomness+0x60/0x8f
>>> <4> [<6090295b>] ? get_random_u32+0x29/0x98
>>> <4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
>>> <4> [<6088f68a>] ? bucket_table_alloc.isra.0+0x0/0x13d
>>> <4> [<6088ff7a>] ? rhashtable_init+0x175/0x1ca
>>> <4> [<607ef317>] ? ipc_init_ids+0x4e/0x6f
>>> <4> [<600153bd>] ? sem_init+0x17/0x45
>>> <4> [<6049d0e5>] ? start_ptraced_child+0x0/0x180
>>> <4> [<604a0ce0>] ? kernel_longjmp+0x0/0x20
>>> <4> [<6049cc3d>] ? set_handler+0x123/0x15b
>>> <4> [<6049c9ee>] ? hard_handler+0x0/0xcd
>>> <4> [<604a0ce0>] ? kernel_longjmp+0x0/0x20
>>> <4> [<6049c3a6>] ? openpty_cb+0x22/0x3b
>>> <4> [<6049fb4b>] ? start_idle_thread+0x66/0x116
>>> <4> [<60004613>] ? linux_main+0x2e7/0x2f9
>>> <4> [<6049cc86>] ? change_sig+0x0/0x6a
>>> <4> [<6000565e>] ? main+0x230/0x2dc
>>> <4> [<60a256b0>] ? __libc_csu_init+0x0/0x60
>>> <4> [<604827d0>] ? _start+0x0/0x30
>>> <4> [<6000542e>] ? main+0x0/0x2dc
>>> <4> [<604827d0>] ? _start+0x0/0x30
>>> <4> [<604827d0>] ? _start+0x0/0x30
>>> <4> [<604827fa>] ? _start+0x2a/0x30
>>> <4> [<604827d0>] ? _start+0x0/0x30
>>> Aborted (core dumped)
>>> ```
>>>
>>>
>>> _______________________________________________
>>> linux-um mailing list
>>> linux-um@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-um
>>>
>>
> 
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um


-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-02-23 12:12   ` Christopher Obbard
  2021-02-23 12:24     ` Anton Ivanov
@ 2021-02-23 17:19     ` Anton Ivanov
  2021-02-23 17:26       ` Ritesh Raj Sarraf
  1 sibling, 1 reply; 41+ messages in thread
From: Anton Ivanov @ 2021-02-23 17:19 UTC (permalink / raw)
  To: Christopher Obbard, rrs, linux-um



On 23/02/2021 12:12, Christopher Obbard wrote:
> Hi Anton,
> 
> On 23/02/2021 10:50, Anton Ivanov wrote:
>>
>>
>> On 23/02/2021 08:06, Ritesh Raj Sarraf wrote:
>>> Hi,
>>>
>>> Recently, with the Linux 5.10 release, I have run into the following
>>> segfault on UML. I was a little disappointed in myself that this
>>> slipped my regular set of tests, before being pushed to Debian. It is
>>> right now part of Debian Testing too and I'd hate to have it removed
>>> from the Bullseye release.
>>>
>>> What is worse is that (to do some quick tests) I reverted to an older
>>> UML (5.9) which I recollect to have working, and that too failed on the
>>> setups.
>>>
>>> In regard to setups, I tried and reproduced the issue on 3 different
>>> machines, but all running Intel hardware. And all running 5.10 host
>>> kernel
>>>
>>>
>>> It would really help if others on this mailing list can check and
>>> validate if they run into this problem. So far I have had 1 report of
>>> being able to reproduce this bug other than me. I have also had 1
>>> report of not being able to reproduce this bug.
>>
>> Confirmed. This is the asprintf issue. It is usually just a warning, but for your config it causes a guaranteed segfault.
>>
>> You need 97be7ceaf7fea68104824b6aa874cff235333ac1 um: Remove use of asprinf in umid.c
>>
>> In the patchset for the debian package.
> 
> The current Debian user-mode-linux package in unstable is based on the 5.10.5 stable source which includes the mentioned patch, but is still causing an error for some users.

After updating the tree to 5.10.5 and applying all Debian patches from the package, I cannot reproduce the bug.

I am running it on 5.10, 5.2 and 4.19 hosts with the same parameters without issues. Hosts are all up to date Debian 10.8 and so is the UML userspace.

I looked at the commit history around the 5.9-5.10 time-frame. Nothing rings any bells in this area.

Also, apologies for barking up the wrong tree with asprintf. That was fixed around that time and this was my first thought.

A.

> 
> thanks!
> Chris
> 
>>
>> A.
>>>
>>>
>>> Thanks,
>>> Ritesh
>>>
>>>
>>> ```
>>> rrs@priyasi:~$ linux ubd0=~/rrs-home/Libvirt-Images/uml.img
>>> vec0:transport=tap,ifname=tap0,gro=1 mem=1024M rw
>>> Core dump limits :
>>>          soft - 0
>>>          hard - NONE
>>> Checking that ptrace can change system call numbers...OK
>>> Checking syscall emulation patch for ptrace...OK
>>> Checking advanced syscall emulation patch for ptrace...OK
>>> Checking environment variables for a tempdir...none found
>>> Checking if /dev/shm is on tmpfs...OK
>>> Checking PROT_EXEC mmap in /dev/shm...OK
>>> Adding 5906432 bytes to physical memory to account for exec-shield gap
>>> kmsg_dump:
>>> <5>Linux version 5.10.5 (buildd@x86-conova-01) (gcc (Debian 10.2.1-6)
>>> 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1) #1 Mon Jan 11
>>> 20:40:53 UTC 2021
>>> <6>Zone ranges:
>>> <6>  Normal   [mem 0x0000000000000000-0x00000000a05a1fff]
>>> <6>Movable zone start for each node
>>> <6>Early memory node ranges
>>> <6>  node   0: [mem 0x0000000000000000-0x00000000405a1fff]
>>> <6>Initmem setup node 0 [mem 0x0000000000000000-0x00000000405a1fff]
>>> <7>On node 0 totalpages: 263586
>>> <7>  Normal zone: 4119 pages used for memmap
>>> <7>  Normal zone: 0 pages reserved
>>> <7>  Normal zone: 263586 pages, LIFO batch:63
>>> <7>pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
>>> <7>pcpu-alloc: [0] 0
>>> <6>Built 1 zonelists, mobility grouping on.  Total pages: 259467
>>> <5>Kernel command line: ubd0=/home/rrs/rrs-home/Libvirt-Images/uml.img
>>> vec0:transport=tap,ifname=tap0,gro=1 mem=1024M rw root=98:0
>>> <6>Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes,
>>> linear)
>>> <6>Inode-cache hash table entries: 65536 (order: 7, 524288 bytes,
>>> linear)
>>> <6>mem auto-init: stack:off, heap alloc:off, heap free:off
>>> <6>Memory: 1016464K/1054344K available (5830K kernel code, 1535K
>>> rwdata, 1744K rodata, 191K init, 225K bss, 37880K reserved, 0K cma-
>>> reserved)
>>> <6>SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
>>> <6>NR_IRQS: 24
>>> <6>clocksource: timer: mask: 0xffffffffffffffff max_cycles:
>>> 0x1cd42e205, max_idle_ns: 881590404426 ns
>>> <6>Calibrating delay loop... 5731.94 BogoMIPS (lpj=28659712)
>>> <6>pid_max: default: 32768 minimum: 301
>>> <6>LSM: Security Framework initializing
>>> <6>Yama: disabled by default; enable with sysctl kernel.yama.*
>>> <6>SELinux:  Initializing.
>>> <6>TOMOYO Linux initialized
>>> <6>Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
>>> <6>Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes,
>>> linear)
>>> <4>
>>> <4>Modules linked in:
>>> <6>Pid: 0, comm: swapper Not tainted 5.10.5
>>> <6>RIP: 0033:[<00000000604d4201>]
>>> <6>RSP: 00007ffca56a8890  EFLAGS: 00010206
>>> <6>RAX: 0000000600000000 RBX: 0000000000000059 RCX: 00007ffca56a8000
>>> <6>RDX: 0000000000000035 RSI: 0000000060b69a71 RDI: 0000000060d8ac3b
>>> <6>RBP: 0000000000000000 R08: 0000000060b69a72 R09: 0000000060d8abe2
>>> <6>R10: 0000000080000000 R11: 3d74696e695f676e R12: 0000000000000002
>>> <6>R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000001
>>> <0>Kernel panic - not syncing: Segfault with no mm
>>> <4>CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.5 #1
>>> <4>Stack:
>>> <4> 61335b50 8000000000000000 7fae69465908 7fae69465ae5
>>> <4> 7fae698ae9e8 00000000 7ffca56a88d0 00000400
>>> <4> 7fae6985bf20 7fae698ae9e8 00000000 00000000Call Trace:
>>> <4> [<604d4fa3>] ? __printk_safe_enter+0x0/0x35
>>> <4> [<604d154a>] ? arch_local_irq_save+0x0/0x22
>>> <4> [<604d46f5>] ? vprintk_emit+0x9d/0x185
>>> <4> [<604d49d3>] ? vprintk_deferred+0x1d/0x32
>>> <4> [<60a26ee2>] ? printk_deferred+0x93/0x9b
>>> <4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
>>> <4> [<60a26e4f>] ? printk_deferred+0x0/0x9b
>>> <4> [<6049cddb>] ? set_signals+0x0/0x38
>>> <4> [<60589588>] ? arch_local_irq_save+0x0/0x22
>>> <4> [<6055c928>] ? kvmalloc_node+0x56/0x96
>>> <4> [<6058d3c0>] ? __kmalloc+0x1e2/0x1f9
>>> <4> [<608e3d32>] ? ___ratelimit+0xd0/0xde
>>> <4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
>>> <4> [<60901485>] ? _warn_unseeded_randomness+0x60/0x8f
>>> <4> [<6090295b>] ? get_random_u32+0x29/0x98
>>> <4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
>>> <4> [<6088f68a>] ? bucket_table_alloc.isra.0+0x0/0x13d
>>> <4> [<6088ff7a>] ? rhashtable_init+0x175/0x1ca
>>> <4> [<607ef317>] ? ipc_init_ids+0x4e/0x6f
>>> <4> [<600153bd>] ? sem_init+0x17/0x45
>>> <4> [<6049d0e5>] ? start_ptraced_child+0x0/0x180
>>> <4> [<604a0ce0>] ? kernel_longjmp+0x0/0x20
>>> <4> [<6049cc3d>] ? set_handler+0x123/0x15b
>>> <4> [<6049c9ee>] ? hard_handler+0x0/0xcd
>>> <4> [<604a0ce0>] ? kernel_longjmp+0x0/0x20
>>> <4> [<6049c3a6>] ? openpty_cb+0x22/0x3b
>>> <4> [<6049fb4b>] ? start_idle_thread+0x66/0x116
>>> <4> [<60004613>] ? linux_main+0x2e7/0x2f9
>>> <4> [<6049cc86>] ? change_sig+0x0/0x6a
>>> <4> [<6000565e>] ? main+0x230/0x2dc
>>> <4> [<60a256b0>] ? __libc_csu_init+0x0/0x60
>>> <4> [<604827d0>] ? _start+0x0/0x30
>>> <4> [<6000542e>] ? main+0x0/0x2dc
>>> <4> [<604827d0>] ? _start+0x0/0x30
>>> <4> [<604827d0>] ? _start+0x0/0x30
>>> <4> [<604827fa>] ? _start+0x2a/0x30
>>> <4> [<604827d0>] ? _start+0x0/0x30
>>> Aborted (core dumped)
>>> ```
>>>
>>>
>>> _______________________________________________
>>> linux-um mailing list
>>> linux-um@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-um
>>>
>>
> 

-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-02-23 17:19     ` Anton Ivanov
@ 2021-02-23 17:26       ` Ritesh Raj Sarraf
  2021-02-23 18:02         ` Anton Ivanov
  2021-02-24 11:44         ` Anton Ivanov
  0 siblings, 2 replies; 41+ messages in thread
From: Ritesh Raj Sarraf @ 2021-02-23 17:26 UTC (permalink / raw)
  To: Anton Ivanov, Christopher Obbard, linux-um; +Cc: 983379


[-- Attachment #1.1: Type: text/plain, Size: 970 bytes --]

Added the debian bug report in CC.

On Tue, 2021-02-23 at 17:19 +0000, Anton Ivanov wrote:
> > The current Debian user-mode-linux package in unstable is based on
> > the 5.10.5 stable source which includes the mentioned patch, but is
> > still causing an error for some users.
> 
> After updating the tree to 5.10.5 and applying all Debian patches
> from the package, I cannot reproduce the bug.
> 
> I am running it on 5.10, 5.2 and 4.19 hosts with the same parameters
> without issues. Hosts are all up to date Debian 10.8 and so is the
> UML userspace.
> 

Did you mean 5.10, 5.2 and 4.19 (UML) guests ?

We've seen this happen on Debian Testing and Unstable Host (of which
the former would soon be the next stable i.e. Debian Bullseye).

In our tests, when running the same linux uml binary (5.10) on a Debian
Stable Host, it is working fine.


-- 
Ritesh Raj Sarraf | http://people.debian.org/~rrs
Debian - The Universal Operating System

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 152 bytes --]

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-02-23 17:26       ` Ritesh Raj Sarraf
@ 2021-02-23 18:02         ` Anton Ivanov
  2021-02-24 11:44         ` Anton Ivanov
  1 sibling, 0 replies; 41+ messages in thread
From: Anton Ivanov @ 2021-02-23 18:02 UTC (permalink / raw)
  To: rrs, Christopher Obbard, linux-um; +Cc: 983379

On 23/02/2021 17:26, Ritesh Raj Sarraf wrote:
> Added the debian bug report in CC.
> 
> On Tue, 2021-02-23 at 17:19 +0000, Anton Ivanov wrote:
>>> The current Debian user-mode-linux package in unstable is based on
>>> the 5.10.5 stable source which includes the mentioned patch, but is
>>> still causing an error for some users.
>>
>> After updating the tree to 5.10.5 and applying all Debian patches
>> from the package, I cannot reproduce the bug.
>>
>> I am running it on 5.10, 5.2 and 4.19 hosts with the same parameters
>> without issues. Hosts are all up to date Debian 10.8 and so is the
>> UML userspace.
>>
> 
> Did you mean 5.10, 5.2 and 4.19 (UML) guests ?

No. Hosts.

I have several 6core/12thread Ryzens which are used for development 
testing.

They all use identical userspace with the sole difference being the 
kernel. They all use a selection of 5.x because 4.19 does not support 
the hardware properly.

The 4.19 testing is done on my old "test farm" which is all A8s and 
Athlon X760.

> 
> We've seen this happen on Debian Testing and Unstable Host (of which
> the former would soon be the next stable i.e. Debian Bullseye).



> 
> In our tests, when running the same linux uml binary (5.10) on a Debian
> Stable Host, it is working fine.
> 


OK. I will upgrade one of my systems to Debian testing to try to 
reproduce this.


-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-02-23 17:26       ` Ritesh Raj Sarraf
  2021-02-23 18:02         ` Anton Ivanov
@ 2021-02-24 11:44         ` Anton Ivanov
  2021-03-02  9:09           ` Ritesh Raj Sarraf
  1 sibling, 1 reply; 41+ messages in thread
From: Anton Ivanov @ 2021-02-24 11:44 UTC (permalink / raw)
  To: rrs, Christopher Obbard, linux-um; +Cc: 983379



On 23/02/2021 17:26, Ritesh Raj Sarraf wrote:
> Added the debian bug report in CC.
> 
> On Tue, 2021-02-23 at 17:19 +0000, Anton Ivanov wrote:
>>> The current Debian user-mode-linux package in unstable is based on
>>> the 5.10.5 stable source which includes the mentioned patch, but is
>>> still causing an error for some users.
>>
>> After updating the tree to 5.10.5 and applying all Debian patches
>> from the package, I cannot reproduce the bug.
>>
>> I am running it on 5.10, 5.2 and 4.19 hosts with the same parameters
>> without issues. Hosts are all up to date Debian 10.8 and so is the
>> UML userspace.
>>
> 
> Did you mean 5.10, 5.2 and 4.19 (UML) guests ?
> 
> We've seen this happen on Debian Testing and Unstable Host (of which
> the former would soon be the next stable i.e. Debian Bullseye).
> 
> In our tests, when running the same linux uml binary (5.10) on a Debian
> Stable Host, it is working fine.

I cannot reproduce it on a physical Bullseye host using the Debian user-mode-linux package compiled from source.

Environment - Bullseye minimal install and build deps. 6 cores/12 threads Ryzen

I cannot reproduce it using the upstream source and the patches from the user-mode-linux package

Environment - same as above.

I cannot reproduce it using the upstream source + patches and compiling on Buster using the following:

1. Bullseye physical host, minimal install, same hardware

2. Bullseye VM, minimal install, running with 4 vCPUs on the same host

3. Bullseye LXC container running on a Debian Buster host, minimal install, same hardware

In all cases it boots cleanly and there are no segfaults.

So, frankly, no idea what is causing it to crash - I have run most combinations of 5.10 on a 5.10, all work fine here.

-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-02-24 11:44         ` Anton Ivanov
@ 2021-03-02  9:09           ` Ritesh Raj Sarraf
  2021-03-02 11:34             ` Anton Ivanov
  0 siblings, 1 reply; 41+ messages in thread
From: Ritesh Raj Sarraf @ 2021-03-02  9:09 UTC (permalink / raw)
  To: Anton Ivanov, Christopher Obbard, linux-um; +Cc: 983379


[-- Attachment #1.1: Type: text/plain, Size: 462 bytes --]

On Wed, 2021-02-24 at 11:44 +0000, Anton Ivanov wrote:
> In all cases it boots cleanly and there are no segfaults.
> 
> So, frankly, no idea what is causing it to crash - I have run most
> combinations of 5.10 on a 5.10, all work fine here.

Is there any other way I can help you with this issue ?
I do have the core dump available on my local machine.


-- 
Ritesh Raj Sarraf | http://people.debian.org/~rrs
Debian - The Universal Operating System

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 152 bytes --]

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-02  9:09           ` Ritesh Raj Sarraf
@ 2021-03-02 11:34             ` Anton Ivanov
  2021-03-02 14:23               ` Ritesh Raj Sarraf
  0 siblings, 1 reply; 41+ messages in thread
From: Anton Ivanov @ 2021-03-02 11:34 UTC (permalink / raw)
  To: rrs, Christopher Obbard, linux-um; +Cc: 983379



On 02/03/2021 09:09, Ritesh Raj Sarraf wrote:
> On Wed, 2021-02-24 at 11:44 +0000, Anton Ivanov wrote:
>> In all cases it boots cleanly and there are no segfaults.
>>
>> So, frankly, no idea what is causing it to crash - I have run most
>> combinations of 5.10 on a 5.10, all work fine here.
> 
> Is there any other way I can help you with this issue ?
> I do have the core dump available on my local machine.

If gdb gives you the exact lines, that may be helpful.

I have looked through the bt several times, it is something through which my set-up cruises through.

The actual moment you see in the backtrace is this one:

[    0.080000] random: get_random_u32 called from bucket_table_alloc.isra.0+0x115/0x13d with crng_init=0

However, in your case, instead of getting this printk warning out it blows up.

Why - I don't know.

A.

> 
> 
> 
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
> 

-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-02 11:34             ` Anton Ivanov
@ 2021-03-02 14:23               ` Ritesh Raj Sarraf
  2021-03-02 17:05                 ` Anton Ivanov
  0 siblings, 1 reply; 41+ messages in thread
From: Ritesh Raj Sarraf @ 2021-03-02 14:23 UTC (permalink / raw)
  To: Anton Ivanov, Christopher Obbard, linux-um; +Cc: 983379


[-- Attachment #1.1: Type: text/plain, Size: 2223 bytes --]

On Tue, 2021-03-02 at 11:34 +0000, Anton Ivanov wrote:
> If gdb gives you the exact lines, that may be helpful.

It doesn't. But it does show drawbacks in my packaging. The debug
symbols packaged are not read/honored by gdb at all.

```
Reading symbols from /usr/bin/linux.uml...
Reading symbols from /usr/lib/debug/.build-
id/6f/ea141539149074c72e80fb8004de124fda115b.debug...
(No debugging symbols found in /usr/lib/debug/.build-
id/6f/ea141539149074c72e80fb8004de124fda115b.debug)

warning: Can't open file /dev/shm/#20817 (deleted) during file-backed
mapping note processing
[New LWP 18788]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-
gnu/libthread_db.so.1".
Core was generated by `linux ubd0=qemu-linux-image.img'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f51842c0087 in kill () at ../sysdeps/unix/syscall-
template.S:120
120     ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x00007f51842c0087 in kill () at ../sysdeps/unix/syscall-
template.S:120
#1  0x000000006049dc20 in uml_abort ()
#2  0x000000006049de7a in os_dump_core ()
#3  0x0000000060486e47 in panic_exit ()
#4  0x00000000604c0a03 in notifier_call_chain ()
#5  0x00000000604c0a98 in atomic_notifier_call_chain ()
#6  0x0000000060a26b85 in panic ()
#7  0x00000000604869e1 in segv ()
#8  0x0000000060486ba9 in segv_handler ()
#9  0x000000006049ccc0 in sig_handler_common ()
#10 0x000000006049d1ec in sig_handler ()
#11 0x000000006049cdc6 in hard_handler ()
#12 <signal handler called>
#13 0x00000000604d45b4 in vprintk_store ()
#14 0x00000000604d4aa8 in vprintk_emit ()
#15 0x00000000604d4d86 in vprintk_deferred ()
#16 0x0000000060a27a02 in printk_deferred ()
#17 0x00000000609031b2 in get_random_u32 ()
#18 0x000000006088ff65 in bucket_table_alloc.isra ()
#19 0x0000000060890740 in rhashtable_init ()
#20 0x00000000607efaa2 in ipc_init_ids ()
#21 0x00000000600153c9 in sem_init ()
```

So the best I can extract for you is to compile the kernel with as much
information as possible.

Thanks,
Ritesh

-- 
Ritesh Raj Sarraf | http://people.debian.org/~rrs
Debian - The Universal Operating System

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 152 bytes --]

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-02 14:23               ` Ritesh Raj Sarraf
@ 2021-03-02 17:05                 ` Anton Ivanov
  2021-03-02 17:27                   ` Ritesh Raj Sarraf
  0 siblings, 1 reply; 41+ messages in thread
From: Anton Ivanov @ 2021-03-02 17:05 UTC (permalink / raw)
  To: rrs, Christopher Obbard, linux-um; +Cc: 983379



On 02/03/2021 14:23, Ritesh Raj Sarraf wrote:
> On Tue, 2021-03-02 at 11:34 +0000, Anton Ivanov wrote:
>> If gdb gives you the exact lines, that may be helpful.
> 
> It doesn't. But it does show drawbacks in my packaging. The debug
> symbols packaged are not read/honored by gdb at all.
> 
> ```
> Reading symbols from /usr/bin/linux.uml...
> Reading symbols from /usr/lib/debug/.build-
> id/6f/ea141539149074c72e80fb8004de124fda115b.debug...
> (No debugging symbols found in /usr/lib/debug/.build-
> id/6f/ea141539149074c72e80fb8004de124fda115b.debug)
> 
> warning: Can't open file /dev/shm/#20817 (deleted) during file-backed
> mapping note processing
> [New LWP 18788]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-
> gnu/libthread_db.so.1".
> Core was generated by `linux ubd0=qemu-linux-image.img'.
> Program terminated with signal SIGABRT, Aborted.
> #0  0x00007f51842c0087 in kill () at ../sysdeps/unix/syscall-
> template.S:120
> 120     ../sysdeps/unix/syscall-template.S: No such file or directory.
> (gdb) bt
> #0  0x00007f51842c0087 in kill () at ../sysdeps/unix/syscall-
> template.S:120
> #1  0x000000006049dc20 in uml_abort ()
> #2  0x000000006049de7a in os_dump_core ()
> #3  0x0000000060486e47 in panic_exit ()
> #4  0x00000000604c0a03 in notifier_call_chain ()
> #5  0x00000000604c0a98 in atomic_notifier_call_chain ()
> #6  0x0000000060a26b85 in panic ()
> #7  0x00000000604869e1 in segv ()
> #8  0x0000000060486ba9 in segv_handler ()
> #9  0x000000006049ccc0 in sig_handler_common ()
> #10 0x000000006049d1ec in sig_handler ()
> #11 0x000000006049cdc6 in hard_handler ()
> #12 <signal handler called>
> #13 0x00000000604d45b4 in vprintk_store ()
> #14 0x00000000604d4aa8 in vprintk_emit ()
> #15 0x00000000604d4d86 in vprintk_deferred ()
> #16 0x0000000060a27a02 in printk_deferred ()
> #17 0x00000000609031b2 in get_random_u32 ()
> #18 0x000000006088ff65 in bucket_table_alloc.isra ()
> #19 0x0000000060890740 in rhashtable_init ()
> #20 0x00000000607efaa2 in ipc_init_ids ()
> #21 0x00000000600153c9 in sem_init ()
> ```
> 
> So the best I can extract for you is to compile the kernel with as much
> information as possible.

Can you try using one of the older kernels so we can verify if this is indeed a 5.10 thing.

I will do a dissect the moment I figure out how to reproduce it. I will try to do some more experiments on that tomorrow.

> 
> Thanks,
> Ritesh
> 

-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-02 17:05                 ` Anton Ivanov
@ 2021-03-02 17:27                   ` Ritesh Raj Sarraf
  2021-03-03  9:30                     ` Anton Ivanov
  2021-03-03 22:40                     ` Johannes Berg
  0 siblings, 2 replies; 41+ messages in thread
From: Ritesh Raj Sarraf @ 2021-03-02 17:27 UTC (permalink / raw)
  To: Anton Ivanov, Christopher Obbard, linux-um; +Cc: 983379


[-- Attachment #1.1: Type: text/plain, Size: 7860 bytes --]

On Tue, 2021-03-02 at 17:05 +0000, Anton Ivanov wrote:
> > So the best I can extract for you is to compile the kernel with as
> > much
> > information as possible.
> 
> Can you try using one of the older kernels so we can verify if this
> is indeed a 5.10 thing.
> 

That was the first thing I tried. I tested it with 5.10, 5.9 and 5.4.
All 3 crashed. That's when I knew this one was going to be painful one
to conclude.

The only other input I have is that I have one more user who's reported
to be able to reproduce the issue.

OTOH, I have one more user (other than you) who's not been able to
reproduce the issue.

> I will do a dissect the moment I figure out how to reproduce it. I
> will try to do some more experiments on that tomorrow.


Meanwhile, I enabled some debug info in the kernel. Here's what I have
got so far:

```
(gdb) bt
#0  0x00007f89908dc087 in kill () at ../sysdeps/unix/syscall-
template.S:120
#1  0x00000000604a3514 in uml_abort () at arch/um/os-Linux/util.c:94
#2  0x00000000604a3791 in os_dump_core () at arch/um/os-
Linux/util.c:149
#3  0x000000006048d126 in panic_exit (self=0x2e66d5, unused1=6,
unused2=0x0) at arch/um/kernel/um_arch.c:217
#4  0x00000000604c725a in notifier_call_chain (nl=0x2e66d5, val=0,
v=0x60d82f40 <buf>, nr_to_call=-1, nr_calls=0x0) at
kernel/notifier.c:83
#5  0x00000000604c72f6 in atomic_notifier_call_chain (nh=0x2e66d5,
val=6, v=0x0) at kernel/notifier.c:217
#6  0x0000000060a54607 in panic (fmt=0x60a55225 <printk>
"UH\211\345H\201\354", <incomplete sequence \320>) at
kernel/panic.c:272
#7  0x000000006048cca3 in segv (fi=<incomplete type>, ip=1615717312,
is_user=0, regs=0x60c2ee58 <cpu0_irqstack+11864>) at
arch/um/kernel/trap.c:246
#8  0x000000006048ce64 in segv_handler (sig=3040981, unused_si=0x6,
regs=0x60c2ee58 <cpu0_irqstack+11864>) at arch/um/kernel/trap.c:190
#9  0x00000000604a2556 in sig_handler_common (sig=11, si=0x60c2fbf0
<cpu0_irqstack+15344>, mc=0x60c2fae8 <cpu0_irqstack+15080>) at
arch/um/os-Linux/signal.c:48
#10 0x00000000604a2aa2 in sig_handler (sig=3040981, si=0x6, mc=0x0) at
arch/um/os-Linux/signal.c:81
#11 0x00000000604a265f in hard_handler (sig=3040981, si=0x60c2fbf0
<cpu0_irqstack+15344>, p=0x0) at arch/um/os-Linux/signal.c:180
#12 <signal handler called>
#13 0x00000000604de3c0 in printk_caller_id () at
kernel/printk/printk.c:1924
#14 log_output (text_len=<optimized out>, text=<optimized out>,
dev_info=<optimized out>, lflags=<optimized out>, level=<optimized
out>, facility=<optimized out>) at kernel/printk/printk.c:1932
#15 vprintk_store (facility=1624806843, level=5, dev_info=0x0, fmt=0x35
<error: Cannot access memory at address 0x35>, args=0x1) at
kernel/printk/printk.c:2004
#16 0x00000000604de8b7 in vprintk_emit (facility=1624806843,
level=1622768673, dev_info=0x35, fmt=0x1 <error: Cannot access memory
at address 0x1>, args=0x60b97c22) at kernel/printk/printk.c:2029
#17 0x00000000604debad in vprintk_deferred (fmt=0x1 <error: Cannot
access memory at address 0x1>, args=0x60b97c21) at
kernel/printk/printk.c:3079
#18 0x0000000060a554de in printk_deferred (fmt=0x60d895bb <textbuf+91>
"\n") at kernel/printk/printk.c:3091
#19 0x000000006092680f in _warn_unseeded_randomness
(previous=<optimized out>, caller=<optimized out>, func_name=<optimized
out>) at drivers/char/random.c:1534
#20 _warn_unseeded_randomness (func_name=0x60abf380 <__func__.38>
"get_random_u32", caller=0x608b5f25 <bucket_table_alloc+287>,
previous=0x35) at drivers/char/random.c:1516
#21 0x0000000060927d47 in get_random_u32 () at
drivers/char/random.c:2221
#22 0x00000000608b5f25 in bucket_table_alloc (nbuckets=64, gfp=3264,
ht=<optimized out>) at lib/rhashtable.c:203
#23 0x00000000608b6733 in rhashtable_init (ht=0x60c60e30
<init_ipc_ns+80>, params=0x608b5e06 <bucket_table_alloc>) at
lib/rhashtable.c:1061
#24 0x000000006080f234 in ipc_init_ids (ids=0x60c60de8 <init_ipc_ns+8>)
at ipc/util.c:119
#25 0x0000000060813c6d in sem_init_ns (ns=0x60d895bb <textbuf+91>) at
ipc/sem.c:254
#26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
#27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-
gnu/libcom_err.so.2
#28 0x00007f8990ab8fb2 in call_init (l=<optimized out>,
argc=argc@entry=5, argv=argv@entry=0x7ffe3e7a4c98,
env=env@entry=0x7ffe3e7a4cc8) at dl-init.c:72
#29 0x00007f8990ab90b9 in call_init (env=0x7ffe3e7a4cc8,
argv=0x7ffe3e7a4c98, argc=5, l=<optimized out>) at dl-init.c:30
#30 _dl_init (main_map=0x61497ea0, argc=5, argv=0x7ffe3e7a4c98,
env=0x7ffe3e7a4cc8) at dl-init.c:119
#31 0x00007f89909d82bd in __GI__dl_catch_exception
(exception=exception@entry=0x0, operate=operate@entry=0x7f8990abc5a0
<call_dl_init>, args=args@entry=0x7ffe3e7a1e80) at dl-error-
skeleton.c:182
#32 0x00007f8990abd028 in dl_open_worker (a=a@entry=0x7ffe3e7a2020) at
dl-open.c:758
#33 0x00007f89909d8260 in __GI__dl_catch_exception
(exception=exception@entry=0x7ffe3e7a2000,
operate=operate@entry=0x7f8990abcc70 <dl_open_worker>,
args=args@entry=0x7ffe3e7a2020) at dl-error-skeleton.c:208
#34 0x00007f8990abc8ca in _dl_open (file=0x7ffe3e7a22a0
"libnss_nis.so.2", mode=-2147483646, caller_dlopen=0x7f89909bf3a6
<nss_load_library+294>, nsid=-2, argc=5, argv=0x7ffe3e7a2000,
env=0x7ffe3e7a4cc8)
    at dl-open.c:837
#35 0x00007f89909d76dd in do_dlopen (ptr=ptr@entry=0x7ffe3e7a2260) at
dl-libc.c:96
#36 0x00007f89909d8260 in __GI__dl_catch_exception
(exception=exception@entry=0x7ffe3e7a21e0,
operate=operate@entry=0x7f89909d76a0 <do_dlopen>,
args=args@entry=0x7ffe3e7a2260) at dl-error-skeleton.c:208
#37 0x00007f89909d831f in __GI__dl_catch_error
(objname=objname@entry=0x7ffe3e7a2238,
errstring=errstring@entry=0x7ffe3e7a2240,
mallocedp=mallocedp@entry=0x7ffe3e7a2237, 
    operate=operate@entry=0x7f89909d76a0 <do_dlopen>,
args=args@entry=0x7ffe3e7a2260) at dl-error-skeleton.c:227
#38 0x00007f89909d77b7 in dlerror_run
(operate=operate@entry=0x7f89909d76a0 <do_dlopen>,
args=args@entry=0x7ffe3e7a2260) at dl-libc.c:46
#39 0x00007f89909d7846 in __GI___libc_dlopen_mode
(name=name@entry=0x7ffe3e7a22a0 "libnss_nis.so.2", mode=mode@entry=-
2147483646) at dl-libc.c:195
#40 0x00007f89909bf3a6 in nss_load_library (ni=ni@entry=0x61497db0) at
nsswitch.c:359
#41 0x00007f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
fct_name=<optimized out>, fct_name@entry=0x7f899089b020 "setgrent") at
nsswitch.c:467
#42 0x00007f899089554b in init_nss_interface () at nss_compat/compat-
grp.c:83
#43 init_nss_interface () at nss_compat/compat-grp.c:79
#44 0x00007f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
"tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
#45 0x00007f8990968b85 in __getgrnam_r (name=name@entry=0x7f8990a2a1e0
"tty", resbuf=resbuf@entry=0x7ffe3e7a2910,
buffer=buffer@entry=0x7ffe3e7a24e0 "", buflen=1024,
result=result@entry=0x7ffe3e7a2908)
    at ../nss/getXXbyYY_r.c:315
#46 0x00007f89909d6b77 in grantpt (fd=fd@entry=5) at
../sysdeps/unix/grantpt.c:152
#47 0x00007f8990a9394e in __GI_openpty (amaster=0x60c2bd94,
aslave=0x60c2bd98, name=0x0, termp=0x0, winp=0x0) at openpty.c:103
#48 0x00000000604a1f65 in openpty_cb (arg=0x60c2bd94) at arch/um/os-
Linux/sigio.c:407
#49 0x00000000604a58d0 in start_idle_thread (stack=0x60c28000
<init_thread_info>, switch_buf=0x60c31e08 <init_task+4936>) at
arch/um/os-Linux/skas/process.c:598
#50 0x0000000060004a3d in start_uml () at
arch/um/kernel/skas/process.c:45
#51 0x00000000600047b2 in linux_main (argc=1624806843, argv=0x40709000)
at arch/um/kernel/um_arch.c:334
#52 0x000000006000574f in main (argc=5, argv=0x7ffe3e7a4c98, envp=0x35)
at arch/um/os-Linux/main.c:144
(gdb) 

```


-- 
Ritesh Raj Sarraf | http://people.debian.org/~rrs
Debian - The Universal Operating System

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 152 bytes --]

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-02 17:27                   ` Ritesh Raj Sarraf
@ 2021-03-03  9:30                     ` Anton Ivanov
  2021-03-03 10:45                       ` Bug#983379: " Ritesh Raj Sarraf
  2021-03-03 22:40                     ` Johannes Berg
  1 sibling, 1 reply; 41+ messages in thread
From: Anton Ivanov @ 2021-03-03  9:30 UTC (permalink / raw)
  To: rrs, Christopher Obbard, linux-um; +Cc: 983379



On 02/03/2021 17:27, Ritesh Raj Sarraf wrote:
> On Tue, 2021-03-02 at 17:05 +0000, Anton Ivanov wrote:
>>> So the best I can extract for you is to compile the kernel with as
>>> much
>>> information as possible.
>>
>> Can you try using one of the older kernels so we can verify if this
>> is indeed a 5.10 thing.
>>
> 
> That was the first thing I tried. I tested it with 5.10, 5.9 and 5.4.
> All 3 crashed. That's when I knew this one was going to be painful one
> to conclude.
> 
> The only other input I have is that I have one more user who's reported
> to be able to reproduce the issue.
> 
> OTOH, I have one more user (other than you) who's not been able to
> reproduce the issue.
> 
>> I will do a dissect the moment I figure out how to reproduce it. I
>> will try to do some more experiments on that tomorrow.

I tried to alter the userspace a bit, but it makes no difference.

Out of curiosity, what are you running it on?

> 
> 
> Meanwhile, I enabled some debug info in the kernel. Here's what I have
> got so far:
> 
> ```
> (gdb) bt
> #0  0x00007f89908dc087 in kill () at ../sysdeps/unix/syscall-
> template.S:120
> #1  0x00000000604a3514 in uml_abort () at arch/um/os-Linux/util.c:94
> #2  0x00000000604a3791 in os_dump_core () at arch/um/os-
> Linux/util.c:149
> #3  0x000000006048d126 in panic_exit (self=0x2e66d5, unused1=6,
> unused2=0x0) at arch/um/kernel/um_arch.c:217
> #4  0x00000000604c725a in notifier_call_chain (nl=0x2e66d5, val=0,
> v=0x60d82f40 <buf>, nr_to_call=-1, nr_calls=0x0) at
> kernel/notifier.c:83
> #5  0x00000000604c72f6 in atomic_notifier_call_chain (nh=0x2e66d5,
> val=6, v=0x0) at kernel/notifier.c:217
> #6  0x0000000060a54607 in panic (fmt=0x60a55225 <printk>
> "UH\211\345H\201\354", <incomplete sequence \320>) at
> kernel/panic.c:272
> #7  0x000000006048cca3 in segv (fi=<incomplete type>, ip=1615717312,
> is_user=0, regs=0x60c2ee58 <cpu0_irqstack+11864>) at
> arch/um/kernel/trap.c:246
> #8  0x000000006048ce64 in segv_handler (sig=3040981, unused_si=0x6,
> regs=0x60c2ee58 <cpu0_irqstack+11864>) at arch/um/kernel/trap.c:190
> #9  0x00000000604a2556 in sig_handler_common (sig=11, si=0x60c2fbf0
> <cpu0_irqstack+15344>, mc=0x60c2fae8 <cpu0_irqstack+15080>) at
> arch/um/os-Linux/signal.c:48
> #10 0x00000000604a2aa2 in sig_handler (sig=3040981, si=0x6, mc=0x0) at
> arch/um/os-Linux/signal.c:81
> #11 0x00000000604a265f in hard_handler (sig=3040981, si=0x60c2fbf0
> <cpu0_irqstack+15344>, p=0x0) at arch/um/os-Linux/signal.c:180
> #12 <signal handler called>

The code here is:

static inline u32 printk_caller_id(void)
{
	return in_task() ? task_pid_nr(current) :
		0x80000000 + raw_smp_processor_id();
}


That is something which should not bomb out unless we have memory corruption or something along those lines - current being invalid.

A.

> #13 0x00000000604de3c0 in printk_caller_id () at
> kernel/printk/printk.c:1924
> #14 log_output (text_len=<optimized out>, text=<optimized out>,
> dev_info=<optimized out>, lflags=<optimized out>, level=<optimized
> out>, facility=<optimized out>) at kernel/printk/printk.c:1932
> #15 vprintk_store (facility=1624806843, level=5, dev_info=0x0, fmt=0x35
> <error: Cannot access memory at address 0x35>, args=0x1) at
> kernel/printk/printk.c:2004
> #16 0x00000000604de8b7 in vprintk_emit (facility=1624806843,
> level=1622768673, dev_info=0x35, fmt=0x1 <error: Cannot access memory
> at address 0x1>, args=0x60b97c22) at kernel/printk/printk.c:2029
> #17 0x00000000604debad in vprintk_deferred (fmt=0x1 <error: Cannot
> access memory at address 0x1>, args=0x60b97c21) at
> kernel/printk/printk.c:3079
> #18 0x0000000060a554de in printk_deferred (fmt=0x60d895bb <textbuf+91>
> "\n") at kernel/printk/printk.c:3091
> #19 0x000000006092680f in _warn_unseeded_randomness
> (previous=<optimized out>, caller=<optimized out>, func_name=<optimized
> out>) at drivers/char/random.c:1534
> #20 _warn_unseeded_randomness (func_name=0x60abf380 <__func__.38>
> "get_random_u32", caller=0x608b5f25 <bucket_table_alloc+287>,
> previous=0x35) at drivers/char/random.c:1516
> #21 0x0000000060927d47 in get_random_u32 () at
> drivers/char/random.c:2221
> #22 0x00000000608b5f25 in bucket_table_alloc (nbuckets=64, gfp=3264,
> ht=<optimized out>) at lib/rhashtable.c:203
> #23 0x00000000608b6733 in rhashtable_init (ht=0x60c60e30
> <init_ipc_ns+80>, params=0x608b5e06 <bucket_table_alloc>) at
> lib/rhashtable.c:1061
> #24 0x000000006080f234 in ipc_init_ids (ids=0x60c60de8 <init_ipc_ns+8>)
> at ipc/util.c:119
> #25 0x0000000060813c6d in sem_init_ns (ns=0x60d895bb <textbuf+91>) at
> ipc/sem.c:254
> #26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
> #27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-
> gnu/libcom_err.so.2
> #28 0x00007f8990ab8fb2 in call_init (l=<optimized out>,
> argc=argc@entry=5, argv=argv@entry=0x7ffe3e7a4c98,
> env=env@entry=0x7ffe3e7a4cc8) at dl-init.c:72
> #29 0x00007f8990ab90b9 in call_init (env=0x7ffe3e7a4cc8,
> argv=0x7ffe3e7a4c98, argc=5, l=<optimized out>) at dl-init.c:30
> #30 _dl_init (main_map=0x61497ea0, argc=5, argv=0x7ffe3e7a4c98,
> env=0x7ffe3e7a4cc8) at dl-init.c:119
> #31 0x00007f89909d82bd in __GI__dl_catch_exception
> (exception=exception@entry=0x0, operate=operate@entry=0x7f8990abc5a0
> <call_dl_init>, args=args@entry=0x7ffe3e7a1e80) at dl-error-
> skeleton.c:182
> #32 0x00007f8990abd028 in dl_open_worker (a=a@entry=0x7ffe3e7a2020) at
> dl-open.c:758
> #33 0x00007f89909d8260 in __GI__dl_catch_exception
> (exception=exception@entry=0x7ffe3e7a2000,
> operate=operate@entry=0x7f8990abcc70 <dl_open_worker>,
> args=args@entry=0x7ffe3e7a2020) at dl-error-skeleton.c:208
> #34 0x00007f8990abc8ca in _dl_open (file=0x7ffe3e7a22a0
> "libnss_nis.so.2", mode=-2147483646, caller_dlopen=0x7f89909bf3a6
> <nss_load_library+294>, nsid=-2, argc=5, argv=0x7ffe3e7a2000,
> env=0x7ffe3e7a4cc8)
>      at dl-open.c:837
> #35 0x00007f89909d76dd in do_dlopen (ptr=ptr@entry=0x7ffe3e7a2260) at
> dl-libc.c:96
> #36 0x00007f89909d8260 in __GI__dl_catch_exception
> (exception=exception@entry=0x7ffe3e7a21e0,
> operate=operate@entry=0x7f89909d76a0 <do_dlopen>,
> args=args@entry=0x7ffe3e7a2260) at dl-error-skeleton.c:208
> #37 0x00007f89909d831f in __GI__dl_catch_error
> (objname=objname@entry=0x7ffe3e7a2238,
> errstring=errstring@entry=0x7ffe3e7a2240,
> mallocedp=mallocedp@entry=0x7ffe3e7a2237,
>      operate=operate@entry=0x7f89909d76a0 <do_dlopen>,
> args=args@entry=0x7ffe3e7a2260) at dl-error-skeleton.c:227
> #38 0x00007f89909d77b7 in dlerror_run
> (operate=operate@entry=0x7f89909d76a0 <do_dlopen>,
> args=args@entry=0x7ffe3e7a2260) at dl-libc.c:46
> #39 0x00007f89909d7846 in __GI___libc_dlopen_mode
> (name=name@entry=0x7ffe3e7a22a0 "libnss_nis.so.2", mode=mode@entry=-
> 2147483646) at dl-libc.c:195
> #40 0x00007f89909bf3a6 in nss_load_library (ni=ni@entry=0x61497db0) at
> nsswitch.c:359
> #41 0x00007f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
> fct_name=<optimized out>, fct_name@entry=0x7f899089b020 "setgrent") at
> nsswitch.c:467
> #42 0x00007f899089554b in init_nss_interface () at nss_compat/compat-
> grp.c:83
> #43 init_nss_interface () at nss_compat/compat-grp.c:79
> #44 0x00007f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
> "tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
> errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
> #45 0x00007f8990968b85 in __getgrnam_r (name=name@entry=0x7f8990a2a1e0
> "tty", resbuf=resbuf@entry=0x7ffe3e7a2910,
> buffer=buffer@entry=0x7ffe3e7a24e0 "", buflen=1024,
> result=result@entry=0x7ffe3e7a2908)
>      at ../nss/getXXbyYY_r.c:315
> #46 0x00007f89909d6b77 in grantpt (fd=fd@entry=5) at
> ../sysdeps/unix/grantpt.c:152
> #47 0x00007f8990a9394e in __GI_openpty (amaster=0x60c2bd94,
> aslave=0x60c2bd98, name=0x0, termp=0x0, winp=0x0) at openpty.c:103
> #48 0x00000000604a1f65 in openpty_cb (arg=0x60c2bd94) at arch/um/os-
> Linux/sigio.c:407
> #49 0x00000000604a58d0 in start_idle_thread (stack=0x60c28000
> <init_thread_info>, switch_buf=0x60c31e08 <init_task+4936>) at
> arch/um/os-Linux/skas/process.c:598
> #50 0x0000000060004a3d in start_uml () at
> arch/um/kernel/skas/process.c:45
> #51 0x00000000600047b2 in linux_main (argc=1624806843, argv=0x40709000)
> at arch/um/kernel/um_arch.c:334
> #52 0x000000006000574f in main (argc=5, argv=0x7ffe3e7a4c98, envp=0x35)
> at arch/um/os-Linux/main.c:144
> (gdb)
> 
> ```
> 
> 

-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Bug#983379: linux uml segfault
  2021-03-03  9:30                     ` Anton Ivanov
@ 2021-03-03 10:45                       ` Ritesh Raj Sarraf
  2021-03-03 10:53                         ` Anton Ivanov
  0 siblings, 1 reply; 41+ messages in thread
From: Ritesh Raj Sarraf @ 2021-03-03 10:45 UTC (permalink / raw)
  To: Anton Ivanov, 983379, Christopher Obbard, linux-um


[-- Attachment #1.1: Type: text/plain, Size: 1221 bytes --]

HI Anton,

On Wed, 2021-03-03 at 09:30 +0000, Anton Ivanov wrote:
> 
> > 
> > OTOH, I have one more user (other than you) who's not been able to
> > reproduce the issue.
> > 
> > > I will do a dissect the moment I figure out how to reproduce it.
> > > I
> > > will try to do some more experiments on that tomorrow.
> 
> I tried to alter the userspace a bit, but it makes no difference.
> 
> Out of curiosity, what are you running it on?
> 

Bare-metal machines. 3 different machines, all Intel processors.
And it fails on all 3 of them.

On the distribution side, all 3 of them run Debian Unstable, with Linux
5.10.13

> > 
> 
> The code here is:
> 
> static inline u32 printk_caller_id(void)
> {
>         return in_task() ? task_pid_nr(current) :
>                 0x80000000 + raw_smp_processor_id();
> }
> 
> 
> That is something which should not bomb out unless we have memory
> corruption or something along those lines - current being invalid.
> 

Must be something different. Not all machines could have bad memory at
the same time.


-- 
Given the large number of mailing lists I follow, I request you to CC
me in replies for quicker response

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 152 bytes --]

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Bug#983379: linux uml segfault
  2021-03-03 10:45                       ` Bug#983379: " Ritesh Raj Sarraf
@ 2021-03-03 10:53                         ` Anton Ivanov
  0 siblings, 0 replies; 41+ messages in thread
From: Anton Ivanov @ 2021-03-03 10:53 UTC (permalink / raw)
  To: rrs, 983379, Christopher Obbard, linux-um


On 03/03/2021 10:45, Ritesh Raj Sarraf wrote:
> HI Anton,
>
> On Wed, 2021-03-03 at 09:30 +0000, Anton Ivanov wrote:
>>> OTOH, I have one more user (other than you) who's not been able to
>>> reproduce the issue.
>>>
>>>> I will do a dissect the moment I figure out how to reproduce it.
>>>> I
>>>> will try to do some more experiments on that tomorrow.
>> I tried to alter the userspace a bit, but it makes no difference.
>>
>> Out of curiosity, what are you running it on?
>>
> Bare-metal machines. 3 different machines, all Intel processors.
> And it fails on all 3 of them.

Hmmm...

All mine are AMD. I can try to boot up an Intel later today with Bullseye to see if it makes a difference.

> On the distribution side, all 3 of them run Debian Unstable, with Linux
> 5.10.13
>
>> The code here is:
>>
>> static inline u32 printk_caller_id(void)
>> {
>>          return in_task() ? task_pid_nr(current) :
>>                  0x80000000 + raw_smp_processor_id();
>> }
>>
>>
>> That is something which should not bomb out unless we have memory
>> corruption or something along those lines - current being invalid.
>>
> Must be something different. Not all machines could have bad memory at
> the same time.

I did not mean bad memory. I meant memory corruption as a result of race, buffer overrun or anything else like that.

>
>
-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-02 17:27                   ` Ritesh Raj Sarraf
  2021-03-03  9:30                     ` Anton Ivanov
@ 2021-03-03 22:40                     ` Johannes Berg
  2021-03-04  5:38                       ` Hajime Tazaki
                                         ` (2 more replies)
  1 sibling, 3 replies; 41+ messages in thread
From: Johannes Berg @ 2021-03-03 22:40 UTC (permalink / raw)
  To: rrs, Anton Ivanov, Christopher Obbard, linux-um; +Cc: 983379

I think the problem is here:

> #24 0x000000006080f234 in ipc_init_ids (ids=0x60c60de8 <init_ipc_ns+8>)
> at ipc/util.c:119
> #25 0x0000000060813c6d in sem_init_ns (ns=0x60d895bb <textbuf+91>) at
> ipc/sem.c:254
> #26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
> #27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-
> gnu/libcom_err.so.2

You're in the init of libcom_err.so.2, which is loaded by

> "libnss_nis.so.2"

which is loaded by normal NSS code (getgrnam):

> #40 0x00007f89909bf3a6 in nss_load_library (ni=ni@entry=0x61497db0) at
> nsswitch.c:359
> #41 0x00007f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
> fct_name=<optimized out>, fct_name@entry=0x7f899089b020 "setgrent") at
> nsswitch.c:467
> #42 0x00007f899089554b in init_nss_interface () at nss_compat/compat-
> grp.c:83
> #43 init_nss_interface () at nss_compat/compat-grp.c:79
> #44 0x00007f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
> "tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
> errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
> #45 0x00007f8990968b85 in __getgrnam_r (name=name@entry=0x7f8990a2a1e0
> "tty", resbuf=resbuf@entry=0x7ffe3e7a2910,
> buffer=buffer@entry=0x7ffe3e7a24e0 "", buflen=1024,
> result=result@entry=0x7ffe3e7a2908)
>     at ../nss/getXXbyYY_r.c:315


You have a strange nsswitch configuration that causes all of this
(libnss_nis.so.2 -> libcom_err.so.2) to get loaded.

Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada
... Linux's sem_init() instead of libpthread's.

And then the crash.

Now, I don't know how to fix it (short of changing your nsswitch
configuration) - maybe we could somehow rename sem_init()? Or maybe we
can somehow give the kernel binary a lower symbol resolution than the
libc/libpthread.


johannes


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-03 22:40                     ` Johannes Berg
@ 2021-03-04  5:38                       ` Hajime Tazaki
  2021-03-04  7:45                         ` Anton Ivanov
                                           ` (2 more replies)
  2021-03-04  7:28                       ` Anton Ivanov
  2021-03-05 19:54                       ` Johannes Berg
  2 siblings, 3 replies; 41+ messages in thread
From: Hajime Tazaki @ 2021-03-04  5:38 UTC (permalink / raw)
  To: johannes; +Cc: rrs, anton.ivanov, chris.obbard, linux-um, 983379


On Thu, 04 Mar 2021 07:40:00 +0900,
Johannes Berg wrote:
> 
> I think the problem is here:
> 
> > #24 0x000000006080f234 in ipc_init_ids (ids=0x60c60de8 <init_ipc_ns+8>)
> > at ipc/util.c:119
> > #25 0x0000000060813c6d in sem_init_ns (ns=0x60d895bb <textbuf+91>) at
> > ipc/sem.c:254
> > #26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
> > #27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-
> > gnu/libcom_err.so.2
> 
> You're in the init of libcom_err.so.2, which is loaded by
> 
> > "libnss_nis.so.2"
> 
> which is loaded by normal NSS code (getgrnam):
> 
> > #40 0x00007f89909bf3a6 in nss_load_library (ni=ni@entry=0x61497db0) at
> > nsswitch.c:359
> > #41 0x00007f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
> > fct_name=<optimized out>, fct_name@entry=0x7f899089b020 "setgrent") at
> > nsswitch.c:467
> > #42 0x00007f899089554b in init_nss_interface () at nss_compat/compat-
> > grp.c:83
> > #43 init_nss_interface () at nss_compat/compat-grp.c:79
> > #44 0x00007f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
> > "tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
> > errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
> > #45 0x00007f8990968b85 in __getgrnam_r (name=name@entry=0x7f8990a2a1e0
> > "tty", resbuf=resbuf@entry=0x7ffe3e7a2910,
> > buffer=buffer@entry=0x7ffe3e7a24e0 "", buflen=1024,
> > result=result@entry=0x7ffe3e7a2908)
> >     at ../nss/getXXbyYY_r.c:315
> 
> 
> You have a strange nsswitch configuration that causes all of this
> (libnss_nis.so.2 -> libcom_err.so.2) to get loaded.
> 
> Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada
> ... Linux's sem_init() instead of libpthread's.
> 
> And then the crash.
> 
> Now, I don't know how to fix it (short of changing your nsswitch
> configuration) - maybe we could somehow rename sem_init()? Or maybe we
> can somehow give the kernel binary a lower symbol resolution than the
> libc/libpthread.

objcopy (from binutils) can localize symbols (i.e., objcopy -L
sem_init $orig_file $new_file).  It also does renaming symbols.  But
not sure this is the ideal solution.

How does UML handle symbol conflicts between userspace code and Linux
kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
Linux kernel (genlmsg_put) and others can possibly do as well.


-- Hajime

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-03 22:40                     ` Johannes Berg
  2021-03-04  5:38                       ` Hajime Tazaki
@ 2021-03-04  7:28                       ` Anton Ivanov
  2021-03-04  7:43                         ` Johannes Berg
  2021-03-05 19:54                       ` Johannes Berg
  2 siblings, 1 reply; 41+ messages in thread
From: Anton Ivanov @ 2021-03-04  7:28 UTC (permalink / raw)
  To: Johannes Berg, rrs, Christopher Obbard, linux-um; +Cc: 983379

On 03/03/2021 22:40, Johannes Berg wrote:
> I think the problem is here:
> 
>> #24 0x000000006080f234 in ipc_init_ids (ids=0x60c60de8 <init_ipc_ns+8>)
>> at ipc/util.c:119
>> #25 0x0000000060813c6d in sem_init_ns (ns=0x60d895bb <textbuf+91>) at
>> ipc/sem.c:254
>> #26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
>> #27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-
>> gnu/libcom_err.so.2
> 
> You're in the init of libcom_err.so.2, which is loaded by
> 
>> "libnss_nis.so.2"
> 
> which is loaded by normal NSS code (getgrnam):
> 
>> #40 0x00007f89909bf3a6 in nss_load_library (ni=ni@entry=0x61497db0) at
>> nsswitch.c:359
>> #41 0x00007f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
>> fct_name=<optimized out>, fct_name@entry=0x7f899089b020 "setgrent") at
>> nsswitch.c:467
>> #42 0x00007f899089554b in init_nss_interface () at nss_compat/compat-
>> grp.c:83
>> #43 init_nss_interface () at nss_compat/compat-grp.c:79
>> #44 0x00007f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
>> "tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
>> errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
>> #45 0x00007f8990968b85 in __getgrnam_r (name=name@entry=0x7f8990a2a1e0
>> "tty", resbuf=resbuf@entry=0x7ffe3e7a2910,
>> buffer=buffer@entry=0x7ffe3e7a24e0 "", buflen=1024,
>> result=result@entry=0x7ffe3e7a2908)
>>      at ../nss/getXXbyYY_r.c:315
> 
> 
> You have a strange nsswitch configuration that causes all of this
> (libnss_nis.so.2 -> libcom_err.so.2) to get loaded.
> 
> Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada
> ... Linux's sem_init() instead of libpthread's.
> 
> And then the crash.
> 
> Now, I don't know how to fix it (short of changing your nsswitch
> configuration) - maybe we could somehow rename sem_init()? Or maybe we
> can somehow give the kernel binary a lower symbol resolution than the
> libc/libpthread.

I have not looked in depth in how the linking process works, but it 
should have picked up the sem_init from the kernel library, not libc.

We are already supposed to do that regarding kernel vs libc string.h 
functions - memcpy, etc.

Though for all of them the libc does the same so invoking the wrong one 
does not kill you so this may have been broken for a while and we were 
simply not noticing it.

> 
> 
> johannes
> 
> 


-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-04  7:28                       ` Anton Ivanov
@ 2021-03-04  7:43                         ` Johannes Berg
  0 siblings, 0 replies; 41+ messages in thread
From: Johannes Berg @ 2021-03-04  7:43 UTC (permalink / raw)
  To: Anton Ivanov, rrs, Christopher Obbard, linux-um; +Cc: 983379

On Thu, 2021-03-04 at 07:28 +0000, Anton Ivanov wrote:
> 
> > Now, I don't know how to fix it (short of changing your nsswitch
> > configuration) - maybe we could somehow rename sem_init()? Or maybe we
> > can somehow give the kernel binary a lower symbol resolution than the
> > libc/libpthread.
> 
> I have not looked in depth in how the linking process works, but it 
> should have picked up the sem_init from the kernel library, not libc.

Well, no, other way around? libnss/libcom_err should have gotten (should
get) the one from libpthread, not the one from the kernel.

> We are already supposed to do that regarding kernel vs libc string.h 
> functions - memcpy, etc.
> 
> Though for all of them the libc does the same so invoking the wrong one 
> does not kill you so this may have been broken for a while and we were 
> simply not noticing it.

Indeed.

johannes


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-04  5:38                       ` Hajime Tazaki
@ 2021-03-04  7:45                         ` Anton Ivanov
  2021-03-04  7:47                         ` Johannes Berg
  2021-03-05 20:22                         ` Johannes Berg
  2 siblings, 0 replies; 41+ messages in thread
From: Anton Ivanov @ 2021-03-04  7:45 UTC (permalink / raw)
  To: Hajime Tazaki, johannes; +Cc: rrs, chris.obbard, linux-um, 983379

On 04/03/2021 05:38, Hajime Tazaki wrote:
> 
> On Thu, 04 Mar 2021 07:40:00 +0900,
> Johannes Berg wrote:
>>
>> I think the problem is here:
>>
>>> #24 0x000000006080f234 in ipc_init_ids (ids=0x60c60de8 <init_ipc_ns+8>)
>>> at ipc/util.c:119
>>> #25 0x0000000060813c6d in sem_init_ns (ns=0x60d895bb <textbuf+91>) at
>>> ipc/sem.c:254
>>> #26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
>>> #27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-
>>> gnu/libcom_err.so.2
>>
>> You're in the init of libcom_err.so.2, which is loaded by
>>
>>> "libnss_nis.so.2"
>>
>> which is loaded by normal NSS code (getgrnam):
>>
>>> #40 0x00007f89909bf3a6 in nss_load_library (ni=ni@entry=0x61497db0) at
>>> nsswitch.c:359
>>> #41 0x00007f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
>>> fct_name=<optimized out>, fct_name@entry=0x7f899089b020 "setgrent") at
>>> nsswitch.c:467
>>> #42 0x00007f899089554b in init_nss_interface () at nss_compat/compat-
>>> grp.c:83
>>> #43 init_nss_interface () at nss_compat/compat-grp.c:79
>>> #44 0x00007f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
>>> "tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
>>> errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
>>> #45 0x00007f8990968b85 in __getgrnam_r (name=name@entry=0x7f8990a2a1e0
>>> "tty", resbuf=resbuf@entry=0x7ffe3e7a2910,
>>> buffer=buffer@entry=0x7ffe3e7a24e0 "", buflen=1024,
>>> result=result@entry=0x7ffe3e7a2908)
>>>      at ../nss/getXXbyYY_r.c:315
>>
>>
>> You have a strange nsswitch configuration that causes all of this
>> (libnss_nis.so.2 -> libcom_err.so.2) to get loaded.
>>
>> Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada
>> ... Linux's sem_init() instead of libpthread's.
>>
>> And then the crash.
>>
>> Now, I don't know how to fix it (short of changing your nsswitch
>> configuration) - maybe we could somehow rename sem_init()? Or maybe we
>> can somehow give the kernel binary a lower symbol resolution than the
>> libc/libpthread.
> 
> objcopy (from binutils) can localize symbols (i.e., objcopy -L
> sem_init $orig_file $new_file).  It also does renaming symbols.  But
> not sure this is the ideal solution.
> 
> How does UML handle symbol conflicts between userspace code and Linux
> kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
> Linux kernel (genlmsg_put) and others can possibly do as well.

It used to handle them. I do not think it does now - something broke and 
it's fairly recent.

I actually have something which confirms this.

I worked on a patch around 5.8-5.9 which would give the option to pick 
up libc equivalents for the functions from string.h and there was a 
clear performance difference of ~ 20%+ This is because UML has no means 
of optimizing them and picks up the worst case scenario x86 version.

I parked that for a while, because had to look at other stuff at work.

I restarted working on it after 5.10. My first observation was that 
despite not changing anything in the patches, the gain was no longer 
there. The performance was the same as if it picked up libc equivalents.

I can either try to reproduce the nss config which causes the sem_init 
issue or use my own libc patchset to try to dissect. The problem commit 
will be roughly around the time the performance difference from applying 
the "switch to libc" goes away.

Brgds,

A.
> 
> 
> -- Hajime
> 
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
> 


-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-04  5:38                       ` Hajime Tazaki
  2021-03-04  7:45                         ` Anton Ivanov
@ 2021-03-04  7:47                         ` Johannes Berg
  2021-03-04  8:05                           ` Benjamin Berg
  2021-03-05 17:39                           ` Anton Ivanov
  2021-03-05 20:22                         ` Johannes Berg
  2 siblings, 2 replies; 41+ messages in thread
From: Johannes Berg @ 2021-03-04  7:47 UTC (permalink / raw)
  To: Hajime Tazaki; +Cc: rrs, anton.ivanov, chris.obbard, linux-um, 983379

On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:

> > Now, I don't know how to fix it (short of changing your nsswitch
> > configuration) - maybe we could somehow rename sem_init()? Or maybe we
> > can somehow give the kernel binary a lower symbol resolution than the
> > libc/libpthread.
> 
> objcopy (from binutils) can localize symbols (i.e., objcopy -L
> sem_init $orig_file $new_file).  It also does renaming symbols.  But
> not sure this is the ideal solution.

Yes, we started thinking about it but it was too late at night when I
replied ...

I think there's basically a way to have an external list of symbols to
export, for symbol versioning, that we could/should use to basically not
export any of the kernel symbols out to libs.

> How does UML handle symbol conflicts between userspace code and Linux
> kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
> Linux kernel (genlmsg_put) and others can possibly do as well.

I fear it doesn't?

johannes


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-04  7:47                         ` Johannes Berg
@ 2021-03-04  8:05                           ` Benjamin Berg
  2021-03-04 18:41                             ` Anton Ivanov
  2021-03-05 17:39                           ` Anton Ivanov
  1 sibling, 1 reply; 41+ messages in thread
From: Benjamin Berg @ 2021-03-04  8:05 UTC (permalink / raw)
  To: Johannes Berg, Hajime Tazaki
  Cc: rrs, anton.ivanov, chris.obbard, linux-um, 983379


[-- Attachment #1.1: Type: text/plain, Size: 1416 bytes --]

On Thu, 2021-03-04 at 08:47 +0100, Johannes Berg wrote:
On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:

> > Now, I don't know how to fix it (short of changing your nsswitch
> > configuration) - maybe we could somehow rename sem_init()? Or maybe
> > we
> > can somehow give the kernel binary a lower symbol resolution than
> > the
> > libc/libpthread.
> 
> objcopy (from binutils) can localize symbols (i.e., objcopy -L
> sem_init $orig_file $new_file).  It also does renaming symbols.  But
> not sure this is the ideal solution.

Yes, we started thinking about it but it was too late at night when I
replied ...

I think there's basically a way to have an external list of symbols to
export, for symbol versioning, that we could/should use to basically
not
export any of the kernel symbols out to libs.

Maybe using the ld --version-script= option here works to mark all
kernel symbols as being "local" and prevent them from being picked up
by libraries.

Benjamin

> How does UML handle symbol conflicts between userspace code and Linux
> kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
> Linux kernel (genlmsg_put) and others can possibly do as well.

I fear it doesn't?

johannes


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um



[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 152 bytes --]

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-04  8:05                           ` Benjamin Berg
@ 2021-03-04 18:41                             ` Anton Ivanov
  2021-03-05  9:59                               ` Anton Ivanov
  0 siblings, 1 reply; 41+ messages in thread
From: Anton Ivanov @ 2021-03-04 18:41 UTC (permalink / raw)
  To: Benjamin Berg, Johannes Berg, Hajime Tazaki
  Cc: rrs, chris.obbard, linux-um, 983379



On 04/03/2021 08:05, Benjamin Berg wrote:
> On Thu, 2021-03-04 at 08:47 +0100, Johannes Berg wrote:
> On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:
> 
>>> Now, I don't know how to fix it (short of changing your nsswitch
>>> configuration) - maybe we could somehow rename sem_init()? Or maybe
>>> we
>>> can somehow give the kernel binary a lower symbol resolution than
>>> the
>>> libc/libpthread.
>>
>> objcopy (from binutils) can localize symbols (i.e., objcopy -L
>> sem_init $orig_file $new_file).  It also does renaming symbols.  But
>> not sure this is the ideal solution.
> 
> Yes, we started thinking about it but it was too late at night when I
> replied ...
> 
> I think there's basically a way to have an external list of symbols to
> export, for symbol versioning, that we could/should use to basically
> not
> export any of the kernel symbols out to libs.
> 
> Maybe using the ld --version-script= option here works to mark all
> kernel symbols as being "local" and prevent them from being picked up
> by libraries.
> 
> Benjamin
> 
>> How does UML handle symbol conflicts between userspace code and Linux
>> kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
>> Linux kernel (genlmsg_put) and others can possibly do as well.
> 
> I fear it doesn't?

I can confirm that it did and this bug is bisect-able.

with 5.7

# dd if=/dev/ubda of=/dev/null bs=1M
16384+1 records in
16384+1 records out
17179869696 bytes (17 GB, 16 GiB) copied, 10.6973 s, 1.6 GB/s

with 5.10 the speed is 2.2
5.7 with "strings from glibc" patch speed is 2.2

As we did not do anything else in this timeframe to jack up the speed from 1.6GB/s to 2.2GB/s and as it is identical to the speed you get with the "use glibc strings.h" this looks like a good criteria to bisect on.

I am going to do a bisect with 5.7 "good" and 5.10 "bad" using the speed test as a working hypothesis.

A.


> 
> johannes
> 
> 
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
> 
> 
> 
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
> 

-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-04 18:41                             ` Anton Ivanov
@ 2021-03-05  9:59                               ` Anton Ivanov
  2021-03-05 10:07                                 ` Johannes Berg
  0 siblings, 1 reply; 41+ messages in thread
From: Anton Ivanov @ 2021-03-05  9:59 UTC (permalink / raw)
  To: Benjamin Berg, Johannes Berg, Hajime Tazaki
  Cc: rrs, chris.obbard, linux-um, 983379


On 04/03/2021 18:41, Anton Ivanov wrote:
>
>
> On 04/03/2021 08:05, Benjamin Berg wrote:
>> On Thu, 2021-03-04 at 08:47 +0100, Johannes Berg wrote:
>> On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:
>>
>>>> Now, I don't know how to fix it (short of changing your nsswitch
>>>> configuration) - maybe we could somehow rename sem_init()? Or maybe
>>>> we
>>>> can somehow give the kernel binary a lower symbol resolution than
>>>> the
>>>> libc/libpthread.
>>>
>>> objcopy (from binutils) can localize symbols (i.e., objcopy -L
>>> sem_init $orig_file $new_file).  It also does renaming symbols.  But
>>> not sure this is the ideal solution.
>>
>> Yes, we started thinking about it but it was too late at night when I
>> replied ...
>>
>> I think there's basically a way to have an external list of symbols to
>> export, for symbol versioning, that we could/should use to basically
>> not
>> export any of the kernel symbols out to libs.
>>
>> Maybe using the ld --version-script= option here works to mark all
>> kernel symbols as being "local" and prevent them from being picked up
>> by libraries.
>>
>> Benjamin
>>
>>> How does UML handle symbol conflicts between userspace code and Linux
>>> kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
>>> Linux kernel (genlmsg_put) and others can possibly do as well.
>>
>> I fear it doesn't?
>
> I can confirm that it did and this bug is bisect-able.
>
> with 5.7
>
> # dd if=/dev/ubda of=/dev/null bs=1M
> 16384+1 records in
> 16384+1 records out
> 17179869696 bytes (17 GB, 16 GiB) copied, 10.6973 s, 1.6 GB/s
>
> with 5.10 the speed is 2.2
> 5.7 with "strings from glibc" patch speed is 2.2
>
> As we did not do anything else in this timeframe to jack up the speed from 1.6GB/s to 2.2GB/s and as it is identical to the speed you get with the "use glibc strings.h" this looks like a good criteria to bisect on.
>
> I am going to do a bisect with 5.7 "good" and 5.10 "bad" using the speed test as a working hypothesis.

This is proving very "interesting" to try to chase down, because the "picking the wrong library" does not happen every time.

F.E. yesterday my 5.10 builds were picking glibc memcpy and friends. Today with the same config and everything else the same it is picking built-ins.

I need to finds some better way to reproduce this.

A.


>
> A.
>
>
>>
>> johannes
>>
>>
>> _______________________________________________
>> linux-um mailing list
>> linux-um@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-um
>>
>>
>>
>> _______________________________________________
>> linux-um mailing list
>> linux-um@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-um
>>
>
-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-05  9:59                               ` Anton Ivanov
@ 2021-03-05 10:07                                 ` Johannes Berg
  0 siblings, 0 replies; 41+ messages in thread
From: Johannes Berg @ 2021-03-05 10:07 UTC (permalink / raw)
  To: Anton Ivanov, Benjamin Berg, Hajime Tazaki
  Cc: rrs, chris.obbard, linux-um, 983379

On Fri, 2021-03-05 at 09:59 +0000, Anton Ivanov wrote:
> 
> This is proving very "interesting" to try to chase down, because the
> "picking the wrong library" does not happen every time.
> 
> F.E. yesterday my 5.10 builds were picking glibc memcpy and friends.
> Today with the same config and everything else the same it is picking
> built-ins.

Ouch.

> I need to finds some better way to reproduce this.

Maybe something like the original report? That caused sem_init() to be
called, so we know libc will/may call something there.

You and me probably don't have the nss setup to cause sem_init() to get
called, but maybe simply putting

void init_nss_interface(void)
{
  panic("how did we get here");
}

somewhere in the kernel image might already reproduce it?

johannes


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-04  7:47                         ` Johannes Berg
  2021-03-04  8:05                           ` Benjamin Berg
@ 2021-03-05 17:39                           ` Anton Ivanov
  2021-03-05 18:32                             ` Johannes Berg
  2021-03-05 20:07                             ` Johannes Berg
  1 sibling, 2 replies; 41+ messages in thread
From: Anton Ivanov @ 2021-03-05 17:39 UTC (permalink / raw)
  To: Johannes Berg, Hajime Tazaki; +Cc: rrs, chris.obbard, linux-um, 983379



On 04/03/2021 07:47, Johannes Berg wrote:
> On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:
> 
>>> Now, I don't know how to fix it (short of changing your nsswitch
>>> configuration) - maybe we could somehow rename sem_init()? Or maybe we
>>> can somehow give the kernel binary a lower symbol resolution than the
>>> libc/libpthread.
>>
>> objcopy (from binutils) can localize symbols (i.e., objcopy -L
>> sem_init $orig_file $new_file).  It also does renaming symbols.  But
>> not sure this is the ideal solution.
> 
> Yes, we started thinking about it but it was too late at night when I
> replied ...
> 
> I think there's basically a way to have an external list of symbols to
> export, for symbol versioning, that we could/should use to basically not
> export any of the kernel symbols out to libs.
> 
>> How does UML handle symbol conflicts between userspace code and Linux
>> kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
>> Linux kernel (genlmsg_put) and others can possibly do as well.
> 
> I fear it doesn't?

Let's assume it does not, and try to fix this by de-conflicting the symbol.
For the time being, also, let's aim for a Debian specific patch just to go into their "patches" dir for build so that UML is not dropped out of the release.

This should make all internal uses of sem_init be um_sem_init in the actual object files. I will chase the issue of it picking up glibc memcpy separately.
Upon close inspection it looks like a different issue - it is in the other direction (picking a dynamic symbol instead of the one from the tree). I spent all day chasing it today and I cannot reproduce it. At the same time it was reproducible yesterday without any problems :(

Ritesh, can you give the following a spin - it renames sem_init as um_sem_init for UML only?

diff --git a/ipc/sem.c b/ipc/sem.c
index f6c30a85dadf..5157796daf54 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -263,7 +263,11 @@ void sem_exit_ns(struct ipc_namespace *ns)
  }
  #endif

+#ifdef CONFIG_UML
+void __init um_sem_init(void)
+#else
  void __init sem_init(void)
+#endif
  {
         sem_init_ns(&init_ipc_ns);
         ipc_init_proc_interface("sysvipc/sem",
diff --git a/ipc/util.h b/ipc/util.h
index 5766c61aed0e..b3356efb3c96 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -47,7 +47,12 @@ extern int ipc_min_cycle;
  #define IPCMNI_IDX_MASK                ((1 << IPCMNI_SHIFT) - 1)
  #endif /* CONFIG_SYSVIPC_SYSCTL */

+#ifdef CONFIG_UML
+void um_sem_init(void);
+#define sem_init() um_sem_init()
+#else
  void sem_init(void);
+#endif
  void msg_init(void);
  void shm_init(void);



> 
> johannes
> 
> 

-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-05 17:39                           ` Anton Ivanov
@ 2021-03-05 18:32                             ` Johannes Berg
  2021-03-05 19:03                               ` Anton Ivanov
  2021-03-05 20:07                             ` Johannes Berg
  1 sibling, 1 reply; 41+ messages in thread
From: Johannes Berg @ 2021-03-05 18:32 UTC (permalink / raw)
  To: Anton Ivanov, Hajime Tazaki; +Cc: rrs, chris.obbard, linux-um, 983379



On 5 March 2021 18:39:42 CET, Anton Ivanov <anton.ivanov@kot-begemot.co.uk> wrote:
>
>
>On 04/03/2021 07:47, Johannes Berg wrote:
>> On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:
>> 
>>>> Now, I don't know how to fix it (short of changing your nsswitch
>>>> configuration) - maybe we could somehow rename sem_init()? Or maybe
>we
>>>> can somehow give the kernel binary a lower symbol resolution than
>the
>>>> libc/libpthread.
>>>
>>> objcopy (from binutils) can localize symbols (i.e., objcopy -L
>>> sem_init $orig_file $new_file).  It also does renaming symbols.  But
>>> not sure this is the ideal solution.
>> 
>> Yes, we started thinking about it but it was too late at night when I
>> replied ...
>> 
>> I think there's basically a way to have an external list of symbols
>to
>> export, for symbol versioning, that we could/should use to basically
>not
>> export any of the kernel symbols out to libs.
>> 
>>> How does UML handle symbol conflicts between userspace code and
>Linux
>>> kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol
>as
>>> Linux kernel (genlmsg_put) and others can possibly do as well.
>> 
>> I fear it doesn't?
>
>Let's assume it does not, and try to fix this by de-conflicting the
>symbol.
>For the time being, also, let's aim for a Debian specific patch just to
>go into their "patches" dir for build so that UML is not dropped out of
>the release.
>
>This should make all internal uses of sem_init be um_sem_init in the
>actual object files. I will chase the issue of it picking up glibc
>memcpy separately.
>Upon close inspection it looks like a different issue - it is in the
>other direction (picking a dynamic symbol instead of the one from the
>tree). I spent all day chasing it today and I cannot reproduce it. At
>the same time it was reproducible yesterday without any problems :(

>+#ifdef CONFIG_UML
>+void __init um_sem_init(void)
>+#else
>  void __init sem_init(void)
>+#endif

Might be easier to just

#define sem_init um_sem_init

in an appropriate header file, perhaps even in arch/um/? 


johannes
-- 
Sent from my phone. 

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-05 18:32                             ` Johannes Berg
@ 2021-03-05 19:03                               ` Anton Ivanov
  2021-03-05 20:06                                 ` Johannes Berg
  0 siblings, 1 reply; 41+ messages in thread
From: Anton Ivanov @ 2021-03-05 19:03 UTC (permalink / raw)
  To: Johannes Berg, Hajime Tazaki; +Cc: rrs, chris.obbard, linux-um, 983379



On 05/03/2021 18:32, Johannes Berg wrote:
> 
> 
> On 5 March 2021 18:39:42 CET, Anton Ivanov <anton.ivanov@kot-begemot.co.uk> wrote:
>>
>>
>> On 04/03/2021 07:47, Johannes Berg wrote:
>>> On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:
>>>
>>>>> Now, I don't know how to fix it (short of changing your nsswitch
>>>>> configuration) - maybe we could somehow rename sem_init()? Or maybe
>> we
>>>>> can somehow give the kernel binary a lower symbol resolution than
>> the
>>>>> libc/libpthread.
>>>>
>>>> objcopy (from binutils) can localize symbols (i.e., objcopy -L
>>>> sem_init $orig_file $new_file).  It also does renaming symbols.  But
>>>> not sure this is the ideal solution.
>>>
>>> Yes, we started thinking about it but it was too late at night when I
>>> replied ...
>>>
>>> I think there's basically a way to have an external list of symbols
>> to
>>> export, for symbol versioning, that we could/should use to basically
>> not
>>> export any of the kernel symbols out to libs.
>>>
>>>> How does UML handle symbol conflicts between userspace code and
>> Linux
>>>> kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol
>> as
>>>> Linux kernel (genlmsg_put) and others can possibly do as well.
>>>
>>> I fear it doesn't?
>>
>> Let's assume it does not, and try to fix this by de-conflicting the
>> symbol.
>> For the time being, also, let's aim for a Debian specific patch just to
>> go into their "patches" dir for build so that UML is not dropped out of
>> the release.
>>
>> This should make all internal uses of sem_init be um_sem_init in the
>> actual object files. I will chase the issue of it picking up glibc
>> memcpy separately.
>> Upon close inspection it looks like a different issue - it is in the
>> other direction (picking a dynamic symbol instead of the one from the
>> tree). I spent all day chasing it today and I cannot reproduce it. At
>> the same time it was reproducible yesterday without any problems :(
> 
>> +#ifdef CONFIG_UML
>> +void __init um_sem_init(void)
>> +#else
>>   void __init sem_init(void)
>> +#endif
> 
> Might be easier to just
> 
> #define sem_init um_sem_init
> 
> in an appropriate header file, perhaps even in arch/um/?

I thought of that, but surrendered to the "dark side" of the quick and ugly fix.

We can do that for the ipc/sem.c - it brings in uaccess.h which ultimately pulls uaccess from our asm tree. So if we do it there, it will end up in sem.c

However, that function is also referenced and is invoked out of ipc/util.c which does not pull that include.

I am going to dig through the rest of our includes to see if we can find a suitable one which will be picked up by both sem.c and util.c. I hope there is a place which we can use for a "proper" fix.

By the way, I actually remember seeing a couple of includes like that somewhere dealing with other um symbol conflicts, just can't remember where I saw it.

> 
> 
> johannes
> 

-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-03 22:40                     ` Johannes Berg
  2021-03-04  5:38                       ` Hajime Tazaki
  2021-03-04  7:28                       ` Anton Ivanov
@ 2021-03-05 19:54                       ` Johannes Berg
  2 siblings, 0 replies; 41+ messages in thread
From: Johannes Berg @ 2021-03-05 19:54 UTC (permalink / raw)
  To: rrs, Anton Ivanov, Christopher Obbard, linux-um; +Cc: 983379

On Wed, 2021-03-03 at 23:40 +0100, Johannes Berg wrote:

> Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada
> ... Linux's sem_init() instead of libpthread's.
> 
> And then the crash.

FWIW, I can trivially reproduce this by simply force-loading
libcom_err.so:


diff --git a/arch/um/Makefile b/arch/um/Makefile
index 1cea46ff9bb7..a16b411154fb 100644
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
@@ -134,7 +134,7 @@ LINK_WRAPS = -Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,calloc
 LD_FLAGS_CMDLINE = $(foreach opt,$(KBUILD_LDFLAGS),-Wl,$(opt))
 
 # Used by link-vmlinux.sh which has special support for um link
-export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE)
+export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE) -ldl
 
 # When cleaning we don't include .config, so we don't include
 # TT or skas makefiles and don't clean skas_ptregs.h.
diff --git a/arch/um/os-Linux/main.c b/arch/um/os-Linux/main.c
index c8a42ecbd7a2..873dc4c40cb7 100644
--- a/arch/um/os-Linux/main.c
+++ b/arch/um/os-Linux/main.c
@@ -16,6 +16,7 @@
 #include <kern_util.h>
 #include <os.h>
 #include <um_malloc.h>
+#include <dlfcn.h>
 
 #define PGD_BOUND (4 * 1024 * 1024)
 #define STACKSIZE (8 * 1024 * 1024)
@@ -115,6 +116,8 @@ int __init main(int argc, char **argv, char **envp)
 
 	setsid();
 
+dlopen("/usr/lib64/libcom_err.so.2", RTLD_NOW|RTLD_GLOBAL);
+
 	new_argv = malloc((argc + 1) * sizeof(char *));
 	if (new_argv == NULL) {
 		perror("Mallocing argv");


johannes


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-05 19:03                               ` Anton Ivanov
@ 2021-03-05 20:06                                 ` Johannes Berg
  0 siblings, 0 replies; 41+ messages in thread
From: Johannes Berg @ 2021-03-05 20:06 UTC (permalink / raw)
  To: Anton Ivanov, Hajime Tazaki; +Cc: rrs, chris.obbard, linux-um, 983379

On Fri, 2021-03-05 at 19:03 +0000, Anton Ivanov wrote:
> 
> I thought of that, but surrendered to the "dark side" of the quick and ugly fix.

:)

> We can do that for the ipc/sem.c - it brings in uaccess.h which
> ultimately pulls uaccess from our asm tree. So if we do it there, it
> will end up in sem.c

Well, most easily you could do it in ipc/util.h, where it's declared. Or
any place that is pulled in by it, e.g. even asm/errno.h.

All ugly though.

johannes


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-05 17:39                           ` Anton Ivanov
  2021-03-05 18:32                             ` Johannes Berg
@ 2021-03-05 20:07                             ` Johannes Berg
  1 sibling, 0 replies; 41+ messages in thread
From: Johannes Berg @ 2021-03-05 20:07 UTC (permalink / raw)
  To: Anton Ivanov, Hajime Tazaki; +Cc: rrs, chris.obbard, linux-um, 983379


> Ritesh, can you give the following a spin - it renames sem_init as um_sem_init for UML only?

FWIW, this fixes the issue in my reproducer, so should work here too:

diff --git a/ipc/util.h b/ipc/util.h
index 5766c61aed0e..cfed40ba983c 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -14,6 +14,7 @@
 #include <linux/unistd.h>
 #include <linux/err.h>
 #include <linux/ipc_namespace.h>
+#define sem_init uml_sem_init
 
 /*
  * The IPC ID contains 2 separate numbers - index and sequence number.

johannes


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-04  5:38                       ` Hajime Tazaki
  2021-03-04  7:45                         ` Anton Ivanov
  2021-03-04  7:47                         ` Johannes Berg
@ 2021-03-05 20:22                         ` Johannes Berg
  2021-03-05 22:25                           ` Hajime Tazaki
  2021-03-07 12:22                           ` Hajime Tazaki
  2 siblings, 2 replies; 41+ messages in thread
From: Johannes Berg @ 2021-03-05 20:22 UTC (permalink / raw)
  To: Hajime Tazaki; +Cc: rrs, anton.ivanov, chris.obbard, linux-um, 983379

On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:
> 
> objcopy (from binutils) can localize symbols (i.e., objcopy -L
> sem_init $orig_file $new_file).

This doesn't seem to be sufficient.

> It also does renaming symbols.  But
> not sure this is the ideal solution.

Even that doesn't seem to actually work/help? I still get libcom_err
trying to call UML's sem_init, even after doing
 objcopy --redefine-sym sem_init=uml_sem_init


> How does UML handle symbol conflicts between userspace code and Linux
> kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
> Linux kernel (genlmsg_put) and others can possibly do as well.

I think like I said it just doesn't but since you don't have much
userspace code linked with UML it never really mattered?

We only link a 'linux' binary, after all. How does LKL handle this
though? It should be far more affected?


Despite the objcopy *not* fixing it, this does seem to:

diff --git a/arch/um/kernel/dyn.lds.S b/arch/um/kernel/dyn.lds.S
index dacbfabf66d8..2f2a8ce92f1e 100644
--- a/arch/um/kernel/dyn.lds.S
+++ b/arch/um/kernel/dyn.lds.S
@@ -6,6 +6,12 @@ OUTPUT_ARCH(ELF_ARCH)
 ENTRY(_start)
 jiffies = jiffies_64;
 
+VERSION {
+  {
+    local: *;
+  };
+}
+
 SECTIONS
 {
   PROVIDE (__executable_start = START);
diff --git a/arch/um/kernel/uml.lds.S b/arch/um/kernel/uml.lds.S
index 45d957d7004c..7a8e2b123e29 100644
--- a/arch/um/kernel/uml.lds.S
+++ b/arch/um/kernel/uml.lds.S
@@ -7,6 +7,12 @@ OUTPUT_ARCH(ELF_ARCH)
 ENTRY(_start)
 jiffies = jiffies_64;
 
+VERSION {
+  {
+    local: *;
+  };
+}
+
 SECTIONS
 {
   /* This must contain the right address - not quite the default ELF one.*/

johannes


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH] um: mark all kernel symbols as local
  2021-02-23  8:06 linux uml segfault Ritesh Raj Sarraf
  2021-02-23 10:50 ` Anton Ivanov
@ 2021-03-05 20:43 ` Johannes Berg
  2021-03-05 20:54   ` Anton Ivanov
  1 sibling, 1 reply; 41+ messages in thread
From: Johannes Berg @ 2021-03-05 20:43 UTC (permalink / raw)
  To: linux-um
  Cc: Ritesh Raj Sarraf, Anton Ivanov, 983379, Christopher Obbard,
	Johannes Berg

From: Johannes Berg <johannes.berg@intel.com>

Ritesh reported a bug [1] against UML, noting that it crashed on
startup. The backtrace shows the following (heavily redacted):

(gdb) bt
...
 #26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
 #27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-gnu/libcom_err.so.2
 #28 0x00007f8990ab8fb2 in call_init (...) at dl-init.c:72
...
 #40 0x00007f89909bf3a6 in nss_load_library (...) at nsswitch.c:359
...
 #44 0x00007f8990895e35 in _nss_compat_getgrnam_r (...) at nss_compat/compat-grp.c:486
 #45 0x00007f8990968b85 in __getgrnam_r [...]
 #46 0x00007f89909d6b77 in grantpt [...]
 #47 0x00007f8990a9394e in __GI_openpty [...]
 #48 0x00000000604a1f65 in openpty_cb (...) at arch/um/os-Linux/sigio.c:407
 #49 0x00000000604a58d0 in start_idle_thread (...) at arch/um/os-Linux/skas/process.c:598
 #50 0x0000000060004a3d in start_uml () at arch/um/kernel/skas/process.c:45
 #51 0x00000000600047b2 in linux_main (...) at arch/um/kernel/um_arch.c:334
 #52 0x000000006000574f in main (...) at arch/um/os-Linux/main.c:144

indicating that the UML function openpty_cb() calls openpty(),
which internally calls __getgrnam_r(), which causes the nsswitch
machinery to get started.

This loads, through lots of indirection that I snipped, the
libcom_err.so.2 library, which (in an unknown function, "??")
calls sem_init().

Now, of course it wants to get libpthread's sem_init(), since
it's linked against libpthread. However, the dynamic linker
looks up that symbol against the binary first, and gets the
kernel's sem_init().

Hajime Tazaki noted that "objcopy -L" can localize a symbol,
so the dynamic linker wouldn't do the lookup this way. I tried,
but for some reason that didn't seem to work.

Doing the same thing in the linker script instead does seem to
work, though I cannot entirely explain - it *also* works if I
just add "VERSION { { global: *; }; }" instead, indicating that
something else is happening that I don't really understand. It
may be that explicitly doing that marks them with some kind of
empty version, and that's different from the default.

Explicitly marking them with a version breaks kallsyms, so that
doesn't seem to be possible.

Marking all the symbols as local seems correct, and does seem
to address the issue, so do that. Also do it for static link,
nsswitch libraries could still be loaded there.

[1] https://bugs.debian.org/983379

Reported-by: Ritesh Raj Sarraf <rrs@debian.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 arch/um/kernel/dyn.lds.S | 6 ++++++
 arch/um/kernel/uml.lds.S | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/arch/um/kernel/dyn.lds.S b/arch/um/kernel/dyn.lds.S
index dacbfabf66d8..2f2a8ce92f1e 100644
--- a/arch/um/kernel/dyn.lds.S
+++ b/arch/um/kernel/dyn.lds.S
@@ -6,6 +6,12 @@ OUTPUT_ARCH(ELF_ARCH)
 ENTRY(_start)
 jiffies = jiffies_64;
 
+VERSION {
+  {
+    local: *;
+  };
+}
+
 SECTIONS
 {
   PROVIDE (__executable_start = START);
diff --git a/arch/um/kernel/uml.lds.S b/arch/um/kernel/uml.lds.S
index 45d957d7004c..7a8e2b123e29 100644
--- a/arch/um/kernel/uml.lds.S
+++ b/arch/um/kernel/uml.lds.S
@@ -7,6 +7,12 @@ OUTPUT_ARCH(ELF_ARCH)
 ENTRY(_start)
 jiffies = jiffies_64;
 
+VERSION {
+  {
+    local: *;
+  };
+}
+
 SECTIONS
 {
   /* This must contain the right address - not quite the default ELF one.*/
-- 
2.26.2


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] um: mark all kernel symbols as local
  2021-03-05 20:43 ` [PATCH] um: mark all kernel symbols as local Johannes Berg
@ 2021-03-05 20:54   ` Anton Ivanov
  2021-03-06 10:51     ` Ritesh Raj Sarraf
  0 siblings, 1 reply; 41+ messages in thread
From: Anton Ivanov @ 2021-03-05 20:54 UTC (permalink / raw)
  To: Johannes Berg, linux-um
  Cc: Ritesh Raj Sarraf, 983379, Christopher Obbard, Johannes Berg

On 05/03/2021 20:43, Johannes Berg wrote:
> From: Johannes Berg <johannes.berg@intel.com>
> 
> Ritesh reported a bug [1] against UML, noting that it crashed on
> startup. The backtrace shows the following (heavily redacted):
> 
> (gdb) bt
> ...
>   #26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
>   #27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-gnu/libcom_err.so.2
>   #28 0x00007f8990ab8fb2 in call_init (...) at dl-init.c:72
> ...
>   #40 0x00007f89909bf3a6 in nss_load_library (...) at nsswitch.c:359
> ...
>   #44 0x00007f8990895e35 in _nss_compat_getgrnam_r (...) at nss_compat/compat-grp.c:486
>   #45 0x00007f8990968b85 in __getgrnam_r [...]
>   #46 0x00007f89909d6b77 in grantpt [...]
>   #47 0x00007f8990a9394e in __GI_openpty [...]
>   #48 0x00000000604a1f65 in openpty_cb (...) at arch/um/os-Linux/sigio.c:407
>   #49 0x00000000604a58d0 in start_idle_thread (...) at arch/um/os-Linux/skas/process.c:598
>   #50 0x0000000060004a3d in start_uml () at arch/um/kernel/skas/process.c:45
>   #51 0x00000000600047b2 in linux_main (...) at arch/um/kernel/um_arch.c:334
>   #52 0x000000006000574f in main (...) at arch/um/os-Linux/main.c:144
> 
> indicating that the UML function openpty_cb() calls openpty(),
> which internally calls __getgrnam_r(), which causes the nsswitch
> machinery to get started.
> 
> This loads, through lots of indirection that I snipped, the
> libcom_err.so.2 library, which (in an unknown function, "??")
> calls sem_init().
> 
> Now, of course it wants to get libpthread's sem_init(), since
> it's linked against libpthread. However, the dynamic linker
> looks up that symbol against the binary first, and gets the
> kernel's sem_init().
> 
> Hajime Tazaki noted that "objcopy -L" can localize a symbol,
> so the dynamic linker wouldn't do the lookup this way. I tried,
> but for some reason that didn't seem to work.
> 
> Doing the same thing in the linker script instead does seem to
> work, though I cannot entirely explain - it *also* works if I
> just add "VERSION { { global: *; }; }" instead, indicating that
> something else is happening that I don't really understand. It
> may be that explicitly doing that marks them with some kind of
> empty version, and that's different from the default.
> 
> Explicitly marking them with a version breaks kallsyms, so that
> doesn't seem to be possible.
> 
> Marking all the symbols as local seems correct, and does seem
> to address the issue, so do that. Also do it for static link,
> nsswitch libraries could still be loaded there.
> 
> [1] https://bugs.debian.org/983379
> 
> Reported-by: Ritesh Raj Sarraf <rrs@debian.org>
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> ---
>   arch/um/kernel/dyn.lds.S | 6 ++++++
>   arch/um/kernel/uml.lds.S | 6 ++++++
>   2 files changed, 12 insertions(+)
> 
> diff --git a/arch/um/kernel/dyn.lds.S b/arch/um/kernel/dyn.lds.S
> index dacbfabf66d8..2f2a8ce92f1e 100644
> --- a/arch/um/kernel/dyn.lds.S
> +++ b/arch/um/kernel/dyn.lds.S
> @@ -6,6 +6,12 @@ OUTPUT_ARCH(ELF_ARCH)
>   ENTRY(_start)
>   jiffies = jiffies_64;
>   
> +VERSION {
> +  {
> +    local: *;
> +  };
> +}
> +
>   SECTIONS
>   {
>     PROVIDE (__executable_start = START);
> diff --git a/arch/um/kernel/uml.lds.S b/arch/um/kernel/uml.lds.S
> index 45d957d7004c..7a8e2b123e29 100644
> --- a/arch/um/kernel/uml.lds.S
> +++ b/arch/um/kernel/uml.lds.S
> @@ -7,6 +7,12 @@ OUTPUT_ARCH(ELF_ARCH)
>   ENTRY(_start)
>   jiffies = jiffies_64;
>   
> +VERSION {
> +  {
> +    local: *;
> +  };
> +}
> +
>   SECTIONS
>   {
>     /* This must contain the right address - not quite the default ELF one.*/
> 

Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-05 20:22                         ` Johannes Berg
@ 2021-03-05 22:25                           ` Hajime Tazaki
  2021-03-07 12:22                           ` Hajime Tazaki
  1 sibling, 0 replies; 41+ messages in thread
From: Hajime Tazaki @ 2021-03-05 22:25 UTC (permalink / raw)
  To: johannes; +Cc: rrs, anton.ivanov, chris.obbard, linux-um, 983379


might be late, but I'll give it a try with your dlopen reproducer.

-- Hajime

On Sat, 06 Mar 2021 05:22:19 +0900,
Johannes Berg wrote:
> 
> On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:
> > 
> > objcopy (from binutils) can localize symbols (i.e., objcopy -L
> > sem_init $orig_file $new_file).
> 
> This doesn't seem to be sufficient.
> 
> > It also does renaming symbols.  But
> > not sure this is the ideal solution.
> 
> Even that doesn't seem to actually work/help? I still get libcom_err
> trying to call UML's sem_init, even after doing
>  objcopy --redefine-sym sem_init=uml_sem_init

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] um: mark all kernel symbols as local
  2021-03-05 20:54   ` Anton Ivanov
@ 2021-03-06 10:51     ` Ritesh Raj Sarraf
  2021-03-08 10:29       ` Bug#983379: " Ritesh Raj Sarraf
  0 siblings, 1 reply; 41+ messages in thread
From: Ritesh Raj Sarraf @ 2021-03-06 10:51 UTC (permalink / raw)
  To: Anton Ivanov, Johannes Berg, linux-um
  Cc: 983379, Christopher Obbard, Johannes Berg


[-- Attachment #1.1: Type: text/plain, Size: 4334 bytes --]

On Fri, 2021-03-05 at 20:54 +0000, Anton Ivanov wrote:
> On 05/03/2021 20:43, Johannes Berg wrote:
> > From: Johannes Berg <johannes.berg@intel.com>
> > 
> > Ritesh reported a bug [1] against UML, noting that it crashed on
> > startup. The backtrace shows the following (heavily redacted):
> > 
> > (gdb) bt
> > ...
> >   #26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
> >   #27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-
> > gnu/libcom_err.so.2
> >   #28 0x00007f8990ab8fb2 in call_init (...) at dl-init.c:72
> > ...
> >   #40 0x00007f89909bf3a6 in nss_load_library (...) at
> > nsswitch.c:359
> > ...
> >   #44 0x00007f8990895e35 in _nss_compat_getgrnam_r (...) at
> > nss_compat/compat-grp.c:486
> >   #45 0x00007f8990968b85 in __getgrnam_r [...]
> >   #46 0x00007f89909d6b77 in grantpt [...]
> >   #47 0x00007f8990a9394e in __GI_openpty [...]
> >   #48 0x00000000604a1f65 in openpty_cb (...) at arch/um/os-
> > Linux/sigio.c:407
> >   #49 0x00000000604a58d0 in start_idle_thread (...) at arch/um/os-
> > Linux/skas/process.c:598
> >   #50 0x0000000060004a3d in start_uml () at
> > arch/um/kernel/skas/process.c:45
> >   #51 0x00000000600047b2 in linux_main (...) at
> > arch/um/kernel/um_arch.c:334
> >   #52 0x000000006000574f in main (...) at arch/um/os-
> > Linux/main.c:144
> > 
> > indicating that the UML function openpty_cb() calls openpty(),
> > which internally calls __getgrnam_r(), which causes the nsswitch
> > machinery to get started.
> > 
> > This loads, through lots of indirection that I snipped, the
> > libcom_err.so.2 library, which (in an unknown function, "??")
> > calls sem_init().
> > 
> > Now, of course it wants to get libpthread's sem_init(), since
> > it's linked against libpthread. However, the dynamic linker
> > looks up that symbol against the binary first, and gets the
> > kernel's sem_init().
> > 
> > Hajime Tazaki noted that "objcopy -L" can localize a symbol,
> > so the dynamic linker wouldn't do the lookup this way. I tried,
> > but for some reason that didn't seem to work.
> > 
> > Doing the same thing in the linker script instead does seem to
> > work, though I cannot entirely explain - it *also* works if I
> > just add "VERSION { { global: *; }; }" instead, indicating that
> > something else is happening that I don't really understand. It
> > may be that explicitly doing that marks them with some kind of
> > empty version, and that's different from the default.
> > 
> > Explicitly marking them with a version breaks kallsyms, so that
> > doesn't seem to be possible.
> > 
> > Marking all the symbols as local seems correct, and does seem
> > to address the issue, so do that. Also do it for static link,
> > nsswitch libraries could still be loaded there.
> > 
> > [1] https://bugs.debian.org/983379
> > 
> > Reported-by: Ritesh Raj Sarraf <rrs@debian.org>
> > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> > ---
> >   arch/um/kernel/dyn.lds.S | 6 ++++++
> >   arch/um/kernel/uml.lds.S | 6 ++++++
> >   2 files changed, 12 insertions(+)
> > 
> > diff --git a/arch/um/kernel/dyn.lds.S b/arch/um/kernel/dyn.lds.S
> > index dacbfabf66d8..2f2a8ce92f1e 100644
> > --- a/arch/um/kernel/dyn.lds.S
> > +++ b/arch/um/kernel/dyn.lds.S
> > @@ -6,6 +6,12 @@ OUTPUT_ARCH(ELF_ARCH)
> >   ENTRY(_start)
> >   jiffies = jiffies_64;
> >   
> > +VERSION {
> > +  {
> > +    local: *;
> > +  };
> > +}
> > +
> >   SECTIONS
> >   {
> >     PROVIDE (__executable_start = START);
> > diff --git a/arch/um/kernel/uml.lds.S b/arch/um/kernel/uml.lds.S
> > index 45d957d7004c..7a8e2b123e29 100644
> > --- a/arch/um/kernel/uml.lds.S
> > +++ b/arch/um/kernel/uml.lds.S
> > @@ -7,6 +7,12 @@ OUTPUT_ARCH(ELF_ARCH)
> >   ENTRY(_start)
> >   jiffies = jiffies_64;
> >   
> > +VERSION {
> > +  {
> > +    local: *;
> > +  };
> > +}
> > +
> >   SECTIONS
> >   {
> >     /* This must contain the right address - not quite the default
> > ELF one.*/
> > 

Tested on all 3 machines where the issue was seen before.


> 
> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>

Tested-By: Ritesh Raj Sarraf <rrs@debian.org>

-- 
Ritesh Raj Sarraf | http://people.debian.org/~rrs
Debian - The Universal Operating System

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 152 bytes --]

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-05 20:22                         ` Johannes Berg
  2021-03-05 22:25                           ` Hajime Tazaki
@ 2021-03-07 12:22                           ` Hajime Tazaki
  2021-03-07 12:56                             ` Johannes Berg
  1 sibling, 1 reply; 41+ messages in thread
From: Hajime Tazaki @ 2021-03-07 12:22 UTC (permalink / raw)
  To: johannes; +Cc: rrs, anton.ivanov, chris.obbard, linux-um, 983379


Sorry that this email is going to be long.  In summary, what Johannes
said is right: what objcopy does is not sufficient, and with ld it
transforms as we expected.

More goes to below.

On Sat, 06 Mar 2021 05:22:19 +0900,
Johannes Berg wrote:
> 
> On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:
> > 
> > objcopy (from binutils) can localize symbols (i.e., objcopy -L
> > sem_init $orig_file $new_file).
> 
> This doesn't seem to be sufficient.
> 
> > It also does renaming symbols.  But
> > not sure this is the ideal solution.
> 
> Even that doesn't seem to actually work/help? I still get libcom_err
> trying to call UML's sem_init, even after doing
>  objcopy --redefine-sym sem_init=uml_sem_init
> 
> 
> > How does UML handle symbol conflicts between userspace code and Linux
> > kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
> > Linux kernel (genlmsg_put) and others can possibly do as well.
> 
> I think like I said it just doesn't but since you don't have much
> userspace code linked with UML it never really mattered?
> 
> We only link a 'linux' binary, after all. How does LKL handle this
> though? It should be far more affected?
> 
> 
> Despite the objcopy *not* fixing it, this does seem to:

with slightly old version:
 - objcopy/ld version 2.29.1-23.fc28

I confirmed that objcopy (both --redefine-sym and --localize-symbol)
only changes symbols of .symtab table.  But there is another table,
.dynsym table, which is used to resolve.
So, the original file looks like this:


1) before objcopy (vmlinux)
% readelf -s obj-x86-um/vmlinux |grep -E "sem_init|Symbol table|Num:"
Symbol table '.dynsym' contains 179 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
   129: 0000000060011d38    72 FUNC    GLOBAL DEFAULT    2 sem_init
Symbol table '.symtab' contains 38474 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
 28515: 0000000060011d38    72 FUNC    GLOBAL DEFAULT    2 sem_init
 37798: 00000000601e30d5    62 FUNC    GLOBAL DEFAULT   13 sem_init_ns
 
the result object looks like

2) after objcopy (linux)
% readelf -s obj-x86-um/linux |grep -E "sem_init|Symbol table|Num:"
Symbol table '.dynsym' contains 179 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
   129: 0000000060011d38    72 FUNC    GLOBAL DEFAULT    2 sem_init
Symbol table '.symtab' contains 38474 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
 28455: 0000000060011d38    72 FUNC    LOCAL  DEFAULT    2 sem_init
 37798: 00000000601e30d5    62 FUNC    GLOBAL DEFAULT   13 sem_init_ns

Only .symtab symbol table is changed to local while .dynsym table is
not changed.  So, sem_init call from libcom_err.so still can resolve
the Linux symbol.


On the other hand, ld --version script solution does as we wish.

3) localized with ld
% readelf -s obj-x86-um/linux G -E "sem_init|Symbol table|Num:" 
Symbol table '.dynsym' contains 142 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
Symbol table '.symtab' contains 38474 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
 28512: 0000000060011d38    72 FUNC    LOCAL  DEFAULT    2 sem_init
 37669: 00000000601e2b45    62 FUNC    LOCAL  DEFAULT   13 sem_init_ns

Only .symtab table is generated for the sem_init symbol and it's localized.


Because the way to build is different from what UML currently does,
LKL (and UML binaries) do not have this issue, with a quick check.

LKL applies objcopy before generating intermediate file (linux.o), and
the symbols of the final binary (linux) are localized and have no
.dynsym entries, thus no issue in this case.

refs:
https://stackoverflow.com/questions/54332797/binding-failure-with-objcopy-redefine-syms
https://sourceware.org/legacy-ml/binutils/2019-01/msg00254.html


-- Hajime

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: linux uml segfault
  2021-03-07 12:22                           ` Hajime Tazaki
@ 2021-03-07 12:56                             ` Johannes Berg
  0 siblings, 0 replies; 41+ messages in thread
From: Johannes Berg @ 2021-03-07 12:56 UTC (permalink / raw)
  To: Hajime Tazaki; +Cc: rrs, anton.ivanov, chris.obbard, linux-um, 983379

On Sun, 2021-03-07 at 21:22 +0900, Hajime Tazaki wrote:
> Sorry that this email is going to be long.  In summary, what Johannes
> said is right: what objcopy does is not sufficient, and with ld it
> transforms as we expected.
> 
> More goes to below.

[snip]

Interesting, thanks for looking into that!

johannes


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Bug#983379: [PATCH] um: mark all kernel symbols as local
  2021-03-06 10:51     ` Ritesh Raj Sarraf
@ 2021-03-08 10:29       ` Ritesh Raj Sarraf
  2021-03-08 10:33         ` Johannes Berg
  0 siblings, 1 reply; 41+ messages in thread
From: Ritesh Raj Sarraf @ 2021-03-08 10:29 UTC (permalink / raw)
  To: 983379, Anton Ivanov, Johannes Berg, linux-um
  Cc: Christopher Obbard, Johannes Berg


[-- Attachment #1.1: Type: text/plain, Size: 2019 bytes --]

Hi,

Just a follow-up question on this fix.

Is it something that is a candidate for linux-stable ?


Thanks,
Ritesh

On Sat, 2021-03-06 at 16:21 +0530, Ritesh Raj Sarraf wrote:
> > > Marking all the symbols as local seems correct, and does seem
> > > to address the issue, so do that. Also do it for static link,
> > > nsswitch libraries could still be loaded there.
> > > 
> > > [1] https://bugs.debian.org/983379
> > > 
> > > Reported-by: Ritesh Raj Sarraf <rrs@debian.org>
> > > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> > > ---
> > >   arch/um/kernel/dyn.lds.S | 6 ++++++
> > >   arch/um/kernel/uml.lds.S | 6 ++++++
> > >   2 files changed, 12 insertions(+)
> > > 
> > > diff --git a/arch/um/kernel/dyn.lds.S b/arch/um/kernel/dyn.lds.S
> > > index dacbfabf66d8..2f2a8ce92f1e 100644
> > > --- a/arch/um/kernel/dyn.lds.S
> > > +++ b/arch/um/kernel/dyn.lds.S
> > > @@ -6,6 +6,12 @@ OUTPUT_ARCH(ELF_ARCH)
> > >   ENTRY(_start)
> > >   jiffies = jiffies_64;
> > >   
> > > +VERSION {
> > > +  {
> > > +    local: *;
> > > +  };
> > > +}
> > > +
> > >   SECTIONS
> > >   {
> > >     PROVIDE (__executable_start = START);
> > > diff --git a/arch/um/kernel/uml.lds.S b/arch/um/kernel/uml.lds.S
> > > index 45d957d7004c..7a8e2b123e29 100644
> > > --- a/arch/um/kernel/uml.lds.S
> > > +++ b/arch/um/kernel/uml.lds.S
> > > @@ -7,6 +7,12 @@ OUTPUT_ARCH(ELF_ARCH)
> > >   ENTRY(_start)
> > >   jiffies = jiffies_64;
> > >   
> > > +VERSION {
> > > +  {
> > > +    local: *;
> > > +  };
> > > +}
> > > +
> > >   SECTIONS
> > >   {
> > >     /* This must contain the right address - not quite the
> default
> > > ELF one.*/
> > > 
> 
> Tested on all 3 machines where the issue was seen before.
> 
> 
> > 
> > Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> 
> Tested-By: Ritesh Raj Sarraf <rrs@debian.org>

-- 
Ritesh Raj Sarraf | http://people.debian.org/~rrs
Debian - The Universal Operating System

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 152 bytes --]

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Bug#983379: [PATCH] um: mark all kernel symbols as local
  2021-03-08 10:29       ` Bug#983379: " Ritesh Raj Sarraf
@ 2021-03-08 10:33         ` Johannes Berg
  0 siblings, 0 replies; 41+ messages in thread
From: Johannes Berg @ 2021-03-08 10:33 UTC (permalink / raw)
  To: rrs, 983379, Anton Ivanov, linux-um; +Cc: Christopher Obbard

On Mon, 2021-03-08 at 15:59 +0530, Ritesh Raj Sarraf wrote:
> Hi,
> 
> Just a follow-up question on this fix.
> 
> Is it something that is a candidate for linux-stable ?

I guess that makes sense. Once it's in mainline you can also request
that yourself :)

johannes


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2021-03-08 10:33 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-23  8:06 linux uml segfault Ritesh Raj Sarraf
2021-02-23 10:50 ` Anton Ivanov
2021-02-23 12:12   ` Christopher Obbard
2021-02-23 12:24     ` Anton Ivanov
2021-02-23 17:19     ` Anton Ivanov
2021-02-23 17:26       ` Ritesh Raj Sarraf
2021-02-23 18:02         ` Anton Ivanov
2021-02-24 11:44         ` Anton Ivanov
2021-03-02  9:09           ` Ritesh Raj Sarraf
2021-03-02 11:34             ` Anton Ivanov
2021-03-02 14:23               ` Ritesh Raj Sarraf
2021-03-02 17:05                 ` Anton Ivanov
2021-03-02 17:27                   ` Ritesh Raj Sarraf
2021-03-03  9:30                     ` Anton Ivanov
2021-03-03 10:45                       ` Bug#983379: " Ritesh Raj Sarraf
2021-03-03 10:53                         ` Anton Ivanov
2021-03-03 22:40                     ` Johannes Berg
2021-03-04  5:38                       ` Hajime Tazaki
2021-03-04  7:45                         ` Anton Ivanov
2021-03-04  7:47                         ` Johannes Berg
2021-03-04  8:05                           ` Benjamin Berg
2021-03-04 18:41                             ` Anton Ivanov
2021-03-05  9:59                               ` Anton Ivanov
2021-03-05 10:07                                 ` Johannes Berg
2021-03-05 17:39                           ` Anton Ivanov
2021-03-05 18:32                             ` Johannes Berg
2021-03-05 19:03                               ` Anton Ivanov
2021-03-05 20:06                                 ` Johannes Berg
2021-03-05 20:07                             ` Johannes Berg
2021-03-05 20:22                         ` Johannes Berg
2021-03-05 22:25                           ` Hajime Tazaki
2021-03-07 12:22                           ` Hajime Tazaki
2021-03-07 12:56                             ` Johannes Berg
2021-03-04  7:28                       ` Anton Ivanov
2021-03-04  7:43                         ` Johannes Berg
2021-03-05 19:54                       ` Johannes Berg
2021-03-05 20:43 ` [PATCH] um: mark all kernel symbols as local Johannes Berg
2021-03-05 20:54   ` Anton Ivanov
2021-03-06 10:51     ` Ritesh Raj Sarraf
2021-03-08 10:29       ` Bug#983379: " Ritesh Raj Sarraf
2021-03-08 10:33         ` Johannes Berg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.