Linux-rt-users Archive on lore.kernel.org
 help / color / Atom feed
From: John Mathew <john.mathew@unikie.com>
To: linux-rt-users@vger.kernel.org
Subject: complete_all() with x waiters in swake_up_all_locked
Date: Wed, 11 Dec 2019 09:17:49 +0200
Message-ID: <CAJz2qXnHnw7Z9ftb6Oha5_0-R4YfUL+FhVZPNX1KKHyUkP6KNA@mail.gmail.com> (raw)

Hi,

This is a follow up to a previous discussion on this topic:
https://www.spinics.net/lists/linux-rt-users/msg19592.html
https://lkml.org/lkml/2016/3/14/214

We are seeing the waring still appearing in the v5.0.3-rt1 tag of the
rt linux kernel.

The test environment was :QEMU emulator version 2.11.1(Debian
1:2.11+dfsg-1ubuntu7.20)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers


The stack trace is as follows:

553.403374] 001: ------------[ cut here ]------------
[ 553.403826] 001: complete_all() with 4 waiters
[ 553.404205] 001: WARNING: CPU: 1 PID: 10418 at swake_up_all_locked+0x146/0x1e0
[ 553.404806] 001: Modules linked in:
[ 553.405101] 001:
[ 553.405269] 001: CPU: 1 PID: 10418 Comm: cryptomgr_probe Tainted: G
W 5.0.3-rt1-eb-corbos-preempt-rt #5
[ 553.406146] 001: Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 553.406914] 001: RIP: 0010:swake_up_all_locked+0x146/0x1e0
[ 553.407373] 001: Code: 0f 85 47 ff ff ff 83 7d cc 02 7f 0f 48 83 c4
18 5b 41 5c 41 5d 41 5e 41 5f 5d c3 8b 75 cc 48 c7 c7 c0 8b a5 94 e8
1a 39 f1 ff <0f> 0b 48 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 8b
7d d0 e8
[ 553.408934] 001: RSP: 0018:ffff888063c37e40 EFLAGS: 00010082
[ 553.409404] 001:
[ 553.409571] 001: RAX: 0000000000000000 RBX: ffff888062b87760 RCX:
ffffffff92c6deb2
[ 553.410195] 001: RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffffffff950c9128
[ 553.410816] 001: RBP: ffff888063c37e80 R08: fffffbfff2a19227 R09:
fffffbfff2a19227
[ 553.411450] 001: R10: 0000000000000001 R11: fffffbfff2a19226 R12:
dffffc0000000000
[ 553.412084] 001: R13: ffff8880612ec1a8 R14: ffff8880612ec1a0 R15:
ffffed100c25d835
[ 553.412709] 001: FS: 0000000000000000(0000)
GS:ffff88806d500000(0000) knlGS:0000000000000000
[ 553.413413] 001: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 553.413925] 001: CR2: 00007fffab7d4ca0 CR3: 000000000ac12000 CR4:
00000000000006e0
[ 553.414549] 001: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 553.415172] 001: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 553.415798] 001: Call Trace:
[ 553.416043] 001: ? kfree+0xc3/0x210
[ 553.416346] 001: complete_all+0x4c/0x70
[ 553.416676] 001: cryptomgr_probe+0x1af/0x250
[ 553.417041] 001: kthread+0x313/0x3d0
[ 553.417350] 001: ? crypto_alg_put+0x70/0x70
[ 553.417708] 001: ? kthread_park+0x170/0x170
[ 553.418067] 001: ret_from_fork+0x35/0x40
[ 553.418406] 001: ---[ end trace f939e9efc6356ffe ]---

The trigger is from cryptomgr_probe in crypto/algboss.c.

The issue is not seen in plain kernel.

When printing the modules which is waking up and using WARN_ON_ONCE
the output is as follows.

[   18.723634] 000: swake_up_all_locked: 4 wakes done on cpu 0:
cryptomgr_probe (pid: 325)
[   18.724488] 000: WARNING: CPU: 0 PID: 325 at swake_up_all_locked+0x199/0x230
[   18.725212] 000: Modules linked in:
[   18.725501] 000:
[   18.725687] 000: CPU: 0 PID: 325 Comm: cryptomgr_probe Not tainted
5.0.3-rt1-eb-corbos-preempt-rt+ #7
[   18.726436] 000: Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[   18.727197] 000: RIP: 0010:swake_up_all_locked+0x199/0x230
[   18.727652] 000: Code: 0c 25 00 dd 01 00 65 8b 15 54 5a 3c 45 44 8b
83 f0 03 00 00 48 81 c1 98 05 00 00 48 c7 c7 c0 8b a5 bc 8b 75 cc e8
80 16 02 00 <0f> 0b eb 93 48 8b 7d d0 e8 da 55 33 00 e9 71 ff ff ff e8
b0 55 33
[   18.729196] 000: RSP: 0018:ffff888064fb7e40 EFLAGS: 00010082
[   18.729667] 000:
[   18.729833] 000: RAX: 0000000000000046 RBX: ffff8880667ec500 RCX:
ffffffffbac6df02
[   18.730449] 000: RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffffffffbd0c9128
[   18.731063] 000: RBP: ffff888064fb7e80 R08: fffffbfff7a19227 R09:
fffffbfff7a19227
[   18.731677] 000: R10: 0000000000000001 R11: fffffbfff7a19226 R12:
dffffc0000000000
[   18.732291] 000: R13: ffff8880614495b8 R14: ffff8880614495b0 R15:
ffffed100c2892b7
[   18.732906] 000: FS:  0000000000000000(0000)
GS:ffff88806d400000(0000) knlGS:0000000000000000
[   18.733607] 000: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.734111] 000: CR2: 00007ffc43757d9c CR3: 000000006101e000 CR4:
00000000000006f0
[   18.734725] 000: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[   18.735336] 000: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[   18.735948] 000: Call Trace:
[   18.736190] 000:  complete_all+0x4c/0x70
[   18.736516] 000:  cryptomgr_probe+0x1af/0x250
[   18.736877] 000:  kthread+0x313/0x3d0
[   18.737181] 000:  ? crypto_alg_put+0x70/0x70
[   18.737533] 000:  ? kthread_park+0x170/0x170
[   18.737890] 000:  ret_from_fork+0x35/0x40
[   18.738226] 000: ---[ end trace 739b32bdb669a874 ]---
[   18.770538] 000: swake_up_all_locked: 3 wakes done on cpu 0:
cryptomgr_probe (pid: 373)
[   18.782207] 001: swake_up_all_locked: 3 wakes done on cpu 1:
cryptomgr_test (pid: 396)
[   18.786154] 001: swake_up_all_locked: 3 wakes done on cpu 1:
cryptomgr_probe (pid: 395)
[   18.798816] 000: swake_up_all_locked: 8 wakes done on cpu 0:
cryptomgr_probe (pid: 407)
[   18.813058] 001: swake_up_all_locked: 3 wakes done on cpu 1:
cryptomgr_probe (pid: 430)
[   18.823554] 000: swake_up_all_locked: 3 wakes done on cpu 0:
cryptomgr_test (pid: 452)
[   18.836808] 001: swake_up_all_locked: 3 wakes done on cpu 1:
cryptomgr_test (pid: 472)
[   18.840767] 001: swake_up_all_locked: 4 wakes done on cpu 1:
cryptomgr_probe (pid: 469)
[   18.852147] 000: swake_up_all_locked: 6 wakes done on cpu 0:
cryptomgr_test (pid: 491)
[   18.871421] 000: swake_up_all_locked: 8 wakes done on cpu 0:
cryptomgr_probe (pid: 508)
[   18.884491] 000: swake_up_all_locked: 4 wakes done on cpu 0:
cryptomgr_test (pid: 529)
[   18.888355] 001: swake_up_all_locked: 4 wakes done on cpu 1:
cryptomgr_probe (pid: 527)
[   18.903979] 001: swake_up_all_locked: 8 wakes done on cpu 1:
cryptomgr_probe (pid: 542)
[   18.920284] 001: swake_up_all_locked: 8 wakes done on cpu 1:
cryptomgr_probe (pid: 566)
[   18.932919] 000: swake_up_all_locked: 6 wakes done on cpu 0:
cryptomgr_probe (pid: 584)
[   18.954440] 001: swake_up_all_locked: 8 wakes done on cpu 1:
cryptomgr_probe (pid: 602)
[   19.003511] 001: swake_up_all_locked: 3 wakes done on cpu 1:
cryptomgr_probe (pid: 652)
[   19.012186] 001: swake_up_all_locked: 3 wakes done on cpu 1:
cryptomgr_probe (pid: 662)

.......

We have 8 syzcaller reproducers with exactly the same warning.

One reproducer is as follows:

// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <dirent.h>
#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <sched.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mount.h>
#include <sys/prctl.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#include <linux/capability.h>

unsigned long long procid;

static void sleep_ms(uint64_t ms)
{
usleep(ms * 1000);
}

static uint64_t current_time_ms(void)
{
struct timespec ts;
if (clock_gettime(CLOCK_MONOTONIC, &ts))
exit(1);
return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
}

static bool write_file(const char* file, const char* what, ...)
{
char buf[1024];
va_list args;
va_start(args, what);
vsnprintf(buf, sizeof(buf), what, args);
va_end(args);
buf[sizeof(buf) - 1] = 0;
int len = strlen(buf);
int fd = open(file, O_WRONLY | O_CLOEXEC);
if (fd == -1)
return false;
if (write(fd, buf, len) != len) {
int err = errno;
close(fd);
errno = err;
return false;
}
close(fd);
return true;
}

static void setup_common()
{
if (mount(0, "/sys/fs/fuse/connections", "fusectl", 0, 0)) {
}
}

static void loop();

static void sandbox_common()
{
prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
setpgrp();
setsid();
struct rlimit rlim;
rlim.rlim_cur = rlim.rlim_max = (200 << 20);
setrlimit(RLIMIT_AS, &rlim);
rlim.rlim_cur = rlim.rlim_max = 32 << 20;
setrlimit(RLIMIT_MEMLOCK, &rlim);
rlim.rlim_cur = rlim.rlim_max = 136 << 20;
setrlimit(RLIMIT_FSIZE, &rlim);
rlim.rlim_cur = rlim.rlim_max = 1 << 20;
setrlimit(RLIMIT_STACK, &rlim);
rlim.rlim_cur = rlim.rlim_max = 0;
setrlimit(RLIMIT_CORE, &rlim);
rlim.rlim_cur = rlim.rlim_max = 256;
setrlimit(RLIMIT_NOFILE, &rlim);
if (unshare(CLONE_NEWNS)) {
}
if (unshare(CLONE_NEWIPC)) {
}
if (unshare(0x02000000)) {
}
if (unshare(CLONE_NEWUTS)) {
}
if (unshare(CLONE_SYSVSEM)) {
}
typedef struct {
const char* name;
const char* value;
} sysctl_t;
static const sysctl_t sysctls[] = {
   {"/proc/sys/kernel/shmmax", "16777216"},
   {"/proc/sys/kernel/shmall", "536870912"},
   {"/proc/sys/kernel/shmmni", "1024"},
   {"/proc/sys/kernel/msgmax", "8192"},
   {"/proc/sys/kernel/msgmni", "1024"},
   {"/proc/sys/kernel/msgmnb", "1024"},
   {"/proc/sys/kernel/sem", "1024 1048576 500 1024"},
};
unsigned i;
for (i = 0; i < sizeof(sysctls) / sizeof(sysctls[0]); i++)
write_file(sysctls[i].name, sysctls[i].value);
}

int wait_for_loop(int pid)
{
if (pid < 0)
exit(1);
int status = 0;
while (waitpid(-1, &status, __WALL) != pid) {
}
return WEXITSTATUS(status);
}

static void drop_caps(void)
{
struct __user_cap_header_struct cap_hdr = {};
struct __user_cap_data_struct cap_data[2] = {};
cap_hdr.version = _LINUX_CAPABILITY_VERSION_3;
cap_hdr.pid = getpid();
if (syscall(SYS_capget, &cap_hdr, &cap_data))
exit(1);
const int drop = (1 << CAP_SYS_PTRACE) | (1 << CAP_SYS_NICE);
cap_data[0].effective &= ~drop;
cap_data[0].permitted &= ~drop;
cap_data[0].inheritable &= ~drop;
if (syscall(SYS_capset, &cap_hdr, &cap_data))
exit(1);
}

static int do_sandbox_none(void)
{
if (unshare(CLONE_NEWPID)) {
}
int pid = fork();
if (pid != 0)
return wait_for_loop(pid);
setup_common();
sandbox_common();
drop_caps();
if (unshare(CLONE_NEWNET)) {
}
loop();
exit(1);
}

static void kill_and_wait(int pid, int* status)
{
kill(-pid, SIGKILL);
kill(pid, SIGKILL);
int i;
for (i = 0; i < 100; i++) {
if (waitpid(-1, status, WNOHANG | __WALL) == pid)
return;
usleep(1000);
}
DIR* dir = opendir("/sys/fs/fuse/connections");
if (dir) {
for (;;) {
struct dirent* ent = readdir(dir);
if (!ent)
break;
if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
continue;
char abort[300];
snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
ent->d_name);
int fd = open(abort, O_WRONLY);
if (fd == -1) {
continue;
}
if (write(fd, abort, 1) < 0) {
}
close(fd);
}
closedir(dir);
} else {
}
while (waitpid(-1, status, __WALL) != pid) {
}
}

static void setup_test()
{
prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
setpgrp();
write_file("/proc/self/oom_score_adj", "1000");
}

static void close_fds()
{
int fd;
for (fd = 3; fd < 30; fd++)
close(fd);
}

static void execute_one(void);

#define WAIT_FLAGS __WALL

static void loop(void)
{
int iter;
for (iter = 0;; iter++) {
int pid = fork();
if (pid < 0)
exit(1);
if (pid == 0) {
setup_test();
execute_one();
close_fds();
exit(0);
}
int status = 0;
uint64_t start = current_time_ms();
for (;;) {
if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
break;
sleep_ms(1);
if (current_time_ms() - start < 5 * 1000)
continue;
kill_and_wait(pid, &status);
break;
}
}
}

uint64_t r[1] = {0xffffffffffffffff};

void execute_one(void)
{
intptr_t res = 0;
res = syscall(__NR_socket, 0xf, 3, 2);
if (res != -1)
r[0] = res;
*(uint64_t*)0x20000040 = 0;
*(uint32_t*)0x20000048 = 0;
*(uint64_t*)0x20000050 = 0x20000140;
*(uint64_t*)0x20000140 = 0x200001c0;
*(uint8_t*)0x200001c0 = 2;
*(uint8_t*)0x200001c1 = 7;
*(uint8_t*)0x200001c2 = 0;
*(uint8_t*)0x200001c3 = 0;
*(uint16_t*)0x200001c4 = 2;
*(uint16_t*)0x200001c6 = 0;
*(uint32_t*)0x200001c8 = 0;
*(uint32_t*)0x200001cc = 0;
*(uint64_t*)0x20000148 = 0x10;
*(uint64_t*)0x20000058 = 1;
*(uint64_t*)0x20000060 = 0;
*(uint64_t*)0x20000068 = 0;
*(uint32_t*)0x20000070 = 0;
syscall(__NR_sendmsg, r[0], 0x20000040, 0);

}
int main(void)
{
syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
for (procid = 0; procid < 8; procid++) {
if (fork() == 0) {
do_sandbox_none();
}
}
sleep(1000000);
return 0;
}

Has there been any further investigation in to this warning following
the previous discussion?
-John

             reply index

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-11  7:17 John Mathew [this message]
2019-12-12 17:12 ` Sebastian Andrzej Siewior
2019-12-13  9:08   ` John Mathew
2019-12-18 16:53     ` Sebastian Andrzej Siewior
2019-12-31 10:00       ` [PATCH] crypto: only wait for completion once John Mathew
2020-01-14 12:47         ` Sebastian Andrzej Siewior
2020-02-11 11:48           ` John Mathew
2020-02-11 12:04             ` Sebastian Andrzej Siewior
2020-02-11 15:52               ` John Mathew
2020-02-11 15:59                 ` Sebastian Andrzej Siewior
2020-02-12 12:55                   ` John Mathew
2020-02-12 17:02                     ` Sebastian Andrzej Siewior
2020-02-13  6:58                       ` John Mathew
2020-02-13  8:31                         ` Sebastian Andrzej Siewior
2020-02-18  7:18                           ` [PATCH] swait: Increase number of waiters for warning John Mathew
2020-03-20 16:54                             ` [PATCH RT] swait: Remove the warning with more than two waiters Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJz2qXnHnw7Z9ftb6Oha5_0-R4YfUL+FhVZPNX1KKHyUkP6KNA@mail.gmail.com \
    --to=john.mathew@unikie.com \
    --cc=linux-rt-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-rt-users Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rt-users/0 linux-rt-users/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rt-users linux-rt-users/ https://lore.kernel.org/linux-rt-users \
		linux-rt-users@vger.kernel.org
	public-inbox-index linux-rt-users

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rt-users


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git