From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x531.google.com ([2607:f8b0:4864:20::531]) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lZBQC-00Cqh0-4C for linux-um@lists.infradead.org; Wed, 21 Apr 2021 11:53:36 +0000 Received: by mail-pg1-x531.google.com with SMTP id m12so8841221pgr.9 for ; Wed, 21 Apr 2021 04:53:31 -0700 (PDT) MIME-Version: 1.0 From: YiFei Zhu Date: Wed, 21 Apr 2021 06:53:19 -0500 Message-ID: Subject: Race between SIGIO and epoll from SMP host List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-um" Errors-To: linux-um-bounces+geert=linux-m68k.org@lists.infradead.org To: linux-um@lists.infradead.org Hi I was playing around with xxd large files when I noticed that the terminal output often stalls for a few seconds. It's a heisenbug so the root cause was mostly found by debugging-by-printing and trial-and-error. This bug I can reliably reproduce by running `xxd /dev/zero` inside UML that has its tty as `fd:0,fd:1`. Behavior: the terminal output stalls in the middle of a line, stalling for seconds and possibly a very long time. Keyboard input de-stalls the output but will stall again quickly. AFAICT, the way UML receives CONSOLE_WRITE_IRQ from stdout is that, it sets O_ASYNC on stdout, then whenever a SIGIO is received by UML, it checks for the origin of the SIGIO via epoll_wait in os_waiting_for_events_epoll. However, this is racy: A SIGIO can be generated prior to epoll receiving the event, so epoll_wait with zero timeout may be called before epoll receives the event, and the event can be missed. As a minimal reproducible example, this C code running on the host often assert failure in `res == 1`: [snip] #define _GNU_SOURCE 1 #include #include #include #include #include #include #include #define MAX_EPOLL_EVENTS 64 volatile sig_atomic_t state; static struct epoll_event epoll_events[MAX_EPOLL_EVENTS]; static char msg[] = "00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................\n"; static void sigio_handler(int ignored) { assert(state == 1 || state == 0); state = 2; } int main(int argc, char **argv) { int epollfd, flags, res; struct epoll_event event; state = 0; signal(SIGIO, sigio_handler); epollfd = epoll_create(MAX_EPOLL_EVENTS); assert(epollfd >= 0); event.data.ptr = NULL; event.events = EPOLLOUT | EPOLLET; assert(!epoll_ctl(epollfd, EPOLL_CTL_ADD, 1, &event)); flags = fcntl(1, F_GETFL); assert(flags >= 0); flags |= O_ASYNC | O_NONBLOCK; assert(!fcntl(1, F_SETFL, flags)); assert(!fcntl(1, F_SETSIG, SIGIO)); assert(!fcntl(1, F_SETOWN, getpid())); for (;;) { if (state == 0) { res = write(1, msg, sizeof(msg)); if (res < 0 && errno == EAGAIN) state = 1; } else if (state == 1) { pause(); } else if (state == 2) { res = epoll_wait(epollfd, epoll_events, MAX_EPOLL_EVENTS, 1); assert(res == 1); state = 0; } } } [/snip] strace -e epoll_wait shows, the last few lines: --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} --- epoll_wait(3, [{events=EPOLLOUT, data={u32=0, u64=0}}], 64, 1) = 1 --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} --- epoll_wait(3, [{events=EPOLLOUT, data={u32=0, u64=0}}], 64, 1) = 1 --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} --- epoll_wait(3, [], 64, 1) = 0 --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2033614, si_uid=1000} --- +++ killed by SIGABRT (core dumped) +++ I reproduced this assertion failure on: * a self-built 5.11.14 SMP vanilla kernel running on bare metal, with terminal as a graphical application. * a 4.19.0-0.bpo.14-amd64 Debian 4.19.171-2~deb9u1 SMP kernel with terminal as ssh terminal. Can still reproduce after setting timeout to 1 ms, in both in UML and in this minimal example. I looked into how SIGIO and epoll events are generated for a tty, and it seems that SIGIO is generated strictly before epoll events are generated. gdb backtrace looks like (yes, on UML; didn't bother to spin up qemu since I had a UML gdb open): #0 send_sigio (fown=0x674a4b98, fd=0x1, band=0x2) at ./fs/fcntl.c:795 #1 kill_fasync_rcu (band=, sig=, fa=0x672f90c0) at ./fs/fcntl.c:1019 #2 kill_fasync (fp=, sig=0x1d, band=0x2) at ./fs/fcntl.c:1033 #3 n_tty_write_wakeup (tty=) at ./drivers/tty/n_tty.c:243 #4 tty_wakeup (tty=0x67485800) at ./drivers/tty/tty_io.c:533 #5 tty_port_default_wakeup (port=) at ./drivers/tty/tty_port.c:50 #6 tty_port_tty_wakeup (port=0x615ded80 ) at ./drivers/tty/tty_port.c:388 #0 ep_poll_callback (wait=0x674d2ad0, mode=0x1, sync=0x0, key=0x4) at ./fs/eventpoll.c:361 #1 __wake_up_common (wq_head=0x67485ce8, mode=0x1, nr_exclusive=0x1, wake_flags=0x0, key=0x4, bookmark=0x61396f70) at ./kernel/sched/wait.c:108 #2 __wake_up_common_lock (wq_head=0x67485ce8, mode=0x1, nr_exclusive=0x1, wake_flags=0x0, key=0x4) at ./kernel/sched/wait.c:138 #3 __wake_up (wq_head=0x67485ce8, mode=0x1, nr_exclusive=0x1, key=0x4) at ./kernel/sched/wait.c:157 #4 tty_wakeup (tty=0x67485800) at ./drivers/tty/tty_io.c:537 #5 tty_port_default_wakeup (port=) at ./drivers/tty/tty_port.c:50 #6 tty_port_tty_wakeup (port=0x615ded80 ) at ./drivers/tty/tty_port.c:388 The last shared frame is in tty_wakeup, and n_tty_write_wakeup is called by `ld->ops->write_wakeup` whereas __wake_up is in the macro wake_up_interruptible_poll, so send_sigio is called strictly before ep_poll_callback for tty_wakeup. I have not tested on a non-SMP host, but on an SMP host, SIGIO could cause the waiting task to wake up on a different core, and the task could invoke epoll_wait before the interrupt handler generates the epoll event in ep_poll_callback. If this happens to UML, the interrupt for UML to make the tty ready will be missed until the next SIGIO. Considering that this is a race on the host, what would be the best way to fix this? YiFei Zhu _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um