All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anton Ivanov <anton.ivanov@kot-begemot.co.uk>
To: YiFei Zhu <zhuyifei1999@gmail.com>, linux-um@lists.infradead.org
Subject: Re: Race between SIGIO and epoll from SMP host
Date: Wed, 21 Apr 2021 13:45:05 +0100	[thread overview]
Message-ID: <9dd2fe8d-71cd-a5e6-15e3-0375568c2b40@kot-begemot.co.uk> (raw)
In-Reply-To: <d26d976f-2ef0-aeec-e2f4-51b171e0a069@kot-begemot.co.uk>



On 21/04/2021 13:32, Anton Ivanov wrote:
> 
> 
> On 21/04/2021 12:53, YiFei Zhu wrote:
>> Hi
>>
>> I was playing around with xxd large files when I noticed that the
>> terminal output often stalls for a few seconds. It's a heisenbug so
>> the root cause was mostly found by debugging-by-printing and
>> trial-and-error.
>>
>> This bug I can reliably reproduce by running `xxd /dev/zero` inside
>> UML that has its tty as `fd:0,fd:1`. Behavior: the terminal output
>> stalls in the middle of a line, stalling for seconds and possibly a
>> very long time. Keyboard input de-stalls the output but will stall
>> again quickly.
>>
>> AFAICT, the way UML receives CONSOLE_WRITE_IRQ from stdout is that, it
>> sets O_ASYNC on stdout, then whenever a SIGIO is received by UML, it
>> checks for the origin of the SIGIO via epoll_wait in
>> os_waiting_for_events_epoll. However, this is racy: A SIGIO can be
>> generated prior to epoll receiving the event, so epoll_wait with zero
>> timeout may be called before epoll receives the event, and the event
>> can be missed.
>>
>> As a minimal reproducible example, this C code running on the host
>> often assert failure in `res == 1`:
>>
>> [snip]
>> #define _GNU_SOURCE 1
>>
>> #include <assert.h>
>> #include <errno.h>
>> #include <stdio.h>
>> #include <signal.h>
>> #include <sys/epoll.h>
>> #include <sys/fcntl.h>
>> #include <unistd.h>
>>
>> #define MAX_EPOLL_EVENTS 64
>> volatile sig_atomic_t state;
>> static struct epoll_event epoll_events[MAX_EPOLL_EVENTS];
>>
>> static char msg[] =
>>      "00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................\n";
>>
>> static void sigio_handler(int ignored)
>> {
>>      assert(state == 1 || state == 0);
>>      state = 2;
>> }
>>
>> int main(int argc, char **argv)
>> {
>>      int epollfd, flags, res;
>>      struct epoll_event event;
>>
>>      state = 0;
>>
>>      signal(SIGIO, sigio_handler);
>>
>>      epollfd = epoll_create(MAX_EPOLL_EVENTS);
>>      assert(epollfd >= 0);
>>
>>      event.data.ptr = NULL;
>>      event.events = EPOLLOUT | EPOLLET;
>>      assert(!epoll_ctl(epollfd, EPOLL_CTL_ADD, 1, &event));
>>
>>      flags = fcntl(1, F_GETFL);
>>      assert(flags >= 0);
>>      flags |= O_ASYNC | O_NONBLOCK;
>>
>>      assert(!fcntl(1, F_SETFL, flags));
>>      assert(!fcntl(1, F_SETSIG, SIGIO));
>>      assert(!fcntl(1, F_SETOWN, getpid()));
>>
>>      for (;;) {
>>          if (state == 0) {
>>              res = write(1, msg, sizeof(msg));
>>              if (res < 0 && errno == EAGAIN)
>>                  state = 1;
>>          } else if (state == 1) {
>>              pause();
>>          } else if (state == 2) {
>>              res = epoll_wait(epollfd, epoll_events, MAX_EPOLL_EVENTS, 1);
>>              assert(res == 1);
>>              state = 0;
>>          }
>>      }
>> }
>> [/snip]
>>
>> strace -e epoll_wait shows, the last few lines:
>>
>>    --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} ---
>>    epoll_wait(3, [{events=EPOLLOUT, data={u32=0, u64=0}}], 64, 1) = 1
>>    --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} ---
>>    epoll_wait(3, [{events=EPOLLOUT, data={u32=0, u64=0}}], 64, 1) = 1
>>    --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} ---
>>    epoll_wait(3, [], 64, 1)                = 0
>>    --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2033614,
>> si_uid=1000} ---
>>    +++ killed by SIGABRT (core dumped) +++
>>
>> I reproduced this assertion failure on:
>> * a self-built 5.11.14 SMP vanilla kernel running on bare metal, with
>> terminal as a graphical application.
>> * a 4.19.0-0.bpo.14-amd64 Debian 4.19.171-2~deb9u1 SMP kernel with
>> terminal as ssh terminal.
>>
>> Can still reproduce after setting timeout to 1 ms, in both in UML and
>> in this minimal example.
>>
>> I looked into how SIGIO and epoll events are generated for a tty, and
>> it seems that SIGIO is generated strictly before epoll events are
>> generated. gdb backtrace looks like (yes, on UML; didn't bother to
>> spin up qemu since I had a UML gdb open):
>>
>>    #0  send_sigio (fown=0x674a4b98, fd=0x1, band=0x2) at ./fs/fcntl.c:795
>>    #1  kill_fasync_rcu (band=<optimized out>, sig=<optimized out>,
>> fa=0x672f90c0) at ./fs/fcntl.c:1019
>>    #2  kill_fasync (fp=<optimized out>, sig=0x1d, band=0x2) at ./fs/fcntl.c:1033
>>    #3  n_tty_write_wakeup (tty=<optimized out>) at ./drivers/tty/n_tty.c:243
>>    #4  tty_wakeup (tty=0x67485800) at ./drivers/tty/tty_io.c:533
>>    #5  tty_port_default_wakeup (port=<optimized out>) at
>> ./drivers/tty/tty_port.c:50
>>    #6  tty_port_tty_wakeup (port=0x615ded80 <vts>) at
>> ./drivers/tty/tty_port.c:388
>>
>>    #0  ep_poll_callback (wait=0x674d2ad0, mode=0x1, sync=0x0, key=0x4)
>> at ./fs/eventpoll.c:361
>>    #1  __wake_up_common (wq_head=0x67485ce8, mode=0x1,
>> nr_exclusive=0x1, wake_flags=0x0, key=0x4, bookmark=0x61396f70) at
>> ./kernel/sched/wait.c:108
>>    #2  __wake_up_common_lock (wq_head=0x67485ce8, mode=0x1,
>> nr_exclusive=0x1, wake_flags=0x0, key=0x4) at
>> ./kernel/sched/wait.c:138
>>    #3  __wake_up (wq_head=0x67485ce8, mode=0x1, nr_exclusive=0x1,
>> key=0x4) at ./kernel/sched/wait.c:157
>>    #4  tty_wakeup (tty=0x67485800) at ./drivers/tty/tty_io.c:537
>>    #5  tty_port_default_wakeup (port=<optimized out>) at
>> ./drivers/tty/tty_port.c:50
>>    #6  tty_port_tty_wakeup (port=0x615ded80 <vts>) at
>> ./drivers/tty/tty_port.c:388
>>
>> The last shared frame is in tty_wakeup, and n_tty_write_wakeup is
>> called by `ld->ops->write_wakeup` whereas __wake_up is in the macro
>> wake_up_interruptible_poll, so send_sigio is called strictly before
>> ep_poll_callback for tty_wakeup.
>>
>> I have not tested on a non-SMP host, but on an SMP host, SIGIO could
>> cause the waiting task to wake up on a different core, and the task
>> could invoke epoll_wait before the interrupt handler generates the
>> epoll event in ep_poll_callback. If this happens to UML, the interrupt
>> for UML to make the tty ready will be missed until the next SIGIO.
>>
>> Considering that this is a race on the host, what would be the best
>> way to fix this?
> 
> Interesting one. I need to think.
> 
> One option would be to wait for epoll events with a timeout which is larger than zero - f.e. HZ.
> 
> If we have received a SIGIO there is an epoll event on the way. The fact that it is not in the queue right now means that we are due to process it shortly.

No, this does not work. I will think of other approaches to this.

> 
> A.
> 
>>
>> YiFei Zhu
>>
>> _______________________________________________
>> linux-um mailing list
>> linux-um@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-um
>>
> 

-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

  reply	other threads:[~2021-04-21 12:45 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-21 11:53 Race between SIGIO and epoll from SMP host YiFei Zhu
2021-04-21 12:32 ` Anton Ivanov
2021-04-21 12:45   ` Anton Ivanov [this message]
2021-04-21 13:35   ` YiFei Zhu
2021-04-21 14:21     ` Anton Ivanov
2021-04-21 15:45     ` Anton Ivanov
2021-04-22  7:32       ` Anton Ivanov
2021-04-22  7:50         ` Anton Ivanov
2021-04-22  8:46         ` YiFei Zhu
2021-04-22 11:27           ` Anton Ivanov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9dd2fe8d-71cd-a5e6-15e3-0375568c2b40@kot-begemot.co.uk \
    --to=anton.ivanov@kot-begemot.co.uk \
    --cc=linux-um@lists.infradead.org \
    --cc=zhuyifei1999@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.