All of lore.kernel.org
 help / color / mirror / Atom feed
* Race between SIGIO and epoll from SMP host
@ 2021-04-21 11:53 YiFei Zhu
  2021-04-21 12:32 ` Anton Ivanov
  0 siblings, 1 reply; 10+ messages in thread
From: YiFei Zhu @ 2021-04-21 11:53 UTC (permalink / raw)
  To: linux-um

Hi

I was playing around with xxd large files when I noticed that the
terminal output often stalls for a few seconds. It's a heisenbug so
the root cause was mostly found by debugging-by-printing and
trial-and-error.

This bug I can reliably reproduce by running `xxd /dev/zero` inside
UML that has its tty as `fd:0,fd:1`. Behavior: the terminal output
stalls in the middle of a line, stalling for seconds and possibly a
very long time. Keyboard input de-stalls the output but will stall
again quickly.

AFAICT, the way UML receives CONSOLE_WRITE_IRQ from stdout is that, it
sets O_ASYNC on stdout, then whenever a SIGIO is received by UML, it
checks for the origin of the SIGIO via epoll_wait in
os_waiting_for_events_epoll. However, this is racy: A SIGIO can be
generated prior to epoll receiving the event, so epoll_wait with zero
timeout may be called before epoll receives the event, and the event
can be missed.

As a minimal reproducible example, this C code running on the host
often assert failure in `res == 1`:

[snip]
#define _GNU_SOURCE 1

#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <signal.h>
#include <sys/epoll.h>
#include <sys/fcntl.h>
#include <unistd.h>

#define MAX_EPOLL_EVENTS 64
volatile sig_atomic_t state;
static struct epoll_event epoll_events[MAX_EPOLL_EVENTS];

static char msg[] =
    "00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................\n";

static void sigio_handler(int ignored)
{
    assert(state == 1 || state == 0);
    state = 2;
}

int main(int argc, char **argv)
{
    int epollfd, flags, res;
    struct epoll_event event;

    state = 0;

    signal(SIGIO, sigio_handler);

    epollfd = epoll_create(MAX_EPOLL_EVENTS);
    assert(epollfd >= 0);

    event.data.ptr = NULL;
    event.events = EPOLLOUT | EPOLLET;
    assert(!epoll_ctl(epollfd, EPOLL_CTL_ADD, 1, &event));

    flags = fcntl(1, F_GETFL);
    assert(flags >= 0);
    flags |= O_ASYNC | O_NONBLOCK;

    assert(!fcntl(1, F_SETFL, flags));
    assert(!fcntl(1, F_SETSIG, SIGIO));
    assert(!fcntl(1, F_SETOWN, getpid()));

    for (;;) {
        if (state == 0) {
            res = write(1, msg, sizeof(msg));
            if (res < 0 && errno == EAGAIN)
                state = 1;
        } else if (state == 1) {
            pause();
        } else if (state == 2) {
            res = epoll_wait(epollfd, epoll_events, MAX_EPOLL_EVENTS, 1);
            assert(res == 1);
            state = 0;
        }
    }
}
[/snip]

strace -e epoll_wait shows, the last few lines:

  --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} ---
  epoll_wait(3, [{events=EPOLLOUT, data={u32=0, u64=0}}], 64, 1) = 1
  --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} ---
  epoll_wait(3, [{events=EPOLLOUT, data={u32=0, u64=0}}], 64, 1) = 1
  --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} ---
  epoll_wait(3, [], 64, 1)                = 0
  --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2033614,
si_uid=1000} ---
  +++ killed by SIGABRT (core dumped) +++

I reproduced this assertion failure on:
* a self-built 5.11.14 SMP vanilla kernel running on bare metal, with
terminal as a graphical application.
* a 4.19.0-0.bpo.14-amd64 Debian 4.19.171-2~deb9u1 SMP kernel with
terminal as ssh terminal.

Can still reproduce after setting timeout to 1 ms, in both in UML and
in this minimal example.

I looked into how SIGIO and epoll events are generated for a tty, and
it seems that SIGIO is generated strictly before epoll events are
generated. gdb backtrace looks like (yes, on UML; didn't bother to
spin up qemu since I had a UML gdb open):

  #0  send_sigio (fown=0x674a4b98, fd=0x1, band=0x2) at ./fs/fcntl.c:795
  #1  kill_fasync_rcu (band=<optimized out>, sig=<optimized out>,
fa=0x672f90c0) at ./fs/fcntl.c:1019
  #2  kill_fasync (fp=<optimized out>, sig=0x1d, band=0x2) at ./fs/fcntl.c:1033
  #3  n_tty_write_wakeup (tty=<optimized out>) at ./drivers/tty/n_tty.c:243
  #4  tty_wakeup (tty=0x67485800) at ./drivers/tty/tty_io.c:533
  #5  tty_port_default_wakeup (port=<optimized out>) at
./drivers/tty/tty_port.c:50
  #6  tty_port_tty_wakeup (port=0x615ded80 <vts>) at
./drivers/tty/tty_port.c:388

  #0  ep_poll_callback (wait=0x674d2ad0, mode=0x1, sync=0x0, key=0x4)
at ./fs/eventpoll.c:361
  #1  __wake_up_common (wq_head=0x67485ce8, mode=0x1,
nr_exclusive=0x1, wake_flags=0x0, key=0x4, bookmark=0x61396f70) at
./kernel/sched/wait.c:108
  #2  __wake_up_common_lock (wq_head=0x67485ce8, mode=0x1,
nr_exclusive=0x1, wake_flags=0x0, key=0x4) at
./kernel/sched/wait.c:138
  #3  __wake_up (wq_head=0x67485ce8, mode=0x1, nr_exclusive=0x1,
key=0x4) at ./kernel/sched/wait.c:157
  #4  tty_wakeup (tty=0x67485800) at ./drivers/tty/tty_io.c:537
  #5  tty_port_default_wakeup (port=<optimized out>) at
./drivers/tty/tty_port.c:50
  #6  tty_port_tty_wakeup (port=0x615ded80 <vts>) at
./drivers/tty/tty_port.c:388

The last shared frame is in tty_wakeup, and n_tty_write_wakeup is
called by `ld->ops->write_wakeup` whereas __wake_up is in the macro
wake_up_interruptible_poll, so send_sigio is called strictly before
ep_poll_callback for tty_wakeup.

I have not tested on a non-SMP host, but on an SMP host, SIGIO could
cause the waiting task to wake up on a different core, and the task
could invoke epoll_wait before the interrupt handler generates the
epoll event in ep_poll_callback. If this happens to UML, the interrupt
for UML to make the tty ready will be missed until the next SIGIO.

Considering that this is a race on the host, what would be the best
way to fix this?

YiFei Zhu

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Race between SIGIO and epoll from SMP host
  2021-04-21 11:53 Race between SIGIO and epoll from SMP host YiFei Zhu
@ 2021-04-21 12:32 ` Anton Ivanov
  2021-04-21 12:45   ` Anton Ivanov
  2021-04-21 13:35   ` YiFei Zhu
  0 siblings, 2 replies; 10+ messages in thread
From: Anton Ivanov @ 2021-04-21 12:32 UTC (permalink / raw)
  To: YiFei Zhu, linux-um



On 21/04/2021 12:53, YiFei Zhu wrote:
> Hi
> 
> I was playing around with xxd large files when I noticed that the
> terminal output often stalls for a few seconds. It's a heisenbug so
> the root cause was mostly found by debugging-by-printing and
> trial-and-error.
> 
> This bug I can reliably reproduce by running `xxd /dev/zero` inside
> UML that has its tty as `fd:0,fd:1`. Behavior: the terminal output
> stalls in the middle of a line, stalling for seconds and possibly a
> very long time. Keyboard input de-stalls the output but will stall
> again quickly.
> 
> AFAICT, the way UML receives CONSOLE_WRITE_IRQ from stdout is that, it
> sets O_ASYNC on stdout, then whenever a SIGIO is received by UML, it
> checks for the origin of the SIGIO via epoll_wait in
> os_waiting_for_events_epoll. However, this is racy: A SIGIO can be
> generated prior to epoll receiving the event, so epoll_wait with zero
> timeout may be called before epoll receives the event, and the event
> can be missed.
> 
> As a minimal reproducible example, this C code running on the host
> often assert failure in `res == 1`:
> 
> [snip]
> #define _GNU_SOURCE 1
> 
> #include <assert.h>
> #include <errno.h>
> #include <stdio.h>
> #include <signal.h>
> #include <sys/epoll.h>
> #include <sys/fcntl.h>
> #include <unistd.h>
> 
> #define MAX_EPOLL_EVENTS 64
> volatile sig_atomic_t state;
> static struct epoll_event epoll_events[MAX_EPOLL_EVENTS];
> 
> static char msg[] =
>      "00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................\n";
> 
> static void sigio_handler(int ignored)
> {
>      assert(state == 1 || state == 0);
>      state = 2;
> }
> 
> int main(int argc, char **argv)
> {
>      int epollfd, flags, res;
>      struct epoll_event event;
> 
>      state = 0;
> 
>      signal(SIGIO, sigio_handler);
> 
>      epollfd = epoll_create(MAX_EPOLL_EVENTS);
>      assert(epollfd >= 0);
> 
>      event.data.ptr = NULL;
>      event.events = EPOLLOUT | EPOLLET;
>      assert(!epoll_ctl(epollfd, EPOLL_CTL_ADD, 1, &event));
> 
>      flags = fcntl(1, F_GETFL);
>      assert(flags >= 0);
>      flags |= O_ASYNC | O_NONBLOCK;
> 
>      assert(!fcntl(1, F_SETFL, flags));
>      assert(!fcntl(1, F_SETSIG, SIGIO));
>      assert(!fcntl(1, F_SETOWN, getpid()));
> 
>      for (;;) {
>          if (state == 0) {
>              res = write(1, msg, sizeof(msg));
>              if (res < 0 && errno == EAGAIN)
>                  state = 1;
>          } else if (state == 1) {
>              pause();
>          } else if (state == 2) {
>              res = epoll_wait(epollfd, epoll_events, MAX_EPOLL_EVENTS, 1);
>              assert(res == 1);
>              state = 0;
>          }
>      }
> }
> [/snip]
> 
> strace -e epoll_wait shows, the last few lines:
> 
>    --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} ---
>    epoll_wait(3, [{events=EPOLLOUT, data={u32=0, u64=0}}], 64, 1) = 1
>    --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} ---
>    epoll_wait(3, [{events=EPOLLOUT, data={u32=0, u64=0}}], 64, 1) = 1
>    --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} ---
>    epoll_wait(3, [], 64, 1)                = 0
>    --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2033614,
> si_uid=1000} ---
>    +++ killed by SIGABRT (core dumped) +++
> 
> I reproduced this assertion failure on:
> * a self-built 5.11.14 SMP vanilla kernel running on bare metal, with
> terminal as a graphical application.
> * a 4.19.0-0.bpo.14-amd64 Debian 4.19.171-2~deb9u1 SMP kernel with
> terminal as ssh terminal.
> 
> Can still reproduce after setting timeout to 1 ms, in both in UML and
> in this minimal example.
> 
> I looked into how SIGIO and epoll events are generated for a tty, and
> it seems that SIGIO is generated strictly before epoll events are
> generated. gdb backtrace looks like (yes, on UML; didn't bother to
> spin up qemu since I had a UML gdb open):
> 
>    #0  send_sigio (fown=0x674a4b98, fd=0x1, band=0x2) at ./fs/fcntl.c:795
>    #1  kill_fasync_rcu (band=<optimized out>, sig=<optimized out>,
> fa=0x672f90c0) at ./fs/fcntl.c:1019
>    #2  kill_fasync (fp=<optimized out>, sig=0x1d, band=0x2) at ./fs/fcntl.c:1033
>    #3  n_tty_write_wakeup (tty=<optimized out>) at ./drivers/tty/n_tty.c:243
>    #4  tty_wakeup (tty=0x67485800) at ./drivers/tty/tty_io.c:533
>    #5  tty_port_default_wakeup (port=<optimized out>) at
> ./drivers/tty/tty_port.c:50
>    #6  tty_port_tty_wakeup (port=0x615ded80 <vts>) at
> ./drivers/tty/tty_port.c:388
> 
>    #0  ep_poll_callback (wait=0x674d2ad0, mode=0x1, sync=0x0, key=0x4)
> at ./fs/eventpoll.c:361
>    #1  __wake_up_common (wq_head=0x67485ce8, mode=0x1,
> nr_exclusive=0x1, wake_flags=0x0, key=0x4, bookmark=0x61396f70) at
> ./kernel/sched/wait.c:108
>    #2  __wake_up_common_lock (wq_head=0x67485ce8, mode=0x1,
> nr_exclusive=0x1, wake_flags=0x0, key=0x4) at
> ./kernel/sched/wait.c:138
>    #3  __wake_up (wq_head=0x67485ce8, mode=0x1, nr_exclusive=0x1,
> key=0x4) at ./kernel/sched/wait.c:157
>    #4  tty_wakeup (tty=0x67485800) at ./drivers/tty/tty_io.c:537
>    #5  tty_port_default_wakeup (port=<optimized out>) at
> ./drivers/tty/tty_port.c:50
>    #6  tty_port_tty_wakeup (port=0x615ded80 <vts>) at
> ./drivers/tty/tty_port.c:388
> 
> The last shared frame is in tty_wakeup, and n_tty_write_wakeup is
> called by `ld->ops->write_wakeup` whereas __wake_up is in the macro
> wake_up_interruptible_poll, so send_sigio is called strictly before
> ep_poll_callback for tty_wakeup.
> 
> I have not tested on a non-SMP host, but on an SMP host, SIGIO could
> cause the waiting task to wake up on a different core, and the task
> could invoke epoll_wait before the interrupt handler generates the
> epoll event in ep_poll_callback. If this happens to UML, the interrupt
> for UML to make the tty ready will be missed until the next SIGIO.
> 
> Considering that this is a race on the host, what would be the best
> way to fix this?

Interesting one. I need to think.

One option would be to wait for epoll events with a timeout which is larger than zero - f.e. HZ.

If we have received a SIGIO there is an epoll event on the way. The fact that it is not in the queue right now means that we are due to process it shortly.

A.

> 
> YiFei Zhu
> 
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
> 

-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Race between SIGIO and epoll from SMP host
  2021-04-21 12:32 ` Anton Ivanov
@ 2021-04-21 12:45   ` Anton Ivanov
  2021-04-21 13:35   ` YiFei Zhu
  1 sibling, 0 replies; 10+ messages in thread
From: Anton Ivanov @ 2021-04-21 12:45 UTC (permalink / raw)
  To: YiFei Zhu, linux-um



On 21/04/2021 13:32, Anton Ivanov wrote:
> 
> 
> On 21/04/2021 12:53, YiFei Zhu wrote:
>> Hi
>>
>> I was playing around with xxd large files when I noticed that the
>> terminal output often stalls for a few seconds. It's a heisenbug so
>> the root cause was mostly found by debugging-by-printing and
>> trial-and-error.
>>
>> This bug I can reliably reproduce by running `xxd /dev/zero` inside
>> UML that has its tty as `fd:0,fd:1`. Behavior: the terminal output
>> stalls in the middle of a line, stalling for seconds and possibly a
>> very long time. Keyboard input de-stalls the output but will stall
>> again quickly.
>>
>> AFAICT, the way UML receives CONSOLE_WRITE_IRQ from stdout is that, it
>> sets O_ASYNC on stdout, then whenever a SIGIO is received by UML, it
>> checks for the origin of the SIGIO via epoll_wait in
>> os_waiting_for_events_epoll. However, this is racy: A SIGIO can be
>> generated prior to epoll receiving the event, so epoll_wait with zero
>> timeout may be called before epoll receives the event, and the event
>> can be missed.
>>
>> As a minimal reproducible example, this C code running on the host
>> often assert failure in `res == 1`:
>>
>> [snip]
>> #define _GNU_SOURCE 1
>>
>> #include <assert.h>
>> #include <errno.h>
>> #include <stdio.h>
>> #include <signal.h>
>> #include <sys/epoll.h>
>> #include <sys/fcntl.h>
>> #include <unistd.h>
>>
>> #define MAX_EPOLL_EVENTS 64
>> volatile sig_atomic_t state;
>> static struct epoll_event epoll_events[MAX_EPOLL_EVENTS];
>>
>> static char msg[] =
>>      "00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................\n";
>>
>> static void sigio_handler(int ignored)
>> {
>>      assert(state == 1 || state == 0);
>>      state = 2;
>> }
>>
>> int main(int argc, char **argv)
>> {
>>      int epollfd, flags, res;
>>      struct epoll_event event;
>>
>>      state = 0;
>>
>>      signal(SIGIO, sigio_handler);
>>
>>      epollfd = epoll_create(MAX_EPOLL_EVENTS);
>>      assert(epollfd >= 0);
>>
>>      event.data.ptr = NULL;
>>      event.events = EPOLLOUT | EPOLLET;
>>      assert(!epoll_ctl(epollfd, EPOLL_CTL_ADD, 1, &event));
>>
>>      flags = fcntl(1, F_GETFL);
>>      assert(flags >= 0);
>>      flags |= O_ASYNC | O_NONBLOCK;
>>
>>      assert(!fcntl(1, F_SETFL, flags));
>>      assert(!fcntl(1, F_SETSIG, SIGIO));
>>      assert(!fcntl(1, F_SETOWN, getpid()));
>>
>>      for (;;) {
>>          if (state == 0) {
>>              res = write(1, msg, sizeof(msg));
>>              if (res < 0 && errno == EAGAIN)
>>                  state = 1;
>>          } else if (state == 1) {
>>              pause();
>>          } else if (state == 2) {
>>              res = epoll_wait(epollfd, epoll_events, MAX_EPOLL_EVENTS, 1);
>>              assert(res == 1);
>>              state = 0;
>>          }
>>      }
>> }
>> [/snip]
>>
>> strace -e epoll_wait shows, the last few lines:
>>
>>    --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} ---
>>    epoll_wait(3, [{events=EPOLLOUT, data={u32=0, u64=0}}], 64, 1) = 1
>>    --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} ---
>>    epoll_wait(3, [{events=EPOLLOUT, data={u32=0, u64=0}}], 64, 1) = 1
>>    --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} ---
>>    epoll_wait(3, [], 64, 1)                = 0
>>    --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2033614,
>> si_uid=1000} ---
>>    +++ killed by SIGABRT (core dumped) +++
>>
>> I reproduced this assertion failure on:
>> * a self-built 5.11.14 SMP vanilla kernel running on bare metal, with
>> terminal as a graphical application.
>> * a 4.19.0-0.bpo.14-amd64 Debian 4.19.171-2~deb9u1 SMP kernel with
>> terminal as ssh terminal.
>>
>> Can still reproduce after setting timeout to 1 ms, in both in UML and
>> in this minimal example.
>>
>> I looked into how SIGIO and epoll events are generated for a tty, and
>> it seems that SIGIO is generated strictly before epoll events are
>> generated. gdb backtrace looks like (yes, on UML; didn't bother to
>> spin up qemu since I had a UML gdb open):
>>
>>    #0  send_sigio (fown=0x674a4b98, fd=0x1, band=0x2) at ./fs/fcntl.c:795
>>    #1  kill_fasync_rcu (band=<optimized out>, sig=<optimized out>,
>> fa=0x672f90c0) at ./fs/fcntl.c:1019
>>    #2  kill_fasync (fp=<optimized out>, sig=0x1d, band=0x2) at ./fs/fcntl.c:1033
>>    #3  n_tty_write_wakeup (tty=<optimized out>) at ./drivers/tty/n_tty.c:243
>>    #4  tty_wakeup (tty=0x67485800) at ./drivers/tty/tty_io.c:533
>>    #5  tty_port_default_wakeup (port=<optimized out>) at
>> ./drivers/tty/tty_port.c:50
>>    #6  tty_port_tty_wakeup (port=0x615ded80 <vts>) at
>> ./drivers/tty/tty_port.c:388
>>
>>    #0  ep_poll_callback (wait=0x674d2ad0, mode=0x1, sync=0x0, key=0x4)
>> at ./fs/eventpoll.c:361
>>    #1  __wake_up_common (wq_head=0x67485ce8, mode=0x1,
>> nr_exclusive=0x1, wake_flags=0x0, key=0x4, bookmark=0x61396f70) at
>> ./kernel/sched/wait.c:108
>>    #2  __wake_up_common_lock (wq_head=0x67485ce8, mode=0x1,
>> nr_exclusive=0x1, wake_flags=0x0, key=0x4) at
>> ./kernel/sched/wait.c:138
>>    #3  __wake_up (wq_head=0x67485ce8, mode=0x1, nr_exclusive=0x1,
>> key=0x4) at ./kernel/sched/wait.c:157
>>    #4  tty_wakeup (tty=0x67485800) at ./drivers/tty/tty_io.c:537
>>    #5  tty_port_default_wakeup (port=<optimized out>) at
>> ./drivers/tty/tty_port.c:50
>>    #6  tty_port_tty_wakeup (port=0x615ded80 <vts>) at
>> ./drivers/tty/tty_port.c:388
>>
>> The last shared frame is in tty_wakeup, and n_tty_write_wakeup is
>> called by `ld->ops->write_wakeup` whereas __wake_up is in the macro
>> wake_up_interruptible_poll, so send_sigio is called strictly before
>> ep_poll_callback for tty_wakeup.
>>
>> I have not tested on a non-SMP host, but on an SMP host, SIGIO could
>> cause the waiting task to wake up on a different core, and the task
>> could invoke epoll_wait before the interrupt handler generates the
>> epoll event in ep_poll_callback. If this happens to UML, the interrupt
>> for UML to make the tty ready will be missed until the next SIGIO.
>>
>> Considering that this is a race on the host, what would be the best
>> way to fix this?
> 
> Interesting one. I need to think.
> 
> One option would be to wait for epoll events with a timeout which is larger than zero - f.e. HZ.
> 
> If we have received a SIGIO there is an epoll event on the way. The fact that it is not in the queue right now means that we are due to process it shortly.

No, this does not work. I will think of other approaches to this.

> 
> A.
> 
>>
>> YiFei Zhu
>>
>> _______________________________________________
>> linux-um mailing list
>> linux-um@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-um
>>
> 

-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Race between SIGIO and epoll from SMP host
  2021-04-21 12:32 ` Anton Ivanov
  2021-04-21 12:45   ` Anton Ivanov
@ 2021-04-21 13:35   ` YiFei Zhu
  2021-04-21 14:21     ` Anton Ivanov
  2021-04-21 15:45     ` Anton Ivanov
  1 sibling, 2 replies; 10+ messages in thread
From: YiFei Zhu @ 2021-04-21 13:35 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: linux-um

On Wed, Apr 21, 2021 at 7:32 AM Anton Ivanov
<anton.ivanov@kot-begemot.co.uk> wrote:
> > Considering that this is a race on the host, what would be the best
> > way to fix this?
>
> Interesting one. I need to think.
>
> One option would be to wait for epoll events with a timeout which is larger than zero - f.e. HZ.

I was about to say I could reproduce it even with a timeout of 1ms,
then I realized that code I pasted above already used 1ms timeout.
Assertion failures using 1ms timeout seems much rarer than 0 timeout
however.

For reference my CONFIG_HZ on the host is 1000. I also use
CONFIG_NO_HZ_IDLE if that's relevant (I'm not too familiar with how
the kernel ticking works).

> If we have received a SIGIO there is an epoll event on the way. The fact that it is not in the queue right now means that we are due to process it shortly.
>
> A.

YiFei Zhu

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Race between SIGIO and epoll from SMP host
  2021-04-21 13:35   ` YiFei Zhu
@ 2021-04-21 14:21     ` Anton Ivanov
  2021-04-21 15:45     ` Anton Ivanov
  1 sibling, 0 replies; 10+ messages in thread
From: Anton Ivanov @ 2021-04-21 14:21 UTC (permalink / raw)
  To: linux-um



On 21/04/2021 14:35, YiFei Zhu wrote:
> On Wed, Apr 21, 2021 at 7:32 AM Anton Ivanov
> <anton.ivanov@kot-begemot.co.uk> wrote:
>>> Considering that this is a race on the host, what would be the best
>>> way to fix this?
>>
>> Interesting one. I need to think.
>>
>> One option would be to wait for epoll events with a timeout which is larger than zero - f.e. HZ.
> 
> I was about to say I could reproduce it even with a timeout of 1ms,
> then I realized that code I pasted above already used 1ms timeout.
> Assertion failures using 1ms timeout seems much rarer than 0 timeout
> however.
> 
> For reference my CONFIG_HZ on the host is 1000. I also use
> CONFIG_NO_HZ_IDLE if that's relevant (I'm not too familiar with how
> the kernel ticking works).

Reading the code around the kernel the race always applies. What changes is the likelihood.

The usual pattern of doing things is:

1) wake up something - means differ depending on the actual place where this is. This may or may not end up on a different core.

2) send SIGIOs.

The only place which does things the other way around is actually ttys. I had a look there in a couple of places and there it first sends SIGIO, then wakes anyone pending for notifications.

That is why you are seeing it on ttys and that is why we are likely to see this more often on ttys compared to other IO. It will happen on other IO as well as there is no guarantee that the notification handling will be performed on the same core as UML.

As far as using time-out on the poll, there is a number of fds which have lots of spurious SIGIOs and it will end up waiting on them. The worst offender is random, but it is not alone on this one.

This will not be easy - we use epoll with a EPOLLET pattern so if we have missed an event after a SIGIO we will not see it until there is a rerun.

I need to think on this one.

A.


 >
 >
> 
>> If we have received a SIGIO there is an epoll event on the way. The fact that it is not in the queue right now means that we are due to process it shortly.
>>
>> A.
> 
> YiFei Zhu
> 
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
> 

-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Race between SIGIO and epoll from SMP host
  2021-04-21 13:35   ` YiFei Zhu
  2021-04-21 14:21     ` Anton Ivanov
@ 2021-04-21 15:45     ` Anton Ivanov
  2021-04-22  7:32       ` Anton Ivanov
  1 sibling, 1 reply; 10+ messages in thread
From: Anton Ivanov @ 2021-04-21 15:45 UTC (permalink / raw)
  To: YiFei Zhu; +Cc: linux-um



On 21/04/2021 14:35, YiFei Zhu wrote:
> On Wed, Apr 21, 2021 at 7:32 AM Anton Ivanov
> <anton.ivanov@kot-begemot.co.uk> wrote:
>>> Considering that this is a race on the host, what would be the best
>>> way to fix this?
>>
>> Interesting one. I need to think.
>>
>> One option would be to wait for epoll events with a timeout which is larger than zero - f.e. HZ.
> 
> I was about to say I could reproduce it even with a timeout of 1ms,
> then I realized that code I pasted above already used 1ms timeout.
> Assertion failures using 1ms timeout seems much rarer than 0 timeout
> however.
> 
> For reference my CONFIG_HZ on the host is 1000. I also use
> CONFIG_NO_HZ_IDLE if that's relevant (I'm not too familiar with how
> the kernel ticking works).
> 
>> If we have received a SIGIO there is an epoll event on the way. The fact that it is not in the queue right now means that we are due to process it shortly.

This seems to be limited to ttys. Why - I need to figure it out.

If this ends up as tty specific, we can enable the work-around for ttys which was there when they were not producing sigio on write correctly.

This ends up disabled on most modern machines, because modern kernels produce sigio on write correctly for ttys.

With the workaround enabled there is an extra IO event which is produced after the notification appears on the poll loop in a helper thread. So the stall should never happen.

A.

>>
>> A.
> 
> YiFei Zhu
> 
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
> 

-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Race between SIGIO and epoll from SMP host
  2021-04-21 15:45     ` Anton Ivanov
@ 2021-04-22  7:32       ` Anton Ivanov
  2021-04-22  7:50         ` Anton Ivanov
  2021-04-22  8:46         ` YiFei Zhu
  0 siblings, 2 replies; 10+ messages in thread
From: Anton Ivanov @ 2021-04-22  7:32 UTC (permalink / raw)
  To: YiFei Zhu; +Cc: linux-um

On 21/04/2021 16:45, Anton Ivanov wrote:
> 
> 
> On 21/04/2021 14:35, YiFei Zhu wrote:
>> On Wed, Apr 21, 2021 at 7:32 AM Anton Ivanov
>> <anton.ivanov@kot-begemot.co.uk> wrote:
>>>> Considering that this is a race on the host, what would be the best
>>>> way to fix this?
>>>
>>> Interesting one. I need to think.
>>>
>>> One option would be to wait for epoll events with a timeout which is 
>>> larger than zero - f.e. HZ.
>>
>> I was about to say I could reproduce it even with a timeout of 1ms,
>> then I realized that code I pasted above already used 1ms timeout.
>> Assertion failures using 1ms timeout seems much rarer than 0 timeout
>> however.
>>
>> For reference my CONFIG_HZ on the host is 1000. I also use
>> CONFIG_NO_HZ_IDLE if that's relevant (I'm not too familiar with how
>> the kernel ticking works).
>>
>>> If we have received a SIGIO there is an epoll event on the way. The 
>>> fact that it is not in the queue right now means that we are due to 
>>> process it shortly.
> 
> This seems to be limited to ttys. Why - I need to figure it out.
> 
> If this ends up as tty specific, we can enable the work-around for ttys 
> which was there when they were not producing sigio on write correctly.
> 
> This ends up disabled on most modern machines, because modern kernels 
> produce sigio on write correctly for ttys.
> 
> With the workaround enabled there is an extra IO event which is produced 
> after the notification appears on the poll loop in a helper thread. So 
> the stall should never happen.


I now have an idea why we see this on ttys.

TTY IO wake-up in addition to doing SIGIO before poll notifications, 
also does poll notifications using a wake-up which will reschedule.

Compared to that, let's say socket does a sync wake-up which does not 
reschedule and does it before SIGIO.

In either case, we stand a chance of missing an interrupt. Just in the 
second case it is extremely small. So small that I have never seen it in 
practice.

The real way of dealing with it will be to do to do a helper thread 
which (e)polls the epoll fd and generates a SIGIO if there is an 
outstanding EPOLL notification which has been missed. This would also 
take care of the range of conditions which are currently handled by the 
SIGIO fd helper so that would become surplus to requirements.

I think that just polling the epoll fd should do the job here. So this 
will also get rid of all the motions needed to register fds with the 
async helper.

Brgds,


> 
> A.
> 
>>>
>>> A.
>>
>> YiFei Zhu
>>
>> _______________________________________________
>> linux-um mailing list
>> linux-um@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-um
>>
> 


-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Race between SIGIO and epoll from SMP host
  2021-04-22  7:32       ` Anton Ivanov
@ 2021-04-22  7:50         ` Anton Ivanov
  2021-04-22  8:46         ` YiFei Zhu
  1 sibling, 0 replies; 10+ messages in thread
From: Anton Ivanov @ 2021-04-22  7:50 UTC (permalink / raw)
  To: YiFei Zhu; +Cc: linux-um

On 22/04/2021 08:32, Anton Ivanov wrote:
> On 21/04/2021 16:45, Anton Ivanov wrote:
>>
>>
>> On 21/04/2021 14:35, YiFei Zhu wrote:
>>> On Wed, Apr 21, 2021 at 7:32 AM Anton Ivanov
>>> <anton.ivanov@kot-begemot.co.uk> wrote:
>>>>> Considering that this is a race on the host, what would be the best
>>>>> way to fix this?
>>>>
>>>> Interesting one. I need to think.
>>>>
>>>> One option would be to wait for epoll events with a timeout which is 
>>>> larger than zero - f.e. HZ.
>>>
>>> I was about to say I could reproduce it even with a timeout of 1ms,
>>> then I realized that code I pasted above already used 1ms timeout.
>>> Assertion failures using 1ms timeout seems much rarer than 0 timeout
>>> however.
>>>
>>> For reference my CONFIG_HZ on the host is 1000. I also use
>>> CONFIG_NO_HZ_IDLE if that's relevant (I'm not too familiar with how
>>> the kernel ticking works).
>>>
>>>> If we have received a SIGIO there is an epoll event on the way. The 
>>>> fact that it is not in the queue right now means that we are due to 
>>>> process it shortly.
>>
>> This seems to be limited to ttys. Why - I need to figure it out.
>>
>> If this ends up as tty specific, we can enable the work-around for 
>> ttys which was there when they were not producing sigio on write 
>> correctly.
>>
>> This ends up disabled on most modern machines, because modern kernels 
>> produce sigio on write correctly for ttys.
>>
>> With the workaround enabled there is an extra IO event which is 
>> produced after the notification appears on the poll loop in a helper 
>> thread. So the stall should never happen.
> 
> 
> I now have an idea why we see this on ttys.
> 
> TTY IO wake-up in addition to doing SIGIO before poll notifications, 
> also does poll notifications using a wake-up which will reschedule.
> 
> Compared to that, let's say socket does a sync wake-up which does not 
> reschedule and does it before SIGIO.
> 
> In either case, we stand a chance of missing an interrupt. Just in the 
> second case it is extremely small. So small that I have never seen it in 
> practice.
> 
> The real way of dealing with it will be to do to do a helper thread 
> which (e)polls the epoll fd and generates a SIGIO if there is an 
> outstanding EPOLL notification which has been missed. This would also 
> take care of the range of conditions which are currently handled by the 
> SIGIO fd helper so that would become surplus to requirements.
> 
> I think that just polling the epoll fd should do the job here. So this 
> will also get rid of all the motions needed to register fds with the 
> async helper.

In fact, we can kill the registration of fds for SIGIO too. The helper 
does the same job, so why bother?

A

> 
> Brgds,
> 
> 
>>
>> A.
>>
>>>>
>>>> A.
>>>
>>> YiFei Zhu
>>>
>>> _______________________________________________
>>> linux-um mailing list
>>> linux-um@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-um
>>>
>>
> 
> 


-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Race between SIGIO and epoll from SMP host
  2021-04-22  7:32       ` Anton Ivanov
  2021-04-22  7:50         ` Anton Ivanov
@ 2021-04-22  8:46         ` YiFei Zhu
  2021-04-22 11:27           ` Anton Ivanov
  1 sibling, 1 reply; 10+ messages in thread
From: YiFei Zhu @ 2021-04-22  8:46 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: linux-um

On Thu, Apr 22, 2021 at 2:32 AM Anton Ivanov
<anton.ivanov@kot-begemot.co.uk> wrote:
> I now have an idea why we see this on ttys.
>
> TTY IO wake-up in addition to doing SIGIO before poll notifications,
> also does poll notifications using a wake-up which will reschedule.
>
> Compared to that, let's say socket does a sync wake-up which does not
> reschedule and does it before SIGIO.
>
> In either case, we stand a chance of missing an interrupt. Just in the
> second case it is extremely small. So small that I have never seen it in
> practice.
>
> The real way of dealing with it will be to do to do a helper thread
> which (e)polls the epoll fd and generates a SIGIO if there is an
> outstanding EPOLL notification which has been missed. This would also
> take care of the range of conditions which are currently handled by the
> SIGIO fd helper so that would become surplus to requirements.
>
> I think that just polling the epoll fd should do the job here. So this
> will also get rid of all the motions needed to register fds with the
> async helper.
>
> Brgds,

By "async helper" do you mean the sigio helper? Is what you are
suggesting to, in the sigio helper, use epoll instead of using poll,
and then send SIGIO to notify the kernel thread once epoll receives an
event? It sounds like a fix, although no idea how difficult it would
be to efficiently send 'which events have been epolled' back to the
kernel efficiently without running into races.

YiFei Zhu

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Race between SIGIO and epoll from SMP host
  2021-04-22  8:46         ` YiFei Zhu
@ 2021-04-22 11:27           ` Anton Ivanov
  0 siblings, 0 replies; 10+ messages in thread
From: Anton Ivanov @ 2021-04-22 11:27 UTC (permalink / raw)
  To: YiFei Zhu; +Cc: linux-um

On 22/04/2021 09:46, YiFei Zhu wrote:
> On Thu, Apr 22, 2021 at 2:32 AM Anton Ivanov
> <anton.ivanov@kot-begemot.co.uk> wrote:
>> I now have an idea why we see this on ttys.
>>
>> TTY IO wake-up in addition to doing SIGIO before poll notifications,
>> also does poll notifications using a wake-up which will reschedule.
>>
>> Compared to that, let's say socket does a sync wake-up which does not
>> reschedule and does it before SIGIO.
>>
>> In either case, we stand a chance of missing an interrupt. Just in the
>> second case it is extremely small. So small that I have never seen it in
>> practice.
>>
>> The real way of dealing with it will be to do to do a helper thread
>> which (e)polls the epoll fd and generates a SIGIO if there is an
>> outstanding EPOLL notification which has been missed. This would also
>> take care of the range of conditions which are currently handled by the
>> SIGIO fd helper so that would become surplus to requirements.
>>
>> I think that just polling the epoll fd should do the job here. So this
>> will also get rid of all the motions needed to register fds with the
>> async helper.
>>
>> Brgds,
> 
> By "async helper" do you mean the sigio helper? Is what you are
> suggesting to, in the sigio helper, use epoll instead of using poll,
> and then send SIGIO to notify the kernel thread once epoll receives an
> event? It sounds like a fix, although no idea how difficult it would
> be to efficiently send 'which events have been epolled' back to the
> kernel efficiently without running into races.

The kernel gets the same SIGIO for all devices. The differentiation 
which device received an event is performed by epoll. So this should be 
functionally equivalent to hitting the kernel with SIGIO out of a a 
helper which polls on the epoll fd. If the epoll fd is active 
kill(uml_pid, SIGIO);

This looks like a drop-in fix.

1. Pass the fd for epoll to the helper when the interrupt controller is 
initialized.

2. Remove the ASYNC registration for all FDs.

3. Remove the check for "SIGIO fix in tty".

I will try to get around to PoC this next week. It does not look 
particularly hard.

This will also reduce a lot of spurious interrupts. Some devices (random 
being the worst offender) produce SIGIO for events which are of no 
interest and concern to UML - there is no IO pending.

Will it be faster than enabling the fds for SIGIO - no idea. Need to 
test it.

A.

> 
> YiFei Zhu
> 
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
> 


-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-04-22 11:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-21 11:53 Race between SIGIO and epoll from SMP host YiFei Zhu
2021-04-21 12:32 ` Anton Ivanov
2021-04-21 12:45   ` Anton Ivanov
2021-04-21 13:35   ` YiFei Zhu
2021-04-21 14:21     ` Anton Ivanov
2021-04-21 15:45     ` Anton Ivanov
2021-04-22  7:32       ` Anton Ivanov
2021-04-22  7:50         ` Anton Ivanov
2021-04-22  8:46         ` YiFei Zhu
2021-04-22 11:27           ` Anton Ivanov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.