All of lore.kernel.org
 help / color / mirror / Atom feed
* [uml-devel] IRQ handler reentrancy
@ 2015-11-20 12:05 Anton Ivanov
  2015-11-20 12:16 ` Richard Weinberger
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Anton Ivanov @ 2015-11-20 12:05 UTC (permalink / raw)
  To: user-mode-linux-devel

I have gotten to the bottom of this.

1. The IRQ handler re-entrancy issue predates the timer patch. Adding a 
simple guard with a WARN_ON_ONCE around the device loop in the 
sig_io_handler catches it in plain 4.3

diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
index 23cb935..ac0bbce 100644
--- a/arch/um/kernel/irq.c
+++ b/arch/um/kernel/irq.c
@@ -30,12 +30,17 @@ static struct irq_fd **last_irq_ptr = &active_fds;

  extern void free_irqs(void);

+static int in_poll_handler = 0;
+
  void sigio_handler(int sig, struct siginfo *unused_si, struct 
uml_pt_regs *regs)
  {
         struct irq_fd *irq_fd;
         int n;

+    WARN_ON_ONCE(in_poll_handler == 1);
+
         while (1) {
+        in_poll_handler = 1;
                 n = os_waiting_for_events(active_fds);
                 if (n <= 0) {
                         if (n == -EINTR)
@@ -51,6 +56,7 @@ void sigio_handler(int sig, struct siginfo *unused_si, 
struct uml_pt_regs *regs)
                         }
                 }
         }
+    in_poll_handler = 0;

         free_irqs();
  }

This is dangerously broken - you can under heavy IO exhaust the stack, 
you can get packets out of order, etc. Most IO is reasonably atomic so 
corruption is not likely, but not impossible (especially if one or more 
drivers are optimized to use multi-read/multi-write).

2. I cannot catch what is wrong with the current code in signal.c. When 
I read it, it should not produce re-entrancy. But it does.

3. I found 2-3 minor issues with signal handling and the timer patch 
which I will submit a hot-fix for, including a proper fix for the 
hang-in-sleep issue.

4. While I can propose a brutal patch for signal.c which sets guards 
against reentrancy which works fine, I suggest we actually get to the 
bottom of this. Why the code in unblock_signals() does not guard 
correctly against that?

A.

------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-11-20 12:05 [uml-devel] IRQ handler reentrancy Anton Ivanov
@ 2015-11-20 12:16 ` Richard Weinberger
  2015-11-20 12:26   ` stian
  2015-11-20 12:45   ` Anton Ivanov
  2015-11-24 17:00 ` Anton Ivanov
  2015-12-10 22:40 ` Richard Weinberger
  2 siblings, 2 replies; 19+ messages in thread
From: Richard Weinberger @ 2015-11-20 12:16 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Al Viro, user-mode-linux-devel

On Fri, Nov 20, 2015 at 1:05 PM, Anton Ivanov
<anton.ivanov@kot-begemot.co.uk> wrote:
> I have gotten to the bottom of this.
>
> 1. The IRQ handler re-entrancy issue predates the timer patch. Adding a
> simple guard with a WARN_ON_ONCE around the device loop in the
> sig_io_handler catches it in plain 4.3
>
> diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
> index 23cb935..ac0bbce 100644
> --- a/arch/um/kernel/irq.c
> +++ b/arch/um/kernel/irq.c
> @@ -30,12 +30,17 @@ static struct irq_fd **last_irq_ptr = &active_fds;
>
>   extern void free_irqs(void);
>
> +static int in_poll_handler = 0;
> +
>   void sigio_handler(int sig, struct siginfo *unused_si, struct
> uml_pt_regs *regs)
>   {
>          struct irq_fd *irq_fd;
>          int n;
>
> +    WARN_ON_ONCE(in_poll_handler == 1);
> +
>          while (1) {
> +        in_poll_handler = 1;
>                  n = os_waiting_for_events(active_fds);
>                  if (n <= 0) {
>                          if (n == -EINTR)
> @@ -51,6 +56,7 @@ void sigio_handler(int sig, struct siginfo *unused_si,
> struct uml_pt_regs *regs)
>                          }
>                  }
>          }
> +    in_poll_handler = 0;
>
>          free_irqs();
>   }
>
> This is dangerously broken - you can under heavy IO exhaust the stack,
> you can get packets out of order, etc. Most IO is reasonably atomic so
> corruption is not likely, but not impossible (especially if one or more
> drivers are optimized to use multi-read/multi-write).
>
> 2. I cannot catch what is wrong with the current code in signal.c. When
> I read it, it should not produce re-entrancy. But it does.
>
> 3. I found 2-3 minor issues with signal handling and the timer patch
> which I will submit a hot-fix for, including a proper fix for the
> hang-in-sleep issue.
>
> 4. While I can propose a brutal patch for signal.c which sets guards
> against reentrancy which works fine, I suggest we actually get to the
> bottom of this. Why the code in unblock_signals() does not guard
> correctly against that?

Thanks for hunting this issue.
I fear I'll have to grab my speleologist's hat to figure out why UML
works this way.
Cc'ing Al, do you have an idea?

-- 
Thanks,
//richard

------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-11-20 12:16 ` Richard Weinberger
@ 2015-11-20 12:26   ` stian
  2015-11-20 12:50     ` Anton Ivanov
  2015-11-20 12:45   ` Anton Ivanov
  1 sibling, 1 reply; 19+ messages in thread
From: stian @ 2015-11-20 12:26 UTC (permalink / raw)
  To: user-mode-linux-devel

>> 4. While I can propose a brutal patch for signal.c which sets guards
>> against reentrancy which works fine, I suggest we actually get to 
>> the
>> bottom of this. Why the code in unblock_signals() does not guard
>> correctly against that?
>
> Thanks for hunting this issue.
> I fear I'll have to grab my speleologist's hat to figure out why UML
> works this way.
> Cc'ing Al, do you have an idea?

In the few stack-traces that I have seen posted here, I could see 
multiple calls to unlocking of signals (with a signal occurred directly 
after). That probably should not happen. Do we count the number of 
timers of time we try to block/unblock signals and only actual perform 
the action when the counter reaches/leaves 0?

if this series of calls happens:
  block()
   foo()
    block()
    bar()
    unblock()  <- this should be a no-op
   foobar()
  unblock() <- first here the signals should be unblocked again



Stian

------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-11-20 12:16 ` Richard Weinberger
  2015-11-20 12:26   ` stian
@ 2015-11-20 12:45   ` Anton Ivanov
  1 sibling, 0 replies; 19+ messages in thread
From: Anton Ivanov @ 2015-11-20 12:45 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: Al Viro, user-mode-linux-devel

It works correctly if I insert a guard around the interrupt handlers as 
well as into unblock_signals which prevents re-entrancy.

I can clean that and send it in as well as the various irq/signal 
erratas I have dug out while hunting this one.

A

On 20/11/15 12:16, Richard Weinberger wrote:
> On Fri, Nov 20, 2015 at 1:05 PM, Anton Ivanov
> <anton.ivanov@kot-begemot.co.uk> wrote:
>> I have gotten to the bottom of this.
>>
>> 1. The IRQ handler re-entrancy issue predates the timer patch. Adding a
>> simple guard with a WARN_ON_ONCE around the device loop in the
>> sig_io_handler catches it in plain 4.3
>>
>> diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
>> index 23cb935..ac0bbce 100644
>> --- a/arch/um/kernel/irq.c
>> +++ b/arch/um/kernel/irq.c
>> @@ -30,12 +30,17 @@ static struct irq_fd **last_irq_ptr = &active_fds;
>>
>>    extern void free_irqs(void);
>>
>> +static int in_poll_handler = 0;
>> +
>>    void sigio_handler(int sig, struct siginfo *unused_si, struct
>> uml_pt_regs *regs)
>>    {
>>           struct irq_fd *irq_fd;
>>           int n;
>>
>> +    WARN_ON_ONCE(in_poll_handler == 1);
>> +
>>           while (1) {
>> +        in_poll_handler = 1;
>>                   n = os_waiting_for_events(active_fds);
>>                   if (n <= 0) {
>>                           if (n == -EINTR)
>> @@ -51,6 +56,7 @@ void sigio_handler(int sig, struct siginfo *unused_si,
>> struct uml_pt_regs *regs)
>>                           }
>>                   }
>>           }
>> +    in_poll_handler = 0;
>>
>>           free_irqs();
>>    }
>>
>> This is dangerously broken - you can under heavy IO exhaust the stack,
>> you can get packets out of order, etc. Most IO is reasonably atomic so
>> corruption is not likely, but not impossible (especially if one or more
>> drivers are optimized to use multi-read/multi-write).
>>
>> 2. I cannot catch what is wrong with the current code in signal.c. When
>> I read it, it should not produce re-entrancy. But it does.
>>
>> 3. I found 2-3 minor issues with signal handling and the timer patch
>> which I will submit a hot-fix for, including a proper fix for the
>> hang-in-sleep issue.
>>
>> 4. While I can propose a brutal patch for signal.c which sets guards
>> against reentrancy which works fine, I suggest we actually get to the
>> bottom of this. Why the code in unblock_signals() does not guard
>> correctly against that?
> Thanks for hunting this issue.
> I fear I'll have to grab my speleologist's hat to figure out why UML
> works this way.
> Cc'ing Al, do you have an idea?
>


------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-11-20 12:26   ` stian
@ 2015-11-20 12:50     ` Anton Ivanov
  2015-11-20 13:48       ` stian
  0 siblings, 1 reply; 19+ messages in thread
From: Anton Ivanov @ 2015-11-20 12:50 UTC (permalink / raw)
  To: user-mode-linux-devel

On 20/11/15 12:26, stian@nixia.no wrote:
>>> 4. While I can propose a brutal patch for signal.c which sets guards
>>> against reentrancy which works fine, I suggest we actually get to
>>> the
>>> bottom of this. Why the code in unblock_signals() does not guard
>>> correctly against that?
>> Thanks for hunting this issue.
>> I fear I'll have to grab my speleologist's hat to figure out why UML
>> works this way.
>> Cc'ing Al, do you have an idea?
> In the few stack-traces that I have seen posted here, I could see
> multiple calls to unlocking of signals (with a signal occurred directly
> after). That probably should not happen. Do we count the number of
> timers of time we try to block/unblock signals and only actual perform
> the action when the counter reaches/leaves 0?
>
> if this series of calls happens:
>    block()
>     foo()
>      block()
>      bar()
>      unblock()  <- this should be a no-op
>     foobar()
>    unblock() <- first here the signals should be unblocked again

Block/unblock are not counting the number of enable/disable at present. 
It is either on or off.

Any unblock will immediately re-trigger all pending interrupts.

Some of the errata patches I have out of investigating this do exactly 
that - change:

block to flags = set_signals(0); bar() ; set_signal(flags);

This, if nested should  be a NOP.

However, even after fixing all of them (and their corresponding kernel 
side counterparts), I still get reentrancy, so there is something else 
at play too.

In any case, the errata should be fixed, I will sort it out, organize it 
into a patch set and send it out by Monday.

A.

>
>
>
> Stian
>
> ------------------------------------------------------------------------------
> _______________________________________________
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>


------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-11-20 12:50     ` Anton Ivanov
@ 2015-11-20 13:48       ` stian
  2015-11-20 14:08         ` Anton Ivanov
  0 siblings, 1 reply; 19+ messages in thread
From: stian @ 2015-11-20 13:48 UTC (permalink / raw)
  To: user-mode-linux-devel

Den 2015-11-20 13:50, skrev Anton Ivanov:
> On 20/11/15 12:26, stian@nixia.no wrote:
>>>> 4. While I can propose a brutal patch for signal.c which sets 
>>>> guards
>>>> against reentrancy which works fine, I suggest we actually get to
>>>> the
>>>> bottom of this. Why the code in unblock_signals() does not guard
>>>> correctly against that?
>>> Thanks for hunting this issue.
>>> I fear I'll have to grab my speleologist's hat to figure out why 
>>> UML
>>> works this way.
>>> Cc'ing Al, do you have an idea?
>> In the few stack-traces that I have seen posted here, I could see
>> multiple calls to unlocking of signals (with a signal occurred 
>> directly
>> after). That probably should not happen. Do we count the number of
>> timers of time we try to block/unblock signals and only actual 
>> perform
>> the action when the counter reaches/leaves 0?
>>
>> if this series of calls happens:
>>    block()
>>     foo()
>>      block()
>>      bar()
>>      unblock()  <- this should be a no-op
>>     foobar()
>>    unblock() <- first here the signals should be unblocked again
>
> Block/unblock are not counting the number of enable/disable at 
> present.
> It is either on or off.
>
> Any unblock will immediately re-trigger all pending interrupts.
>
> Some of the errata patches I have out of investigating this do 
> exactly
> that - change:
>
> block to flags = set_signals(0); bar() ; set_signal(flags);
>
> This, if nested should  be a NOP.
>
> However, even after fixing all of them (and their corresponding 
> kernel
> side counterparts), I still get reentrancy, so there is something 
> else
> at play too.

Please, share a stack-trace if possible.



As a side-note:
The small issue with the code example above I can see is that what if 
flags should have change during bar(). And code inside bar can do 
set_signals() magic.

I am not linux kernel ABI expert.

To me, it seems to be a more safe to have a ABI that tracks each signal 
blocked mask individually, and have a ref-counted block-all/unblock-all 
call. This would be like how you normally program on a CPU. You have a 
interrupt controller that you setup (masks), and a master interrupt 
enable/disable flag.



--

Stian

------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-11-20 13:48       ` stian
@ 2015-11-20 14:08         ` Anton Ivanov
  2015-11-20 15:21           ` Thomas Meyer
  0 siblings, 1 reply; 19+ messages in thread
From: Anton Ivanov @ 2015-11-20 14:08 UTC (permalink / raw)
  To: user-mode-linux-devel

On 20/11/15 13:48, stian@nixia.no wrote:
> Den 2015-11-20 13:50, skrev Anton Ivanov:
>> On 20/11/15 12:26, stian@nixia.no wrote:
>>>>> 4. While I can propose a brutal patch for signal.c which sets
>>>>> guards
>>>>> against reentrancy which works fine, I suggest we actually get to
>>>>> the
>>>>> bottom of this. Why the code in unblock_signals() does not guard
>>>>> correctly against that?
>>>> Thanks for hunting this issue.
>>>> I fear I'll have to grab my speleologist's hat to figure out why
>>>> UML
>>>> works this way.
>>>> Cc'ing Al, do you have an idea?
>>> In the few stack-traces that I have seen posted here, I could see
>>> multiple calls to unlocking of signals (with a signal occurred
>>> directly
>>> after). That probably should not happen. Do we count the number of
>>> timers of time we try to block/unblock signals and only actual
>>> perform
>>> the action when the counter reaches/leaves 0?
>>>
>>> if this series of calls happens:
>>>     block()
>>>      foo()
>>>       block()
>>>       bar()
>>>       unblock()  <- this should be a no-op
>>>      foobar()
>>>     unblock() <- first here the signals should be unblocked again
>> Block/unblock are not counting the number of enable/disable at
>> present.
>> It is either on or off.
>>
>> Any unblock will immediately re-trigger all pending interrupts.
>>
>> Some of the errata patches I have out of investigating this do
>> exactly
>> that - change:
>>
>> block to flags = set_signals(0); bar() ; set_signal(flags);
>>
>> This, if nested should  be a NOP.
>>
>> However, even after fixing all of them (and their corresponding
>> kernel
>> side counterparts), I still get reentrancy, so there is something
>> else
>> at play too.
> Please, share a stack-trace if possible.
>
>
>
> As a side-note:
> The small issue with the code example above I can see is that what if
> flags should have change during bar().

I see it too, but I have not figured out how to deal with it.


>   And code inside bar can do
> set_signals() magic.

Correct, which is to some extent our issue.

>
> I am not linux kernel ABI expert.
>
> To me, it seems to be a more safe to have a ABI that tracks each signal
> blocked mask individually, and have a ref-counted block-all/unblock-all
> call. This would be like how you normally program on a CPU. You have a
> interrupt controller that you setup (masks), and a master interrupt
> enable/disable flag.

That is what signal.c is trying to simulate - you have a mask for ALRM 
(or VTALRM with the older timers) and SIGIO and a global on/off.

What that fails to emulate, however, is that an IRQ is usually blocked 
until it is fully serviced. This, depending on IRQ controller design may 
block all IRQs, all lower priority IRQs or none.

The current code in uml tries to block all while processing an IRQ, but 
for some reason fails.

I will submit a patch to put some ducktape over this for the time being, 
we should understand what is the root cause.

A.



>
>
>
> --
>
> Stian
>
> ------------------------------------------------------------------------------
> _______________________________________________
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>


------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-11-20 14:08         ` Anton Ivanov
@ 2015-11-20 15:21           ` Thomas Meyer
  2015-11-20 16:22             ` Anton Ivanov
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Meyer @ 2015-11-20 15:21 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: user-mode-linux-devel

Am 20.11.2015 3:08 nachm. schrieb Anton Ivanov <anton.ivanov@kot-begemot.co.uk>:
>
> On 20/11/15 13:48, stian@nixia.no wrote: 
> > Den 2015-11-20 13:50, skrev Anton Ivanov: 
> >> On 20/11/15 12:26, stian@nixia.no wrote: 
> >>>>> 4. While I can propose a brutal patch for signal.c which sets 
> >>>>> guards 
> >>>>> against reentrancy which works fine, I suggest we actually get to 
> >>>>> the 
> >>>>> bottom of this. Why the code in unblock_signals() does not guard 
> >>>>> correctly against that? 
> >>>> Thanks for hunting this issue. 
> >>>> I fear I'll have to grab my speleologist's hat to figure out why 
> >>>> UML 
> >>>> works this way. 
> >>>> Cc'ing Al, do you have an idea? 
> >>> In the few stack-traces that I have seen posted here, I could see 
> >>> multiple calls to unlocking of signals (with a signal occurred 
> >>> directly 
> >>> after). That probably should not happen. Do we count the number of 
> >>> timers of time we try to block/unblock signals and only actual 
> >>> perform 
> >>> the action when the counter reaches/leaves 0? 
> >>> 
> >>> if this series of calls happens: 
> >>>     block() 
> >>>      foo() 
> >>>       block() 
> >>>       bar() 
> >>>       unblock()  <- this should be a no-op 
> >>>      foobar() 
> >>>     unblock() <- first here the signals should be unblocked again 
> >> Block/unblock are not counting the number of enable/disable at 
> >> present. 
> >> It is either on or off. 
> >> 
> >> Any unblock will immediately re-trigger all pending interrupts. 
> >> 
> >> Some of the errata patches I have out of investigating this do 
> >> exactly 
> >> that - change: 
> >> 
> >> block to flags = set_signals(0); bar() ; set_signal(flags); 
> >> 
> >> This, if nested should  be a NOP. 
> >> 
> >> However, even after fixing all of them (and their corresponding 
> >> kernel 
> >> side counterparts), I still get reentrancy, so there is something 
> >> else 
> >> at play too. 
> > Please, share a stack-trace if possible. 
> > 
> > 
> > 
> > As a side-note: 
> > The small issue with the code example above I can see is that what if 
> > flags should have change during bar(). 
>
> I see it too, but I have not figured out how to deal with it. 
>
>
> >   And code inside bar can do 
> > set_signals() magic. 
>
> Correct, which is to some extent our issue. 
>
> > 
> > I am not linux kernel ABI expert. 
> > 
> > To me, it seems to be a more safe to have a ABI that tracks each signal 
> > blocked mask individually, and have a ref-counted block-all/unblock-all 
> > call. This would be like how you normally program on a CPU. You have a 
> > interrupt controller that you setup (masks), and a master interrupt 
> > enable/disable flag. 
>
> That is what signal.c is trying to simulate - you have a mask for ALRM 
> (or VTALRM with the older timers) and SIGIO and a global on/off. 
>
> What that fails to emulate, however, is that an IRQ is usually blocked 
> until it is fully serviced. This, depending on IRQ controller design may 
> block all IRQs, all lower priority IRQs or none. 
>
> The current code in uml tries to block all while processing an IRQ, but 
> for some reason fails. 
>
> I will submit a patch to put some ducktape over this for the time being, 
> we should understand what is the root cause. 

I hope the change in arch/um/kernel/skas/MMU.c isn't the cause of all this trouble!

I wanted to disable interrupt processing and so the forwarding of timer interrupts to the user space process when the user space is currently in a critical section of forking itself, and no signal handler is installed yet!

>
> A. 
>
> > 
> > Stian 

------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-11-20 15:21           ` Thomas Meyer
@ 2015-11-20 16:22             ` Anton Ivanov
  2015-11-20 16:43               ` Anton Ivanov
  0 siblings, 1 reply; 19+ messages in thread
From: Anton Ivanov @ 2015-11-20 16:22 UTC (permalink / raw)
  To: Thomas Meyer; +Cc: user-mode-linux-devel

On 20/11/15 15:21, Thomas Meyer wrote:
> Am 20.11.2015 3:08 nachm. schrieb Anton Ivanov <anton.ivanov@kot-begemot.co.uk>:
>> On 20/11/15 13:48, stian@nixia.no wrote:
>>> Den 2015-11-20 13:50, skrev Anton Ivanov:
>>>> On 20/11/15 12:26, stian@nixia.no wrote:
>>>>>>> 4. While I can propose a brutal patch for signal.c which sets
>>>>>>> guards
>>>>>>> against reentrancy which works fine, I suggest we actually get to
>>>>>>> the
>>>>>>> bottom of this. Why the code in unblock_signals() does not guard
>>>>>>> correctly against that?
>>>>>> Thanks for hunting this issue.
>>>>>> I fear I'll have to grab my speleologist's hat to figure out why
>>>>>> UML
>>>>>> works this way.
>>>>>> Cc'ing Al, do you have an idea?
>>>>> In the few stack-traces that I have seen posted here, I could see
>>>>> multiple calls to unlocking of signals (with a signal occurred
>>>>> directly
>>>>> after). That probably should not happen. Do we count the number of
>>>>> timers of time we try to block/unblock signals and only actual
>>>>> perform
>>>>> the action when the counter reaches/leaves 0?
>>>>>
>>>>> if this series of calls happens:
>>>>>       block()
>>>>>        foo()
>>>>>         block()
>>>>>         bar()
>>>>>         unblock()  <- this should be a no-op
>>>>>        foobar()
>>>>>       unblock() <- first here the signals should be unblocked again
>>>> Block/unblock are not counting the number of enable/disable at
>>>> present.
>>>> It is either on or off.
>>>>
>>>> Any unblock will immediately re-trigger all pending interrupts.
>>>>
>>>> Some of the errata patches I have out of investigating this do
>>>> exactly
>>>> that - change:
>>>>
>>>> block to flags = set_signals(0); bar() ; set_signal(flags);
>>>>
>>>> This, if nested should  be a NOP.
>>>>
>>>> However, even after fixing all of them (and their corresponding
>>>> kernel
>>>> side counterparts), I still get reentrancy, so there is something
>>>> else
>>>> at play too.
>>> Please, share a stack-trace if possible.
>>>
>>>
>>>
>>> As a side-note:
>>> The small issue with the code example above I can see is that what if
>>> flags should have change during bar().
>> I see it too, but I have not figured out how to deal with it.
>>
>>
>>>     And code inside bar can do
>>> set_signals() magic.
>> Correct, which is to some extent our issue.
>>
>>> I am not linux kernel ABI expert.
>>>
>>> To me, it seems to be a more safe to have a ABI that tracks each signal
>>> blocked mask individually, and have a ref-counted block-all/unblock-all
>>> call. This would be like how you normally program on a CPU. You have a
>>> interrupt controller that you setup (masks), and a master interrupt
>>> enable/disable flag.
>> That is what signal.c is trying to simulate - you have a mask for ALRM
>> (or VTALRM with the older timers) and SIGIO and a global on/off.
>>
>> What that fails to emulate, however, is that an IRQ is usually blocked
>> until it is fully serviced. This, depending on IRQ controller design may
>> block all IRQs, all lower priority IRQs or none.
>>
>> The current code in uml tries to block all while processing an IRQ, but
>> for some reason fails.
>>
>> I will submit a patch to put some ducktape over this for the time being,
>> we should understand what is the root cause.
> I hope the change in arch/um/kernel/skas/MMU.c isn't the cause of all this trouble!

No it is not, but it is one that needs refinement. Fix coming up in 10 
minutes or so, I was just cleaning up the patches for submission.

The issue is reentrancy of the core interrupt logic which the code in 
unblock_signals is supposed to prevent, but it fails (for some reason) 
to do so.

A.

>
> I wanted to disable interrupt processing and so the forwarding of timer interrupts to the user space process when the user space is currently in a critical section of forking itself, and no signal handler is installed yet!
>
>> A.
>>
>>> Stian


------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-11-20 16:22             ` Anton Ivanov
@ 2015-11-20 16:43               ` Anton Ivanov
  0 siblings, 0 replies; 19+ messages in thread
From: Anton Ivanov @ 2015-11-20 16:43 UTC (permalink / raw)
  To: user-mode-linux-devel


[-- Attachment #1.1: Type: text/plain, Size: 993 bytes --]

[snip]

>> I hope the change in arch/um/kernel/skas/MMU.c isn't the cause of all this trouble!
> No it is not, but it is one that needs refinement. Fix coming up in 10
> minutes or so, I was just cleaning up the patches for submission.
>
> The issue is reentrancy of the core interrupt logic which the code in
> unblock_signals is supposed to prevent, but it fails (for some reason)
> to do so.

Some of it looks like a heisenbug from debugging by the way.

If you look in softirq.c you see:

#ifdef CONFIG_TRACE_IRQFLAGS
local_irq_enable <http://lxr.free-electrons.com/ident?i=local_irq_enable>()
#endif

That in UML translates to unblock_signals(). That in turn will 
re-execute hardware interrupts out of a software IRQ bottom half context.

There is still an inherent reentrancy issue, but I suspect that it is 
less with the debugging off.

Patchset coming up in next email. It throws bh warnings if you have 
CONFIG_TRACE_IRQFLAGS, but that is not something I can fix in first pass.

A.

[-- Attachment #1.2: Type: text/html, Size: 1588 bytes --]

[-- Attachment #2: Type: text/plain, Size: 79 bytes --]

------------------------------------------------------------------------------

[-- Attachment #3: Type: text/plain, Size: 194 bytes --]

_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-11-20 12:05 [uml-devel] IRQ handler reentrancy Anton Ivanov
  2015-11-20 12:16 ` Richard Weinberger
@ 2015-11-24 17:00 ` Anton Ivanov
  2015-12-10 22:40 ` Richard Weinberger
  2 siblings, 0 replies; 19+ messages in thread
From: Anton Ivanov @ 2015-11-24 17:00 UTC (permalink / raw)
  To: user-mode-linux-devel; +Cc: Richard Weinberger

I got to the bottom of this now:

1. This has been around for a long time.

2. It was not showing up because:

     2.1. The deactivate_fd/reactivate_fd in the poll based IRQ 
controller double up as per-fd reentrancy guard for SIGIO. So while 
SIGIO handler was reentrant (potentially bad), the actual device reading 
and writing were not. This is why this was not blowing up regularly so 
far. Even if there are potential races in modifying the poll structures, 
etc, they are very difficult to trigger.

     2.2. The VTALRM based timer system produced timer interrupts only 
when UML was running or out of idle. This ensured that there is only one 
timer interrupt in flight at any given time.

The new timer subsystem + me trying to move to EPOLL brought the 
gremlins out of the woodwork (which is good, we know that they are there 
now).

I am going to reissue the timer errata + IRQ guard setting a guard only 
around the timer. The per-fd disable/reenable behaviour provides a 
sufficient guard for SIGIO so while you can crudely guard for both (as I 
have in the first version of the reentrancy patch), you do not need to 
do that. A guard for the timer is sufficient.

I now have a fixed version for the EPOLL patch which replicates the 
per-fd old reentrancy guard behavior and has a timer guard. That has 
killed all WARN() and BUG() in my testing so far.

I am going to give all of these some testing for 1-2 days and if I am 
happy with the results resubmit the errata patchset and the revised 
EPOLL IRQ controller patch (as an incremental on top of the errata).

The Epoll controller with the fixes still manages ~ 10%+ better 
performance than the old POLL based one and it will also allow further 
optimizations later on.

A.


On 20/11/15 12:05, Anton Ivanov wrote:
> I have gotten to the bottom of this.
>
> 1. The IRQ handler re-entrancy issue predates the timer patch. Adding a
> simple guard with a WARN_ON_ONCE around the device loop in the
> sig_io_handler catches it in plain 4.3
>
> diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
> index 23cb935..ac0bbce 100644
> --- a/arch/um/kernel/irq.c
> +++ b/arch/um/kernel/irq.c
> @@ -30,12 +30,17 @@ static struct irq_fd **last_irq_ptr = &active_fds;
>
>    extern void free_irqs(void);
>
> +static int in_poll_handler = 0;
> +
>    void sigio_handler(int sig, struct siginfo *unused_si, struct
> uml_pt_regs *regs)
>    {
>           struct irq_fd *irq_fd;
>           int n;
>
> +    WARN_ON_ONCE(in_poll_handler == 1);
> +
>           while (1) {
> +        in_poll_handler = 1;
>                   n = os_waiting_for_events(active_fds);
>                   if (n <= 0) {
>                           if (n == -EINTR)
> @@ -51,6 +56,7 @@ void sigio_handler(int sig, struct siginfo *unused_si,
> struct uml_pt_regs *regs)
>                           }
>                   }
>           }
> +    in_poll_handler = 0;
>
>           free_irqs();
>    }
>
> This is dangerously broken - you can under heavy IO exhaust the stack,
> you can get packets out of order, etc. Most IO is reasonably atomic so
> corruption is not likely, but not impossible (especially if one or more
> drivers are optimized to use multi-read/multi-write).
>
> 2. I cannot catch what is wrong with the current code in signal.c. When
> I read it, it should not produce re-entrancy. But it does.
>
> 3. I found 2-3 minor issues with signal handling and the timer patch
> which I will submit a hot-fix for, including a proper fix for the
> hang-in-sleep issue.
>
> 4. While I can propose a brutal patch for signal.c which sets guards
> against reentrancy which works fine, I suggest we actually get to the
> bottom of this. Why the code in unblock_signals() does not guard
> correctly against that?
>
> A.
>
> ------------------------------------------------------------------------------
> _______________________________________________
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>


------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-11-20 12:05 [uml-devel] IRQ handler reentrancy Anton Ivanov
  2015-11-20 12:16 ` Richard Weinberger
  2015-11-24 17:00 ` Anton Ivanov
@ 2015-12-10 22:40 ` Richard Weinberger
  2015-12-11  6:58   ` Anton Ivanov
  2 siblings, 1 reply; 19+ messages in thread
From: Richard Weinberger @ 2015-12-10 22:40 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: user-mode-linux-devel

On Fri, Nov 20, 2015 at 1:05 PM, Anton Ivanov
<anton.ivanov@kot-begemot.co.uk> wrote:
> I have gotten to the bottom of this.
>
> 1. The IRQ handler re-entrancy issue predates the timer patch. Adding a
> simple guard with a WARN_ON_ONCE around the device loop in the
> sig_io_handler catches it in plain 4.3
>
> diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
> index 23cb935..ac0bbce 100644
> --- a/arch/um/kernel/irq.c
> +++ b/arch/um/kernel/irq.c
> @@ -30,12 +30,17 @@ static struct irq_fd **last_irq_ptr = &active_fds;
>
>   extern void free_irqs(void);
>
> +static int in_poll_handler = 0;
> +
>   void sigio_handler(int sig, struct siginfo *unused_si, struct
> uml_pt_regs *regs)
>   {
>          struct irq_fd *irq_fd;
>          int n;
>
> +    WARN_ON_ONCE(in_poll_handler == 1);
> +
>          while (1) {
> +        in_poll_handler = 1;
>                  n = os_waiting_for_events(active_fds);
>                  if (n <= 0) {
>                          if (n == -EINTR)
> @@ -51,6 +56,7 @@ void sigio_handler(int sig, struct siginfo *unused_si,
> struct uml_pt_regs *regs)
>                          }
>                  }
>          }
> +    in_poll_handler = 0;
>
>          free_irqs();
>   }
>
> This is dangerously broken - you can under heavy IO exhaust the stack,
> you can get packets out of order, etc. Most IO is reasonably atomic so
> corruption is not likely, but not impossible (especially if one or more
> drivers are optimized to use multi-read/multi-write).
>
> 2. I cannot catch what is wrong with the current code in signal.c. When
> I read it, it should not produce re-entrancy. But it does.

Sorry for the delay. Until now I did not find the time to dig into that.
Did you find the offending code in signal.c?
I'm also winding my head how to fix this properly (and to verify
whether your patches are correct).
This UML code is very very old and a dark corner.

-- 
Thanks,
//richard

------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-12-10 22:40 ` Richard Weinberger
@ 2015-12-11  6:58   ` Anton Ivanov
  2015-12-11  8:16     ` Richard Weinberger
  0 siblings, 1 reply; 19+ messages in thread
From: Anton Ivanov @ 2015-12-11  6:58 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: user-mode-linux-devel

On 10/12/15 22:40, Richard Weinberger wrote:
> On Fri, Nov 20, 2015 at 1:05 PM, Anton Ivanov
> <anton.ivanov@kot-begemot.co.uk> wrote:
>> I have gotten to the bottom of this.
>>
>> 1. The IRQ handler re-entrancy issue predates the timer patch. Adding a
>> simple guard with a WARN_ON_ONCE around the device loop in the
>> sig_io_handler catches it in plain 4.3
>>
>> diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
>> index 23cb935..ac0bbce 100644
>> --- a/arch/um/kernel/irq.c
>> +++ b/arch/um/kernel/irq.c
>> @@ -30,12 +30,17 @@ static struct irq_fd **last_irq_ptr = &active_fds;
>>
>>   extern void free_irqs(void);
>>
>> +static int in_poll_handler = 0;
>> +
>>   void sigio_handler(int sig, struct siginfo *unused_si, struct
>> uml_pt_regs *regs)
>>   {
>>          struct irq_fd *irq_fd;
>>          int n;
>>
>> +    WARN_ON_ONCE(in_poll_handler == 1);
>> +
>>          while (1) {
>> +        in_poll_handler = 1;
>>                  n = os_waiting_for_events(active_fds);
>>                  if (n <= 0) {
>>                          if (n == -EINTR)
>> @@ -51,6 +56,7 @@ void sigio_handler(int sig, struct siginfo *unused_si,
>> struct uml_pt_regs *regs)
>>                          }
>>                  }
>>          }
>> +    in_poll_handler = 0;
>>
>>          free_irqs();
>>   }
>>
>> This is dangerously broken - you can under heavy IO exhaust the stack,
>> you can get packets out of order, etc. Most IO is reasonably atomic so
>> corruption is not likely, but not impossible (especially if one or more
>> drivers are optimized to use multi-read/multi-write).
>>
>> 2. I cannot catch what is wrong with the current code in signal.c. When
>> I read it, it should not produce re-entrancy. But it does.
> Sorry for the delay. Until now I did not find the time to dig into that.
> Did you find the offending code in signal.c?

Yes.

Unblock signals is logically incorrect - it will re-trigger an
interrupts even if there is an interrupt in flight whose processing has
not been finished.

I tried several approaches both with the original poll() controller and
with my epoll() based version, some show promise.

I had to put it aside until next Friday as I have some stuff due at work
so I cannot spare time to work on it until then. Once I get that out of
the way I should be able to spare it a day or two which should be enough
to finish it.

Ditto for the UBD improvements.

A.

> I'm also winding my head how to fix this properly (and to verify
> whether your patches are correct).
> This UML code is very very old and a dark corner.
>


------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-12-11  6:58   ` Anton Ivanov
@ 2015-12-11  8:16     ` Richard Weinberger
  2015-12-11 11:24       ` Anton Ivanov
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Weinberger @ 2015-12-11  8:16 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: user-mode-linux-devel

Am 11.12.2015 um 07:58 schrieb Anton Ivanov:
>>> 2. I cannot catch what is wrong with the current code in signal.c. When
>>> I read it, it should not produce re-entrancy. But it does.
>> Sorry for the delay. Until now I did not find the time to dig into that.
>> Did you find the offending code in signal.c?
> 
> Yes.
> 
> Unblock signals is logically incorrect - it will re-trigger an
> interrupts even if there is an interrupt in flight whose processing has
> not been finished.
> 
> I tried several approaches both with the original poll() controller and
> with my epoll() based version, some show promise.
> 
> I had to put it aside until next Friday as I have some stuff due at work
> so I cannot spare time to work on it until then. Once I get that out of
> the way I should be able to spare it a day or two which should be enough
> to finish it.
> 
> Ditto for the UBD improvements.

One thing we have to consider is that's legit to have SIGIO nested.
I'm currently investigating whether we use do_IRQ() correctly.

Thanks,
//richard

------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-12-11  8:16     ` Richard Weinberger
@ 2015-12-11 11:24       ` Anton Ivanov
  2015-12-11 18:38         ` Richard Weinberger
  0 siblings, 1 reply; 19+ messages in thread
From: Anton Ivanov @ 2015-12-11 11:24 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: user-mode-linux-devel

On 11/12/15 08:16, Richard Weinberger wrote:
> Am 11.12.2015 um 07:58 schrieb Anton Ivanov:
>>>> 2. I cannot catch what is wrong with the current code in signal.c. When
>>>> I read it, it should not produce re-entrancy. But it does.
>>> Sorry for the delay. Until now I did not find the time to dig into that.
>>> Did you find the offending code in signal.c?
>> Yes.
>>
>> Unblock signals is logically incorrect - it will re-trigger an
>> interrupts even if there is an interrupt in flight whose processing has
>> not been finished.
>>
>> I tried several approaches both with the original poll() controller and
>> with my epoll() based version, some show promise.
>>
>> I had to put it aside until next Friday as I have some stuff due at work
>> so I cannot spare time to work on it until then. Once I get that out of
>> the way I should be able to spare it a day or two which should be enough
>> to finish it.
>>
>> Ditto for the UBD improvements.
> One thing we have to consider is that's legit to have SIGIO nested.

Correct. That is considered :)

Both when looking at poll() and epoll()

However, it is not legit to have sigio on a specific fd nested. That is 
mostly safe for the poll() version, but will need to be accounted for in 
any surgery on the irq controller.

A.

> I'm currently investigating whether we use do_IRQ() correctly.
>
> Thanks,
> //richard
>


------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-12-11 11:24       ` Anton Ivanov
@ 2015-12-11 18:38         ` Richard Weinberger
  2015-12-11 19:12           ` Anton Ivanov
  2015-12-21 11:55           ` Anton Ivanov
  0 siblings, 2 replies; 19+ messages in thread
From: Richard Weinberger @ 2015-12-11 18:38 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: user-mode-linux-devel

Am 11.12.2015 um 12:24 schrieb Anton Ivanov:
> On 11/12/15 08:16, Richard Weinberger wrote:
>> Am 11.12.2015 um 07:58 schrieb Anton Ivanov:
>>>>> 2. I cannot catch what is wrong with the current code in signal.c. When
>>>>> I read it, it should not produce re-entrancy. But it does.
>>>> Sorry for the delay. Until now I did not find the time to dig into that.
>>>> Did you find the offending code in signal.c?
>>> Yes.
>>>
>>> Unblock signals is logically incorrect - it will re-trigger an
>>> interrupts even if there is an interrupt in flight whose processing has
>>> not been finished.
>>>
>>> I tried several approaches both with the original poll() controller and
>>> with my epoll() based version, some show promise.
>>>
>>> I had to put it aside until next Friday as I have some stuff due at work
>>> so I cannot spare time to work on it until then. Once I get that out of
>>> the way I should be able to spare it a day or two which should be enough
>>> to finish it.
>>>
>>> Ditto for the UBD improvements.
>> One thing we have to consider is that's legit to have SIGIO nested.
> 
> Correct. That is considered :)
> 
> Both when looking at poll() and epoll()
> 
> However, it is not legit to have sigio on a specific fd nested. That is mostly safe for the poll() version, but will need to be accounted for in any surgery on the irq controller.

So, the current code is fine unless you switch to epoll()?
Is it because you use epoll() in edge-triggered mode?

Thanks,
//richard

------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-12-11 18:38         ` Richard Weinberger
@ 2015-12-11 19:12           ` Anton Ivanov
  2015-12-21 11:55           ` Anton Ivanov
  1 sibling, 0 replies; 19+ messages in thread
From: Anton Ivanov @ 2015-12-11 19:12 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: user-mode-linux-devel

On 11/12/15 18:38, Richard Weinberger wrote:
> Am 11.12.2015 um 12:24 schrieb Anton Ivanov:
>> On 11/12/15 08:16, Richard Weinberger wrote:
>>> Am 11.12.2015 um 07:58 schrieb Anton Ivanov:
>>>>>> 2. I cannot catch what is wrong with the current code in signal.c. When
>>>>>> I read it, it should not produce re-entrancy. But it does.
>>>>> Sorry for the delay. Until now I did not find the time to dig into that.
>>>>> Did you find the offending code in signal.c?
>>>> Yes.
>>>>
>>>> Unblock signals is logically incorrect - it will re-trigger an
>>>> interrupts even if there is an interrupt in flight whose processing has
>>>> not been finished.
>>>>
>>>> I tried several approaches both with the original poll() controller and
>>>> with my epoll() based version, some show promise.
>>>>
>>>> I had to put it aside until next Friday as I have some stuff due at work
>>>> so I cannot spare time to work on it until then. Once I get that out of
>>>> the way I should be able to spare it a day or two which should be enough
>>>> to finish it.
>>>>
>>>> Ditto for the UBD improvements.
>>> One thing we have to consider is that's legit to have SIGIO nested.
>> Correct. That is considered :)
>>
>> Both when looking at poll() and epoll()
>>
>> However, it is not legit to have sigio on a specific fd nested. That is mostly safe for the poll() version, but will need to be accounted for in any surgery on the irq controller.
> So, the current code is fine unless you switch to epoll()?

Correct. It looks OK, though I have not looked at it in depth to make 
sure there is no race somewhere.

> Is it because you use epoll() in edge-triggered mode?

For now it is level which means it should be identical to poll().

I want to be able (long term) to use either - have the user set what 
they want for a particular IRQ. There are a couple of use cases like 
packet mmap and can-write IRQs where edge allows for better code.

I have a working POC which sets per - fd flag to avoid reentrancy on a 
per-fd basis so it behaves identically to the poll one - similar to the 
"reactivate fd" semantics.

What I see with it however is some weird stalls on ubd and I am not 100% 
sure that they are epoll specific, it just tends to make them easily 
reproducible as it is faster.

I will pick it Monday-Tuesday the week before Xmas after I get some work 
stuff out of the way.

A.

>
> Thanks,
> //richard
>


------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-12-11 18:38         ` Richard Weinberger
  2015-12-11 19:12           ` Anton Ivanov
@ 2015-12-21 11:55           ` Anton Ivanov
  2016-01-10 15:53             ` Richard Weinberger
  1 sibling, 1 reply; 19+ messages in thread
From: Anton Ivanov @ 2015-12-21 11:55 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: user-mode-linux-devel

Patch in your queue, current code looks OK.

I will try to assemble the full ubd acceleration sequence of patches 
this afternoon to submit that as well.

A.

On 11/12/15 18:38, Richard Weinberger wrote:
> Am 11.12.2015 um 12:24 schrieb Anton Ivanov:
>> On 11/12/15 08:16, Richard Weinberger wrote:
>>> Am 11.12.2015 um 07:58 schrieb Anton Ivanov:
>>>>>> 2. I cannot catch what is wrong with the current code in signal.c. When
>>>>>> I read it, it should not produce re-entrancy. But it does.
>>>>> Sorry for the delay. Until now I did not find the time to dig into that.
>>>>> Did you find the offending code in signal.c?
>>>> Yes.
>>>>
>>>> Unblock signals is logically incorrect - it will re-trigger an
>>>> interrupts even if there is an interrupt in flight whose processing has
>>>> not been finished.
>>>>
>>>> I tried several approaches both with the original poll() controller and
>>>> with my epoll() based version, some show promise.
>>>>
>>>> I had to put it aside until next Friday as I have some stuff due at work
>>>> so I cannot spare time to work on it until then. Once I get that out of
>>>> the way I should be able to spare it a day or two which should be enough
>>>> to finish it.
>>>>
>>>> Ditto for the UBD improvements.
>>> One thing we have to consider is that's legit to have SIGIO nested.
>> Correct. That is considered :)
>>
>> Both when looking at poll() and epoll()
>>
>> However, it is not legit to have sigio on a specific fd nested. That is mostly safe for the poll() version, but will need to be accounted for in any surgery on the irq controller.
> So, the current code is fine unless you switch to epoll()?
> Is it because you use epoll() in edge-triggered mode?
>
> Thanks,
> //richard
>


------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [uml-devel] IRQ handler reentrancy
  2015-12-21 11:55           ` Anton Ivanov
@ 2016-01-10 15:53             ` Richard Weinberger
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Weinberger @ 2016-01-10 15:53 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Richard Weinberger, user-mode-linux-devel

On Mon, Dec 21, 2015 at 12:55 PM, Anton Ivanov
<anton.ivanov@kot-begemot.co.uk> wrote:
> Patch in your queue, current code looks OK.
>
> I will try to assemble the full ubd acceleration sequence of patches
> this afternoon to submit that as well.

Applied. Let's see how much will explode. :-)

-- 
Thanks,
//richard

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-01-10 15:53 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-20 12:05 [uml-devel] IRQ handler reentrancy Anton Ivanov
2015-11-20 12:16 ` Richard Weinberger
2015-11-20 12:26   ` stian
2015-11-20 12:50     ` Anton Ivanov
2015-11-20 13:48       ` stian
2015-11-20 14:08         ` Anton Ivanov
2015-11-20 15:21           ` Thomas Meyer
2015-11-20 16:22             ` Anton Ivanov
2015-11-20 16:43               ` Anton Ivanov
2015-11-20 12:45   ` Anton Ivanov
2015-11-24 17:00 ` Anton Ivanov
2015-12-10 22:40 ` Richard Weinberger
2015-12-11  6:58   ` Anton Ivanov
2015-12-11  8:16     ` Richard Weinberger
2015-12-11 11:24       ` Anton Ivanov
2015-12-11 18:38         ` Richard Weinberger
2015-12-11 19:12           ` Anton Ivanov
2015-12-21 11:55           ` Anton Ivanov
2016-01-10 15:53             ` Richard Weinberger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.