All of lore.kernel.org
 help / color / mirror / Atom feed
* [libgpiod] gpiomon loses events
@ 2020-09-14 10:38 Maxim Devaev
  2020-09-14 15:12 ` Andy Shevchenko
  0 siblings, 1 reply; 16+ messages in thread
From: Maxim Devaev @ 2020-09-14 10:38 UTC (permalink / raw)
  To: linux-gpio

Hi. I noticed a strange behavior of gpiomon and gpiomon.py. It seems
that in some cases, if the signal arrives at the GPIO pin too quickly,
the last event on it may be rising, despite the fact that the actual
signal is already set to 0. a Cursory study of the sources showed that
both of these utilities read only one (the first?) event from the
line. I changed gpiomon.py rby replacing read_event() to
read_event_multiply() and iterating by all events, and it looks like
the lost faling events were there.

So, I have a few questions.

1) Is this really a bug in gpiomon, or is it intended to be?

2) If I use read_events_multiple(), can it skip some events? I noticed
that sometimes I can get several falling and rising in a row. Why is
this happening? Shouldn't they be paired? Can the state transition,
i.e. the final falling or rising, be lost?

3) It seems there can only be 16 events in a line. What happens if
more events occur in one iteration of the loop, such as 20? The last 4
events will be lost, they will be available in the next iteration of
event_wait(), or the first 4 events in the current iteration will be
discarded?

Thank you for your attention.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libgpiod] gpiomon loses events
  2020-09-14 10:38 [libgpiod] gpiomon loses events Maxim Devaev
@ 2020-09-14 15:12 ` Andy Shevchenko
  2020-09-14 15:54   ` Andy Shevchenko
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Shevchenko @ 2020-09-14 15:12 UTC (permalink / raw)
  To: Maxim Devaev; +Cc: open list:GPIO SUBSYSTEM

On Mon, Sep 14, 2020 at 1:40 PM Maxim Devaev <mdevaev@gmail.com> wrote:
>
> Hi. I noticed a strange behavior of gpiomon and gpiomon.py. It seems
> that in some cases, if the signal arrives at the GPIO pin too quickly,
> the last event on it may be rising, despite the fact that the actual
> signal is already set to 0.

I'm not sure I understood what's wrong here. 'Event' by definition is
something that has already happened. If pin floats from 0 to 1 to 0
you will get one rising and one falling event.

> a Cursory study of the sources showed that
> both of these utilities read only one (the first?) event from the
> line. I changed gpiomon.py rby replacing read_event() to
> read_event_multiply() and iterating by all events, and it looks like
> the lost faling events were there.
>
> So, I have a few questions.
>
> 1) Is this really a bug in gpiomon, or is it intended to be?
>
> 2) If I use read_events_multiple(), can it skip some events? I noticed
> that sometimes I can get several falling and rising in a row.
> Why is
> this happening?

I can assume that IRQ handler is reentrant and since it has been run
in the thread we will get it messed up. The timestamp of the event
(when recorded) should be used for serialization of events. But if my
assumption is the case, we have to record it in a hard IRQ context.

> Shouldn't they be paired? Can the state transition,
> i.e. the final falling or rising, be lost?
>
> 3) It seems there can only be 16 events in a line. What happens if
> more events occur in one iteration of the loop, such as 20? The last 4
> events will be lost, they will be available in the next iteration of
> event_wait(), or the first 4 events in the current iteration will be
> discarded?

It's how kfifo works, AFAIU it should rewrite first (older) ones.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libgpiod] gpiomon loses events
  2020-09-14 15:12 ` Andy Shevchenko
@ 2020-09-14 15:54   ` Andy Shevchenko
  2020-09-14 15:55     ` Andy Shevchenko
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Shevchenko @ 2020-09-14 15:54 UTC (permalink / raw)
  To: Maxim Devaev; +Cc: open list:GPIO SUBSYSTEM

On Mon, Sep 14, 2020 at 6:12 PM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
>
> On Mon, Sep 14, 2020 at 1:40 PM Maxim Devaev <mdevaev@gmail.com> wrote:
> >
> > Hi. I noticed a strange behavior of gpiomon and gpiomon.py. It seems
> > that in some cases, if the signal arrives at the GPIO pin too quickly,
> > the last event on it may be rising, despite the fact that the actual
> > signal is already set to 0.
>
> I'm not sure I understood what's wrong here. 'Event' by definition is
> something that has already happened. If pin floats from 0 to 1 to 0
> you will get one rising and one falling event.
>
> > a Cursory study of the sources showed that
> > both of these utilities read only one (the first?) event from the
> > line. I changed gpiomon.py rby replacing read_event() to
> > read_event_multiply() and iterating by all events, and it looks like
> > the lost faling events were there.
> >
> > So, I have a few questions.
> >
> > 1) Is this really a bug in gpiomon, or is it intended to be?
> >
> > 2) If I use read_events_multiple(), can it skip some events? I noticed
> > that sometimes I can get several falling and rising in a row.
> > Why is
> > this happening?
>
> I can assume that IRQ handler is reentrant and since it has been run
> in the thread we will get it messed up. The timestamp of the event
> (when recorded) should be used for serialization of events. But if my
> assumption is the case, we have to record it in a hard IRQ context.

...but this is exactly what we are doing in the latest code (didn't
check from which kernel it's a default approach).

So, do you have the timestamps not paired?

> > Shouldn't they be paired? Can the state transition,
> > i.e. the final falling or rising, be lost?
> >
> > 3) It seems there can only be 16 events in a line. What happens if
> > more events occur in one iteration of the loop, such as 20? The last 4
> > events will be lost, they will be available in the next iteration of
> > event_wait(), or the first 4 events in the current iteration will be
> > discarded?
>
> It's how kfifo works, AFAIU it should rewrite first (older) ones.
>
> --
> With Best Regards,
> Andy Shevchenko



-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libgpiod] gpiomon loses events
  2020-09-14 15:54   ` Andy Shevchenko
@ 2020-09-14 15:55     ` Andy Shevchenko
  2020-09-14 16:38       ` Maxim Devaev
  2020-09-15  0:45       ` Kent Gibson
  0 siblings, 2 replies; 16+ messages in thread
From: Andy Shevchenko @ 2020-09-14 15:55 UTC (permalink / raw)
  To: Maxim Devaev, Kent Gibson, Bartosz Golaszewski; +Cc: open list:GPIO SUBSYSTEM

+Cc: libgpiod maintainers

On Mon, Sep 14, 2020 at 6:54 PM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
>
> On Mon, Sep 14, 2020 at 6:12 PM Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> >
> > On Mon, Sep 14, 2020 at 1:40 PM Maxim Devaev <mdevaev@gmail.com> wrote:
> > >
> > > Hi. I noticed a strange behavior of gpiomon and gpiomon.py. It seems
> > > that in some cases, if the signal arrives at the GPIO pin too quickly,
> > > the last event on it may be rising, despite the fact that the actual
> > > signal is already set to 0.
> >
> > I'm not sure I understood what's wrong here. 'Event' by definition is
> > something that has already happened. If pin floats from 0 to 1 to 0
> > you will get one rising and one falling event.
> >
> > > a Cursory study of the sources showed that
> > > both of these utilities read only one (the first?) event from the
> > > line. I changed gpiomon.py rby replacing read_event() to
> > > read_event_multiply() and iterating by all events, and it looks like
> > > the lost faling events were there.
> > >
> > > So, I have a few questions.
> > >
> > > 1) Is this really a bug in gpiomon, or is it intended to be?
> > >
> > > 2) If I use read_events_multiple(), can it skip some events? I noticed
> > > that sometimes I can get several falling and rising in a row.
> > > Why is
> > > this happening?
> >
> > I can assume that IRQ handler is reentrant and since it has been run
> > in the thread we will get it messed up. The timestamp of the event
> > (when recorded) should be used for serialization of events. But if my
> > assumption is the case, we have to record it in a hard IRQ context.
>
> ...but this is exactly what we are doing in the latest code (didn't
> check from which kernel it's a default approach).
>
> So, do you have the timestamps not paired?
>
> > > Shouldn't they be paired? Can the state transition,
> > > i.e. the final falling or rising, be lost?
> > >
> > > 3) It seems there can only be 16 events in a line. What happens if
> > > more events occur in one iteration of the loop, such as 20? The last 4
> > > events will be lost, they will be available in the next iteration of
> > > event_wait(), or the first 4 events in the current iteration will be
> > > discarded?
> >
> > It's how kfifo works, AFAIU it should rewrite first (older) ones.
> >
> > --
> > With Best Regards,
> > Andy Shevchenko
>
>
>
> --
> With Best Regards,
> Andy Shevchenko



-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libgpiod] gpiomon loses events
  2020-09-14 15:55     ` Andy Shevchenko
@ 2020-09-14 16:38       ` Maxim Devaev
  2020-09-15  0:45       ` Kent Gibson
  1 sibling, 0 replies; 16+ messages in thread
From: Maxim Devaev @ 2020-09-14 16:38 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Kent Gibson, Bartosz Golaszewski, open list:GPIO SUBSYSTEM

> I'm not sure I understood what's wrong here. 'Event' by definition is
> something that has already happened. If pin floats from 0 to 1 to 0
> you will get one rising and one falling event.

Well, it looks like something's wrong. Look, I took a Raspberry Pi 4
and libgpiod 1.5.2. Pin 13 receives a non-periodic signal. And here is
the result: https://gist.github.com/mdevaev/bcd6bb8e9c73383c88e410645c4d1797

Despite the fact that all event timestamps are increasing, I get a
large number of duplicate transitions without a reverse transition.
But the most important thing is the last line. I interrupted receiving
the signal, meaning the actual level at the end is 0. However, gpiomon
did not see this, and still believes that there is a 1. The reason for
this is that a single event on a line can contain up to 16 transition
events. However, the gpiomon tool and all python examples only read
the first event. I think this is a problem. I used the python sample
code in my project and was very surprised when my indicator showed the
1 state, but the actual state after the end of a series of events was
turned 0.

For the next example, I took the python implementation of gpiomon. The
git example behaves similarly to the C utility. I modified it as
follows: https://gist.github.com/mdevaev/70b9aa1505bd415167f3feb0e78e2f78
The result is: https://gist.github.com/mdevaev/776d8f8f600053fc42d019d92406641f

Here you can clearly see how several transition events are extracted
from a single line event. The timestamps are still consistently
increasing, however you still see several rising or falling without a
paired opposite event. Pay attention to the last 9 events. If I were
using a normal gpiomon at this point (which only reads the first event
on the line), I would skip the final falling. I see two problems here:
- The absence of the opposite pairs of events. I expected that for
every rising there should be a falling, but I don't see it. If that's
the way it should be, then fine. I admit, I didn't understand your
explanation about IRQ a bit, I'm sorry :)
- The second problem is that using the python example and the gpiomon
tool can lead to a desynchronization of understanding what state the
pin may actually be in. Both utilities should use
gpiod_line_event_read_multiple() instead of gpiod_line_event_read(),
as I think.

> It's how kfifo works, AFAIU it should rewrite first (older) ones.

In other words, it is the last transition event from a series of
events on the line that will reflect the current state of the pin,
right? Provided that there were no other events on the line.

пн, 14 сент. 2020 г. в 18:55, Andy Shevchenko <andy.shevchenko@gmail.com>:
>
> +Cc: libgpiod maintainers
>
> On Mon, Sep 14, 2020 at 6:54 PM Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> >
> > On Mon, Sep 14, 2020 at 6:12 PM Andy Shevchenko
> > <andy.shevchenko@gmail.com> wrote:
> > >
> > > On Mon, Sep 14, 2020 at 1:40 PM Maxim Devaev <mdevaev@gmail.com> wrote:
> > > >
> > > > Hi. I noticed a strange behavior of gpiomon and gpiomon.py. It seems
> > > > that in some cases, if the signal arrives at the GPIO pin too quickly,
> > > > the last event on it may be rising, despite the fact that the actual
> > > > signal is already set to 0.
> > >
> > > I'm not sure I understood what's wrong here. 'Event' by definition is
> > > something that has already happened. If pin floats from 0 to 1 to 0
> > > you will get one rising and one falling event.
> > >
> > > > a Cursory study of the sources showed that
> > > > both of these utilities read only one (the first?) event from the
> > > > line. I changed gpiomon.py rby replacing read_event() to
> > > > read_event_multiply() and iterating by all events, and it looks like
> > > > the lost faling events were there.
> > > >
> > > > So, I have a few questions.
> > > >
> > > > 1) Is this really a bug in gpiomon, or is it intended to be?
> > > >
> > > > 2) If I use read_events_multiple(), can it skip some events? I noticed
> > > > that sometimes I can get several falling and rising in a row.
> > > > Why is
> > > > this happening?
> > >
> > > I can assume that IRQ handler is reentrant and since it has been run
> > > in the thread we will get it messed up. The timestamp of the event
> > > (when recorded) should be used for serialization of events. But if my
> > > assumption is the case, we have to record it in a hard IRQ context.
> >
> > ...but this is exactly what we are doing in the latest code (didn't
> > check from which kernel it's a default approach).
> >
> > So, do you have the timestamps not paired?
> >
> > > > Shouldn't they be paired? Can the state transition,
> > > > i.e. the final falling or rising, be lost?
> > > >
> > > > 3) It seems there can only be 16 events in a line. What happens if
> > > > more events occur in one iteration of the loop, such as 20? The last 4
> > > > events will be lost, they will be available in the next iteration of
> > > > event_wait(), or the first 4 events in the current iteration will be
> > > > discarded?
> > >
> > > It's how kfifo works, AFAIU it should rewrite first (older) ones.
> > >
> > > --
> > > With Best Regards,
> > > Andy Shevchenko
> >
> >
> >
> > --
> > With Best Regards,
> > Andy Shevchenko
>
>
>
> --
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libgpiod] gpiomon loses events
  2020-09-14 15:55     ` Andy Shevchenko
  2020-09-14 16:38       ` Maxim Devaev
@ 2020-09-15  0:45       ` Kent Gibson
  2020-09-15  3:34         ` Kent Gibson
  1 sibling, 1 reply; 16+ messages in thread
From: Kent Gibson @ 2020-09-15  0:45 UTC (permalink / raw)
  To: Maxim Devaev
  Cc: Andy Shevchenko, Bartosz Golaszewski, open list:GPIO SUBSYSTEM

On Mon, Sep 14, 2020 at 06:55:20PM +0300, Andy Shevchenko wrote:
> +Cc: libgpiod maintainers
> 
> On Mon, Sep 14, 2020 at 6:54 PM Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> >
> > On Mon, Sep 14, 2020 at 6:12 PM Andy Shevchenko
> > <andy.shevchenko@gmail.com> wrote:
> > >
> > > On Mon, Sep 14, 2020 at 1:40 PM Maxim Devaev <mdevaev@gmail.com> wrote:
> > > >
> > > > Hi. I noticed a strange behavior of gpiomon and gpiomon.py. It seems
> > > > that in some cases, if the signal arrives at the GPIO pin too quickly,
> > > > the last event on it may be rising, despite the fact that the actual
> > > > signal is already set to 0.
> > >

Hi Maxim,

It looks like you have run across a bug in libgpiod, that I recently submitted
a patch for, whereby reading events can discard events from the kfifo.
The discarded events are the most recent.
The problem is particularly bad for reading single events, as gpiomon
does - any other events in the kfifo will be lost.

The bug was introduced in libgpiod v1.5 so, depending on your
circumstances, I would revert to an earlier libgpiod or apply my patch.
(let me know if you can't find it and I'll send you a copy).

A workaround with the unpatched v1.5.x is to only read events using
gpiod_line_event_read_fd_multiple(), or one of the functions that wrap
it, with num_lines=16.
For the python binding that would be event_read_multiple() - as you have
discovered.

> > > I'm not sure I understood what's wrong here. 'Event' by definition is
> > > something that has already happened. If pin floats from 0 to 1 to 0
> > > you will get one rising and one falling event.
> > >
> > > > a Cursory study of the sources showed that
> > > > both of these utilities read only one (the first?) event from the
> > > > line. I changed gpiomon.py rby replacing read_event() to
> > > > read_event_multiply() and iterating by all events, and it looks like
> > > > the lost faling events were there.
> > > >
> > > > So, I have a few questions.
> > > >
> > > > 1) Is this really a bug in gpiomon, or is it intended to be?
> > > >
> > > > 2) If I use read_events_multiple(), can it skip some events? I noticed
> > > > that sometimes I can get several falling and rising in a row.
> > > > Why is
> > > > this happening?
> > >

Generated events are buffered in the kernel - to a point.  Beyond that
any new events are discarded.  The current limit is hard coded in the
kernel at 16.

Also, if you toggle the line faster than the kernel can service the
resulting interrupts then you can lose events.

> > > I can assume that IRQ handler is reentrant and since it has been run
> > > in the thread we will get it messed up. The timestamp of the event
> > > (when recorded) should be used for serialization of events. But if my
> > > assumption is the case, we have to record it in a hard IRQ context.
> >
> > ...but this is exactly what we are doing in the latest code (didn't
> > check from which kernel it's a default approach).
> >
> > So, do you have the timestamps not paired?
> >
> > > > Shouldn't they be paired? Can the state transition,
> > > > i.e. the final falling or rising, be lost?
> > > >
> > > > 3) It seems there can only be 16 events in a line. What happens if
> > > > more events occur in one iteration of the loop, such as 20? The last 4
> > > > events will be lost, they will be available in the next iteration of
> > > > event_wait(), or the first 4 events in the current iteration will be
> > > > discarded?
> > >
> > > It's how kfifo works, AFAIU it should rewrite first (older) ones.
> > >

The edge detection in the kernel only writes to the kfifo if it is NOT
full, so it actually discards the most recent - you will only get the
first 16 events. The last 4 events of your 20 will be lost.

I would actually prefer it worked as Andy thinks it does as the more
recent events are generally more valuable.
Bart, do you have any opinion on that?
(this would be for uAPI v2, not lineevent, so as not to alter existing
behaviour)

We are working on changes that will make debugging cases such as this
easier, as well as allowing some userspace control over the event
buffer size, but those wont be available until libgpiod v2.

Cheers,
Kent.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libgpiod] gpiomon loses events
  2020-09-15  0:45       ` Kent Gibson
@ 2020-09-15  3:34         ` Kent Gibson
  2020-09-15  7:34           ` Maxim Devaev
  0 siblings, 1 reply; 16+ messages in thread
From: Kent Gibson @ 2020-09-15  3:34 UTC (permalink / raw)
  To: Maxim Devaev
  Cc: Andy Shevchenko, Bartosz Golaszewski, open list:GPIO SUBSYSTEM

On Tue, Sep 15, 2020 at 08:45:41AM +0800, Kent Gibson wrote:
> On Mon, Sep 14, 2020 at 06:55:20PM +0300, Andy Shevchenko wrote:
> > +Cc: libgpiod maintainers
> > 

[snip]

> A workaround with the unpatched v1.5.x is to only read events using
> gpiod_line_event_read_fd_multiple(), or one of the functions that wrap
> it, with num_lines=16.

Oops - that should be num_events.

Cheers,
Kent.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libgpiod] gpiomon loses events
  2020-09-15  3:34         ` Kent Gibson
@ 2020-09-15  7:34           ` Maxim Devaev
  2020-09-15  7:46             ` Bartosz Golaszewski
  2020-09-15 13:57             ` Kent Gibson
  0 siblings, 2 replies; 16+ messages in thread
From: Maxim Devaev @ 2020-09-15  7:34 UTC (permalink / raw)
  To: Kent Gibson
  Cc: Andy Shevchenko, Bartosz Golaszewski, open list:GPIO SUBSYSTEM

> The bug was introduced in libgpiod v1.5 so, depending on your
> circumstances, I would revert to an earlier libgpiod or apply my patch.
> ...
> For the python binding that would be event_read_multiple() - as you have
> discovered.

Hello and thanks for the info. In my case, using workaround is an
acceptable solution, although I think I will also patch libgpiod. I
would appreciate it if you send it to me.

> The edge detection in the kernel only writes to the kfifo if it is NOT
> full, so it actually discards the most recent - you will only get the
> first 16 events. The last 4 events of your 20 will be lost.

Is this behavior documented somewhere? It's a complete surprise to me
that this is how it works. I expected to lose the old events. It seems
to me that for software that catches edge, the loss of new events is a
serious problem, since it can lead to a desynchronization of the
physical state of the pin and the user's information about it. For
example, if event 16 was falling and event 17 was rising, and the
signal stopped changing and remains at 1, the kernel will tell us that
it was only falling (i.e. 0), while the real state will be 1.

If we lose events in any case, then in my opinion it is much more
important to keep the current state, not the past. I can't think of a
case where the loss of old events can lead to problems, but the
desynchronization of the current state actually means that the
software can make the wrong decision in its logic based on the
driver's lies. Yes, this would be a breaking change, but it seems to
me that it is the current behavior that is incorrect. Don't get me
wrong, I don't insist on it. If this decision was made for certain
reasons, I would like to understand where I am wrong.

I see a specific workaround and for this behavior, when the read
timeout occurs, I can re-read the batch of all lines to check if the
state has changed. But it partially makes it meaningless to wait for
events. I still have to manually check if anything is lost or if the
driver has started lying to me. Here the example:
https://github.com/pikvm/kvmd/blob/7cdf597/kvmd/aiogp.py#L102

The fact is that after reading the presentation from Bartosz
Golaszewski and seeing the line "Events never get lost!", I was
impressed and satisfied, but the situation was not so happy:
https://ostconf.com/system/attachments/files/000/001/532/original/Linux_Piter_2018_-_New_GPIO_interface_for_linux_userspace.pdf?1541021776

BTW what about unpaired falling-rising events? Is this how it should be?

вт, 15 сент. 2020 г. в 06:34, Kent Gibson <warthog618@gmail.com>:
>
> On Tue, Sep 15, 2020 at 08:45:41AM +0800, Kent Gibson wrote:
> > On Mon, Sep 14, 2020 at 06:55:20PM +0300, Andy Shevchenko wrote:
> > > +Cc: libgpiod maintainers
> > >
>
> [snip]
>
> > A workaround with the unpatched v1.5.x is to only read events using
> > gpiod_line_event_read_fd_multiple(), or one of the functions that wrap
> > it, with num_lines=16.
>
> Oops - that should be num_events.
>
> Cheers,
> Kent.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libgpiod] gpiomon loses events
  2020-09-15  7:34           ` Maxim Devaev
@ 2020-09-15  7:46             ` Bartosz Golaszewski
  2020-09-15 13:57             ` Kent Gibson
  1 sibling, 0 replies; 16+ messages in thread
From: Bartosz Golaszewski @ 2020-09-15  7:46 UTC (permalink / raw)
  To: Maxim Devaev
  Cc: Kent Gibson, Andy Shevchenko, Bartosz Golaszewski,
	open list:GPIO SUBSYSTEM

On Tue, Sep 15, 2020 at 9:34 AM Maxim Devaev <mdevaev@gmail.com> wrote:
>
> > The bug was introduced in libgpiod v1.5 so, depending on your
> > circumstances, I would revert to an earlier libgpiod or apply my patch.
> > ...
> > For the python binding that would be event_read_multiple() - as you have
> > discovered.
>
> Hello and thanks for the info. In my case, using workaround is an
> acceptable solution, although I think I will also patch libgpiod. I
> would appreciate it if you send it to me.
>

Hi Maxim!

I already applied the patch to the master branch and backported it to
v1.5.x[1]. Please give it a try. I will make a bugfix release soon
too.

Bartosz

[snip]

[1] https://git.kernel.org/pub/scm/libs/libgpiod/libgpiod.git/commit/?h=v1.5.x&id=23df3feb6a1b05f2bafa0b4bde58423daa0cb03b

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libgpiod] gpiomon loses events
  2020-09-15  7:34           ` Maxim Devaev
  2020-09-15  7:46             ` Bartosz Golaszewski
@ 2020-09-15 13:57             ` Kent Gibson
  2020-09-16  9:29               ` Bartosz Golaszewski
  1 sibling, 1 reply; 16+ messages in thread
From: Kent Gibson @ 2020-09-15 13:57 UTC (permalink / raw)
  To: Maxim Devaev
  Cc: Andy Shevchenko, Bartosz Golaszewski, open list:GPIO SUBSYSTEM

On Tue, Sep 15, 2020 at 10:34:31AM +0300, Maxim Devaev wrote:
> > The bug was introduced in libgpiod v1.5 so, depending on your
> > circumstances, I would revert to an earlier libgpiod or apply my patch.
> > ...
> 
> Is this behavior documented somewhere? It's a complete surprise to me
> that this is how it works. I expected to lose the old events. It seems
> to me that for software that catches edge, the loss of new events is a
> serious problem, since it can lead to a desynchronization of the
> physical state of the pin and the user's information about it. For
> example, if event 16 was falling and event 17 was rising, and the
> signal stopped changing and remains at 1, the kernel will tell us that
> it was only falling (i.e. 0), while the real state will be 1.
> 
> If we lose events in any case, then in my opinion it is much more
> important to keep the current state, not the past. I can't think of a
> case where the loss of old events can lead to problems, but the
> desynchronization of the current state actually means that the
> software can make the wrong decision in its logic based on the
> driver's lies. Yes, this would be a breaking change, but it seems to
> me that it is the current behavior that is incorrect. Don't get me
> wrong, I don't insist on it. If this decision was made for certain
> reasons, I would like to understand where I am wrong.
> 

I agree - it makes more sense to discard the older events.
The existing behaviour pre-dates me, so I'm not sure if it is
intentional and if so what the rationale for it is.

And I'm still trying to think of a case where it would be harmful to
change this behaviour - what could it break?

> I see a specific workaround and for this behavior, when the read
> timeout occurs, I can re-read the batch of all lines to check if the
> state has changed. But it partially makes it meaningless to wait for
> events. I still have to manually check if anything is lost or if the
> driver has started lying to me. Here the example:
> https://github.com/pikvm/kvmd/blob/7cdf597/kvmd/aiogp.py#L102
> 
> The fact is that after reading the presentation from Bartosz
> Golaszewski and seeing the line "Events never get lost!", I was
> impressed and satisfied, but the situation was not so happy:
> https://ostconf.com/system/attachments/files/000/001/532/original/Linux_Piter_2018_-_New_GPIO_interface_for_linux_userspace.pdf?1541021776
> 

To be fair, the slide in question is comparing SYSFS with CDEV.
With the SYSFS API it is impossible to queue events in the kernel.
CDEV can provide a queue, it is even in the slide, but all queues
are finite and so can only help iron out bursts - they aren't magic.
On average you need to be able to service the events at the rate
they are arriving or the queue will eventually overflow.

> BTW what about unpaired falling-rising events? Is this how it should be?
> 

If you are losing events then this is what you will get.
No attempt is made in the kernel or libgpiod to keep rising and falling
events paired (until debounce support in GPIO CDEV uAPI v2 arrives).

What is your use case?  Are you seeing these problems in practice or
only because you are generating events faster than you service them
for testing?

Cheers,
Kent.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libgpiod] gpiomon loses events
  2020-09-15 13:57             ` Kent Gibson
@ 2020-09-16  9:29               ` Bartosz Golaszewski
  2020-09-16  9:57                 ` Kent Gibson
  0 siblings, 1 reply; 16+ messages in thread
From: Bartosz Golaszewski @ 2020-09-16  9:29 UTC (permalink / raw)
  To: Kent Gibson, Linus Walleij
  Cc: Maxim Devaev, Andy Shevchenko, Bartosz Golaszewski,
	open list:GPIO SUBSYSTEM

On Wed, Sep 16, 2020 at 2:27 AM Kent Gibson <warthog618@gmail.com> wrote:
>
> On Tue, Sep 15, 2020 at 10:34:31AM +0300, Maxim Devaev wrote:
> > > The bug was introduced in libgpiod v1.5 so, depending on your
> > > circumstances, I would revert to an earlier libgpiod or apply my patch.
> > > ...
> >
> > Is this behavior documented somewhere? It's a complete surprise to me
> > that this is how it works. I expected to lose the old events. It seems
> > to me that for software that catches edge, the loss of new events is a
> > serious problem, since it can lead to a desynchronization of the
> > physical state of the pin and the user's information about it. For
> > example, if event 16 was falling and event 17 was rising, and the
> > signal stopped changing and remains at 1, the kernel will tell us that
> > it was only falling (i.e. 0), while the real state will be 1.
> >
> > If we lose events in any case, then in my opinion it is much more
> > important to keep the current state, not the past. I can't think of a
> > case where the loss of old events can lead to problems, but the
> > desynchronization of the current state actually means that the
> > software can make the wrong decision in its logic based on the
> > driver's lies. Yes, this would be a breaking change, but it seems to
> > me that it is the current behavior that is incorrect. Don't get me
> > wrong, I don't insist on it. If this decision was made for certain
> > reasons, I would like to understand where I am wrong.
> >
>
> I agree - it makes more sense to discard the older events.
> The existing behaviour pre-dates me, so I'm not sure if it is
> intentional and if so what the rationale for it is.
>

While it predates me too (Linus: any particular reason to do it like
this?) I think that requesting events from user-space is a contract
where the user-space program commits to reading the events fast enough
to avoid this kind of overflow. In V2 we can adjust the size of the
queue to make it bigger if the process isn't capable of consuming all
the data as they come.

> And I'm still trying to think of a case where it would be harmful to
> change this behaviour - what could it break?
>

Well, I wouldn't change it in V1 but since V2 is a new thing - I think
it should be relatively straightforward right? If we see the kfifo is
full, we should simply consume the oldest event on the kernel side,
drop it and add in the new one. Maybe worth considering for v9? I
don't see any cons of this and this behavior is quite reasonable.

[snip]

Bartosz

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libgpiod] gpiomon loses events
  2020-09-16  9:29               ` Bartosz Golaszewski
@ 2020-09-16  9:57                 ` Kent Gibson
  2020-09-17 10:36                   ` Maxim Devaev
  0 siblings, 1 reply; 16+ messages in thread
From: Kent Gibson @ 2020-09-16  9:57 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Linus Walleij, Maxim Devaev, Andy Shevchenko,
	Bartosz Golaszewski, open list:GPIO SUBSYSTEM

On Wed, Sep 16, 2020 at 11:29:00AM +0200, Bartosz Golaszewski wrote:
> On Wed, Sep 16, 2020 at 2:27 AM Kent Gibson <warthog618@gmail.com> wrote:
> >
> > On Tue, Sep 15, 2020 at 10:34:31AM +0300, Maxim Devaev wrote:
> > > > The bug was introduced in libgpiod v1.5 so, depending on your
> > > > circumstances, I would revert to an earlier libgpiod or apply my patch.
> > > > ...
> > >
> > > Is this behavior documented somewhere? It's a complete surprise to me
> > > that this is how it works. I expected to lose the old events. It seems
> > > to me that for software that catches edge, the loss of new events is a
> > > serious problem, since it can lead to a desynchronization of the
> > > physical state of the pin and the user's information about it. For
> > > example, if event 16 was falling and event 17 was rising, and the
> > > signal stopped changing and remains at 1, the kernel will tell us that
> > > it was only falling (i.e. 0), while the real state will be 1.
> > >
> > > If we lose events in any case, then in my opinion it is much more
> > > important to keep the current state, not the past. I can't think of a
> > > case where the loss of old events can lead to problems, but the
> > > desynchronization of the current state actually means that the
> > > software can make the wrong decision in its logic based on the
> > > driver's lies. Yes, this would be a breaking change, but it seems to
> > > me that it is the current behavior that is incorrect. Don't get me
> > > wrong, I don't insist on it. If this decision was made for certain
> > > reasons, I would like to understand where I am wrong.
> > >
> >
> > I agree - it makes more sense to discard the older events.
> > The existing behaviour pre-dates me, so I'm not sure if it is
> > intentional and if so what the rationale for it is.
> >
> 
> While it predates me too (Linus: any particular reason to do it like
> this?) I think that requesting events from user-space is a contract
> where the user-space program commits to reading the events fast enough
> to avoid this kind of overflow. In V2 we can adjust the size of the
> queue to make it bigger if the process isn't capable of consuming all
> the data as they come.
> 

For sure, but if there is an overflow for whatever reason - maybe they
need to debounce ;-) - then it would be preferable for the final event
to correspond to the current state.

> > And I'm still trying to think of a case where it would be harmful to
> > change this behaviour - what could it break?
> >
> 
> Well, I wouldn't change it in V1 but since V2 is a new thing - I think
> it should be relatively straightforward right? If we see the kfifo is
> full, we should simply consume the oldest event on the kernel side,
> drop it and add in the new one. Maybe worth considering for v9? I
> don't see any cons of this and this behavior is quite reasonable.
> 

It is pretty straight forward - my current patch for this looks like:

@@ -537,9 +537,15 @@ static irqreturn_t edge_irq_thread(int irq, void *p)
        le.seqno = (lr->num_lines == 1) ? le.line_seqno : line->req_seqno;
        le.offset = gpio_chip_hwgpio(line->desc);

-       ret = kfifo_in_spinlocked_noirqsave(&lr->events, &le,
-                                           1, &lr->wait.lock);
-       if (ret)
+       overflow = false;
+       spin_lock(&lr->wait.lock);
+       if (kfifo_is_full(&lr->events)) {
+               overflow = true;
+               kfifo_skip(&lr->events);
+       }
+       kfifo_in(&lr->events, &le, 1);
+       spin_unlock(&lr->wait.lock);
+       if (!overflow)
                wake_up_poll(&lr->wait, EPOLLIN)

I'll incorporate that into v9.

Cheers,
Kent.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libgpiod] gpiomon loses events
  2020-09-16  9:57                 ` Kent Gibson
@ 2020-09-17 10:36                   ` Maxim Devaev
  2020-09-17 13:41                     ` Andy Shevchenko
  0 siblings, 1 reply; 16+ messages in thread
From: Maxim Devaev @ 2020-09-17 10:36 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Kent Gibson, Linus Walleij, Andy Shevchenko, open list:GPIO SUBSYSTEM

Hi again and sorry for delay.

> I already applied the patch to the master branch and backported it to
> v1.5.x[1]. Please give it a try. I will make a bugfix release soon
> too.

Everything seems to be working fine now. In any case, I plan to use
event_read_multiple (), just for reasons of paranoia (and for older
versions) :)

> If you are losing events then this is what you will get.
> No attempt is made in the kernel or libgpiod to keep rising and falling
> events paired (until debounce support in GPIO CDEV uAPI v2 arrives).

Okay, thanks. I will take this into account.

> What is your use case?  Are you seeing these problems in practice or
> only because you are generating events faster than you service them
> for testing?

I make an IP-KVM and use GPIO to read the status of the HDD led on the
managed server's motherboard. This led operates in PWM mode, and if
the frequency is high enough and then the activity is stopped, then
there is a chance that the actual state of the led will be 0, while
the last state received from the library will be 1.

ср, 16 сент. 2020 г. в 12:57, Kent Gibson <warthog618@gmail.com>:
>
> On Wed, Sep 16, 2020 at 11:29:00AM +0200, Bartosz Golaszewski wrote:
> > On Wed, Sep 16, 2020 at 2:27 AM Kent Gibson <warthog618@gmail.com> wrote:
> > >
> > > On Tue, Sep 15, 2020 at 10:34:31AM +0300, Maxim Devaev wrote:
> > > > > The bug was introduced in libgpiod v1.5 so, depending on your
> > > > > circumstances, I would revert to an earlier libgpiod or apply my patch.
> > > > > ...
> > > >
> > > > Is this behavior documented somewhere? It's a complete surprise to me
> > > > that this is how it works. I expected to lose the old events. It seems
> > > > to me that for software that catches edge, the loss of new events is a
> > > > serious problem, since it can lead to a desynchronization of the
> > > > physical state of the pin and the user's information about it. For
> > > > example, if event 16 was falling and event 17 was rising, and the
> > > > signal stopped changing and remains at 1, the kernel will tell us that
> > > > it was only falling (i.e. 0), while the real state will be 1.
> > > >
> > > > If we lose events in any case, then in my opinion it is much more
> > > > important to keep the current state, not the past. I can't think of a
> > > > case where the loss of old events can lead to problems, but the
> > > > desynchronization of the current state actually means that the
> > > > software can make the wrong decision in its logic based on the
> > > > driver's lies. Yes, this would be a breaking change, but it seems to
> > > > me that it is the current behavior that is incorrect. Don't get me
> > > > wrong, I don't insist on it. If this decision was made for certain
> > > > reasons, I would like to understand where I am wrong.
> > > >
> > >
> > > I agree - it makes more sense to discard the older events.
> > > The existing behaviour pre-dates me, so I'm not sure if it is
> > > intentional and if so what the rationale for it is.
> > >
> >
> > While it predates me too (Linus: any particular reason to do it like
> > this?) I think that requesting events from user-space is a contract
> > where the user-space program commits to reading the events fast enough
> > to avoid this kind of overflow. In V2 we can adjust the size of the
> > queue to make it bigger if the process isn't capable of consuming all
> > the data as they come.
> >
>
> For sure, but if there is an overflow for whatever reason - maybe they
> need to debounce ;-) - then it would be preferable for the final event
> to correspond to the current state.
>
> > > And I'm still trying to think of a case where it would be harmful to
> > > change this behaviour - what could it break?
> > >
> >
> > Well, I wouldn't change it in V1 but since V2 is a new thing - I think
> > it should be relatively straightforward right? If we see the kfifo is
> > full, we should simply consume the oldest event on the kernel side,
> > drop it and add in the new one. Maybe worth considering for v9? I
> > don't see any cons of this and this behavior is quite reasonable.
> >
>
> It is pretty straight forward - my current patch for this looks like:
>
> @@ -537,9 +537,15 @@ static irqreturn_t edge_irq_thread(int irq, void *p)
>         le.seqno = (lr->num_lines == 1) ? le.line_seqno : line->req_seqno;
>         le.offset = gpio_chip_hwgpio(line->desc);
>
> -       ret = kfifo_in_spinlocked_noirqsave(&lr->events, &le,
> -                                           1, &lr->wait.lock);
> -       if (ret)
> +       overflow = false;
> +       spin_lock(&lr->wait.lock);
> +       if (kfifo_is_full(&lr->events)) {
> +               overflow = true;
> +               kfifo_skip(&lr->events);
> +       }
> +       kfifo_in(&lr->events, &le, 1);
> +       spin_unlock(&lr->wait.lock);
> +       if (!overflow)
>                 wake_up_poll(&lr->wait, EPOLLIN)
>
> I'll incorporate that into v9.
>
> Cheers,
> Kent.
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libgpiod] gpiomon loses events
  2020-09-17 10:36                   ` Maxim Devaev
@ 2020-09-17 13:41                     ` Andy Shevchenko
  2020-09-17 13:45                       ` Maxim Devaev
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Shevchenko @ 2020-09-17 13:41 UTC (permalink / raw)
  To: Maxim Devaev
  Cc: Bartosz Golaszewski, Kent Gibson, Linus Walleij,
	open list:GPIO SUBSYSTEM

On Thu, Sep 17, 2020 at 1:37 PM Maxim Devaev <mdevaev@gmail.com> wrote:

> > I already applied the patch to the master branch and backported it to
> > v1.5.x[1]. Please give it a try. I will make a bugfix release soon
> > too.
>
> Everything seems to be working fine now. In any case, I plan to use
> event_read_multiple (), just for reasons of paranoia (and for older
> versions) :)
>
> > If you are losing events then this is what you will get.
> > No attempt is made in the kernel or libgpiod to keep rising and falling
> > events paired (until debounce support in GPIO CDEV uAPI v2 arrives).
>
> Okay, thanks. I will take this into account.
>
> > What is your use case?  Are you seeing these problems in practice or
> > only because you are generating events faster than you service them
> > for testing?
>
> I make an IP-KVM and use GPIO to read the status of the HDD led on the
> managed server's motherboard. This led operates in PWM mode, and if
> the frequency is high enough and then the activity is stopped, then
> there is a chance that the actual state of the led will be 0, while
> the last state received from the library will be 1.

As a workaround can you simply read the state separately (yes, I
understand that is not ideal but may help a bit to mitigate the
current situation)?
I.o.w. if you haven't got a new event during certain period of time,
read the pin state to actualise it.


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libgpiod] gpiomon loses events
  2020-09-17 13:41                     ` Andy Shevchenko
@ 2020-09-17 13:45                       ` Maxim Devaev
  2020-09-17 13:51                         ` Andy Shevchenko
  0 siblings, 1 reply; 16+ messages in thread
From: Maxim Devaev @ 2020-09-17 13:45 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Bartosz Golaszewski, Kent Gibson, Linus Walleij,
	open list:GPIO SUBSYSTEM

> As a workaround can you simply read the state separately (yes, I
> understand that is not ideal but may help a bit to mitigate the
> current situation)?
> I.o.w. if you haven't got a new event during certain period of time,
> read the pin state to actualise it.

Yes, that's what I'm doing now. In principle, I am satisfied with this
decision. Not perfect, but okay. The only thing I think is worth
documenting this situation. Suddenly someone else will face similar
problems.

чт, 17 сент. 2020 г. в 16:41, Andy Shevchenko <andy.shevchenko@gmail.com>:
>
> On Thu, Sep 17, 2020 at 1:37 PM Maxim Devaev <mdevaev@gmail.com> wrote:
>
> > > I already applied the patch to the master branch and backported it to
> > > v1.5.x[1]. Please give it a try. I will make a bugfix release soon
> > > too.
> >
> > Everything seems to be working fine now. In any case, I plan to use
> > event_read_multiple (), just for reasons of paranoia (and for older
> > versions) :)
> >
> > > If you are losing events then this is what you will get.
> > > No attempt is made in the kernel or libgpiod to keep rising and falling
> > > events paired (until debounce support in GPIO CDEV uAPI v2 arrives).
> >
> > Okay, thanks. I will take this into account.
> >
> > > What is your use case?  Are you seeing these problems in practice or
> > > only because you are generating events faster than you service them
> > > for testing?
> >
> > I make an IP-KVM and use GPIO to read the status of the HDD led on the
> > managed server's motherboard. This led operates in PWM mode, and if
> > the frequency is high enough and then the activity is stopped, then
> > there is a chance that the actual state of the led will be 0, while
> > the last state received from the library will be 1.
>
> As a workaround can you simply read the state separately (yes, I
> understand that is not ideal but may help a bit to mitigate the
> current situation)?
> I.o.w. if you haven't got a new event during certain period of time,
> read the pin state to actualise it.
>
>
> --
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [libgpiod] gpiomon loses events
  2020-09-17 13:45                       ` Maxim Devaev
@ 2020-09-17 13:51                         ` Andy Shevchenko
  0 siblings, 0 replies; 16+ messages in thread
From: Andy Shevchenko @ 2020-09-17 13:51 UTC (permalink / raw)
  To: Maxim Devaev
  Cc: Bartosz Golaszewski, Kent Gibson, Linus Walleij,
	open list:GPIO SUBSYSTEM

On Thu, Sep 17, 2020 at 4:45 PM Maxim Devaev <mdevaev@gmail.com> wrote:
>
> > As a workaround can you simply read the state separately (yes, I
> > understand that is not ideal but may help a bit to mitigate the
> > current situation)?
> > I.o.w. if you haven't got a new event during certain period of time,
> > read the pin state to actualise it.
>
> Yes, that's what I'm doing now. In principle, I am satisfied with this
> decision. Not perfect, but okay. The only thing I think is worth
> documenting this situation. Suddenly someone else will face similar
> problems.

Side note: please do avoid top postings.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-09-17 14:29 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-14 10:38 [libgpiod] gpiomon loses events Maxim Devaev
2020-09-14 15:12 ` Andy Shevchenko
2020-09-14 15:54   ` Andy Shevchenko
2020-09-14 15:55     ` Andy Shevchenko
2020-09-14 16:38       ` Maxim Devaev
2020-09-15  0:45       ` Kent Gibson
2020-09-15  3:34         ` Kent Gibson
2020-09-15  7:34           ` Maxim Devaev
2020-09-15  7:46             ` Bartosz Golaszewski
2020-09-15 13:57             ` Kent Gibson
2020-09-16  9:29               ` Bartosz Golaszewski
2020-09-16  9:57                 ` Kent Gibson
2020-09-17 10:36                   ` Maxim Devaev
2020-09-17 13:41                     ` Andy Shevchenko
2020-09-17 13:45                       ` Maxim Devaev
2020-09-17 13:51                         ` Andy Shevchenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.