linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues
@ 2020-10-22 12:24 Alexandre Courbot
  2020-10-23 14:22 ` Alexandre Courbot
  2020-10-30 15:09 ` Hans Verkuil
  0 siblings, 2 replies; 14+ messages in thread
From: Alexandre Courbot @ 2020-10-22 12:24 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Hans Verkuil
  Cc: linux-media, linux-kernel, Alexandre Courbot

do_poll()/do_select() seem to set the _qproc member of poll_table to
NULL the first time they are called on a given table, making subsequent
calls of poll_wait() on that table no-ops. This is a problem for mem2mem
which calls poll_wait() on the V4L2 queues' waitqueues only when a
queue-related event is requested, which may not necessarily be the case
during the first poll.

For instance, a stateful decoder is typically only interested in
EPOLLPRI events when it starts, and will switch to listening to both
EPOLLPRI and EPOLLIN after receiving the initial resolution change event
and configuring the CAPTURE queue. However by the time that switch
happens and v4l2_m2m_poll_for_data() is called for the first time,
poll_wait() has become a no-op and the V4L2 queues waitqueues thus
cannot be registered.

Fix this by moving the registration to v4l2_m2m_poll() and do it whether
or not one of the queue-related events are requested.

Signed-off-by: Alexandre Courbot <gnurou@gmail.com>
---
I seem to be hitting all the polling corner cases recently! ^_^; This
time I was wondering why epoll_wait() never returned after I received
the first resolution change event from the vicodec stateful decoder.
This is why - please take a look!

 drivers/media/v4l2-core/v4l2-mem2mem.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
index b221b4e438a1..65476ef2879f 100644
--- a/drivers/media/v4l2-core/v4l2-mem2mem.c
+++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
@@ -887,9 +887,6 @@ static __poll_t v4l2_m2m_poll_for_data(struct file *file,
 	src_q = v4l2_m2m_get_src_vq(m2m_ctx);
 	dst_q = v4l2_m2m_get_dst_vq(m2m_ctx);
 
-	poll_wait(file, &src_q->done_wq, wait);
-	poll_wait(file, &dst_q->done_wq, wait);
-
 	/*
 	 * There has to be at least one buffer queued on each queued_list, which
 	 * means either in driver already or waiting for driver to claim it
@@ -922,9 +919,14 @@ __poll_t v4l2_m2m_poll(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
 		       struct poll_table_struct *wait)
 {
 	struct video_device *vfd = video_devdata(file);
+	struct vb2_queue *src_q = v4l2_m2m_get_src_vq(m2m_ctx);
+	struct vb2_queue *dst_q = v4l2_m2m_get_dst_vq(m2m_ctx);
 	__poll_t req_events = poll_requested_events(wait);
 	__poll_t rc = 0;
 
+	poll_wait(file, &src_q->done_wq, wait);
+	poll_wait(file, &dst_q->done_wq, wait);
+
 	if (req_events & (EPOLLOUT | EPOLLWRNORM | EPOLLIN | EPOLLRDNORM))
 		rc = v4l2_m2m_poll_for_data(file, m2m_ctx, wait);
 
-- 
2.29.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues
  2020-10-22 12:24 [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues Alexandre Courbot
@ 2020-10-23 14:22 ` Alexandre Courbot
  2020-10-30 15:09 ` Hans Verkuil
  1 sibling, 0 replies; 14+ messages in thread
From: Alexandre Courbot @ 2020-10-23 14:22 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Hans Verkuil
  Cc: linux-media, Linux Kernel Mailing List

On Thu, Oct 22, 2020 at 9:24 PM Alexandre Courbot <gnurou@gmail.com> wrote:
>
> do_poll()/do_select() seem to set the _qproc member of poll_table to
> NULL the first time they are called on a given table, making subsequent
> calls of poll_wait() on that table no-ops. This is a problem for mem2mem
> which calls poll_wait() on the V4L2 queues' waitqueues only when a
> queue-related event is requested, which may not necessarily be the case
> during the first poll.
>
> For instance, a stateful decoder is typically only interested in
> EPOLLPRI events when it starts, and will switch to listening to both
> EPOLLPRI and EPOLLIN after receiving the initial resolution change event
> and configuring the CAPTURE queue. However by the time that switch
> happens and v4l2_m2m_poll_for_data() is called for the first time,
> poll_wait() has become a no-op and the V4L2 queues waitqueues thus
> cannot be registered.
>
> Fix this by moving the registration to v4l2_m2m_poll() and do it whether
> or not one of the queue-related events are requested.
>
> Signed-off-by: Alexandre Courbot <gnurou@gmail.com>
> ---
> I seem to be hitting all the polling corner cases recently! ^_^; This
> time I was wondering why epoll_wait() never returned after I received
> the first resolution change event from the vicodec stateful decoder.
> This is why - please take a look!
>
>  drivers/media/v4l2-core/v4l2-mem2mem.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> index b221b4e438a1..65476ef2879f 100644
> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> @@ -887,9 +887,6 @@ static __poll_t v4l2_m2m_poll_for_data(struct file *file,
>         src_q = v4l2_m2m_get_src_vq(m2m_ctx);
>         dst_q = v4l2_m2m_get_dst_vq(m2m_ctx);
>
> -       poll_wait(file, &src_q->done_wq, wait);
> -       poll_wait(file, &dst_q->done_wq, wait);
> -
>         /*
>          * There has to be at least one buffer queued on each queued_list, which
>          * means either in driver already or waiting for driver to claim it
> @@ -922,9 +919,14 @@ __poll_t v4l2_m2m_poll(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
>                        struct poll_table_struct *wait)
>  {
>         struct video_device *vfd = video_devdata(file);
> +       struct vb2_queue *src_q = v4l2_m2m_get_src_vq(m2m_ctx);
> +       struct vb2_queue *dst_q = v4l2_m2m_get_dst_vq(m2m_ctx);
>         __poll_t req_events = poll_requested_events(wait);
>         __poll_t rc = 0;
>
> +       poll_wait(file, &src_q->done_wq, wait);
> +       poll_wait(file, &dst_q->done_wq, wait);

This should probably include a comment to not move this back to
v4l2_m2m_poll_for_data(). I'll add one and send a v2 unless someone
points out that the premise of this patch is a bad idea to begin with.


> +
>         if (req_events & (EPOLLOUT | EPOLLWRNORM | EPOLLIN | EPOLLRDNORM))
>                 rc = v4l2_m2m_poll_for_data(file, m2m_ctx, wait);
>
> --
> 2.29.0
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues
  2020-10-22 12:24 [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues Alexandre Courbot
  2020-10-23 14:22 ` Alexandre Courbot
@ 2020-10-30 15:09 ` Hans Verkuil
  2020-11-03  8:51   ` Alexandre Courbot
  1 sibling, 1 reply; 14+ messages in thread
From: Hans Verkuil @ 2020-10-30 15:09 UTC (permalink / raw)
  To: Alexandre Courbot, Mauro Carvalho Chehab; +Cc: linux-media, linux-kernel

On 22/10/2020 14:24, Alexandre Courbot wrote:
> do_poll()/do_select() seem to set the _qproc member of poll_table to
> NULL the first time they are called on a given table, making subsequent
> calls of poll_wait() on that table no-ops. This is a problem for mem2mem
> which calls poll_wait() on the V4L2 queues' waitqueues only when a
> queue-related event is requested, which may not necessarily be the case
> during the first poll.
> 
> For instance, a stateful decoder is typically only interested in
> EPOLLPRI events when it starts, and will switch to listening to both
> EPOLLPRI and EPOLLIN after receiving the initial resolution change event
> and configuring the CAPTURE queue. However by the time that switch
> happens and v4l2_m2m_poll_for_data() is called for the first time,
> poll_wait() has become a no-op and the V4L2 queues waitqueues thus
> cannot be registered.
> 
> Fix this by moving the registration to v4l2_m2m_poll() and do it whether
> or not one of the queue-related events are requested.

This looks good, but would it be possible to add a test for this to
v4l2-compliance? (Look for POLL_MODE_EPOLL in v4l2-test-buffers.cpp)

If I understand this right, calling EPOLL_CTL_ADD for EPOLLPRI, then
calling EPOLL_CTL_ADD for EPOLLIN/OUT would trigger this? Or does there
have to be an epoll_wait call in between?

Another reason for adding this test is that I wonder if regular capture
or output V4L2 devices don't have the same issue.

It's a very subtle bug and so adding a test for this to v4l2-compliance
would be very useful.

Regards,

	Hans

> 
> Signed-off-by: Alexandre Courbot <gnurou@gmail.com>
> ---
> I seem to be hitting all the polling corner cases recently! ^_^; This
> time I was wondering why epoll_wait() never returned after I received
> the first resolution change event from the vicodec stateful decoder.
> This is why - please take a look!
> 
>  drivers/media/v4l2-core/v4l2-mem2mem.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> index b221b4e438a1..65476ef2879f 100644
> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> @@ -887,9 +887,6 @@ static __poll_t v4l2_m2m_poll_for_data(struct file *file,
>  	src_q = v4l2_m2m_get_src_vq(m2m_ctx);
>  	dst_q = v4l2_m2m_get_dst_vq(m2m_ctx);
>  
> -	poll_wait(file, &src_q->done_wq, wait);
> -	poll_wait(file, &dst_q->done_wq, wait);
> -
>  	/*
>  	 * There has to be at least one buffer queued on each queued_list, which
>  	 * means either in driver already or waiting for driver to claim it
> @@ -922,9 +919,14 @@ __poll_t v4l2_m2m_poll(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
>  		       struct poll_table_struct *wait)
>  {
>  	struct video_device *vfd = video_devdata(file);
> +	struct vb2_queue *src_q = v4l2_m2m_get_src_vq(m2m_ctx);
> +	struct vb2_queue *dst_q = v4l2_m2m_get_dst_vq(m2m_ctx);
>  	__poll_t req_events = poll_requested_events(wait);
>  	__poll_t rc = 0;
>  
> +	poll_wait(file, &src_q->done_wq, wait);
> +	poll_wait(file, &dst_q->done_wq, wait);
> +
>  	if (req_events & (EPOLLOUT | EPOLLWRNORM | EPOLLIN | EPOLLRDNORM))
>  		rc = v4l2_m2m_poll_for_data(file, m2m_ctx, wait);
>  
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues
  2020-10-30 15:09 ` Hans Verkuil
@ 2020-11-03  8:51   ` Alexandre Courbot
  2020-11-03  9:48     ` Hans Verkuil
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Courbot @ 2020-11-03  8:51 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Mauro Carvalho Chehab, linux-media, Linux Kernel Mailing List

Hi Hans,

On Sat, Oct 31, 2020 at 12:09 AM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>
> On 22/10/2020 14:24, Alexandre Courbot wrote:
> > do_poll()/do_select() seem to set the _qproc member of poll_table to
> > NULL the first time they are called on a given table, making subsequent
> > calls of poll_wait() on that table no-ops. This is a problem for mem2mem
> > which calls poll_wait() on the V4L2 queues' waitqueues only when a
> > queue-related event is requested, which may not necessarily be the case
> > during the first poll.
> >
> > For instance, a stateful decoder is typically only interested in
> > EPOLLPRI events when it starts, and will switch to listening to both
> > EPOLLPRI and EPOLLIN after receiving the initial resolution change event
> > and configuring the CAPTURE queue. However by the time that switch
> > happens and v4l2_m2m_poll_for_data() is called for the first time,
> > poll_wait() has become a no-op and the V4L2 queues waitqueues thus
> > cannot be registered.
> >
> > Fix this by moving the registration to v4l2_m2m_poll() and do it whether
> > or not one of the queue-related events are requested.
>
> This looks good, but would it be possible to add a test for this to
> v4l2-compliance? (Look for POLL_MODE_EPOLL in v4l2-test-buffers.cpp)
>
> If I understand this right, calling EPOLL_CTL_ADD for EPOLLPRI, then
> calling EPOLL_CTL_ADD for EPOLLIN/OUT would trigger this? Or does there
> have to be an epoll_wait call in between?

Even without an epoll_wait() in between the behavior is visible.
v4l2_m2m_poll() will be called once during the initial EPOLL_CTL_ADD
and this will trigger the bug.

> Another reason for adding this test is that I wonder if regular capture
> or output V4L2 devices don't have the same issue.
>
> It's a very subtle bug and so adding a test for this to v4l2-compliance
> would be very useful.

I fully agree, this is very counter-intuitive since what basically
happens is that the kernel's poll_wait() function becomes a no-op
after the poll() hook of a driver is called for the first time. There
is no way one can expect this behavior just from browsing the code so
this is likely to affect other drivers.

As for the test itself, we can easily reproduce the conditions for
failure in v4l2-test-buffers.cpp's captureBufs() function, but doing
so will make the streaming tests fail without being specific about the
cause. Or maybe we should add another pollmode to specifically test
epoll in this setup? Can I get your thoughts?

Cheers,
Alex.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues
  2020-11-03  8:51   ` Alexandre Courbot
@ 2020-11-03  9:48     ` Hans Verkuil
  2020-11-05 12:21       ` Alexandre Courbot
  2020-11-23 15:18       ` Alexandre Courbot
  0 siblings, 2 replies; 14+ messages in thread
From: Hans Verkuil @ 2020-11-03  9:48 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Mauro Carvalho Chehab, linux-media, Linux Kernel Mailing List

On 03/11/2020 09:51, Alexandre Courbot wrote:
> Hi Hans,
> 
> On Sat, Oct 31, 2020 at 12:09 AM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>>
>> On 22/10/2020 14:24, Alexandre Courbot wrote:
>>> do_poll()/do_select() seem to set the _qproc member of poll_table to
>>> NULL the first time they are called on a given table, making subsequent
>>> calls of poll_wait() on that table no-ops. This is a problem for mem2mem
>>> which calls poll_wait() on the V4L2 queues' waitqueues only when a
>>> queue-related event is requested, which may not necessarily be the case
>>> during the first poll.
>>>
>>> For instance, a stateful decoder is typically only interested in
>>> EPOLLPRI events when it starts, and will switch to listening to both
>>> EPOLLPRI and EPOLLIN after receiving the initial resolution change event
>>> and configuring the CAPTURE queue. However by the time that switch
>>> happens and v4l2_m2m_poll_for_data() is called for the first time,
>>> poll_wait() has become a no-op and the V4L2 queues waitqueues thus
>>> cannot be registered.
>>>
>>> Fix this by moving the registration to v4l2_m2m_poll() and do it whether
>>> or not one of the queue-related events are requested.
>>
>> This looks good, but would it be possible to add a test for this to
>> v4l2-compliance? (Look for POLL_MODE_EPOLL in v4l2-test-buffers.cpp)
>>
>> If I understand this right, calling EPOLL_CTL_ADD for EPOLLPRI, then
>> calling EPOLL_CTL_ADD for EPOLLIN/OUT would trigger this? Or does there
>> have to be an epoll_wait call in between?
> 
> Even without an epoll_wait() in between the behavior is visible.
> v4l2_m2m_poll() will be called once during the initial EPOLL_CTL_ADD
> and this will trigger the bug.
> 
>> Another reason for adding this test is that I wonder if regular capture
>> or output V4L2 devices don't have the same issue.
>>
>> It's a very subtle bug and so adding a test for this to v4l2-compliance
>> would be very useful.
> 
> I fully agree, this is very counter-intuitive since what basically
> happens is that the kernel's poll_wait() function becomes a no-op
> after the poll() hook of a driver is called for the first time. There
> is no way one can expect this behavior just from browsing the code so
> this is likely to affect other drivers.
> 
> As for the test itself, we can easily reproduce the conditions for
> failure in v4l2-test-buffers.cpp's captureBufs() function, but doing
> so will make the streaming tests fail without being specific about the
> cause. Or maybe we should add another pollmode to specifically test
> epoll in this setup? Can I get your thoughts?

No, just keep it as part of the poll test. Just add comments at the place
where it fails describing this error.

After all, it *is* a poll() bug, so it is only fair that it is tested as
part of the epoll test.

Can you call EPOLL_CTL_ADD with ev.events set to 0? And then call it again
with the actual value that you need? If that triggers this issue as well,
then that is a nice test (but perhaps EPOLL_CTL_ADD won't call poll() if
ev.events is 0, but perhaps EPOLLERR would work instead of 0).

The epoll_wait() will fail when this issue hits, so that's a good place
to add comments explaining this problem.

There is one other place where this needs to be tested: testEvents() in
v4l2-test-controls.cpp: currently this only tests select(), but there
should be a second epoll test here as well that just tests EPOLLPRI.

This would catch drivers that do not stream (i.e. no EPOLLIN/OUT) but
that do have controls (so support EPOLLPRI).

Congratulation, you found a really nasty corner case! :-)

Regards,

	Hans

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues
  2020-11-03  9:48     ` Hans Verkuil
@ 2020-11-05 12:21       ` Alexandre Courbot
  2020-11-05 12:36         ` Hans Verkuil
  2020-11-23 15:18       ` Alexandre Courbot
  1 sibling, 1 reply; 14+ messages in thread
From: Alexandre Courbot @ 2020-11-05 12:21 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Mauro Carvalho Chehab, linux-media, Linux Kernel Mailing List

On Tue, Nov 3, 2020 at 6:48 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>
> On 03/11/2020 09:51, Alexandre Courbot wrote:
> > Hi Hans,
> >
> > On Sat, Oct 31, 2020 at 12:09 AM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
> >>
> >> On 22/10/2020 14:24, Alexandre Courbot wrote:
> >>> do_poll()/do_select() seem to set the _qproc member of poll_table to
> >>> NULL the first time they are called on a given table, making subsequent
> >>> calls of poll_wait() on that table no-ops. This is a problem for mem2mem
> >>> which calls poll_wait() on the V4L2 queues' waitqueues only when a
> >>> queue-related event is requested, which may not necessarily be the case
> >>> during the first poll.
> >>>
> >>> For instance, a stateful decoder is typically only interested in
> >>> EPOLLPRI events when it starts, and will switch to listening to both
> >>> EPOLLPRI and EPOLLIN after receiving the initial resolution change event
> >>> and configuring the CAPTURE queue. However by the time that switch
> >>> happens and v4l2_m2m_poll_for_data() is called for the first time,
> >>> poll_wait() has become a no-op and the V4L2 queues waitqueues thus
> >>> cannot be registered.
> >>>
> >>> Fix this by moving the registration to v4l2_m2m_poll() and do it whether
> >>> or not one of the queue-related events are requested.
> >>
> >> This looks good, but would it be possible to add a test for this to
> >> v4l2-compliance? (Look for POLL_MODE_EPOLL in v4l2-test-buffers.cpp)
> >>
> >> If I understand this right, calling EPOLL_CTL_ADD for EPOLLPRI, then
> >> calling EPOLL_CTL_ADD for EPOLLIN/OUT would trigger this? Or does there
> >> have to be an epoll_wait call in between?
> >
> > Even without an epoll_wait() in between the behavior is visible.
> > v4l2_m2m_poll() will be called once during the initial EPOLL_CTL_ADD
> > and this will trigger the bug.
> >
> >> Another reason for adding this test is that I wonder if regular capture
> >> or output V4L2 devices don't have the same issue.
> >>
> >> It's a very subtle bug and so adding a test for this to v4l2-compliance
> >> would be very useful.
> >
> > I fully agree, this is very counter-intuitive since what basically
> > happens is that the kernel's poll_wait() function becomes a no-op
> > after the poll() hook of a driver is called for the first time. There
> > is no way one can expect this behavior just from browsing the code so
> > this is likely to affect other drivers.
> >
> > As for the test itself, we can easily reproduce the conditions for
> > failure in v4l2-test-buffers.cpp's captureBufs() function, but doing
> > so will make the streaming tests fail without being specific about the
> > cause. Or maybe we should add another pollmode to specifically test
> > epoll in this setup? Can I get your thoughts?
>
> No, just keep it as part of the poll test. Just add comments at the place
> where it fails describing this error.
>
> After all, it *is* a poll() bug, so it is only fair that it is tested as
> part of the epoll test.
>
> Can you call EPOLL_CTL_ADD with ev.events set to 0? And then call it again
> with the actual value that you need? If that triggers this issue as well,
> then that is a nice test (but perhaps EPOLL_CTL_ADD won't call poll() if
> ev.events is 0, but perhaps EPOLLERR would work instead of 0).

Yup, actually the following is enough to make v4l2-compliance -s fail
with vicodec:

diff --git a/utils/v4l2-compliance/v4l2-test-buffers.cpp
b/utils/v4l2-compliance/v4l2-test-buffers.cpp
index 8000db23..b63326cd 100644
--- a/utils/v4l2-compliance/v4l2-test-buffers.cpp
+++ b/utils/v4l2-compliance/v4l2-test-buffers.cpp
@@ -903,6 +903,10 @@ static int captureBufs(struct node *node, struct
node *node_m2m_cap, const cv4l_
                epollfd = epoll_create1(0);

                fail_on_test(epollfd < 0);
+
+               ev.events = 0;
+               fail_on_test(epoll_ctl(epollfd, EPOLL_CTL_ADD,
node->g_fd(), &ev));
+
                if (node->is_m2m)
                        ev.events = EPOLLIN | EPOLLOUT | EPOLLPRI;
                else if (v4l_type_is_output(q.g_type()))
@@ -910,7 +914,7 @@ static int captureBufs(struct node *node, struct
node *node_m2m_cap, const cv4l_
                else
                        ev.events = EPOLLIN;
                ev.data.fd = node->g_fd();
-               fail_on_test(epoll_ctl(epollfd, EPOLL_CTL_ADD,
node->g_fd(), &ev));
+               fail_on_test(epoll_ctl(epollfd, EPOLL_CTL_MOD,
node->g_fd(), &ev));
        }

        if (pollmode)

>
> The epoll_wait() will fail when this issue hits, so that's a good place
> to add comments explaining this problem.
>
> There is one other place where this needs to be tested: testEvents() in
> v4l2-test-controls.cpp: currently this only tests select(), but there
> should be a second epoll test here as well that just tests EPOLLPRI.
>
> This would catch drivers that do not stream (i.e. no EPOLLIN/OUT) but
> that do have controls (so support EPOLLPRI).

I'll take a look there as well, and think about a proper comment
before sending a patch towards you.

Cheers,
Alex.

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues
  2020-11-05 12:21       ` Alexandre Courbot
@ 2020-11-05 12:36         ` Hans Verkuil
  2020-11-05 12:52           ` Alexandre Courbot
  0 siblings, 1 reply; 14+ messages in thread
From: Hans Verkuil @ 2020-11-05 12:36 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Mauro Carvalho Chehab, linux-media, Linux Kernel Mailing List

On 05/11/2020 13:21, Alexandre Courbot wrote:
> On Tue, Nov 3, 2020 at 6:48 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>>
>> On 03/11/2020 09:51, Alexandre Courbot wrote:
>>> Hi Hans,
>>>
>>> On Sat, Oct 31, 2020 at 12:09 AM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>>>>
>>>> On 22/10/2020 14:24, Alexandre Courbot wrote:
>>>>> do_poll()/do_select() seem to set the _qproc member of poll_table to
>>>>> NULL the first time they are called on a given table, making subsequent
>>>>> calls of poll_wait() on that table no-ops. This is a problem for mem2mem
>>>>> which calls poll_wait() on the V4L2 queues' waitqueues only when a
>>>>> queue-related event is requested, which may not necessarily be the case
>>>>> during the first poll.
>>>>>
>>>>> For instance, a stateful decoder is typically only interested in
>>>>> EPOLLPRI events when it starts, and will switch to listening to both
>>>>> EPOLLPRI and EPOLLIN after receiving the initial resolution change event
>>>>> and configuring the CAPTURE queue. However by the time that switch
>>>>> happens and v4l2_m2m_poll_for_data() is called for the first time,
>>>>> poll_wait() has become a no-op and the V4L2 queues waitqueues thus
>>>>> cannot be registered.
>>>>>
>>>>> Fix this by moving the registration to v4l2_m2m_poll() and do it whether
>>>>> or not one of the queue-related events are requested.
>>>>
>>>> This looks good, but would it be possible to add a test for this to
>>>> v4l2-compliance? (Look for POLL_MODE_EPOLL in v4l2-test-buffers.cpp)
>>>>
>>>> If I understand this right, calling EPOLL_CTL_ADD for EPOLLPRI, then
>>>> calling EPOLL_CTL_ADD for EPOLLIN/OUT would trigger this? Or does there
>>>> have to be an epoll_wait call in between?
>>>
>>> Even without an epoll_wait() in between the behavior is visible.
>>> v4l2_m2m_poll() will be called once during the initial EPOLL_CTL_ADD
>>> and this will trigger the bug.
>>>
>>>> Another reason for adding this test is that I wonder if regular capture
>>>> or output V4L2 devices don't have the same issue.
>>>>
>>>> It's a very subtle bug and so adding a test for this to v4l2-compliance
>>>> would be very useful.
>>>
>>> I fully agree, this is very counter-intuitive since what basically
>>> happens is that the kernel's poll_wait() function becomes a no-op
>>> after the poll() hook of a driver is called for the first time. There
>>> is no way one can expect this behavior just from browsing the code so
>>> this is likely to affect other drivers.
>>>
>>> As for the test itself, we can easily reproduce the conditions for
>>> failure in v4l2-test-buffers.cpp's captureBufs() function, but doing
>>> so will make the streaming tests fail without being specific about the
>>> cause. Or maybe we should add another pollmode to specifically test
>>> epoll in this setup? Can I get your thoughts?
>>
>> No, just keep it as part of the poll test. Just add comments at the place
>> where it fails describing this error.
>>
>> After all, it *is* a poll() bug, so it is only fair that it is tested as
>> part of the epoll test.
>>
>> Can you call EPOLL_CTL_ADD with ev.events set to 0? And then call it again
>> with the actual value that you need? If that triggers this issue as well,
>> then that is a nice test (but perhaps EPOLL_CTL_ADD won't call poll() if
>> ev.events is 0, but perhaps EPOLLERR would work instead of 0).
> 
> Yup, actually the following is enough to make v4l2-compliance -s fail
> with vicodec:

Does it also fail with vivid? I am curious to know whether this issue is
m2m specific or a more general problem.

Regards,

	Hans

> 
> diff --git a/utils/v4l2-compliance/v4l2-test-buffers.cpp
> b/utils/v4l2-compliance/v4l2-test-buffers.cpp
> index 8000db23..b63326cd 100644
> --- a/utils/v4l2-compliance/v4l2-test-buffers.cpp
> +++ b/utils/v4l2-compliance/v4l2-test-buffers.cpp
> @@ -903,6 +903,10 @@ static int captureBufs(struct node *node, struct
> node *node_m2m_cap, const cv4l_
>                 epollfd = epoll_create1(0);
> 
>                 fail_on_test(epollfd < 0);
> +
> +               ev.events = 0;
> +               fail_on_test(epoll_ctl(epollfd, EPOLL_CTL_ADD,
> node->g_fd(), &ev));
> +
>                 if (node->is_m2m)
>                         ev.events = EPOLLIN | EPOLLOUT | EPOLLPRI;
>                 else if (v4l_type_is_output(q.g_type()))
> @@ -910,7 +914,7 @@ static int captureBufs(struct node *node, struct
> node *node_m2m_cap, const cv4l_
>                 else
>                         ev.events = EPOLLIN;
>                 ev.data.fd = node->g_fd();
> -               fail_on_test(epoll_ctl(epollfd, EPOLL_CTL_ADD,
> node->g_fd(), &ev));
> +               fail_on_test(epoll_ctl(epollfd, EPOLL_CTL_MOD,
> node->g_fd(), &ev));
>         }
> 
>         if (pollmode)
> 
>>
>> The epoll_wait() will fail when this issue hits, so that's a good place
>> to add comments explaining this problem.
>>
>> There is one other place where this needs to be tested: testEvents() in
>> v4l2-test-controls.cpp: currently this only tests select(), but there
>> should be a second epoll test here as well that just tests EPOLLPRI.
>>
>> This would catch drivers that do not stream (i.e. no EPOLLIN/OUT) but
>> that do have controls (so support EPOLLPRI).
> 
> I'll take a look there as well, and think about a proper comment
> before sending a patch towards you.
> 
> Cheers,
> Alex.
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues
  2020-11-05 12:36         ` Hans Verkuil
@ 2020-11-05 12:52           ` Alexandre Courbot
  2020-11-05 13:12             ` Hans Verkuil
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Courbot @ 2020-11-05 12:52 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Mauro Carvalho Chehab, linux-media, Linux Kernel Mailing List

On Thu, Nov 5, 2020 at 9:36 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>
> On 05/11/2020 13:21, Alexandre Courbot wrote:
> > On Tue, Nov 3, 2020 at 6:48 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
> >>
> >> On 03/11/2020 09:51, Alexandre Courbot wrote:
> >>> Hi Hans,
> >>>
> >>> On Sat, Oct 31, 2020 at 12:09 AM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
> >>>>
> >>>> On 22/10/2020 14:24, Alexandre Courbot wrote:
> >>>>> do_poll()/do_select() seem to set the _qproc member of poll_table to
> >>>>> NULL the first time they are called on a given table, making subsequent
> >>>>> calls of poll_wait() on that table no-ops. This is a problem for mem2mem
> >>>>> which calls poll_wait() on the V4L2 queues' waitqueues only when a
> >>>>> queue-related event is requested, which may not necessarily be the case
> >>>>> during the first poll.
> >>>>>
> >>>>> For instance, a stateful decoder is typically only interested in
> >>>>> EPOLLPRI events when it starts, and will switch to listening to both
> >>>>> EPOLLPRI and EPOLLIN after receiving the initial resolution change event
> >>>>> and configuring the CAPTURE queue. However by the time that switch
> >>>>> happens and v4l2_m2m_poll_for_data() is called for the first time,
> >>>>> poll_wait() has become a no-op and the V4L2 queues waitqueues thus
> >>>>> cannot be registered.
> >>>>>
> >>>>> Fix this by moving the registration to v4l2_m2m_poll() and do it whether
> >>>>> or not one of the queue-related events are requested.
> >>>>
> >>>> This looks good, but would it be possible to add a test for this to
> >>>> v4l2-compliance? (Look for POLL_MODE_EPOLL in v4l2-test-buffers.cpp)
> >>>>
> >>>> If I understand this right, calling EPOLL_CTL_ADD for EPOLLPRI, then
> >>>> calling EPOLL_CTL_ADD for EPOLLIN/OUT would trigger this? Or does there
> >>>> have to be an epoll_wait call in between?
> >>>
> >>> Even without an epoll_wait() in between the behavior is visible.
> >>> v4l2_m2m_poll() will be called once during the initial EPOLL_CTL_ADD
> >>> and this will trigger the bug.
> >>>
> >>>> Another reason for adding this test is that I wonder if regular capture
> >>>> or output V4L2 devices don't have the same issue.
> >>>>
> >>>> It's a very subtle bug and so adding a test for this to v4l2-compliance
> >>>> would be very useful.
> >>>
> >>> I fully agree, this is very counter-intuitive since what basically
> >>> happens is that the kernel's poll_wait() function becomes a no-op
> >>> after the poll() hook of a driver is called for the first time. There
> >>> is no way one can expect this behavior just from browsing the code so
> >>> this is likely to affect other drivers.
> >>>
> >>> As for the test itself, we can easily reproduce the conditions for
> >>> failure in v4l2-test-buffers.cpp's captureBufs() function, but doing
> >>> so will make the streaming tests fail without being specific about the
> >>> cause. Or maybe we should add another pollmode to specifically test
> >>> epoll in this setup? Can I get your thoughts?
> >>
> >> No, just keep it as part of the poll test. Just add comments at the place
> >> where it fails describing this error.
> >>
> >> After all, it *is* a poll() bug, so it is only fair that it is tested as
> >> part of the epoll test.
> >>
> >> Can you call EPOLL_CTL_ADD with ev.events set to 0? And then call it again
> >> with the actual value that you need? If that triggers this issue as well,
> >> then that is a nice test (but perhaps EPOLL_CTL_ADD won't call poll() if
> >> ev.events is 0, but perhaps EPOLLERR would work instead of 0).
> >
> > Yup, actually the following is enough to make v4l2-compliance -s fail
> > with vicodec:
>
> Does it also fail with vivid? I am curious to know whether this issue is
> m2m specific or a more general problem.

It does fail actually! And that made me notice that vb2_poll() uses
the same pattern as v4l2_m2m_poll() (probably because the latter is
inspired by the former?) and needs to be fixed similarly. I will send
another patch to fix vb2_poll() as well, thanks for pointing it out!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues
  2020-11-05 12:52           ` Alexandre Courbot
@ 2020-11-05 13:12             ` Hans Verkuil
  2020-11-05 14:05               ` Alexandre Courbot
  0 siblings, 1 reply; 14+ messages in thread
From: Hans Verkuil @ 2020-11-05 13:12 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Mauro Carvalho Chehab, linux-media, Linux Kernel Mailing List

On 05/11/2020 13:52, Alexandre Courbot wrote:
> On Thu, Nov 5, 2020 at 9:36 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>>
>> On 05/11/2020 13:21, Alexandre Courbot wrote:
>>> On Tue, Nov 3, 2020 at 6:48 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>>>>
>>>> On 03/11/2020 09:51, Alexandre Courbot wrote:
>>>>> Hi Hans,
>>>>>
>>>>> On Sat, Oct 31, 2020 at 12:09 AM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>>>>>>
>>>>>> On 22/10/2020 14:24, Alexandre Courbot wrote:
>>>>>>> do_poll()/do_select() seem to set the _qproc member of poll_table to
>>>>>>> NULL the first time they are called on a given table, making subsequent
>>>>>>> calls of poll_wait() on that table no-ops. This is a problem for mem2mem
>>>>>>> which calls poll_wait() on the V4L2 queues' waitqueues only when a
>>>>>>> queue-related event is requested, which may not necessarily be the case
>>>>>>> during the first poll.
>>>>>>>
>>>>>>> For instance, a stateful decoder is typically only interested in
>>>>>>> EPOLLPRI events when it starts, and will switch to listening to both
>>>>>>> EPOLLPRI and EPOLLIN after receiving the initial resolution change event
>>>>>>> and configuring the CAPTURE queue. However by the time that switch
>>>>>>> happens and v4l2_m2m_poll_for_data() is called for the first time,
>>>>>>> poll_wait() has become a no-op and the V4L2 queues waitqueues thus
>>>>>>> cannot be registered.
>>>>>>>
>>>>>>> Fix this by moving the registration to v4l2_m2m_poll() and do it whether
>>>>>>> or not one of the queue-related events are requested.
>>>>>>
>>>>>> This looks good, but would it be possible to add a test for this to
>>>>>> v4l2-compliance? (Look for POLL_MODE_EPOLL in v4l2-test-buffers.cpp)
>>>>>>
>>>>>> If I understand this right, calling EPOLL_CTL_ADD for EPOLLPRI, then
>>>>>> calling EPOLL_CTL_ADD for EPOLLIN/OUT would trigger this? Or does there
>>>>>> have to be an epoll_wait call in between?
>>>>>
>>>>> Even without an epoll_wait() in between the behavior is visible.
>>>>> v4l2_m2m_poll() will be called once during the initial EPOLL_CTL_ADD
>>>>> and this will trigger the bug.
>>>>>
>>>>>> Another reason for adding this test is that I wonder if regular capture
>>>>>> or output V4L2 devices don't have the same issue.
>>>>>>
>>>>>> It's a very subtle bug and so adding a test for this to v4l2-compliance
>>>>>> would be very useful.
>>>>>
>>>>> I fully agree, this is very counter-intuitive since what basically
>>>>> happens is that the kernel's poll_wait() function becomes a no-op
>>>>> after the poll() hook of a driver is called for the first time. There
>>>>> is no way one can expect this behavior just from browsing the code so
>>>>> this is likely to affect other drivers.
>>>>>
>>>>> As for the test itself, we can easily reproduce the conditions for
>>>>> failure in v4l2-test-buffers.cpp's captureBufs() function, but doing
>>>>> so will make the streaming tests fail without being specific about the
>>>>> cause. Or maybe we should add another pollmode to specifically test
>>>>> epoll in this setup? Can I get your thoughts?
>>>>
>>>> No, just keep it as part of the poll test. Just add comments at the place
>>>> where it fails describing this error.
>>>>
>>>> After all, it *is* a poll() bug, so it is only fair that it is tested as
>>>> part of the epoll test.
>>>>
>>>> Can you call EPOLL_CTL_ADD with ev.events set to 0? And then call it again
>>>> with the actual value that you need? If that triggers this issue as well,
>>>> then that is a nice test (but perhaps EPOLL_CTL_ADD won't call poll() if
>>>> ev.events is 0, but perhaps EPOLLERR would work instead of 0).
>>>
>>> Yup, actually the following is enough to make v4l2-compliance -s fail
>>> with vicodec:
>>
>> Does it also fail with vivid? I am curious to know whether this issue is
>> m2m specific or a more general problem.
> 
> It does fail actually! And that made me notice that vb2_poll() uses
> the same pattern as v4l2_m2m_poll() (probably because the latter is
> inspired by the former?) and needs to be fixed similarly. I will send
> another patch to fix vb2_poll() as well, thanks for pointing it out!

I was afraid of that.

Testing epoll for control events would be interesting as well. The
vivid radio device is an example of a device that has controls, but
does not do streaming (so is not using vb2).

But from what I can see v4l2_ctrl_poll() does the right thing, so this
should be fine.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues
  2020-11-05 13:12             ` Hans Verkuil
@ 2020-11-05 14:05               ` Alexandre Courbot
  2020-11-11 12:41                 ` Alexandre Courbot
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Courbot @ 2020-11-05 14:05 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Mauro Carvalho Chehab, linux-media, Linux Kernel Mailing List

On Thu, Nov 5, 2020 at 10:12 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>
> On 05/11/2020 13:52, Alexandre Courbot wrote:
> > On Thu, Nov 5, 2020 at 9:36 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
> >>
> >> On 05/11/2020 13:21, Alexandre Courbot wrote:
> >>> On Tue, Nov 3, 2020 at 6:48 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
> >>>>
> >>>> On 03/11/2020 09:51, Alexandre Courbot wrote:
> >>>>> Hi Hans,
> >>>>>
> >>>>> On Sat, Oct 31, 2020 at 12:09 AM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
> >>>>>>
> >>>>>> On 22/10/2020 14:24, Alexandre Courbot wrote:
> >>>>>>> do_poll()/do_select() seem to set the _qproc member of poll_table to
> >>>>>>> NULL the first time they are called on a given table, making subsequent
> >>>>>>> calls of poll_wait() on that table no-ops. This is a problem for mem2mem
> >>>>>>> which calls poll_wait() on the V4L2 queues' waitqueues only when a
> >>>>>>> queue-related event is requested, which may not necessarily be the case
> >>>>>>> during the first poll.
> >>>>>>>
> >>>>>>> For instance, a stateful decoder is typically only interested in
> >>>>>>> EPOLLPRI events when it starts, and will switch to listening to both
> >>>>>>> EPOLLPRI and EPOLLIN after receiving the initial resolution change event
> >>>>>>> and configuring the CAPTURE queue. However by the time that switch
> >>>>>>> happens and v4l2_m2m_poll_for_data() is called for the first time,
> >>>>>>> poll_wait() has become a no-op and the V4L2 queues waitqueues thus
> >>>>>>> cannot be registered.
> >>>>>>>
> >>>>>>> Fix this by moving the registration to v4l2_m2m_poll() and do it whether
> >>>>>>> or not one of the queue-related events are requested.
> >>>>>>
> >>>>>> This looks good, but would it be possible to add a test for this to
> >>>>>> v4l2-compliance? (Look for POLL_MODE_EPOLL in v4l2-test-buffers.cpp)
> >>>>>>
> >>>>>> If I understand this right, calling EPOLL_CTL_ADD for EPOLLPRI, then
> >>>>>> calling EPOLL_CTL_ADD for EPOLLIN/OUT would trigger this? Or does there
> >>>>>> have to be an epoll_wait call in between?
> >>>>>
> >>>>> Even without an epoll_wait() in between the behavior is visible.
> >>>>> v4l2_m2m_poll() will be called once during the initial EPOLL_CTL_ADD
> >>>>> and this will trigger the bug.
> >>>>>
> >>>>>> Another reason for adding this test is that I wonder if regular capture
> >>>>>> or output V4L2 devices don't have the same issue.
> >>>>>>
> >>>>>> It's a very subtle bug and so adding a test for this to v4l2-compliance
> >>>>>> would be very useful.
> >>>>>
> >>>>> I fully agree, this is very counter-intuitive since what basically
> >>>>> happens is that the kernel's poll_wait() function becomes a no-op
> >>>>> after the poll() hook of a driver is called for the first time. There
> >>>>> is no way one can expect this behavior just from browsing the code so
> >>>>> this is likely to affect other drivers.
> >>>>>
> >>>>> As for the test itself, we can easily reproduce the conditions for
> >>>>> failure in v4l2-test-buffers.cpp's captureBufs() function, but doing
> >>>>> so will make the streaming tests fail without being specific about the
> >>>>> cause. Or maybe we should add another pollmode to specifically test
> >>>>> epoll in this setup? Can I get your thoughts?
> >>>>
> >>>> No, just keep it as part of the poll test. Just add comments at the place
> >>>> where it fails describing this error.
> >>>>
> >>>> After all, it *is* a poll() bug, so it is only fair that it is tested as
> >>>> part of the epoll test.
> >>>>
> >>>> Can you call EPOLL_CTL_ADD with ev.events set to 0? And then call it again
> >>>> with the actual value that you need? If that triggers this issue as well,
> >>>> then that is a nice test (but perhaps EPOLL_CTL_ADD won't call poll() if
> >>>> ev.events is 0, but perhaps EPOLLERR would work instead of 0).
> >>>
> >>> Yup, actually the following is enough to make v4l2-compliance -s fail
> >>> with vicodec:
> >>
> >> Does it also fail with vivid? I am curious to know whether this issue is
> >> m2m specific or a more general problem.
> >
> > It does fail actually! And that made me notice that vb2_poll() uses
> > the same pattern as v4l2_m2m_poll() (probably because the latter is
> > inspired by the former?) and needs to be fixed similarly. I will send
> > another patch to fix vb2_poll() as well, thanks for pointing it out!
>
> I was afraid of that.
>
> Testing epoll for control events would be interesting as well. The
> vivid radio device is an example of a device that has controls, but
> does not do streaming (so is not using vb2).
>
> But from what I can see v4l2_ctrl_poll() does the right thing, so this
> should be fine.

Indeed, it unconditionally calls poll_wait() with all the wait queues
that may wake us up (that is, only one), so there is no problem there.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues
  2020-11-05 14:05               ` Alexandre Courbot
@ 2020-11-11 12:41                 ` Alexandre Courbot
  2020-11-11 12:47                   ` Hans Verkuil
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Courbot @ 2020-11-11 12:41 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Mauro Carvalho Chehab, linux-media, Linux Kernel Mailing List

On Thu, Nov 5, 2020 at 11:05 PM Alexandre Courbot <gnurou@gmail.com> wrote:
>
> On Thu, Nov 5, 2020 at 10:12 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
> >
> > On 05/11/2020 13:52, Alexandre Courbot wrote:
> > > On Thu, Nov 5, 2020 at 9:36 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
> > >>
> > >> On 05/11/2020 13:21, Alexandre Courbot wrote:
> > >>> On Tue, Nov 3, 2020 at 6:48 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
> > >>>>
> > >>>> On 03/11/2020 09:51, Alexandre Courbot wrote:
> > >>>>> Hi Hans,
> > >>>>>
> > >>>>> On Sat, Oct 31, 2020 at 12:09 AM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
> > >>>>>>
> > >>>>>> On 22/10/2020 14:24, Alexandre Courbot wrote:
> > >>>>>>> do_poll()/do_select() seem to set the _qproc member of poll_table to
> > >>>>>>> NULL the first time they are called on a given table, making subsequent
> > >>>>>>> calls of poll_wait() on that table no-ops. This is a problem for mem2mem
> > >>>>>>> which calls poll_wait() on the V4L2 queues' waitqueues only when a
> > >>>>>>> queue-related event is requested, which may not necessarily be the case
> > >>>>>>> during the first poll.
> > >>>>>>>
> > >>>>>>> For instance, a stateful decoder is typically only interested in
> > >>>>>>> EPOLLPRI events when it starts, and will switch to listening to both
> > >>>>>>> EPOLLPRI and EPOLLIN after receiving the initial resolution change event
> > >>>>>>> and configuring the CAPTURE queue. However by the time that switch
> > >>>>>>> happens and v4l2_m2m_poll_for_data() is called for the first time,
> > >>>>>>> poll_wait() has become a no-op and the V4L2 queues waitqueues thus
> > >>>>>>> cannot be registered.
> > >>>>>>>
> > >>>>>>> Fix this by moving the registration to v4l2_m2m_poll() and do it whether
> > >>>>>>> or not one of the queue-related events are requested.
> > >>>>>>
> > >>>>>> This looks good, but would it be possible to add a test for this to
> > >>>>>> v4l2-compliance? (Look for POLL_MODE_EPOLL in v4l2-test-buffers.cpp)
> > >>>>>>
> > >>>>>> If I understand this right, calling EPOLL_CTL_ADD for EPOLLPRI, then
> > >>>>>> calling EPOLL_CTL_ADD for EPOLLIN/OUT would trigger this? Or does there
> > >>>>>> have to be an epoll_wait call in between?
> > >>>>>
> > >>>>> Even without an epoll_wait() in between the behavior is visible.
> > >>>>> v4l2_m2m_poll() will be called once during the initial EPOLL_CTL_ADD
> > >>>>> and this will trigger the bug.
> > >>>>>
> > >>>>>> Another reason for adding this test is that I wonder if regular capture
> > >>>>>> or output V4L2 devices don't have the same issue.
> > >>>>>>
> > >>>>>> It's a very subtle bug and so adding a test for this to v4l2-compliance
> > >>>>>> would be very useful.
> > >>>>>
> > >>>>> I fully agree, this is very counter-intuitive since what basically
> > >>>>> happens is that the kernel's poll_wait() function becomes a no-op
> > >>>>> after the poll() hook of a driver is called for the first time. There
> > >>>>> is no way one can expect this behavior just from browsing the code so
> > >>>>> this is likely to affect other drivers.
> > >>>>>
> > >>>>> As for the test itself, we can easily reproduce the conditions for
> > >>>>> failure in v4l2-test-buffers.cpp's captureBufs() function, but doing
> > >>>>> so will make the streaming tests fail without being specific about the
> > >>>>> cause. Or maybe we should add another pollmode to specifically test
> > >>>>> epoll in this setup? Can I get your thoughts?
> > >>>>
> > >>>> No, just keep it as part of the poll test. Just add comments at the place
> > >>>> where it fails describing this error.
> > >>>>
> > >>>> After all, it *is* a poll() bug, so it is only fair that it is tested as
> > >>>> part of the epoll test.
> > >>>>
> > >>>> Can you call EPOLL_CTL_ADD with ev.events set to 0? And then call it again
> > >>>> with the actual value that you need? If that triggers this issue as well,
> > >>>> then that is a nice test (but perhaps EPOLL_CTL_ADD won't call poll() if
> > >>>> ev.events is 0, but perhaps EPOLLERR would work instead of 0).
> > >>>
> > >>> Yup, actually the following is enough to make v4l2-compliance -s fail
> > >>> with vicodec:
> > >>
> > >> Does it also fail with vivid? I am curious to know whether this issue is
> > >> m2m specific or a more general problem.
> > >
> > > It does fail actually! And that made me notice that vb2_poll() uses
> > > the same pattern as v4l2_m2m_poll() (probably because the latter is
> > > inspired by the former?) and needs to be fixed similarly. I will send
> > > another patch to fix vb2_poll() as well, thanks for pointing it out!
> >
> > I was afraid of that.
> >
> > Testing epoll for control events would be interesting as well. The
> > vivid radio device is an example of a device that has controls, but
> > does not do streaming (so is not using vb2).
> >
> > But from what I can see v4l2_ctrl_poll() does the right thing, so this
> > should be fine.
>
> Indeed, it unconditionally calls poll_wait() with all the wait queues
> that may wake us up (that is, only one), so there is no problem there.

Sorry, I noticed that this patch was marked with "Changes Requested"
in patchwork, but isn't it valid as-is? We need a similar change to
VB2, but that should go as a separate patch IMHO. I'm fine with doing
both in one go if you prefer that though.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues
  2020-11-11 12:41                 ` Alexandre Courbot
@ 2020-11-11 12:47                   ` Hans Verkuil
  2020-11-11 12:58                     ` Alexandre Courbot
  0 siblings, 1 reply; 14+ messages in thread
From: Hans Verkuil @ 2020-11-11 12:47 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Mauro Carvalho Chehab, linux-media, Linux Kernel Mailing List

On 11/11/2020 13:41, Alexandre Courbot wrote:
> On Thu, Nov 5, 2020 at 11:05 PM Alexandre Courbot <gnurou@gmail.com> wrote:
>>
>> On Thu, Nov 5, 2020 at 10:12 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>>>
>>> On 05/11/2020 13:52, Alexandre Courbot wrote:
>>>> On Thu, Nov 5, 2020 at 9:36 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>>>>>
>>>>> On 05/11/2020 13:21, Alexandre Courbot wrote:
>>>>>> On Tue, Nov 3, 2020 at 6:48 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>>>>>>>
>>>>>>> On 03/11/2020 09:51, Alexandre Courbot wrote:
>>>>>>>> Hi Hans,
>>>>>>>>
>>>>>>>> On Sat, Oct 31, 2020 at 12:09 AM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>>>>>>>>>
>>>>>>>>> On 22/10/2020 14:24, Alexandre Courbot wrote:
>>>>>>>>>> do_poll()/do_select() seem to set the _qproc member of poll_table to
>>>>>>>>>> NULL the first time they are called on a given table, making subsequent
>>>>>>>>>> calls of poll_wait() on that table no-ops. This is a problem for mem2mem
>>>>>>>>>> which calls poll_wait() on the V4L2 queues' waitqueues only when a
>>>>>>>>>> queue-related event is requested, which may not necessarily be the case
>>>>>>>>>> during the first poll.
>>>>>>>>>>
>>>>>>>>>> For instance, a stateful decoder is typically only interested in
>>>>>>>>>> EPOLLPRI events when it starts, and will switch to listening to both
>>>>>>>>>> EPOLLPRI and EPOLLIN after receiving the initial resolution change event
>>>>>>>>>> and configuring the CAPTURE queue. However by the time that switch
>>>>>>>>>> happens and v4l2_m2m_poll_for_data() is called for the first time,
>>>>>>>>>> poll_wait() has become a no-op and the V4L2 queues waitqueues thus
>>>>>>>>>> cannot be registered.
>>>>>>>>>>
>>>>>>>>>> Fix this by moving the registration to v4l2_m2m_poll() and do it whether
>>>>>>>>>> or not one of the queue-related events are requested.
>>>>>>>>>
>>>>>>>>> This looks good, but would it be possible to add a test for this to
>>>>>>>>> v4l2-compliance? (Look for POLL_MODE_EPOLL in v4l2-test-buffers.cpp)
>>>>>>>>>
>>>>>>>>> If I understand this right, calling EPOLL_CTL_ADD for EPOLLPRI, then
>>>>>>>>> calling EPOLL_CTL_ADD for EPOLLIN/OUT would trigger this? Or does there
>>>>>>>>> have to be an epoll_wait call in between?
>>>>>>>>
>>>>>>>> Even without an epoll_wait() in between the behavior is visible.
>>>>>>>> v4l2_m2m_poll() will be called once during the initial EPOLL_CTL_ADD
>>>>>>>> and this will trigger the bug.
>>>>>>>>
>>>>>>>>> Another reason for adding this test is that I wonder if regular capture
>>>>>>>>> or output V4L2 devices don't have the same issue.
>>>>>>>>>
>>>>>>>>> It's a very subtle bug and so adding a test for this to v4l2-compliance
>>>>>>>>> would be very useful.
>>>>>>>>
>>>>>>>> I fully agree, this is very counter-intuitive since what basically
>>>>>>>> happens is that the kernel's poll_wait() function becomes a no-op
>>>>>>>> after the poll() hook of a driver is called for the first time. There
>>>>>>>> is no way one can expect this behavior just from browsing the code so
>>>>>>>> this is likely to affect other drivers.
>>>>>>>>
>>>>>>>> As for the test itself, we can easily reproduce the conditions for
>>>>>>>> failure in v4l2-test-buffers.cpp's captureBufs() function, but doing
>>>>>>>> so will make the streaming tests fail without being specific about the
>>>>>>>> cause. Or maybe we should add another pollmode to specifically test
>>>>>>>> epoll in this setup? Can I get your thoughts?
>>>>>>>
>>>>>>> No, just keep it as part of the poll test. Just add comments at the place
>>>>>>> where it fails describing this error.
>>>>>>>
>>>>>>> After all, it *is* a poll() bug, so it is only fair that it is tested as
>>>>>>> part of the epoll test.
>>>>>>>
>>>>>>> Can you call EPOLL_CTL_ADD with ev.events set to 0? And then call it again
>>>>>>> with the actual value that you need? If that triggers this issue as well,
>>>>>>> then that is a nice test (but perhaps EPOLL_CTL_ADD won't call poll() if
>>>>>>> ev.events is 0, but perhaps EPOLLERR would work instead of 0).
>>>>>>
>>>>>> Yup, actually the following is enough to make v4l2-compliance -s fail
>>>>>> with vicodec:
>>>>>
>>>>> Does it also fail with vivid? I am curious to know whether this issue is
>>>>> m2m specific or a more general problem.
>>>>
>>>> It does fail actually! And that made me notice that vb2_poll() uses
>>>> the same pattern as v4l2_m2m_poll() (probably because the latter is
>>>> inspired by the former?) and needs to be fixed similarly. I will send
>>>> another patch to fix vb2_poll() as well, thanks for pointing it out!
>>>
>>> I was afraid of that.
>>>
>>> Testing epoll for control events would be interesting as well. The
>>> vivid radio device is an example of a device that has controls, but
>>> does not do streaming (so is not using vb2).
>>>
>>> But from what I can see v4l2_ctrl_poll() does the right thing, so this
>>> should be fine.
>>
>> Indeed, it unconditionally calls poll_wait() with all the wait queues
>> that may wake us up (that is, only one), so there is no problem there.
> 
> Sorry, I noticed that this patch was marked with "Changes Requested"
> in patchwork, but isn't it valid as-is? We need a similar change to
> VB2, but that should go as a separate patch IMHO. I'm fine with doing
> both in one go if you prefer that though.
> 

In at least one reply you mentioned that you wanted to add a comment (reply
from 23 Oct). That's why I changed it to 'Changes Requested'.

Also, I prefer to fix both m2m and vb2 at the same time (separate patches,
but part of the same patch series). And together with a separate patch improving
v4l2-compliance.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues
  2020-11-11 12:47                   ` Hans Verkuil
@ 2020-11-11 12:58                     ` Alexandre Courbot
  0 siblings, 0 replies; 14+ messages in thread
From: Alexandre Courbot @ 2020-11-11 12:58 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Mauro Carvalho Chehab, linux-media, Linux Kernel Mailing List

On Wed, Nov 11, 2020 at 9:47 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>
> On 11/11/2020 13:41, Alexandre Courbot wrote:
> > On Thu, Nov 5, 2020 at 11:05 PM Alexandre Courbot <gnurou@gmail.com> wrote:
> >>
> >> On Thu, Nov 5, 2020 at 10:12 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
> >>>
> >>> On 05/11/2020 13:52, Alexandre Courbot wrote:
> >>>> On Thu, Nov 5, 2020 at 9:36 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
> >>>>>
> >>>>> On 05/11/2020 13:21, Alexandre Courbot wrote:
> >>>>>> On Tue, Nov 3, 2020 at 6:48 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
> >>>>>>>
> >>>>>>> On 03/11/2020 09:51, Alexandre Courbot wrote:
> >>>>>>>> Hi Hans,
> >>>>>>>>
> >>>>>>>> On Sat, Oct 31, 2020 at 12:09 AM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
> >>>>>>>>>
> >>>>>>>>> On 22/10/2020 14:24, Alexandre Courbot wrote:
> >>>>>>>>>> do_poll()/do_select() seem to set the _qproc member of poll_table to
> >>>>>>>>>> NULL the first time they are called on a given table, making subsequent
> >>>>>>>>>> calls of poll_wait() on that table no-ops. This is a problem for mem2mem
> >>>>>>>>>> which calls poll_wait() on the V4L2 queues' waitqueues only when a
> >>>>>>>>>> queue-related event is requested, which may not necessarily be the case
> >>>>>>>>>> during the first poll.
> >>>>>>>>>>
> >>>>>>>>>> For instance, a stateful decoder is typically only interested in
> >>>>>>>>>> EPOLLPRI events when it starts, and will switch to listening to both
> >>>>>>>>>> EPOLLPRI and EPOLLIN after receiving the initial resolution change event
> >>>>>>>>>> and configuring the CAPTURE queue. However by the time that switch
> >>>>>>>>>> happens and v4l2_m2m_poll_for_data() is called for the first time,
> >>>>>>>>>> poll_wait() has become a no-op and the V4L2 queues waitqueues thus
> >>>>>>>>>> cannot be registered.
> >>>>>>>>>>
> >>>>>>>>>> Fix this by moving the registration to v4l2_m2m_poll() and do it whether
> >>>>>>>>>> or not one of the queue-related events are requested.
> >>>>>>>>>
> >>>>>>>>> This looks good, but would it be possible to add a test for this to
> >>>>>>>>> v4l2-compliance? (Look for POLL_MODE_EPOLL in v4l2-test-buffers.cpp)
> >>>>>>>>>
> >>>>>>>>> If I understand this right, calling EPOLL_CTL_ADD for EPOLLPRI, then
> >>>>>>>>> calling EPOLL_CTL_ADD for EPOLLIN/OUT would trigger this? Or does there
> >>>>>>>>> have to be an epoll_wait call in between?
> >>>>>>>>
> >>>>>>>> Even without an epoll_wait() in between the behavior is visible.
> >>>>>>>> v4l2_m2m_poll() will be called once during the initial EPOLL_CTL_ADD
> >>>>>>>> and this will trigger the bug.
> >>>>>>>>
> >>>>>>>>> Another reason for adding this test is that I wonder if regular capture
> >>>>>>>>> or output V4L2 devices don't have the same issue.
> >>>>>>>>>
> >>>>>>>>> It's a very subtle bug and so adding a test for this to v4l2-compliance
> >>>>>>>>> would be very useful.
> >>>>>>>>
> >>>>>>>> I fully agree, this is very counter-intuitive since what basically
> >>>>>>>> happens is that the kernel's poll_wait() function becomes a no-op
> >>>>>>>> after the poll() hook of a driver is called for the first time. There
> >>>>>>>> is no way one can expect this behavior just from browsing the code so
> >>>>>>>> this is likely to affect other drivers.
> >>>>>>>>
> >>>>>>>> As for the test itself, we can easily reproduce the conditions for
> >>>>>>>> failure in v4l2-test-buffers.cpp's captureBufs() function, but doing
> >>>>>>>> so will make the streaming tests fail without being specific about the
> >>>>>>>> cause. Or maybe we should add another pollmode to specifically test
> >>>>>>>> epoll in this setup? Can I get your thoughts?
> >>>>>>>
> >>>>>>> No, just keep it as part of the poll test. Just add comments at the place
> >>>>>>> where it fails describing this error.
> >>>>>>>
> >>>>>>> After all, it *is* a poll() bug, so it is only fair that it is tested as
> >>>>>>> part of the epoll test.
> >>>>>>>
> >>>>>>> Can you call EPOLL_CTL_ADD with ev.events set to 0? And then call it again
> >>>>>>> with the actual value that you need? If that triggers this issue as well,
> >>>>>>> then that is a nice test (but perhaps EPOLL_CTL_ADD won't call poll() if
> >>>>>>> ev.events is 0, but perhaps EPOLLERR would work instead of 0).
> >>>>>>
> >>>>>> Yup, actually the following is enough to make v4l2-compliance -s fail
> >>>>>> with vicodec:
> >>>>>
> >>>>> Does it also fail with vivid? I am curious to know whether this issue is
> >>>>> m2m specific or a more general problem.
> >>>>
> >>>> It does fail actually! And that made me notice that vb2_poll() uses
> >>>> the same pattern as v4l2_m2m_poll() (probably because the latter is
> >>>> inspired by the former?) and needs to be fixed similarly. I will send
> >>>> another patch to fix vb2_poll() as well, thanks for pointing it out!
> >>>
> >>> I was afraid of that.
> >>>
> >>> Testing epoll for control events would be interesting as well. The
> >>> vivid radio device is an example of a device that has controls, but
> >>> does not do streaming (so is not using vb2).
> >>>
> >>> But from what I can see v4l2_ctrl_poll() does the right thing, so this
> >>> should be fine.
> >>
> >> Indeed, it unconditionally calls poll_wait() with all the wait queues
> >> that may wake us up (that is, only one), so there is no problem there.
> >
> > Sorry, I noticed that this patch was marked with "Changes Requested"
> > in patchwork, but isn't it valid as-is? We need a similar change to
> > VB2, but that should go as a separate patch IMHO. I'm fine with doing
> > both in one go if you prefer that though.
> >
>
> In at least one reply you mentioned that you wanted to add a comment (reply
> from 23 Oct). That's why I changed it to 'Changes Requested'.

Ah, that's correct. Thanks for clarifying.

> Also, I prefer to fix both m2m and vb2 at the same time (separate patches,
> but part of the same patch series). And together with a separate patch improving
> v4l2-compliance.

Ack - will submit a series and a separate patch for v4l-utils.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues
  2020-11-03  9:48     ` Hans Verkuil
  2020-11-05 12:21       ` Alexandre Courbot
@ 2020-11-23 15:18       ` Alexandre Courbot
  1 sibling, 0 replies; 14+ messages in thread
From: Alexandre Courbot @ 2020-11-23 15:18 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Mauro Carvalho Chehab, linux-media, Linux Kernel Mailing List

On Tue, Nov 3, 2020 at 6:48 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
> There is one other place where this needs to be tested: testEvents() in
> v4l2-test-controls.cpp: currently this only tests select(), but there
> should be a second epoll test here as well that just tests EPOLLPRI.
>
> This would catch drivers that do not stream (i.e. no EPOLLIN/OUT) but
> that do have controls (so support EPOLLPRI).

IIUC this part should not require fixing - EPOLLPRI is behaving
properly, the bug is only with EPOLLIN/EPOLLOUT.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-11-23 15:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-22 12:24 [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues Alexandre Courbot
2020-10-23 14:22 ` Alexandre Courbot
2020-10-30 15:09 ` Hans Verkuil
2020-11-03  8:51   ` Alexandre Courbot
2020-11-03  9:48     ` Hans Verkuil
2020-11-05 12:21       ` Alexandre Courbot
2020-11-05 12:36         ` Hans Verkuil
2020-11-05 12:52           ` Alexandre Courbot
2020-11-05 13:12             ` Hans Verkuil
2020-11-05 14:05               ` Alexandre Courbot
2020-11-11 12:41                 ` Alexandre Courbot
2020-11-11 12:47                   ` Hans Verkuil
2020-11-11 12:58                     ` Alexandre Courbot
2020-11-23 15:18       ` Alexandre Courbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).