All of lore.kernel.org
 help / color / mirror / Atom feed
* General protection fault on rmmod cx8800
@ 2009-02-15 20:41 Jean Delvare
  2009-03-02 12:39 ` Jean Delvare
  0 siblings, 1 reply; 15+ messages in thread
From: Jean Delvare @ 2009-02-15 20:41 UTC (permalink / raw)
  To: linux-media

Hi all,

Today I have hit the following general protection fault when removing
module cx8800:

Feb 15 18:30:09 hyperion kernel: cx88/2: unregistering cx8802 driver, type: dvb access: shared
Feb 15 18:30:09 hyperion kernel: cx88[0]/2: subsystem: 107d:665f, board: WinFast DTV1000-T [card=35]
Feb 15 18:30:09 hyperion kernel: cx88-mpeg driver manager 0000:02:04.2: PCI INT A disabled
Feb 15 18:30:09 hyperion kernel: cx8800 0000:02:04.0: PCI INT A disabled
Feb 15 18:30:09 hyperion kernel: general protection fault: 0000 [#1]
Feb 15 18:30:09 hyperion kernel: last sysfs file: /sys/devices/pnp0/00:08/i2c-adapter/i2c-3/3-004c/temp2_crit_alarm
Feb 15 18:30:09 hyperion kernel: CPU 0
Feb 15 18:30:09 hyperion kernel: Modules linked in: lm90 w83627ehf
  hwmon_vid i2c_parport isofs ip6t_LOG xt_tcpudp xt_pkttype ipt_LOG
  xt_limit binfmt_misc snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device
  nfs lockd sunrpc radeon drm nf_conntrack_ipv6 ip6t_REJECT xt_NOTRACK
  ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle
  nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
  ip_tables ip6table_filter ip6_tables x_tables ipv6 nls_iso8859_1
  nls_cp437 vfat fuse loop dm_mod dvb_pll cx22702 cx88_vp3054_i2c zr36060
  cx88xx snd_intel8x0 ir_common snd_ac97_codec saa7110 snd_pcsp tveeprom
  ac97_bus zr36067 videobuf_dvb i2c_algo_bit compat_ioctl32 dvb_core
  snd_pcm snd_timer v4l2_common videocodec parport_pc parport btcx_risc
  iTCO_wdt snd videobuf_dma_sg 8139too soundcore videodev videobuf_core
  v4l1_compat mii i2c_i801 snd_page_alloc sr_mod cdrom intel_agp button
  sg sd_mod ehci_hcd uhci_hcd usbcore edd ext3 mbcache jbd fan
  ide_pci_generic piix ide_core ata_generic ata_piix libata thermal
  processor thermal_sys hwmon [last unloaded: cx8800]
Feb 15 18:30:09 hyperion kernel: Pid: 5, comm: events/0 Not tainted 2.6.28.5 #5
Feb 15 18:30:09 hyperion kernel: RIP: 0010:[<ffffffffa0256c18>]  [<ffffffffa0256c18>] cx88_ir_work+0x32/0x236 [cx88xx]
Feb 15 18:30:09 hyperion kernel: RSP: 0000:ffff88003f86be20  EFLAGS: 00010202
Feb 15 18:30:09 hyperion kernel: RAX: 0000000000350010 RBX: ffff88003e1c0ec8 RCX: 0000000000000001
Feb 15 18:30:09 hyperion kernel: RDX: ffffffff80666ee0 RSI: ffffffff809fc750 RDI: ffff88003e1c0ec0
Feb 15 18:30:09 hyperion kernel: RBP: ffff88003f86be50 R08: 0000000000000001 R09: ffffffbff9b00198
Feb 15 18:30:09 hyperion kernel: R10: 0000000000000080 R11: 0000000000000001 R12: ffff88003e1c0ec0
Feb 15 18:30:09 hyperion kernel: R13: ffff88003e1c0c00 R14: 2f4065766f6d6572 R15: ffff88003f8201f0
Feb 15 18:30:09 hyperion kernel: FS:  0000000000000000(0000) GS:ffffffff805f3020(0000) knlGS:0000000000000000
Feb 15 18:30:09 hyperion kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Feb 15 18:30:09 hyperion kernel: CR2: 00000000004042e0 CR3: 000000003ee63000 CR4: 00000000000006e0
Feb 15 18:30:09 hyperion kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 15 18:30:09 hyperion kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 15 18:30:10 hyperion kernel: Process events/0 (pid: 5, threadinfo ffff88003f86a000, task ffff88003f845140)
Feb 15 18:30:10 hyperion kernel: Stack:
Feb 15 18:30:10 hyperion kernel:  ffff88003f86be50 ffff88003e1c0ec8 ffff88003e1c0ec0 ffff88003f8201c0
Feb 15 18:30:10 hyperion kernel:  ffffffffa0256be6 ffff88003f8201f0 ffff88003f86bec0 ffffffff802426b0
Feb 15 18:30:10 hyperion kernel:  ffffffff8024268a ffffffff8042a910 ffffffffa0264444 0000000000000000
Feb 15 18:30:10 hyperion kernel: Call Trace:
Feb 15 18:30:10 hyperion kernel:  [<ffffffffa0256be6>] ? cx88_ir_work+0x0/0x236 [cx88xx]
Feb 15 18:30:10 hyperion kernel:  [<ffffffff802426b0>] run_workqueue+0xee/0x21f
Feb 15 18:30:10 hyperion kernel:  [<ffffffff8024268a>] ? run_workqueue+0xc8/0x21f
Feb 15 18:30:10 hyperion kernel:  [<ffffffff8042a910>] ? thread_return+0x30/0x1cd
Feb 15 18:30:10 hyperion kernel:  [<ffffffff80242c7e>] worker_thread+0xa6/0x111
Feb 15 18:30:10 hyperion kernel:  [<ffffffff802465b1>] ? autoremove_wake_function+0x0/0x3b
Feb 15 18:30:10 hyperion kernel:  [<ffffffff80242bd8>] ? worker_thread+0x0/0x111
Feb 15 18:30:10 hyperion kernel:  [<ffffffff80246306>] kthread+0x49/0x7b
Feb 15 18:30:10 hyperion kernel:  [<ffffffff8020c019>] child_rip+0xa/0x11
Feb 15 18:30:10 hyperion kernel:  [<ffffffff8020bb34>] ? restore_args+0x0/0x30
Feb 15 18:30:10 hyperion kernel:  [<ffffffff802462bd>] ? kthread+0x0/0x7b
Feb 15 18:30:10 hyperion kernel:  [<ffffffff8020c00f>] ? child_rip+0x0/0x11
Feb 15 18:30:10 hyperion kernel: Code: 56 41 55 41 54 53 48 83 ec 08 49 89 fc 4c 8d af 40 fd ff ff 4c 8b b7 40 fd ff ff 41 8b 85 48 03 00 00 c1 e8 02 89 c0 48 c1 e0 02 <49> 03 46 40 8b 18 41 8b 86 10 0a 00 00 83 f8 23 0f 84 fc 00 00
Feb 15 18:30:10 hyperion kernel: RIP  [<ffffffffa0256c18>] cx88_ir_work+0x32/0x236 [cx88xx]
Feb 15 18:30:10 hyperion kernel:  RSP <ffff88003f86be20>
Feb 15 18:30:10 hyperion kernel: ---[ end trace 45f7ffdbf475eafd ]---
Feb 15 18:30:10 hyperion kernel: pci 0000:02:01.0: PCI INT A disabled

This was with kernel 2.6.28.5 on x86-64. I then hit it a second time
using Hans Verkuil's v4l-dvb-zoran tree. In both case the keyboard
started acting weirdly, first time it didn't respond to any key, second
time it was like the return key was stuck. Both time I was able to
reboot the machine with some insistence (SysRq or remote logging,
killing X and reboot.)

It doesn't happen on each rmmod cx8800, but as I managed to reproduce
it once, I guess I should be able to reproduce it again if needed.
Please let me know if you need more information or want me to test
something.

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: General protection fault on rmmod cx8800
  2009-02-15 20:41 General protection fault on rmmod cx8800 Jean Delvare
@ 2009-03-02 12:39 ` Jean Delvare
  2009-03-02 14:16   ` Andy Walls
  0 siblings, 1 reply; 15+ messages in thread
From: Jean Delvare @ 2009-03-02 12:39 UTC (permalink / raw)
  To: linux-media

On Sun, 15 Feb 2009 21:41:08 +0100, Jean Delvare wrote:
> Hi all,
> 
> Today I have hit the following general protection fault when removing
> module cx8800:

This has just happened to me again today, with kernel 2.6.28.7. I have
opened a bug in bugzilla:

http://bugzilla.kernel.org/show_bug.cgi?id=12802

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: General protection fault on rmmod cx8800
  2009-03-02 12:39 ` Jean Delvare
@ 2009-03-02 14:16   ` Andy Walls
  2009-03-02 16:03     ` Jean Delvare
  0 siblings, 1 reply; 15+ messages in thread
From: Andy Walls @ 2009-03-02 14:16 UTC (permalink / raw)
  To: Jean Delvare; +Cc: linux-media

On Mon, 2009-03-02 at 13:39 +0100, Jean Delvare wrote:
> On Sun, 15 Feb 2009 21:41:08 +0100, Jean Delvare wrote:
> > Hi all,
> > 
> > Today I have hit the following general protection fault when removing
> > module cx8800:
> 
> This has just happened to me again today, with kernel 2.6.28.7. I have
> opened a bug in bugzilla:
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=12802
> 

I'll try to look at it later today.  But right off the bat, I think
here's a problem:


void cx88_ir_stop(struct cx88_core *core, struct cx88_IR *ir)
{
[...]
        if (ir->polling) {
                del_timer_sync(&ir->timer);   <--- Wrong order?
                flush_scheduled_work();       <--- Wrong order?
        }
}

static void cx88_ir_work(struct work_struct *work)
{
        struct cx88_IR *ir = container_of(work, struct cx88_IR, work);

        cx88_ir_handle_key(ir);
        mod_timer(&ir->timer, jiffies + msecs_to_jiffies(ir->polling));
}


mod_timer() acts like del_timer(); mumble; add_timer();  If there was
any work flushed when stopping the IR, a new timer gets added.  That
seems wrong.

Regards,
Andy


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: General protection fault on rmmod cx8800
  2009-03-02 14:16   ` Andy Walls
@ 2009-03-02 16:03     ` Jean Delvare
  2009-03-02 19:05       ` Jean Delvare
  2009-03-02 20:09       ` Andy Walls
  0 siblings, 2 replies; 15+ messages in thread
From: Jean Delvare @ 2009-03-02 16:03 UTC (permalink / raw)
  To: Andy Walls; +Cc: linux-media

Hi Andy,

On Mon, 02 Mar 2009 09:16:05 -0500, Andy Walls wrote:
> On Mon, 2009-03-02 at 13:39 +0100, Jean Delvare wrote:
> > On Sun, 15 Feb 2009 21:41:08 +0100, Jean Delvare wrote:
> > > Hi all,
> > > 
> > > Today I have hit the following general protection fault when removing
> > > module cx8800:
> > 
> > This has just happened to me again today, with kernel 2.6.28.7. I have
> > opened a bug in bugzilla:
> > 
> > http://bugzilla.kernel.org/show_bug.cgi?id=12802
> > 
> 
> I'll try to look at it later today.  But right off the bat, I think
> here's a problem:

Thanks for your help looking into this!

> void cx88_ir_stop(struct cx88_core *core, struct cx88_IR *ir)
> {
> [...]
>         if (ir->polling) {
>                 del_timer_sync(&ir->timer);   <--- Wrong order?
>                 flush_scheduled_work();       <--- Wrong order?
>         }
> }

The order looks OK to me. If you flush the event workqueue before
deleting the timer, the timer could rearm before you delete it, and
you'd return before the workqueue is actually flushed. As a matter of
fact, both bttv-input and ir-kbd-i2c have it in the same order.

> static void cx88_ir_work(struct work_struct *work)
> {
>         struct cx88_IR *ir = container_of(work, struct cx88_IR, work);
> 
>         cx88_ir_handle_key(ir);
>         mod_timer(&ir->timer, jiffies + msecs_to_jiffies(ir->polling));
> }
> 
> 
> mod_timer() acts like del_timer(); mumble; add_timer();  If there was
> any work flushed when stopping the IR, a new timer gets added.  That
> seems wrong.

As far as I can see the key difference between bttv-input and
cx88-input is that bttv-input only uses a simple self-rearming timer,
while cx88-input uses a timer and a separate workqueue. The timer runs
the workqueue, which rearms the timer, etc. When you flush the timer,
the separate workqueue can be still active. I presume this is what
happens on my system. I guess the reason for the separate workqueue is
that the processing may take some time and we don't want to hurt the
system's performance?

So we need to flush both the event workqueue (with
flush_scheduled_work) and the separate workqueue (with
flush_workqueue), at the same time, otherwise the active one may rearm
the flushed one again. This looks tricky, as obviously we can't flush
both at the exact same time. Alternatively, if we could get rid of one
of the queues, we'd have only one that needs flushing, this would be a
lot easier...

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: General protection fault on rmmod cx8800
  2009-03-02 16:03     ` Jean Delvare
@ 2009-03-02 19:05       ` Jean Delvare
  2009-03-02 21:12         ` Trent Piepho
  2009-03-02 22:33         ` Andy Walls
  2009-03-02 20:09       ` Andy Walls
  1 sibling, 2 replies; 15+ messages in thread
From: Jean Delvare @ 2009-03-02 19:05 UTC (permalink / raw)
  To: Andy Walls; +Cc: linux-media

On Mon, 2 Mar 2009 17:03:49 +0100, Jean Delvare wrote:
> As far as I can see the key difference between bttv-input and
> cx88-input is that bttv-input only uses a simple self-rearming timer,
> while cx88-input uses a timer and a separate workqueue. The timer runs
> the workqueue, which rearms the timer, etc. When you flush the timer,
> the separate workqueue can be still active. I presume this is what
> happens on my system. I guess the reason for the separate workqueue is
> that the processing may take some time and we don't want to hurt the
> system's performance?
> 
> So we need to flush both the event workqueue (with
> flush_scheduled_work) and the separate workqueue (with
> flush_workqueue), at the same time, otherwise the active one may rearm
> the flushed one again. This looks tricky, as obviously we can't flush
> both at the exact same time. Alternatively, if we could get rid of one
> of the queues, we'd have only one that needs flushing, this would be a
> lot easier...

Switching to delayed_work seems to do the trick (note this is a 2.6.28
patch):

---
 drivers/media/video/cx88/cx88-input.c |   26 ++++++++------------------
 1 file changed, 8 insertions(+), 18 deletions(-)

--- linux-2.6.28.orig/drivers/media/video/cx88/cx88-input.c	2009-03-02 19:11:24.000000000 +0100
+++ linux-2.6.28/drivers/media/video/cx88/cx88-input.c	2009-03-02 19:49:31.000000000 +0100
@@ -48,8 +48,7 @@ struct cx88_IR {
 
 	/* poll external decoder */
 	int polling;
-	struct work_struct work;
-	struct timer_list timer;
+	struct delayed_work work;
 	u32 gpio_addr;
 	u32 last_gpio;
 	u32 mask_keycode;
@@ -143,27 +142,20 @@ static void cx88_ir_handle_key(struct cx
 	}
 }
 
-static void ir_timer(unsigned long data)
-{
-	struct cx88_IR *ir = (struct cx88_IR *)data;
-
-	schedule_work(&ir->work);
-}
-
 static void cx88_ir_work(struct work_struct *work)
 {
-	struct cx88_IR *ir = container_of(work, struct cx88_IR, work);
+	struct delayed_work *dwork = container_of(work, struct delayed_work, work);
+	struct cx88_IR *ir = container_of(dwork, struct cx88_IR, work);
 
 	cx88_ir_handle_key(ir);
-	mod_timer(&ir->timer, jiffies + msecs_to_jiffies(ir->polling));
+	schedule_delayed_work(dwork, msecs_to_jiffies(ir->polling));
 }
 
 void cx88_ir_start(struct cx88_core *core, struct cx88_IR *ir)
 {
 	if (ir->polling) {
-		setup_timer(&ir->timer, ir_timer, (unsigned long)ir);
-		INIT_WORK(&ir->work, cx88_ir_work);
-		schedule_work(&ir->work);
+		INIT_DELAYED_WORK(&ir->work, cx88_ir_work);
+		schedule_delayed_work(&ir->work, msecs_to_jiffies(ir->polling));
 	}
 	if (ir->sampling) {
 		core->pci_irqmask |= PCI_INT_IR_SMPINT;
@@ -179,10 +171,8 @@ void cx88_ir_stop(struct cx88_core *core
 		core->pci_irqmask &= ~PCI_INT_IR_SMPINT;
 	}
 
-	if (ir->polling) {
-		del_timer_sync(&ir->timer);
-		flush_scheduled_work();
-	}
+	if (ir->polling)
+		cancel_delayed_work_sync(&ir->work);
 }
 
 /* ---------------------------------------------------------------------- */

At least I didn't have any general protection fault with this patch
applied. Comments?

Thanks,
-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: General protection fault on rmmod cx8800
  2009-03-02 16:03     ` Jean Delvare
  2009-03-02 19:05       ` Jean Delvare
@ 2009-03-02 20:09       ` Andy Walls
  2009-03-05  5:09         ` Trent Piepho
  1 sibling, 1 reply; 15+ messages in thread
From: Andy Walls @ 2009-03-02 20:09 UTC (permalink / raw)
  To: Jean Delvare; +Cc: linux-media

On Mon, 2009-03-02 at 17:03 +0100, Jean Delvare wrote:
> Hi Andy,
> 
> On Mon, 02 Mar 2009 09:16:05 -0500, Andy Walls wrote:
> > On Mon, 2009-03-02 at 13:39 +0100, Jean Delvare wrote:
> > > On Sun, 15 Feb 2009 21:41:08 +0100, Jean Delvare wrote:
> > > > Hi all,
> > > > 
> > > > Today I have hit the following general protection fault when removing
> > > > module cx8800:
> > > 
> > > This has just happened to me again today, with kernel 2.6.28.7. I have
> > > opened a bug in bugzilla:
> > > 
> > > http://bugzilla.kernel.org/show_bug.cgi?id=12802
> > > 
> > 
> > I'll try to look at it later today.  But right off the bat, I think
> > here's a problem:
> 
> Thanks for your help looking into this!
> 
> > void cx88_ir_stop(struct cx88_core *core, struct cx88_IR *ir)
> > {
> > [...]
> >         if (ir->polling) {
> >                 del_timer_sync(&ir->timer);   <--- Wrong order?
> >                 flush_scheduled_work();       <--- Wrong order?
> >         }
> > }
> 
> The order looks OK to me. If you flush the event workqueue before
> deleting the timer, the timer could rearm before you delete it, and
> you'd return before the workqueue is actually flushed. As a matter of
> fact, both bttv-input and ir-kbd-i2c have it in the same order.

flush_scheduled_work() causes any queued work to execute.  If queued
cx88_IR work exists, it *will* rearm the timer via mod_timer() - the
del_timer_sync() is nullified in this case.



> > static void cx88_ir_work(struct work_struct *work)
> > {
> >         struct cx88_IR *ir = container_of(work, struct cx88_IR, work);
> > 
> >         cx88_ir_handle_key(ir);
> >         mod_timer(&ir->timer, jiffies + msecs_to_jiffies(ir->polling));
> > }
> > 
> > 
> > mod_timer() acts like del_timer(); mumble; add_timer();  If there was
> > any work flushed when stopping the IR, a new timer gets added.  That
> > seems wrong.
> 
> As far as I can see the key difference between bttv-input and
> cx88-input is that bttv-input only uses a simple self-rearming timer,
> while cx88-input uses a timer and a separate workqueue. The timer runs
> the workqueue, which rearms the timer, etc. When you flush the timer,
> the separate workqueue can be still active. I presume this is what
> happens on my system. I guess the reason for the separate workqueue is
> that the processing may take some time and we don't want to hurt the
> system's performance?

It depends.  You use work_queus in the first place to have interrupts
disabled for as little time as possible on a processor; increasing
system performance.  If ordering of deferred work is important, you want
your own single threaded work handler.  If latency of deferred work is
important, you want your own normal (multithreaded) work handler.  If
neither is important, you just use the default kernel eventd. 

> So we need to flush both the event workqueue (with
> flush_scheduled_work) and the separate workqueue (with
> flush_workqueue), at the same time, otherwise the active one may rearm
> the flushed one again. This looks tricky, as obviously we can't flush
> both at the exact same time. Alternatively, if we could get rid of one
> of the queues, we'd have only one that needs flushing, this would be a
> lot easier...

I still need to look into what you just mentioned (I've be out shoveling
over 12 inches of snow off of my driveway for the past few hours).

Depending on how tangled the mess is I was thinking of a few things:

1. cancelling the work instead of flushing it (only good for newer
kernels)

2. a state variable that indicates we're removing the device and to not
call mod_timer() which rearms the timer.


But before all that, I still need to decode the oops to see exactly what
failed.


Regards,
Andy


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: General protection fault on rmmod cx8800
  2009-03-02 19:05       ` Jean Delvare
@ 2009-03-02 21:12         ` Trent Piepho
  2009-03-02 21:52           ` Jean Delvare
  2009-03-02 22:33         ` Andy Walls
  1 sibling, 1 reply; 15+ messages in thread
From: Trent Piepho @ 2009-03-02 21:12 UTC (permalink / raw)
  To: Jean Delvare; +Cc: Andy Walls, linux-media

On Mon, 2 Mar 2009, Jean Delvare wrote:
> On Mon, 2 Mar 2009 17:03:49 +0100, Jean Delvare wrote:
> > As far as I can see the key difference between bttv-input and
> > cx88-input is that bttv-input only uses a simple self-rearming timer,
> > while cx88-input uses a timer and a separate workqueue. The timer runs
> > the workqueue, which rearms the timer, etc. When you flush the timer,
> > the separate workqueue can be still active. I presume this is what
> > happens on my system. I guess the reason for the separate workqueue is
> > that the processing may take some time and we don't want to hurt the
> > system's performance?
> >
> > So we need to flush both the event workqueue (with
> > flush_scheduled_work) and the separate workqueue (with
> > flush_workqueue), at the same time, otherwise the active one may rearm

What are the two work queues are you talking about?  I don't see any actual
work queues created.  Just one work function that is scheduled on the
system work queue.  The timer is a softirq and doesn't run on a work queue.

> Switching to delayed_work seems to do the trick (note this is a 2.6.28
> patch):

Makes the most sense to me.  I was just about to make a patch to do the
same thing when I got your email.  Though I was going to patch the v4l-dvb
sources to avoid porting work.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: General protection fault on rmmod cx8800
  2009-03-02 21:12         ` Trent Piepho
@ 2009-03-02 21:52           ` Jean Delvare
  2009-03-03  9:40             ` Trent Piepho
  0 siblings, 1 reply; 15+ messages in thread
From: Jean Delvare @ 2009-03-02 21:52 UTC (permalink / raw)
  To: Trent Piepho; +Cc: Andy Walls, linux-media

Hi Trent,

On Mon, 2 Mar 2009 13:12:24 -0800 (PST), Trent Piepho wrote:
> On Mon, 2 Mar 2009, Jean Delvare wrote:
> > On Mon, 2 Mar 2009 17:03:49 +0100, Jean Delvare wrote:
> > > As far as I can see the key difference between bttv-input and
> > > cx88-input is that bttv-input only uses a simple self-rearming timer,
> > > while cx88-input uses a timer and a separate workqueue. The timer runs
> > > the workqueue, which rearms the timer, etc. When you flush the timer,
> > > the separate workqueue can be still active. I presume this is what
> > > happens on my system. I guess the reason for the separate workqueue is
> > > that the processing may take some time and we don't want to hurt the
> > > system's performance?
> > >
> > > So we need to flush both the event workqueue (with
> > > flush_scheduled_work) and the separate workqueue (with
> > > flush_workqueue), at the same time, otherwise the active one may rearm
> 
> What are the two work queues are you talking about?  I don't see any actual
> work queues created.  Just one work function that is scheduled on the
> system work queue.  The timer is a softirq and doesn't run on a work queue.

Sorry, I misread the code. There's only one work queue involved (the
system one). Reading the timer code again now, I admit I am curious how
I managed to misread it to that degree...

The key point remains though: we'd need to delete the timer and flush
the system workqueue at the exact same time, which is not possible, or
to add some sort of signaling between the work and the timer. Or use
delayed_work.

> > Switching to delayed_work seems to do the trick (note this is a 2.6.28
> > patch):
> 
> Makes the most sense to me.  I was just about to make a patch to do the
> same thing when I got your email.  Though I was going to patch the v4l-dvb
> sources to avoid porting work.

It was easier for me to test on an upstream kernel. The porting should
be fairly easy, I can take care of it. The difficult part will be to
handle the compatibility with kernels < 2.6.20 because delayed_work was
introduced in 2.6.20. Probably "compatibility" here will simply mean
that the bug I've hit will only be fixed for kernels >= 2.6.20. Which
once again raises the question of whether we really want to keep
supporting these old kernels.

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: General protection fault on rmmod cx8800
  2009-03-02 19:05       ` Jean Delvare
  2009-03-02 21:12         ` Trent Piepho
@ 2009-03-02 22:33         ` Andy Walls
  1 sibling, 0 replies; 15+ messages in thread
From: Andy Walls @ 2009-03-02 22:33 UTC (permalink / raw)
  To: Jean Delvare; +Cc: linux-media

On Mon, 2009-03-02 at 20:05 +0100, Jean Delvare wrote:
> On Mon, 2 Mar 2009 17:03:49 +0100, Jean Delvare wrote:

>From your original oops, the fault is in cx88_ir_work() but
cx88_ir_handle_key() gets inlined, so you'll need to look at both
functions:

   0:	56                   	push   %rsi
   1:	41 55                	push   %r13
   3:	41 54                	push   %r12
   5:	53                   	push   %rbx
   6:	48 83 ec 08          	sub    $0x8,%rsp
   a:	49 89 fc             	mov    %rdi,%r12
   d:	4c 8d af 40 fd ff ff 	lea    -0x2c0(%rdi),%r13   struct cx88_IR *ir = container_of(work, struct cx88_IR, work);
  14:	4c 8b b7 40 fd ff ff 	mov    -0x2c0(%rdi),%r14   struct cx88_core *core = ir->core;
  1b:	41 8b 85 48 03 00 00 	mov    0x348(%r13),%eax    fetch ir->gpio_addr [for gpio = cx_read(ir->gpio_addr);]
  22:	c1 e8 02             	shr    $0x2,%eax           ((reg)>>2)  [from cx_read() macro]
  25:	89 c0                	mov    %eax,%eax           [I don't know - maybe readl() related]
  27:	48 c1 e0 02          	shl    $0x2,%rax           [I don't know - maybe readl() related]
  2b:	49 03 46 40          	add    0x40(%r14),%rax   core->lmmio + ...  <--- Oops is here  
  2f:	8b 18                	mov    (%rax),%ebx       readl(core->lmmio + ((reg)>>2)) [cx_read() definition]
  31:	41 8b 86 10 0a 00 00 	mov    0xa10(%r14),%eax  switch (core->boardnr) {
  38:	83 f8 23             	cmp    $0x23,%eax        case CX88_BOARD_WINFAST_DTV1000:

"core" is invalid as %r14 holds junk:

R14: 2f4065766f6d6572

A valid address would start with  0xffff.....  In fact, %r14's value is
a magic cookie: "remove@/".

It's safe to assume that the work handler was called when the per device
instance of cx88_core was gone.



> > As far as I can see the key difference between bttv-input and
> > cx88-input is that bttv-input only uses a simple self-rearming timer,
> > while cx88-input uses a timer and a separate workqueue. The timer runs
> > the workqueue, which rearms the timer, etc. When you flush the timer,
> > the separate workqueue can be still active. I presume this is what
> > happens on my system. I guess the reason for the separate workqueue is
> > that the processing may take some time and we don't want to hurt the
> > system's performance?
 
> > So we need to flush both the event workqueue (with
> > flush_scheduled_work) and the separate workqueue (with
> > flush_workqueue), at the same time, otherwise the active one may rearm
> > the flushed one again. This looks tricky, as obviously we can't flush
> > both at the exact same time. Alternatively, if we could get rid of one
> > of the queues, we'd have only one that needs flushing, this would be a
> > lot easier...
> 
> Switching to delayed_work seems to do the trick (note this is a 2.6.28
> patch):
> 
> ---
>  drivers/media/video/cx88/cx88-input.c |   26 ++++++++------------------
>  1 file changed, 8 insertions(+), 18 deletions(-)
> 
> --- linux-2.6.28.orig/drivers/media/video/cx88/cx88-input.c	2009-03-02 19:11:24.000000000 +0100
> +++ linux-2.6.28/drivers/media/video/cx88/cx88-input.c	2009-03-02 19:49:31.000000000 +0100
> @@ -48,8 +48,7 @@ struct cx88_IR {
>  
>  	/* poll external decoder */
>  	int polling;
> -	struct work_struct work;
> -	struct timer_list timer;
> +	struct delayed_work work;
>  	u32 gpio_addr;
>  	u32 last_gpio;
>  	u32 mask_keycode;
> @@ -143,27 +142,20 @@ static void cx88_ir_handle_key(struct cx
>  	}
>  }
>  
> -static void ir_timer(unsigned long data)
> -{
> -	struct cx88_IR *ir = (struct cx88_IR *)data;
> -
> -	schedule_work(&ir->work);
> -}
> -
>  static void cx88_ir_work(struct work_struct *work)
>  {
> -	struct cx88_IR *ir = container_of(work, struct cx88_IR, work);
> +	struct delayed_work *dwork = container_of(work, struct delayed_work, work);
> +	struct cx88_IR *ir = container_of(dwork, struct cx88_IR, work);
>  
>  	cx88_ir_handle_key(ir);
> -	mod_timer(&ir->timer, jiffies + msecs_to_jiffies(ir->polling));
> +	schedule_delayed_work(dwork, msecs_to_jiffies(ir->polling));
>  }
>  
>  void cx88_ir_start(struct cx88_core *core, struct cx88_IR *ir)
>  {
>  	if (ir->polling) {
> -		setup_timer(&ir->timer, ir_timer, (unsigned long)ir);
> -		INIT_WORK(&ir->work, cx88_ir_work);
> -		schedule_work(&ir->work);
> +		INIT_DELAYED_WORK(&ir->work, cx88_ir_work);
> +		schedule_delayed_work(&ir->work, msecs_to_jiffies(ir->polling));
>  	}
>  	if (ir->sampling) {
>  		core->pci_irqmask |= PCI_INT_IR_SMPINT;
> @@ -179,10 +171,8 @@ void cx88_ir_stop(struct cx88_core *core
>  		core->pci_irqmask &= ~PCI_INT_IR_SMPINT;
>  	}
>  
> -	if (ir->polling) {
> -		del_timer_sync(&ir->timer);
> -		flush_scheduled_work();
> -	}
> +	if (ir->polling)
> +		cancel_delayed_work_sync(&ir->work);
>  }
>  
>  /* ---------------------------------------------------------------------- */
> 
> At least I didn't have any general protection fault with this patch
> applied. Comments?

Jean,

Reviewed-by: Andy Walls <awalls@radix.net>

I've done some research and this looks good.

1. It's a cleaner way to use the kernel event threads to perform a
periodic action.

2. No races with stopping the work, as cancel_delayed_work_sync()
reliably disarms even self-firing work handler functions like the one
here.  It even appears to make sure it is cancelled from every CPU for a
multithreaded handler, AFAICT.  (No flag variable is needed to signal
work is stopping to the work handler AFAICT.)



Not to start a flame war on supporting older kernels, but I must mention

3. Canceling of work is only supported on more recent kernels.

4. I'm not willing to waste brain cells on how to avoid work canceling
races for older kernels. :)

> Thanks,

Regards,
Andy


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: General protection fault on rmmod cx8800
  2009-03-02 21:52           ` Jean Delvare
@ 2009-03-03  9:40             ` Trent Piepho
  2009-03-03  9:49               ` Trent Piepho
  2009-03-03 12:16               ` Jean Delvare
  0 siblings, 2 replies; 15+ messages in thread
From: Trent Piepho @ 2009-03-03  9:40 UTC (permalink / raw)
  To: Jean Delvare; +Cc: Andy Walls, linux-media

On Mon, 2 Mar 2009, Jean Delvare wrote:
> > Makes the most sense to me.  I was just about to make a patch to do the
> > same thing when I got your email.  Though I was going to patch the v4l-dvb
> > sources to avoid porting work.
>
> It was easier for me to test on an upstream kernel. The porting should
> be fairly easy, I can take care of it. The difficult part will be to
> handle the compatibility with kernels < 2.6.20 because delayed_work was
> introduced in 2.6.20. Probably "compatibility" here will simply mean
> that the bug I've hit will only be fixed for kernels >= 2.6.20. Which
> once again raises the question of whether we really want to keep
> supporting these old kernels.

cancel_delayed_work_sync() was renamed from cancel_rearming_delayed_work()
in 2.6.23.  A compat.h patch can handle that one.

In 2.6.22, cancel_delayed_work_sync(work) was created from
cancel_rearming_delayed_workqueue(wq, work).  The kernel has a compat
function to turn cancel_rearming_delayed_workqueue() into the
cancel_delayed_work_sync() call.  cancel_rearming_delayed_workqueue() has
been around since 2.6.13.  Apparently it was un-exported for a while
because it had no users, see commit v2.6.12-rc2-8-g81ddef7.  Isn't it nice
that there a commit message other than "export
cancel_rearming_delayed_workqueue"?  Let me again express my dislike for
commit with no description.

In 2.6.20 delayed_work was split from work_struct.  The concept of delayed
work was already there and schedule_delayed_work() hasn't changed.  I think
this can also be handled with a compat.h change that defines delayed_work
to work_struct.  That will only be a problem on pre 2.6.20 kernels if some
code decides to define identifiers named work_struct and delayed_work in
the same scope.  There are currently no identifier named delayed_work in
any driver and one driver (sq905) has a structure member named
work_struct.  So I think it'll be ok.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: General protection fault on rmmod cx8800
  2009-03-03  9:40             ` Trent Piepho
@ 2009-03-03  9:49               ` Trent Piepho
  2009-03-03 12:16               ` Jean Delvare
  1 sibling, 0 replies; 15+ messages in thread
From: Trent Piepho @ 2009-03-03  9:49 UTC (permalink / raw)
  To: Jean Delvare; +Cc: Andy Walls, linux-media

On Tue, 3 Mar 2009, Trent Piepho wrote:
> On Mon, 2 Mar 2009, Jean Delvare wrote:
> > be fairly easy, I can take care of it. The difficult part will be to
> > handle the compatibility with kernels < 2.6.20 because delayed_work was
> > introduced in 2.6.20. Probably "compatibility" here will simply mean
> > that the bug I've hit will only be fixed for kernels >= 2.6.20. Which
> > once again raises the question of whether we really want to keep
> > supporting these old kernels.
>
> cancel_delayed_work_sync() was renamed from cancel_rearming_delayed_work()
> in 2.6.23.  A compat.h patch can handle that one.

compat.h has code to handle it froma year ago.

> In 2.6.22, cancel_delayed_work_sync(work) was created from
> cancel_rearming_delayed_workqueue(wq, work).  The kernel has a compat

But cancel_rearming_delayed_work() already existed, the real change was
that cancel_rearming_delayed_workqueue() was made obsolete by making
cancel_rearming_delayed_work() not care what workqueue the work was in.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: General protection fault on rmmod cx8800
  2009-03-03  9:40             ` Trent Piepho
  2009-03-03  9:49               ` Trent Piepho
@ 2009-03-03 12:16               ` Jean Delvare
  2009-03-03 20:14                 ` Trent Piepho
  1 sibling, 1 reply; 15+ messages in thread
From: Jean Delvare @ 2009-03-03 12:16 UTC (permalink / raw)
  To: Trent Piepho; +Cc: Andy Walls, linux-media

On Tue, 3 Mar 2009 01:40:00 -0800 (PST), Trent Piepho wrote:
> On Mon, 2 Mar 2009, Jean Delvare wrote:
> > > Makes the most sense to me.  I was just about to make a patch to do the
> > > same thing when I got your email.  Though I was going to patch the v4l-dvb
> > > sources to avoid porting work.
> >
> > It was easier for me to test on an upstream kernel. The porting should
> > be fairly easy, I can take care of it. The difficult part will be to
> > handle the compatibility with kernels < 2.6.20 because delayed_work was
> > introduced in 2.6.20. Probably "compatibility" here will simply mean
> > that the bug I've hit will only be fixed for kernels >= 2.6.20. Which
> > once again raises the question of whether we really want to keep
> > supporting these old kernels.
> 
> cancel_delayed_work_sync() was renamed from cancel_rearming_delayed_work()
> in 2.6.23.  A compat.h patch can handle that one.
> 
> In 2.6.22, cancel_delayed_work_sync(work) was created from
> cancel_rearming_delayed_workqueue(wq, work).  The kernel has a compat
> function to turn cancel_rearming_delayed_workqueue() into the
> cancel_delayed_work_sync() call.  cancel_rearming_delayed_workqueue() has
> been around since 2.6.13.  Apparently it was un-exported for a while
> because it had no users, see commit v2.6.12-rc2-8-g81ddef7.  Isn't it nice
> that there a commit message other than "export
> cancel_rearming_delayed_workqueue"?  Let me again express my dislike for
> commit with no description.
> 
> In 2.6.20 delayed_work was split from work_struct.  The concept of delayed
> work was already there and schedule_delayed_work() hasn't changed.  I think
> this can also be handled with a compat.h change that defines delayed_work
> to work_struct.  That will only be a problem on pre 2.6.20 kernels if some
> code decides to define identifiers named work_struct and delayed_work in
> the same scope.  There are currently no identifier named delayed_work in
> any driver and one driver (sq905) has a structure member named
> work_struct.  So I think it'll be ok.

Wow, I didn't expect that many different compatibility issues. This
goes beyond the time I am ready to spend on it, I'm afraid.

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: General protection fault on rmmod cx8800
  2009-03-03 12:16               ` Jean Delvare
@ 2009-03-03 20:14                 ` Trent Piepho
  2009-03-03 21:33                   ` Jean Delvare
  0 siblings, 1 reply; 15+ messages in thread
From: Trent Piepho @ 2009-03-03 20:14 UTC (permalink / raw)
  To: Jean Delvare; +Cc: Andy Walls, linux-media

On Tue, 3 Mar 2009, Jean Delvare wrote:
> On Tue, 3 Mar 2009 01:40:00 -0800 (PST), Trent Piepho wrote:
> > On Mon, 2 Mar 2009, Jean Delvare wrote:
> > In 2.6.20 delayed_work was split from work_struct.  The concept of delayed
> > work was already there and schedule_delayed_work() hasn't changed.  I think
> > this can also be handled with a compat.h change that defines delayed_work
> > to work_struct.  That will only be a problem on pre 2.6.20 kernels if some
> > code decides to define identifiers named work_struct and delayed_work in
> > the same scope.  There are currently no identifier named delayed_work in
> > any driver and one driver (sq905) has a structure member named
> > work_struct.  So I think it'll be ok.
>
> Wow, I didn't expect that many different compatibility issues. This
> goes beyond the time I am ready to spend on it, I'm afraid.

I already have a patch for compat.h that handles the last remaining issue.
You don't have to do anything.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: General protection fault on rmmod cx8800
  2009-03-03 20:14                 ` Trent Piepho
@ 2009-03-03 21:33                   ` Jean Delvare
  0 siblings, 0 replies; 15+ messages in thread
From: Jean Delvare @ 2009-03-03 21:33 UTC (permalink / raw)
  To: Trent Piepho; +Cc: Andy Walls, linux-media

On Tue, 3 Mar 2009 12:14:06 -0800 (PST), Trent Piepho wrote:
> On Tue, 3 Mar 2009, Jean Delvare wrote:
> > On Tue, 3 Mar 2009 01:40:00 -0800 (PST), Trent Piepho wrote:
> > > On Mon, 2 Mar 2009, Jean Delvare wrote:
> > > In 2.6.20 delayed_work was split from work_struct.  The concept of delayed
> > > work was already there and schedule_delayed_work() hasn't changed.  I think
> > > this can also be handled with a compat.h change that defines delayed_work
> > > to work_struct.  That will only be a problem on pre 2.6.20 kernels if some
> > > code decides to define identifiers named work_struct and delayed_work in
> > > the same scope.  There are currently no identifier named delayed_work in
> > > any driver and one driver (sq905) has a structure member named
> > > work_struct.  So I think it'll be ok.
> >
> > Wow, I didn't expect that many different compatibility issues. This
> > goes beyond the time I am ready to spend on it, I'm afraid.
> 
> I already have a patch for compat.h that handles the last remaining issue.
> You don't have to do anything.

Ah, very nice then. Please push it to the v4l-dvb repository so that I
can send my own patch. I will also fix the 3 other drivers that have
the same bug (ir-kbd-i2c, saa6588 and em28xx-input).

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: General protection fault on rmmod cx8800
  2009-03-02 20:09       ` Andy Walls
@ 2009-03-05  5:09         ` Trent Piepho
  0 siblings, 0 replies; 15+ messages in thread
From: Trent Piepho @ 2009-03-05  5:09 UTC (permalink / raw)
  To: Andy Walls; +Cc: Jean Delvare, linux-media

On Mon, 2 Mar 2009, Andy Walls wrote:
> On Mon, 2009-03-02 at 17:03 +0100, Jean Delvare wrote:
> > >         if (ir->polling) {
> > >                 del_timer_sync(&ir->timer);   <--- Wrong order?
> > >                 flush_scheduled_work();       <--- Wrong order?
> > >         }
> > > }
> >
> > The order looks OK to me. If you flush the event workqueue before
> > deleting the timer, the timer could rearm before you delete it, and
> > you'd return before the workqueue is actually flushed. As a matter of
> > fact, both bttv-input and ir-kbd-i2c have it in the same order.
>
> flush_scheduled_work() causes any queued work to execute.  If queued
> cx88_IR work exists, it *will* rearm the timer via mod_timer() - the
> del_timer_sync() is nullified in this case.

The timer could go off between the call to flush_s_w and del_t_s.  In that
case the work function could call mod_timer after del_t_s is called restart
the timer.

I think what you have to do is first disable the timer so that it won't run
the work fuction when it goes off.  Then flush the work, then delete the
timer.

Or make the work function not mod the timer (i.e., add "if (!ir->exiting)"
to the mod_timer() call), then flush the work, then delete the timer,
then I think you have to flush the work again.

Of course, the best method is to use delayed_work, which takes care of all
this trickiness.  I think it's able to use the work queue code on a lower
level and do it more efficiently than would be possible with just the
public work queue api.

> > the separate workqueue can be still active. I presume this is what
> > happens on my system. I guess the reason for the separate workqueue is
> > that the processing may take some time and we don't want to hurt the
> > system's performance?
>
> It depends.  You use work_queus in the first place to have interrupts
> disabled for as little time as possible on a processor; increasing
> system performance.  If ordering of deferred work is important, you want
> your own single threaded work handler.  If latency of deferred work is
> important, you want your own normal (multithreaded) work handler.  If
> neither is important, you just use the default kernel eventd.

For IR polling, latency might be important.  IIRC, there are also RT
priority work queues now that can be used.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-03-05  5:10 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-15 20:41 General protection fault on rmmod cx8800 Jean Delvare
2009-03-02 12:39 ` Jean Delvare
2009-03-02 14:16   ` Andy Walls
2009-03-02 16:03     ` Jean Delvare
2009-03-02 19:05       ` Jean Delvare
2009-03-02 21:12         ` Trent Piepho
2009-03-02 21:52           ` Jean Delvare
2009-03-03  9:40             ` Trent Piepho
2009-03-03  9:49               ` Trent Piepho
2009-03-03 12:16               ` Jean Delvare
2009-03-03 20:14                 ` Trent Piepho
2009-03-03 21:33                   ` Jean Delvare
2009-03-02 22:33         ` Andy Walls
2009-03-02 20:09       ` Andy Walls
2009-03-05  5:09         ` Trent Piepho

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.