netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* r8169: is the work queue is initialized at wrong place?
@ 2013-07-18  9:17 Peter Wu
  2013-07-18 21:53 ` Francois Romieu
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Wu @ 2013-07-18  9:17 UTC (permalink / raw)
  To: netdev; +Cc: Francois Romieu, Realtek linux nic maintainers

Hi,

While writing EEPROM dumping code for r8169, I encountered a lockdep warning 
after the following sequence:

(assume PCI device 03:00.0 to be bound to r8169)
sudo tee /sys/bus/pci/drivers/r8169/unbind <<<0000:03:00.0

The warning is:
[  809.907323] INFO: trying to register non-static key.
[  809.907336] the code is fine but needs lockdep annotation.
[  809.907339] turning off the locking correctness validator.
[  809.907345] CPU: 2 PID: 2207 Comm: tee Tainted: G        W    3.11.0-rc1-
cold-00021-g3aaf2fe-dirty #1
[  809.907349] Hardware name: Gigabyte Technology Co., Ltd. To be filled by 
O.E.M./Z68X-UD3H-B3, BIOS U1l 03/08/2013
[  809.907353]  ffff880601ba0000 ffff880600943b58 ffffffff8166d3e7 0000000000000000
[  809.907419]  ffff8806002cd8f8 ffff880600943bf8 ffffffff810ad6d6 ffff880600943bb8
[  809.907429]  ffffffff00000000 ffff880600000000 0000000600000000 ffff880600943ce0
[  809.907440] Call Trace:
[  809.907451]  [<ffffffff8166d3e7>] dump_stack+0x55/0x76
[  809.907459]  [<ffffffff810ad6d6>] __lock_acquire+0x9b6/0xab0
[  809.907465]  [<ffffffff810adf76>] ? lockdep_init_map.part.40+0x46/0x590
[  809.907473]  [<ffffffff81069f35>] ? flush_work+0x5/0xb0
[  809.907479]  [<ffffffff810ade80>] lock_acquire+0x90/0x140
[  809.907485]  [<ffffffff81069f35>] ? flush_work+0x5/0xb0
[  809.907491]  [<ffffffff81069f78>] flush_work+0x48/0xb0
[  809.907497]  [<ffffffff81069f35>] ? flush_work+0x5/0xb0
[  809.907502]  [<ffffffff810abbb4>] ? mark_held_locks+0x74/0x140
[  809.907508]  [<ffffffff8106afb1>] ? __cancel_work_timer+0x71/0x110
[  809.907514]  [<ffffffff810abe3d>] ? trace_hardirqs_on_caller+0x7d/0x150
[  809.907519]  [<ffffffff8106afbd>] __cancel_work_timer+0x7d/0x110
[  809.907526]  [<ffffffff8106b080>] cancel_work_sync+0x10/0x20
[  809.907537]  [<ffffffffa0142f73>] rtl_remove_one+0x63/0x150 [r8169]
[  809.907544]  [<ffffffff8134a626>] pci_device_remove+0x46/0xc0
[  809.907551]  [<ffffffff8141832f>] __device_release_driver+0x7f/0xf0
[  809.907556]  [<ffffffff814183ce>] device_release_driver+0x2e/0x40
[  809.907563]  [<ffffffff81417213>] driver_unbind+0xa3/0xc0
[  809.907569]  [<ffffffff814165f4>] drv_attr_store+0x24/0x40
[  809.907576]  [<ffffffff811fe0f6>] sysfs_write_file+0xe6/0x170
[  809.907582]  [<ffffffff81187f7e>] vfs_write+0xce/0x200
[  809.907588]  [<ffffffff81188485>] SyS_write+0x55/0xa0
[  809.907595]  [<ffffffff81682982>] system_call_fastpath+0x16/0x1b

(This is kernel v3.11-rc1-19-gc0d15cc with a66b2e5 and 2f7021a8 reverted due 
to another unrelated locking issue on suspend/resume [1])

rtl_remove_one calls:

    cancel_work_sync(&tp->wk.work);

However, this queue is only initialized in rtl_open:

6673         INIT_WORK(&tp->wk.work, rtl_task);
6674 
6675         smp_mb();

Shouldn't this INIT_WORK be done in rtl_init_one? I do not know what this 
smp_mb is used for in this context, so I leave an appropriate patch up to you.

Regards,
Peter

 [1]: https://lkml.org/lkml/2013/7/15/501

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: r8169: is the work queue is initialized at wrong place?
  2013-07-18  9:17 r8169: is the work queue is initialized at wrong place? Peter Wu
@ 2013-07-18 21:53 ` Francois Romieu
  2013-07-18 22:41   ` Peter Wu
  2013-07-20 23:22   ` Peter Wu
  0 siblings, 2 replies; 7+ messages in thread
From: Francois Romieu @ 2013-07-18 21:53 UTC (permalink / raw)
  To: Peter Wu; +Cc: netdev, Realtek linux nic maintainers

Peter Wu <lekensteyn@gmail.com> :
[...]
> However, this queue is only initialized in rtl_open:
> 
> 6673         INIT_WORK(&tp->wk.work, rtl_task);
> 6674 
> 6675         smp_mb();
> 
> Shouldn't this INIT_WORK be done in rtl_init_one ?

(or cancel_work_sync in rtl8169_close to reduce the scope)

I do not see how the current code could hurt but it's really ugly.

> I do not know what this smp_mb is used for in this context, so I leave
> an appropriate patch up to you.

It does not cost much and it enforces an initialized state. IRQ - real
or shared ones - and napi poll could happen really soon after this point.
There is probably some implicit memory barrier the driver could rely on.
It would be a bit hackish though.

I'll check the history and cook a patch.

Thanks.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: r8169: is the work queue is initialized at wrong place?
  2013-07-18 21:53 ` Francois Romieu
@ 2013-07-18 22:41   ` Peter Wu
  2013-07-19  3:16     ` David Miller
  2013-07-19  6:10     ` Francois Romieu
  2013-07-20 23:22   ` Peter Wu
  1 sibling, 2 replies; 7+ messages in thread
From: Peter Wu @ 2013-07-18 22:41 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, Realtek linux nic maintainers

On Thursday 18 July 2013 23:53:43 Francois Romieu wrote:
> Peter Wu <lekensteyn@gmail.com> :
> [...]
> 
> > However, this queue is only initialized in rtl_open:
> > 
> > 6673         INIT_WORK(&tp->wk.work, rtl_task);
> > 6674
> > 6675         smp_mb();
> > 
> > Shouldn't this INIT_WORK be done in rtl_init_one ?
> 
> (or cancel_work_sync in rtl8169_close to reduce the scope)
Will this also work with multiple adapters? I currently have an on-board chip 
using the r8169 driver and a separate PCI card.

> I do not see how the current code could hurt but it's really ugly.
When I googled for the warning, I found some hint about something not being 
initialized. From that I guessed that rtl_open is never called when the 
network interface is not brought up (which seems to match the documentation of 
struct net_device_ops[1]).

> > I do not know what this smp_mb is used for in this context, so I leave
> > an appropriate patch up to you.
> 
> It does not cost much and it enforces an initialized state. IRQ - real
> or shared ones - and napi poll could happen really soon after this point.
> There is probably some implicit memory barrier the driver could rely on.
> It would be a bit hackish though.
>
> I'll check the history and cook a patch.

Thanks.

Regards,
Peter

 [1]: http://lxr.free-electrons.com/source/include/linux/netdevice.h#L716

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: r8169: is the work queue is initialized at wrong place?
  2013-07-18 22:41   ` Peter Wu
@ 2013-07-19  3:16     ` David Miller
  2013-07-19  6:10     ` Francois Romieu
  1 sibling, 0 replies; 7+ messages in thread
From: David Miller @ 2013-07-19  3:16 UTC (permalink / raw)
  To: lekensteyn; +Cc: romieu, netdev, nic_swsd

From: Peter Wu <lekensteyn@gmail.com>
Date: Fri, 19 Jul 2013 00:41:57 +0200

> On Thursday 18 July 2013 23:53:43 Francois Romieu wrote:
>> Peter Wu <lekensteyn@gmail.com> :
>> [...]
>> 
>> > However, this queue is only initialized in rtl_open:
>> > 
>> > 6673         INIT_WORK(&tp->wk.work, rtl_task);
>> > 6674
>> > 6675         smp_mb();
>> > 
>> > Shouldn't this INIT_WORK be done in rtl_init_one ?
>> 
>> (or cancel_work_sync in rtl8169_close to reduce the scope)
> Will this also work with multiple adapters? I currently have an on-board chip 
> using the r8169 driver and a separate PCI card.

Francois's suggested fix will work just as equally, regardless of
number of adapters.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: r8169: is the work queue is initialized at wrong place?
  2013-07-18 22:41   ` Peter Wu
  2013-07-19  3:16     ` David Miller
@ 2013-07-19  6:10     ` Francois Romieu
  2013-07-19  9:23       ` Peter Wu
  1 sibling, 1 reply; 7+ messages in thread
From: Francois Romieu @ 2013-07-19  6:10 UTC (permalink / raw)
  To: Peter Wu; +Cc: netdev, Realtek linux nic maintainers

Peter Wu <lekensteyn@gmail.com> :
> On Thursday 18 July 2013 23:53:43 Francois Romieu wrote:
[...]
> > (or cancel_work_sync in rtl8169_close to reduce the scope)
> Will this also work with multiple adapters? I currently have an on-board chip 
> using the r8169 driver and a separate PCI card.

As stated by David, yes.

> > I do not see how the current code could hurt but it's really ugly.
> When I googled for the warning, I found some hint about something not being 
> initialized. From that I guessed that rtl_open is never called when the 
> network interface is not brought up (which seems to match the documentation of 
> struct net_device_ops[1]).

You guessed right. Please note that the relevant struct is zeroed during
initialization (see net/core/dev.c::alloc_netdev_mqs). While it's true
that the struct is not initialized for use by the device, cancel_work_sync
will cope with it (bail out thanks to a 0 bitflag).

You can ignore the warning and proceed with the driver as-is.

I'll fix it anyway. It's too ugly to be kept as is.

Side-note: if you can publish your eeprom reading code, it would be
interesting to know how it differs from what has already been tried
in the past.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: r8169: is the work queue is initialized at wrong place?
  2013-07-19  6:10     ` Francois Romieu
@ 2013-07-19  9:23       ` Peter Wu
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Wu @ 2013-07-19  9:23 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, Realtek linux nic maintainers

On Friday 19 July 2013 08:10:17 Francois Romieu wrote:
> Peter Wu <lekensteyn@gmail.com> :
> > On Thursday 18 July 2013 23:53:43 Francois Romieu wrote:
> [...]
> 
> > > (or cancel_work_sync in rtl8169_close to reduce the scope)
> > 
> > Will this also work with multiple adapters? I currently have an on-board
> > chip using the r8169 driver and a separate PCI card.
> 
> As stated by David, yes.
Thanks for confirming.

> > > I do not see how the current code could hurt but it's really ugly.
> > 
> > When I googled for the warning, I found some hint about something not
> > being
> > initialized. From that I guessed that rtl_open is never called when the
> > network interface is not brought up (which seems to match the
> > documentation of struct net_device_ops[1]).
> 
> You guessed right. Please note that the relevant struct is zeroed during
> initialization (see net/core/dev.c::alloc_netdev_mqs). While it's true
> that the struct is not initialized for use by the device, cancel_work_sync
> will cope with it (bail out thanks to a 0 bitflag).
> 
> You can ignore the warning and proceed with the driver as-is.
Well, the driver appears to work, but due to this I also miss future lockdep 
warnings until I reboot.

> I'll fix it anyway. It's too ugly to be kept as is.
> 
> Side-note: if you can publish your eeprom reading code, it would be
> interesting to know how it differs from what has already been tried
> in the past.
I will certainly do this. Right not I am leveraging the existing 93cx6 driver 
in the kernel. I am having difficulties with making the reads reliable though, 
only after writing at least once to the EEPROM (which is logged as timeout, 
although the word is written), I am able to read it. Otherwise, I will only 
read 00s. This is on an external RTL8169SB Gbe PCI card (on the chip itself, I 
can read RTL8169SC though). (bought for 10$ on Ebay :P).

Initially, this chip could not detect a link (it reported itself with PCI 
VID:PID 10ec:8129), probably due to a bad driver that overwrote part of the 
EEPROM (maybe this one[1]?). After writing the EEPROM signature (29 81) (and 
making the card load from EEPROM by writing 0x40 to register 0x50)), the card 
MAC version does not get recognized and is detected as generic "RTL8169" 
(unbind/bind). After a reboot, the card got detected "properly" 
RTL8169sb/8110sb ("properly", because I really see "RTL8169SC" on the 
chip...).

On another onboard RTL8111E chip, I only read FFs. I haven't tried writing to 
that one as it is working perfectly.

Regards,
Peter

 [1]: http://www.gossamer-threads.com/lists/linux/kernel/990061

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: r8169: is the work queue is initialized at wrong place?
  2013-07-18 21:53 ` Francois Romieu
  2013-07-18 22:41   ` Peter Wu
@ 2013-07-20 23:22   ` Peter Wu
  1 sibling, 0 replies; 7+ messages in thread
From: Peter Wu @ 2013-07-20 23:22 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, Realtek linux nic maintainers

On Thursday 18 July 2013 23:53:43 Francois Romieu wrote:
> Peter Wu <lekensteyn@gmail.com> :
> [...]
> 
> > However, this queue is only initialized in rtl_open:
> > 
> > 6673         INIT_WORK(&tp->wk.work, rtl_task);
> > 6674
> > 6675         smp_mb();
> > 
> > Shouldn't this INIT_WORK be done in rtl_init_one ?
> 
> (or cancel_work_sync in rtl8169_close to reduce the scope)
> 
> I do not see how the current code could hurt but it's really ugly.
> 
> > I do not know what this smp_mb is used for in this context, so I leave
> > an appropriate patch up to you.
> 
> It does not cost much and it enforces an initialized state. IRQ - real
> or shared ones - and napi poll could happen really soon after this point.
> There is probably some implicit memory barrier the driver could rely on.
> It would be a bit hackish though.
> 
> I'll check the history and cook a patch.

Just to let you know, I am working on a patch since I encountered a 
different(related?) issue that spams the syslog.The patch and partial analysis 
is ready but untested.

Regards,
Peter

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-07-20 23:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-18  9:17 r8169: is the work queue is initialized at wrong place? Peter Wu
2013-07-18 21:53 ` Francois Romieu
2013-07-18 22:41   ` Peter Wu
2013-07-19  3:16     ` David Miller
2013-07-19  6:10     ` Francois Romieu
2013-07-19  9:23       ` Peter Wu
2013-07-20 23:22   ` Peter Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).