All of lore.kernel.org
 help / color / mirror / Atom feed
* reproducible w1 oops on recent kernels (at least since 3.2.x)
@ 2013-01-10 18:44 Sven Geggus
  2013-01-16 14:16 ` Evgeniy Polyakov
  0 siblings, 1 reply; 9+ messages in thread
From: Sven Geggus @ 2013-01-10 18:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: Evgeniy Polyakov

Hello,

I first thought this to be a Raspberry Pi thing, but its not. Looks
like w1 driver is broken in some platform and busmaster independent
way at least since kernel 3.2.x (which Raspberry Pi uses).

Here is what to do to repoduce the bug on x86:

Get owfs from owfs.org and compile with w1 support or just install
owserver from your favourite Linux distribution.  I'm using version
2.8p15-1 from debian testing.

1. connect a 1-wire device to your computer and load the appropriate
   kernel module (I'm using a DS9490, so the module is ds2490.ko, but
   the bug also happens with other modules like w1-gpio)
2. run "owserver --error_print 2 --error_level 99 --foreground --w1"
3. run "owdir" on another terminal
4. system crashes with the following oops:

--cut--
Driver for 1-wire Dallas network protocol.
usbcore: registered new interface driver DS9490R
w1_master_driver w1_bus_master1: Family 81 for 81.000000247ca7.41 is not registered.
PGD 16ff067 PUD 1700067 PMD 0 
Oops: 0000 [#1] PREEMPT SMP 
Modules linked in: ds2490 wire cn sha256_generic bluetooth crc16 binfmt_misc nfsd coretemp kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep microcode i2c_i801 uhci_hcd
CPU 1 
Pid: 4631, comm: owserver Not tainted 3.7.1 #1                  /DG45ID
RIP: 0010:[<ffffffff8104baf0>]  [<ffffffff8104baf0>] kthread_should_stop+0x10/0x1b
RSP: 0018:ffff880223d79b00  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88022f144000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000286 RDI: 0000000000000000
RBP: 00000000ffffffff R08: ffff880223d78000 R09: 0000000000000000
R10: 0000000000000001 R11: dead000000100100 R12: ffff88022f1440b0
R13: 0000000000000040 R14: ffffffffa006f7fa R15: 0000000000000000
FS:  00007fdf7fd80700(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffffffffc8 CR3: 000000021c83c000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process owserver (pid: 4631, threadinfo ffff880223d78000, task ffff880232f4e740)
Stack:
 ffffffffa006ee9c ffff880232cc60c0 0100000000000000 0000000000000000
 000000f000000001 ffffea000744b070 0000000000000001 ffff88021c8a3824
 ffff88022f144000 ffff88021c8a3810 ffff88022f144038 0000000000000000
Call Trace:
 [<ffffffffa006ee9c>] ? w1_search+0x11d/0x188 [wire]
 [<ffffffffa006ef3e>] ? w1_search_process_cb+0x37/0x91 [wire]
 [<ffffffffa006fbbc>] ? w1_cn_callback+0x2fd/0x42e [wire]
 [<ffffffffa0034585>] ? cn_rx_skb+0xb7/0xea [cn]
 [<ffffffff81458e29>] ? netlink_unicast+0x123/0x1ae
 [<ffffffff814591a7>] ? netlink_sendmsg+0x27d/0x2ed
 [<ffffffff81428229>] ? sock_sendmsg+0x98/0xb5
 [<ffffffff8142a7db>] ? sys_sendto+0xdb/0x104
 [<ffffffff810ef7cd>] ? vfs_write+0xfa/0x141
 [<ffffffff810efa27>] ? sys_write+0x60/0x77
 [<ffffffff8150e0a9>] ? system_call_fastpath+0x16/0x1b
Code: ff c6 05 93 71 73 00 01 eb 06 48 89 df 5b ff e0 48 c7 c0 ea ff ff ff 5b c3 90 90 65 48 8b 04 25 c0 b7 00 00 48 8b 80 88 02 00 00 <48> 8b 40 c8 48 d1 e8 83 e0 01 c3 f0 ff 47 10 48 8b 87 88 02 00 
 RSP <ffff880223d79b00>
CR2: ffffffffffffffc8
---[ end trace 3131d23f4378d60e ]---
--cut--

Regards

Sven

P.S.: Looks like this is the same bug, as the one reported at
https://bugzilla.redhat.com/show_bug.cgi?id=857954

-- 
"Those who do not understand Unix are condemned to reinvent it, poorly"
(Henry Spencer)

/me is giggls@ircnet, http://sven.gegg.us/ on the Web

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reproducible w1 oops on recent kernels (at least since 3.2.x)
  2013-01-10 18:44 reproducible w1 oops on recent kernels (at least since 3.2.x) Sven Geggus
@ 2013-01-16 14:16 ` Evgeniy Polyakov
  2013-03-02  0:11   ` Marcin Jurkowski
  0 siblings, 1 reply; 9+ messages in thread
From: Evgeniy Polyakov @ 2013-01-16 14:16 UTC (permalink / raw)
  To: Sven Geggus; +Cc: linux-kernel

Hi

Sorry for long answer

On Thu, Jan 10, 2013 at 07:44:20PM +0100, Sven Geggus (lists@fuchsschwanzdomain.de) wrote:
> I first thought this to be a Raspberry Pi thing, but its not. Looks
> like w1 driver is broken in some platform and busmaster independent
> way at least since kernel 3.2.x (which Raspberry Pi uses).

> P.S.: Looks like this is the same bug, as the one reported at
> https://bugzilla.redhat.com/show_bug.cgi?id=857954

Its indeed looks the same.
Can you confirm that bug still persists and that it doesn't exist in
3.1? Do you have a possibility to bisect w1 bits down to broken commit?

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re:  Re: reproducible w1 oops on recent kernels (at least since 3.2.x)
  2013-01-16 14:16 ` Evgeniy Polyakov
@ 2013-03-02  0:11   ` Marcin Jurkowski
  2013-03-02  9:45     ` Sven Geggus
  0 siblings, 1 reply; 9+ messages in thread
From: Marcin Jurkowski @ 2013-03-02  0:11 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Sven Geggus, linux-kernel

On Wednesday, January 16, 2013 3:16:38 PM UTC+1, Evgeniy Polyakov wrote:
> Can you confirm that bug still persists and that it doesn't exist in
> 
> 3.1? Do you have a possibility to bisect w1 bits down to broken
> commit?

Hi

I can confirm that this bug persists in recent kernel. Onewire netlink
interface to W1_SEARCH command must have been broken for a while.

Good news is that it seems to be easy to fix. I'll post an explanation 
and a patch tomorrow.


Regards

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reproducible w1 oops on recent kernels (at least since 3.2.x)
  2013-03-02  0:11   ` Marcin Jurkowski
@ 2013-03-02  9:45     ` Sven Geggus
  2013-03-02 13:50       ` [PATCH 1/1] w1: fix oops when w1_search is called from netlink connector Marcin Jurkowski
  0 siblings, 1 reply; 9+ messages in thread
From: Sven Geggus @ 2013-03-02  9:45 UTC (permalink / raw)
  To: Marcin Jurkowski; +Cc: Evgeniy Polyakov, linux-kernel

Marcin Jurkowski schrieb am Samstag, den 02. März um 01:11 Uhr:

> I can confirm that this bug persists in recent kernel. Onewire netlink
> interface to W1_SEARCH command must have been broken for a while.
> 
> Good news is that it seems to be easy to fix. I'll post an explanation 
> and a patch tomorrow.

I did not send this to the kernel Mailinglist but to Evgeniy
only. This is the bad commit I found doing git bisect:

04f482faf50535229a5a5c8d629cf963899f857c is the first bad commit
commit 04f482faf50535229a5a5c8d629cf963899f857c
Author: Patrick McHardy <kaber@trash.net>
Date:   Mon Mar 28 08:39:36 2011 +0000
                                      
    connector: convert to synchronous netlink message processing
                                                                
    Commits 01a16b21 (netlink: kill eff_cap from structnetlink_skb_parms)
    and c53fa1ed (netlink: kill loginuid/sessionid/sid members fromstruct
    netlink_skb_parms) removed some members from structnetlink_skb_parms 
    that depend on the current context, all netlink users are nowrequired
    to do synchronous message processing.                                 
                                         
    connector however queues received messages and processes them ina work
    queue, which is not valid anymore. This patch converts connectorto do 
    synchronous message processing by invoking the registeredcallback    
    handler directly from the netlink receive function.               
                                                       
    In order to avoid invoking the callback with connector locksheld, a
    reference count is added to struct cn_callback_entry, thereference 
    is taken when finding a matching callback entry on the device'squeue_list
    and released after the callback handler has been invoked.                 
                                                             
    Signed-off-by: Patrick McHardy <kaber@trash.net>         
    Acked-by: Evgeniy Polyakov <zbr@ioremap.net>    
    Signed-off-by: David S. Miller <davem@davemloft.net>

Sven

-- 
"C Is Quirky, Flawed, And An Enormous Success."
(Dennis M. Ritchie)

/me is giggls@ircnet, http://sven.gegg.us/ on the Web

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/1] w1: fix oops when w1_search is called from netlink connector
  2013-03-02  9:45     ` Sven Geggus
@ 2013-03-02 13:50       ` Marcin Jurkowski
  2013-03-03 15:36         ` Sven Geggus
  2013-03-03 20:54         ` Evgeniy Polyakov
  0 siblings, 2 replies; 9+ messages in thread
From: Marcin Jurkowski @ 2013-03-02 13:50 UTC (permalink / raw)
  To: Sven Geggus; +Cc: Evgeniy Polyakov, linux-kernel

On Sat, Mar 02, 2013 at 10:45:10AM +0100, Sven Geggus wrote:
> This is the bad commit I found doing git bisect:
> 04f482faf50535229a5a5c8d629cf963899f857c is the first bad commit
> commit 04f482faf50535229a5a5c8d629cf963899f857c
> Author: Patrick McHardy <kaber@trash.net>
> Date:   Mon Mar 28 08:39:36 2011 +0000

Good job. I was too lazy to bisect for bad commit;)

Reading the code I found problematic kthread_should_stop call from netlink 
connector which causes the oops. After applying a patch, I've been testing 
owfs+w1 setup for nearly two days and it seems to work very reliable (no 
hangs, no memleaks etc).
More detailed description and possible fix is given below:

Function w1_search can be called from either kthread or netlink callback.
While the former works fine, the latter causes oops due to kthread_should_stop
invocation.

This patch adds a check if w1_search is serving netlink command, skipping
kthread_should_stop invocation if so.

Signed-off-by: Marcin Jurkowski <marcin1j@gmail.com>
---
 drivers/w1/w1.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
index 7994d933..7e2220d 100644
--- a/drivers/w1/w1.c
+++ b/drivers/w1/w1.c
@@ -924,7 +924,8 @@ void w1_search(struct w1_master *dev, u8 search_type, w1_slave_found_callback cb
 			tmp64 = (triplet_ret >> 2);
 			rn |= (tmp64 << i);
 
-			if (kthread_should_stop()) {
+			/* ensure we're called from kthread and not by netlink callback */
+			if (!dev->priv && kthread_should_stop()) {
 				mutex_unlock(&dev->bus_mutex);
 				dev_dbg(&dev->dev, "Abort w1_search\n");
 				return;
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] w1: fix oops when w1_search is called from netlink connector
  2013-03-02 13:50       ` [PATCH 1/1] w1: fix oops when w1_search is called from netlink connector Marcin Jurkowski
@ 2013-03-03 15:36         ` Sven Geggus
  2013-03-03 20:54         ` Evgeniy Polyakov
  1 sibling, 0 replies; 9+ messages in thread
From: Sven Geggus @ 2013-03-03 15:36 UTC (permalink / raw)
  To: Marcin Jurkowski; +Cc: Evgeniy Polyakov, linux-kernel

Marcin Jurkowski schrieb am Samstag, den 02. März um 14:50 Uhr:

> This patch adds a check if w1_search is serving netlink command, skipping
> kthread_should_stop invocation if so.

Works fine on my Raspberry Pi!

Any chance to get this fix into mainline?

Regards

Sven

-- 
# Turn on/off security.  Off is currently the default
(found in MongoDB default configfile)

/me is giggls@ircnet, http://sven.gegg.us/ on the Web

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] w1: fix oops when w1_search is called from netlink connector
  2013-03-02 13:50       ` [PATCH 1/1] w1: fix oops when w1_search is called from netlink connector Marcin Jurkowski
  2013-03-03 15:36         ` Sven Geggus
@ 2013-03-03 20:54         ` Evgeniy Polyakov
  2013-03-03 22:41           ` GregKH
  1 sibling, 1 reply; 9+ messages in thread
From: Evgeniy Polyakov @ 2013-03-03 20:54 UTC (permalink / raw)
  To: Marcin Jurkowski; +Cc: Sven Geggus, linux-kernel, GregKH

Hi

Marcin, thanks a lot for the fix, I have to sorry I'm not on this bug yet :(
Sven confirmed patch fixes it, Greg please pull it into your tree.

I believe this is stable material.
Thanks everyone.

Acked-by: Evgeniy Polyakov <zbr@ioremap.net>

On Sat, Mar 02, 2013 at 02:50:15PM +0100, Marcin Jurkowski (marcin1j@gmail.com) wrote:
> On Sat, Mar 02, 2013 at 10:45:10AM +0100, Sven Geggus wrote:
> > This is the bad commit I found doing git bisect:
> > 04f482faf50535229a5a5c8d629cf963899f857c is the first bad commit
> > commit 04f482faf50535229a5a5c8d629cf963899f857c
> > Author: Patrick McHardy <kaber@trash.net>
> > Date:   Mon Mar 28 08:39:36 2011 +0000
> 
> Good job. I was too lazy to bisect for bad commit;)
> 
> Reading the code I found problematic kthread_should_stop call from netlink 
> connector which causes the oops. After applying a patch, I've been testing 
> owfs+w1 setup for nearly two days and it seems to work very reliable (no 
> hangs, no memleaks etc).
> More detailed description and possible fix is given below:
> 
> Function w1_search can be called from either kthread or netlink callback.
> While the former works fine, the latter causes oops due to kthread_should_stop
> invocation.
> 
> This patch adds a check if w1_search is serving netlink command, skipping
> kthread_should_stop invocation if so.
> 
> Signed-off-by: Marcin Jurkowski <marcin1j@gmail.com>
> ---
>  drivers/w1/w1.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
> index 7994d933..7e2220d 100644
> --- a/drivers/w1/w1.c
> +++ b/drivers/w1/w1.c
> @@ -924,7 +924,8 @@ void w1_search(struct w1_master *dev, u8 search_type, w1_slave_found_callback cb
>  			tmp64 = (triplet_ret >> 2);
>  			rn |= (tmp64 << i);
>  
> -			if (kthread_should_stop()) {
> +			/* ensure we're called from kthread and not by netlink callback */
> +			if (!dev->priv && kthread_should_stop()) {
>  				mutex_unlock(&dev->bus_mutex);
>  				dev_dbg(&dev->dev, "Abort w1_search\n");
>  				return;
> -- 
> 1.7.12.4

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] w1: fix oops when w1_search is called from netlink connector
  2013-03-03 20:54         ` Evgeniy Polyakov
@ 2013-03-03 22:41           ` GregKH
  2013-03-11 13:18             ` Josh Boyer
  0 siblings, 1 reply; 9+ messages in thread
From: GregKH @ 2013-03-03 22:41 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Marcin Jurkowski, Sven Geggus, linux-kernel

On Mon, Mar 04, 2013 at 12:54:52AM +0400, Evgeniy Polyakov wrote:
> Hi
> 
> Marcin, thanks a lot for the fix, I have to sorry I'm not on this bug yet :(
> Sven confirmed patch fixes it, Greg please pull it into your tree.

Ok, will do once 3.9-rc1 is out.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] w1: fix oops when w1_search is called from netlink connector
  2013-03-03 22:41           ` GregKH
@ 2013-03-11 13:18             ` Josh Boyer
  0 siblings, 0 replies; 9+ messages in thread
From: Josh Boyer @ 2013-03-11 13:18 UTC (permalink / raw)
  To: GregKH; +Cc: Evgeniy Polyakov, Marcin Jurkowski, Sven Geggus, linux-kernel

On Sun, Mar 3, 2013 at 5:41 PM, GregKH <greg@kroah.com> wrote:
> On Mon, Mar 04, 2013 at 12:54:52AM +0400, Evgeniy Polyakov wrote:
>> Hi
>>
>> Marcin, thanks a lot for the fix, I have to sorry I'm not on this bug yet :(
>> Sven confirmed patch fixes it, Greg please pull it into your tree.
>
> Ok, will do once 3.9-rc1 is out.

3.9-rc2 is out now.  I don't see this in any of your trees.  Just a reminder.

josh

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-03-11 13:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-10 18:44 reproducible w1 oops on recent kernels (at least since 3.2.x) Sven Geggus
2013-01-16 14:16 ` Evgeniy Polyakov
2013-03-02  0:11   ` Marcin Jurkowski
2013-03-02  9:45     ` Sven Geggus
2013-03-02 13:50       ` [PATCH 1/1] w1: fix oops when w1_search is called from netlink connector Marcin Jurkowski
2013-03-03 15:36         ` Sven Geggus
2013-03-03 20:54         ` Evgeniy Polyakov
2013-03-03 22:41           ` GregKH
2013-03-11 13:18             ` Josh Boyer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.