All of lore.kernel.org
 help / color / mirror / Atom feed
* [ROSE] [AX25] possible circular locking
@ 2007-12-17 10:06 Bernard Pidoux F6BVP
  2007-12-18 13:52 ` Jarek Poplawski
  0 siblings, 1 reply; 21+ messages in thread
From: Bernard Pidoux F6BVP @ 2007-12-17 10:06 UTC (permalink / raw)
  To: Jarek Poplawski, Alexey Dobriyan, Ralf Baechle DL5RB, Linux Netdev List

Hi,


When I killall kissattach I can see the following message.

This happens on kernel 2.6.24-rc5 already patched with the 6 previously
patches I sent recently.


=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.23.9 #1
-------------------------------------------------------
kissattach/2906 is trying to acquire lock:
  (linkfail_lock){-+..}, at: [<d8bd4603>] ax25_link_failed+0x11/0x39 [ax25]

but task is already holding lock:
  (ax25_list_lock){-+..}, at: [<d8bd7c7c>] ax25_device_event+0x38/0x84
[ax25]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (ax25_list_lock){-+..}:
        [<c0130897>] __lock_acquire+0x9e9/0xbe6
        [<d8bd845c>] ax25_find_cb+0x18/0xc6 [ax25]
        [<c0130b02>] lock_acquire+0x6e/0x87
        [<d8bd845c>] ax25_find_cb+0x18/0xc6 [ax25]
        [<c02a399b>] _spin_lock_bh+0x2e/0x39
        [<d8bd845c>] ax25_find_cb+0x18/0xc6 [ax25]
        [<d8bd845c>] ax25_find_cb+0x18/0xc6 [ax25]
        [<d8bd5d57>] ax25_send_frame+0x40/0x131 [ax25]
        [<d8bed51a>] rose_send_frame+0x4a/0x5b [rose]
        [<d8bed946>] rose_link_rx_restart+0x135/0x157 [rose]
        [<c02a399b>] _spin_lock_bh+0x2e/0x39
        [<d8bee56a>] rose_route_frame+0xad/0x4f3 [rose]
        [<c0105215>] dump_trace+0x81/0x8b
        [<c012dea3>] save_trace+0x37/0x8c
        [<c012f73c>] mark_lock+0x337/0x44b
        [<c0130a4c>] __lock_acquire+0xb9e/0xbe6
        [<d8bd471e>] ax25_protocol_function+0x30/0x34 [ax25]
        [<d8bd46fb>] ax25_protocol_function+0xd/0x34 [ax25]
        [<d8bd5271>] ax25_rx_iframe+0x2e3/0x332 [ax25]
        [<c011f839>] __mod_timer+0x89/0x93
        [<d8bd6b95>] ax25_std_frame_in+0x5b1/0x638 [ax25]
        [<d8bd4c49>] ax25_kiss_rcv+0x3cd/0x712 [ax25]
        [<c012f889>] mark_held_locks+0x39/0x53
        [<c02a3d2a>] _spin_unlock_irqrestore+0x34/0x39
        [<c024a79b>] sock_queue_rcv_skb+0xd6/0xf3
        [<c02a3879>] _read_unlock+0x14/0x1c
        [<c024a79b>] sock_queue_rcv_skb+0xd6/0xf3
        [<c025033c>] netif_receive_skb+0x22d/0x289
        [<c012fa60>] trace_hardirqs_on+0x109/0x148
        [<c02521ff>] process_backlog+0x7b/0xeb
        [<c02522c6>] net_rx_action+0x57/0xfd
        [<c011c52d>] __do_softirq+0x40/0x90
        [<c011c5a4>] do_softirq+0x27/0x3d
        [<c0106768>] do_IRQ+0x58/0x6c
        [<c0104cee>] common_interrupt+0x2e/0x40
        [<ffffffff>] 0xffffffff

-> #2 (rose_route_list_lock){-+..}:
        [<c0130897>] __lock_acquire+0x9e9/0xbe6
        [<d8bee50a>] rose_route_frame+0x4d/0x4f3 [rose]
        [<c0130b02>] lock_acquire+0x6e/0x87
        [<d8bee50a>] rose_route_frame+0x4d/0x4f3 [rose]
        [<c02a399b>] _spin_lock_bh+0x2e/0x39
        [<d8bee50a>] rose_route_frame+0x4d/0x4f3 [rose]
        [<d8bee50a>] rose_route_frame+0x4d/0x4f3 [rose]
        [<c0105215>] dump_trace+0x81/0x8b
        [<c012dea3>] save_trace+0x37/0x8c
        [<c012f73c>] mark_lock+0x337/0x44b
        [<c0130a4c>] __lock_acquire+0xb9e/0xbe6
        [<d8bd471e>] ax25_protocol_function+0x30/0x34 [ax25]
        [<d8bd46fb>] ax25_protocol_function+0xd/0x34 [ax25]
        [<d8bd5271>] ax25_rx_iframe+0x2e3/0x332 [ax25]
        [<c011f839>] __mod_timer+0x89/0x93
        [<d8bd6b95>] ax25_std_frame_in+0x5b1/0x638 [ax25]
        [<d8bd4c49>] ax25_kiss_rcv+0x3cd/0x712 [ax25]
        [<c012f889>] mark_held_locks+0x39/0x53
        [<c02a3d2a>] _spin_unlock_irqrestore+0x34/0x39
        [<c024a79b>] sock_queue_rcv_skb+0xd6/0xf3
        [<c02a3879>] _read_unlock+0x14/0x1c
        [<c024a79b>] sock_queue_rcv_skb+0xd6/0xf3
        [<c025033c>] netif_receive_skb+0x22d/0x289
        [<c012fa60>] trace_hardirqs_on+0x109/0x148
        [<c02521ff>] process_backlog+0x7b/0xeb
        [<c02522c6>] net_rx_action+0x57/0xfd
        [<c011c52d>] __do_softirq+0x40/0x90
        [<c011c5a4>] do_softirq+0x27/0x3d
        [<c0106768>] do_IRQ+0x58/0x6c
        [<c0104cee>] common_interrupt+0x2e/0x40
        [<ffffffff>] 0xffffffff

-> #1 (rose_neigh_list_lock){-+..}:
        [<c0130897>] __lock_acquire+0x9e9/0xbe6
        [<d8bee31e>] rose_link_failed+0xe/0x44 [rose]
        [<c0130b02>] lock_acquire+0x6e/0x87
        [<d8bee31e>] rose_link_failed+0xe/0x44 [rose]
        [<d8bd7783>] ax25_t1timer_expiry+0x0/0x20 [ax25]
        [<c02a399b>] _spin_lock_bh+0x2e/0x39
        [<d8bee31e>] rose_link_failed+0xe/0x44 [rose]
        [<d8bee31e>] rose_link_failed+0xe/0x44 [rose]
        [<d8bd461a>] ax25_link_failed+0x28/0x39 [ax25]
        [<d8bd7300>] ax25_disconnect+0x34/0xbe [ax25]
        [<c011f4f3>] run_timer_softirq+0xee/0x14a
        [<c011c51e>] __do_softirq+0x31/0x90
        [<c012fa60>] trace_hardirqs_on+0x109/0x148
        [<c011c52d>] __do_softirq+0x40/0x90
        [<c011c5a4>] do_softirq+0x27/0x3d
        [<c0106768>] do_IRQ+0x58/0x6c
        [<c0104cee>] common_interrupt+0x2e/0x40
        [<d8a9163f>] acpi_processor_idle+0x262/0x3cf [processor]
        [<c0102342>] cpu_idle+0x3c/0x51
        [<c0382a0c>] start_kernel+0x272/0x277
        [<c0382323>] unknown_bootoption+0x0/0x195
        [<ffffffff>] 0xffffffff

-> #0 (linkfail_lock){-+..}:
        [<c0130780>] __lock_acquire+0x8d2/0xbe6
        [<c0130b02>] lock_acquire+0x6e/0x87
        [<d8bd4603>] ax25_link_failed+0x11/0x39 [ax25]
        [<c02a399b>] _spin_lock_bh+0x2e/0x39
        [<d8bd4603>] ax25_link_failed+0x11/0x39 [ax25]
        [<d8bd4603>] ax25_link_failed+0x11/0x39 [ax25]
        [<d8bd7300>] ax25_disconnect+0x34/0xbe [ax25]
        [<d8bd7c97>] ax25_device_event+0x53/0x84 [ax25]
        [<c0122670>] notifier_call_chain+0x2a/0x47
        [<c01226d3>] raw_notifier_call_chain+0x17/0x1a
        [<c0250a47>] dev_close+0x62/0x66
        [<c0250af1>] unregister_netdevice+0xa6/0x21f
        [<c0250c79>] unregister_netdev+0xf/0x15
        [<d8b4a7ad>] mkiss_close+0x63/0x7c [mkiss]
        [<c01fdd85>] release_dev+0x4f1/0x5a6
        [<c0130a4c>] __lock_acquire+0xb9e/0xbe6
        [<c01b2226>] _atomic_dec_and_lock+0x22/0x2c
        [<c0130a4c>] __lock_acquire+0xb9e/0xbe6
        [<c01fde41>] tty_release+0x7/0xa
        [<c015c7f7>] __fput+0xbc/0x172
        [<c015a2c6>] filp_close+0x51/0x58
        [<c0119dc3>] put_files_struct+0x5e/0xa6
        [<c011ae6e>] do_exit+0x22e/0x6d9
        [<c02a3c0d>] _spin_unlock_irq+0x20/0x23
        [<c012fa76>] trace_hardirqs_on+0x11f/0x148
        [<c011b384>] sys_exit_group+0x0/0xd
        [<c0121c36>] get_signal_to_deliver+0x3c6/0x3ea
        [<c0103475>] do_notify_resume+0x81/0x5fe
        [<c02a3d2a>] _spin_unlock_irqrestore+0x34/0x39
        [<c012fa76>] trace_hardirqs_on+0x11f/0x148
        [<c012b196>] getnstimeofday+0x2b/0xac
        [<c01b6d55>] copy_to_user+0x2f/0x46
        [<c0129ed8>] hrtimer_nanosleep+0x92/0xe5
        [<c0129b75>] hrtimer_wakeup+0x0/0x18
        [<c0129f74>] sys_nanosleep+0x49/0x59
        [<c0103ded>] work_notifysig+0x13/0x26
        [<ffffffff>] 0xffffffff

other info that might help us debug this:

2 locks held by kissattach/2906:
  #0:  (rtnl_mutex){--..}, at: [<c0250c72>] unregister_netdev+0x8/0x15
  #1:  (ax25_list_lock){-+..}, at: [<d8bd7c7c>]
ax25_device_event+0x38/0x84 [ax25]

stack backtrace:
  [<c012efac>] print_circular_bug_tail+0x5e/0x66
  [<c0130780>] __lock_acquire+0x8d2/0xbe6
  [<c0130b02>] lock_acquire+0x6e/0x87
  [<d8bd4603>] ax25_link_failed+0x11/0x39 [ax25]
  [<c02a399b>] _spin_lock_bh+0x2e/0x39
  [<d8bd4603>] ax25_link_failed+0x11/0x39 [ax25]
  [<d8bd4603>] ax25_link_failed+0x11/0x39 [ax25]
  [<d8bd7300>] ax25_disconnect+0x34/0xbe [ax25]
  [<d8bd7c97>] ax25_device_event+0x53/0x84 [ax25]
  [<c0122670>] notifier_call_chain+0x2a/0x47
  [<c01226d3>] raw_notifier_call_chain+0x17/0x1a
  [<c0250a47>] dev_close+0x62/0x66
  [<c0250af1>] unregister_netdevice+0xa6/0x21f
  [<c0250c79>] unregister_netdev+0xf/0x15
  [<d8b4a7ad>] mkiss_close+0x63/0x7c [mkiss]
  [<c01fdd85>] release_dev+0x4f1/0x5a6
  [<c0130a4c>] __lock_acquire+0xb9e/0xbe6
  [<c01b2226>] _atomic_dec_and_lock+0x22/0x2c
  [<c0130a4c>] __lock_acquire+0xb9e/0xbe6
  [<c01fde41>] tty_release+0x7/0xa
  [<c015c7f7>] __fput+0xbc/0x172
  [<c015a2c6>] filp_close+0x51/0x58
  [<c0119dc3>] put_files_struct+0x5e/0xa6
  [<c011ae6e>] do_exit+0x22e/0x6d9
  [<c02a3c0d>] _spin_unlock_irq+0x20/0x23
  [<c012fa76>] trace_hardirqs_on+0x11f/0x148
  [<c011b384>] sys_exit_group+0x0/0xd
  [<c0121c36>] get_signal_to_deliver+0x3c6/0x3ea
  [<c0103475>] do_notify_resume+0x81/0x5fe
  [<c02a3d2a>] _spin_unlock_irqrestore+0x34/0x39
  [<c012fa76>] trace_hardirqs_on+0x11f/0x148
  [<c012b196>] getnstimeofday+0x2b/0xac
  [<c01b6d55>] copy_to_user+0x2f/0x46
  [<c0129ed8>] hrtimer_nanosleep+0x92/0xe5
  [<c0129b75>] hrtimer_wakeup+0x0/0x18
  [<c0129f74>] sys_nanosleep+0x49/0x59
  [<c0103ded>] work_notifysig+0x13/0x26
  =======================

Bernard Pidoux



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [ROSE] [AX25] possible circular locking
  2007-12-17 10:06 [ROSE] [AX25] possible circular locking Bernard Pidoux F6BVP
@ 2007-12-18 13:52 ` Jarek Poplawski
       [not found]   ` <476837BF.3070207@free.fr>
                     ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Jarek Poplawski @ 2007-12-18 13:52 UTC (permalink / raw)
  To: Bernard Pidoux F6BVP
  Cc: Alexey Dobriyan, Ralf Baechle DL5RB, Linux Netdev List

On Mon, Dec 17, 2007 at 11:06:04AM +0100, Bernard Pidoux F6BVP wrote:
> Hi,
>
>
> When I killall kissattach I can see the following message.
>
> This happens on kernel 2.6.24-rc5 already patched with the 6 previously
> patches I sent recently.
>
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.23.9 #1
> -------------------------------------------------------
> kissattach/2906 is trying to acquire lock:
>  (linkfail_lock){-+..}, at: [<d8bd4603>] ax25_link_failed+0x11/0x39 [ax25]
>
> but task is already holding lock:
>  (ax25_list_lock){-+..}, at: [<d8bd7c7c>] ax25_device_event+0x38/0x84
> [ax25]
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
...

It seems, lockdep is warried about the different order here:

#1 (rose_neigh_list_lock){-+..}:
#3 (ax25_list_lock){-+..}:

#0 (linkfail_lock){-+..}:
#1 (rose_neigh_list_lock){-+..}:

#3 (ax25_list_lock){-+..}:
#0 (linkfail_lock){-+..}:

So, ax25_list_lock could be taken before and after linkfail_lock. 
I don't know if this three-thread clutch is very probable (or
possible at all), but it seems this other bug nearby reported by
Bernard ("[...] system impossible to reboot with linux-2.6.24-rc5")
could have similar source - namely ax25_list_lock held by
ax25_kill_by_device() during ax25_disconnect(). It looks like the
only place which calls ax25_disconnect() this way, so I guess, it
isn't necessary. But, since I don't know AX25 & ROSE at all, this
should be necessarily verified by somebody who knows these things.

I attach here my very experimental proposal with breaking the lock
for ax25_disconnect(), with some failsafe and debugging because of
this, but, if in this special case the lock is required for some
other reasons, then this patch should be dumped, of course.

Regards,
Jarek P.

WARNING:
not tested, not even compiled, needs some ack before testing!

---

diff -Nurp linux-2.6.24-rc5-/net/ax25/af_ax25.c linux-2.6.24-rc5+/net/ax25/af_ax25.c
--- linux-2.6.24-rc5-/net/ax25/af_ax25.c	2007-12-17 13:29:19.000000000 +0100
+++ linux-2.6.24-rc5+/net/ax25/af_ax25.c	2007-12-18 13:36:05.000000000 +0100
@@ -87,10 +87,19 @@ static void ax25_kill_by_device(struct n
 		return;
 
 	spin_lock_bh(&ax25_list_lock);
+again:
 	ax25_for_each(s, node, &ax25_list) {
 		if (s->ax25_dev == ax25_dev) {
+			struct hlist_node *nn = node->next;
+
 			s->ax25_dev = NULL;
+			spin_unlock_bh(&ax25_list_lock);
 			ax25_disconnect(s, ENETUNREACH);
+			spin_lock_bh(&ax25_list_lock);
+			if (nn != node->next) {
+				WARN_ON_ONCE(1);
+				goto again;
+			}
 		}
 	}
 	spin_unlock_bh(&ax25_list_lock);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [ROSE] [AX25] possible circular locking
       [not found]   ` <476837BF.3070207@free.fr>
@ 2007-12-18 22:04     ` Jarek Poplawski
  0 siblings, 0 replies; 21+ messages in thread
From: Jarek Poplawski @ 2007-12-18 22:04 UTC (permalink / raw)
  To: Pidoux; +Cc: Alexey Dobriyan, Ralf Baechle DL5RB, Linux Netdev List

On Tue, Dec 18, 2007 at 10:12:31PM +0100, Pidoux wrote:
> Hi,
>
> Thank you Jarek for the analysis of the circular locking dependency report.
> I applied the patch you proposed and it works well as soon as I am able to 
> reboot now without
> lock warning message and I can also killall kissattach.
>
> I tried also the patch without the loop, that is only spin_unlock_bh() 
> before calling
> ax25_disconnect() and spin_lock_bh() just after.
> It worked also well.
> However I must say that I only have one ax25 device running (ax0).
>
> Is the loop really necessary here ? in case there are more than one ax25 
> device ?
>
> Also, I will let my AX25 - ROSE application running in order to test the 
> system stability with the full patch.
> I will let you know the results in a while.

Thank you Bernard for the bravery!

The loop is only for debugging: I don't know exactly what is done
during this ax25_disconnect(), and how the list can change in the
meantime because of some other activities. So, it's e.g. against
possible list_del of this node - then some sockets would stay not
disconnected.

I think, it's up to Ralf or some other ax25 expert to judge if this
could be done like this. Since it's not used very often, I think
this should better stay during testing: if there are no warnings
- the loop isn't repeated; otherwise it could simply take a little
more time, but all sockets should be serviced, plus we know about
problems.

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [ROSE] [AX25] possible circular locking
  2007-12-18 13:52 ` Jarek Poplawski
       [not found]   ` <476837BF.3070207@free.fr>
@ 2007-12-28 21:30   ` Pidoux
       [not found]   ` <47755FDB.2070501@free.fr>
  2008-02-09 18:44   ` [PATCH][AX25] ax25_ds_timer: use mod_timer instead of add_timer Bernard Pidoux F6BVP
  3 siblings, 0 replies; 21+ messages in thread
From: Pidoux @ 2007-12-28 21:30 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Alexey Dobriyan, Ralf Baechle DL5RB, Linux Netdev List

Jarek Poplawski wrote :
> On Mon, Dec 17, 2007 at 11:06:04AM +0100, Bernard Pidoux F6BVP wrote:
>   
>> Hi,
>>
>>
>> When I killall kissattach I can see the following message.
>>
>> This happens on kernel 2.6.24-rc5 already patched with the 6 previously
>> patches I sent recently.
>>
>>
>> =======================================================
>> [ INFO: possible circular locking dependency detected ]
>> 2.6.23.9 #1
>> -------------------------------------------------------
>> kissattach/2906 is trying to acquire lock:
>>  (linkfail_lock){-+..}, at: [<d8bd4603>] ax25_link_failed+0x11/0x39 [ax25]
>>
>> but task is already holding lock:
>>  (ax25_list_lock){-+..}, at: [<d8bd7c7c>] ax25_device_event+0x38/0x84
>> [ax25]
>>
>> which lock already depends on the new lock.
>>
>>
>> the existing dependency chain (in reverse order) is:
>>     
> ...
>
> It seems, lockdep is warried about the different order here:
>
> #1 (rose_neigh_list_lock){-+..}:
> #3 (ax25_list_lock){-+..}:
>
> #0 (linkfail_lock){-+..}:
> #1 (rose_neigh_list_lock){-+..}:
>
> #3 (ax25_list_lock){-+..}:
> #0 (linkfail_lock){-+..}:
>
> So, ax25_list_lock could be taken before and after linkfail_lock. 
> I don't know if this three-thread clutch is very probable (or
> possible at all), but it seems this other bug nearby reported by
> Bernard ("[...] system impossible to reboot with linux-2.6.24-rc5")
> could have similar source - namely ax25_list_lock held by
> ax25_kill_by_device() during ax25_disconnect(). It looks like the
> only place which calls ax25_disconnect() this way, so I guess, it
> isn't necessary. But, since I don't know AX25 & ROSE at all, this
> should be necessarily verified by somebody who knows these things.
>
> I attach here my very experimental proposal with breaking the lock
> for ax25_disconnect(), with some failsafe and debugging because of
> this, but, if in this special case the lock is required for some
> other reasons, then this patch should be dumped, of course.
>
> Regards,
> Jarek P.
>
> WARNING:
> not tested, not even compiled, needs some ack before testing!
>
> ---
>
> diff -Nurp linux-2.6.24-rc5-/net/ax25/af_ax25.c linux-2.6.24-rc5+/net/ax25/af_ax25.c
> --- linux-2.6.24-rc5-/net/ax25/af_ax25.c	2007-12-17 13:29:19.000000000 +0100
> +++ linux-2.6.24-rc5+/net/ax25/af_ax25.c	2007-12-18 13:36:05.000000000 +0100
> @@ -87,10 +87,19 @@ static void ax25_kill_by_device(struct n
>  		return;
>  
>  	spin_lock_bh(&ax25_list_lock);
> +again:
>  	ax25_for_each(s, node, &ax25_list) {
>  		if (s->ax25_dev == ax25_dev) {
> +			struct hlist_node *nn = node->next;
> +
>  			s->ax25_dev = NULL;
> +			spin_unlock_bh(&ax25_list_lock);
>  			ax25_disconnect(s, ENETUNREACH);
> +			spin_lock_bh(&ax25_list_lock);
> +			if (nn != node->next) {
> +				WARN_ON_ONCE(1);
> +				goto again;
> +			}
>  		}
>  	}
>  	spin_unlock_bh(&ax25_list_lock);
>
>
>   
After a few days of observation and a number of reboot for test purpose, 
I confirm that your patch is doing very well.
I have no more problems rebooting and the AX25 applications are running 
fine.

I hope this patch, with or without warning, could be applied in next 
kernel release.

Thanks again Jarek.

Regards from Bernard P.
f6bvp


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH][ROSE][AX25] af_ax25: possible circular locking
       [not found]   ` <47755FDB.2070501@free.fr>
@ 2007-12-28 21:48     ` Jarek Poplawski
  2007-12-30  3:14       ` David Miller
  0 siblings, 1 reply; 21+ messages in thread
From: Jarek Poplawski @ 2007-12-28 21:48 UTC (permalink / raw)
  To: David Miller
  Cc: Pidoux, Ralf Baechle DL5RB, Alexey Dobriyan, Linux Netdev List

On Fri, Dec 28, 2007 at 09:43:07PM +0100, Pidoux wrote:
...
> After a few days of observation and a number of reboot for test purpose, I 
> confirm that your patch is doing very well.
> I have no more problems rebooting and the AX25 applications are running 
> fine.
>
> I hope this patch, with or without warning, could be applied in next kernel 
> release.
>
> Thanks again Jarek.
>
> Regards from Bernard P.
> f6bvp

Thanks again Bernard.
Jarek P. too

---------------->

Subject: [ROSE][AX25] af_ax25: possible circular locking

Bernard Pidoux F6BVP reported:
> When I killall kissattach I can see the following message.
>
> This happens on kernel 2.6.24-rc5 already patched with the 6 previously
> patches I sent recently.
>
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.23.9 #1
> -------------------------------------------------------
> kissattach/2906 is trying to acquire lock:
>  (linkfail_lock){-+..}, at: [<d8bd4603>] ax25_link_failed+0x11/0x39 [ax25]
>
> but task is already holding lock:
>  (ax25_list_lock){-+..}, at: [<d8bd7c7c>] ax25_device_event+0x38/0x84
> [ax25]
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
...

lockdep is worried about the different order here:

#1 (rose_neigh_list_lock){-+..}:
#3 (ax25_list_lock){-+..}:

#0 (linkfail_lock){-+..}:
#1 (rose_neigh_list_lock){-+..}:

#3 (ax25_list_lock){-+..}:
#0 (linkfail_lock){-+..}:

So, ax25_list_lock could be taken before and after linkfail_lock. 
I don't know if this three-thread clutch is very probable (or
possible at all), but it seems another bug reported by Bernard
("[...] system impossible to reboot with linux-2.6.24-rc5")
could have similar source - namely ax25_list_lock held by
ax25_kill_by_device() during ax25_disconnect(). It looks like the
only place which calls ax25_disconnect() this way, so I guess, it
isn't necessary.

This patch is breaking the lock for ax25_disconnect(), with some
failsafe and debugging added to detect unforeseen problems.


Reported-and-tested-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

---

diff -Nurp linux-2.6.24-rc5-/net/ax25/af_ax25.c linux-2.6.24-rc5+/net/ax25/af_ax25.c
--- linux-2.6.24-rc5-/net/ax25/af_ax25.c	2007-12-17 13:29:19.000000000 +0100
+++ linux-2.6.24-rc5+/net/ax25/af_ax25.c	2007-12-18 13:36:05.000000000 +0100
@@ -87,10 +87,19 @@ static void ax25_kill_by_device(struct n
 		return;
 
 	spin_lock_bh(&ax25_list_lock);
+again:
 	ax25_for_each(s, node, &ax25_list) {
 		if (s->ax25_dev == ax25_dev) {
+			struct hlist_node *nn = node->next;
+
 			s->ax25_dev = NULL;
+			spin_unlock_bh(&ax25_list_lock);
 			ax25_disconnect(s, ENETUNREACH);
+			spin_lock_bh(&ax25_list_lock);
+			if (nn != node->next) {
+				WARN_ON_ONCE(1);
+				goto again;
+			}
 		}
 	}
 	spin_unlock_bh(&ax25_list_lock);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][ROSE][AX25] af_ax25: possible circular locking
  2007-12-28 21:48     ` [PATCH][ROSE][AX25] af_ax25: " Jarek Poplawski
@ 2007-12-30  3:14       ` David Miller
  2007-12-30 14:13         ` Jarek Poplawski
  0 siblings, 1 reply; 21+ messages in thread
From: David Miller @ 2007-12-30  3:14 UTC (permalink / raw)
  To: jarkao2; +Cc: f6bvp, ralf, adobriyan, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Fri, 28 Dec 2007 22:48:57 +0100

> lockdep is worried about the different order here:
> 
> #1 (rose_neigh_list_lock){-+..}:
> #3 (ax25_list_lock){-+..}:
> 
> #0 (linkfail_lock){-+..}:
> #1 (rose_neigh_list_lock){-+..}:
> 
> #3 (ax25_list_lock){-+..}:
> #0 (linkfail_lock){-+..}:
> 
> So, ax25_list_lock could be taken before and after linkfail_lock. 
> I don't know if this three-thread clutch is very probable (or
> possible at all), but it seems another bug reported by Bernard
> ("[...] system impossible to reboot with linux-2.6.24-rc5")
> could have similar source - namely ax25_list_lock held by
> ax25_kill_by_device() during ax25_disconnect(). It looks like the
> only place which calls ax25_disconnect() this way, so I guess, it
> isn't necessary.
> 
> This patch is breaking the lock for ax25_disconnect(), with some
> failsafe and debugging added to detect unforeseen problems.
> 
> 
> Reported-and-tested-by: Bernard Pidoux <f6bvp@free.fr>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

I can't apply this fix, sorry.

You can't just drop this linked list lock and expect it to stay
consistent like that.

Once you drop it, any thread of control can get in there and delete
entries from the list.

Since we know it can happen, using a WARN_ON_ONCE(1) is not
appropriate.  And if it triggers it will do the wrong thing, because
by branching back to "again" we can call ax25_disconnect() multiple
times on the same entry which isn't right.

You'll thus need to resolve this locking conflict more properly.
I know it's hard, but your current fix is worse because it adds
a new known bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][ROSE][AX25] af_ax25: possible circular locking
  2007-12-30  3:14       ` David Miller
@ 2007-12-30 14:13         ` Jarek Poplawski
  2007-12-31  5:00           ` David Miller
  2008-01-11  5:22           ` David Miller
  0 siblings, 2 replies; 21+ messages in thread
From: Jarek Poplawski @ 2007-12-30 14:13 UTC (permalink / raw)
  To: David Miller; +Cc: f6bvp, ralf, adobriyan, netdev

On Sat, Dec 29, 2007 at 07:14:43PM -0800, David Miller wrote:
...
> You can't just drop this linked list lock and expect it to stay
> consistent like that.
> 
> Once you drop it, any thread of control can get in there and delete
> entries from the list.
> 
> Since we know it can happen, using a WARN_ON_ONCE(1) is not
> appropriate.

The problem is 'we' don't know if it can happen... In the first
message with this patch I've tried to get this information, and
now it seems you are the only one with this knowledge, but of
course this is more than enough for me to agree with your decision
to dump this patch.

> [...]  And if it triggers it will do the wrong thing, because
> by branching back to "again" we can call ax25_disconnect() multiple
> times on the same entry which isn't right.

This is an equivalent of list_for_each_entry_safe(), and if such
WARN_ON is ever reported this would confirm the solution is wrong.
But, it seems Bernard's case was too simple to trigger this bug.
Alas it was complex enough to trigger this other bug...

"again" loop should skip this entry next time: s->ax25_dev should
be NULL, so not equal to ax25dev. (But of course there could be
this unknown to me place, which changes this back in the meantime.)

> You'll thus need to resolve this locking conflict more properly.
> I know it's hard, but your current fix is worse because it adds
> a new known bug.

Sorry, it seems this will've to wait until Ralf finds some time,
because I really can't give this more time, considering I never
used nor have any plans to use this code, and this bug could
suggest there is not so many interested in this, besides Bernard,
either.

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][ROSE][AX25] af_ax25: possible circular locking
  2007-12-30 14:13         ` Jarek Poplawski
@ 2007-12-31  5:00           ` David Miller
  2008-01-11  5:22           ` David Miller
  1 sibling, 0 replies; 21+ messages in thread
From: David Miller @ 2007-12-31  5:00 UTC (permalink / raw)
  To: jarkao2; +Cc: f6bvp, ralf, adobriyan, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Sun, 30 Dec 2007 15:13:23 +0100

> On Sat, Dec 29, 2007 at 07:14:43PM -0800, David Miller wrote:
> ...
> "again" loop should skip this entry next time: s->ax25_dev should
> be NULL, so not equal to ax25dev. (But of course there could be
> this unknown to me place, which changes this back in the meantime.)

Indeed you are right.

> > You'll thus need to resolve this locking conflict more properly.
> > I know it's hard, but your current fix is worse because it adds
> > a new known bug.
> 
> Sorry, it seems this will've to wait until Ralf finds some time,
> because I really can't give this more time, considering I never
> used nor have any plans to use this code, and this bug could
> suggest there is not so many interested in this, besides Bernard,
> either.

I'll try to look at this.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][ROSE][AX25] af_ax25: possible circular locking
  2007-12-30 14:13         ` Jarek Poplawski
  2007-12-31  5:00           ` David Miller
@ 2008-01-11  5:22           ` David Miller
  2008-01-11  9:40             ` Jarek Poplawski
  2008-01-11 21:40             ` [PATCH] [ROSE] two extra tab characters removed Bernard Pidoux F6BVP
  1 sibling, 2 replies; 21+ messages in thread
From: David Miller @ 2008-01-11  5:22 UTC (permalink / raw)
  To: jarkao2; +Cc: f6bvp, ralf, adobriyan, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Sun, 30 Dec 2007 15:13:23 +0100

> On Sat, Dec 29, 2007 at 07:14:43PM -0800, David Miller wrote:
> ...
> > You can't just drop this linked list lock and expect it to stay
> > consistent like that.
> > 
> > Once you drop it, any thread of control can get in there and delete
> > entries from the list.
> > 
> > Since we know it can happen, using a WARN_ON_ONCE(1) is not
> > appropriate.
> 
> The problem is 'we' don't know if it can happen... In the first
> message with this patch I've tried to get this information, and
> now it seems you are the only one with this knowledge, but of
> course this is more than enough for me to agree with your decision
> to dump this patch.

I've removed the warning and made the branch back to 'again'
unconditional as I think this is the safest version of the
change.

I'll push this upstream, thanks for fixing this Jarek.

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index ecb14ee..b4725ff 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -87,10 +87,22 @@ static void ax25_kill_by_device(struct net_device *dev)
 		return;
 
 	spin_lock_bh(&ax25_list_lock);
+again:
 	ax25_for_each(s, node, &ax25_list) {
 		if (s->ax25_dev == ax25_dev) {
 			s->ax25_dev = NULL;
+			spin_unlock_bh(&ax25_list_lock);
 			ax25_disconnect(s, ENETUNREACH);
+			spin_lock_bh(&ax25_list_lock);
+
+			/* The entry could have been deleted from the
+			 * list meanwhile and thus the next pointer is
+			 * no longer valid.  Play it safe and restart
+			 * the scan.  Forward progress is ensured
+			 * because we set s->ax25_dev to NULL and we
+			 * are never passed a NULL 'dev' argument.
+			 */
+			goto again;
 		}
 	}
 	spin_unlock_bh(&ax25_list_lock);
-- 
1.5.4.rc2.84.gf85fd


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH][ROSE][AX25] af_ax25: possible circular locking
  2008-01-11  5:22           ` David Miller
@ 2008-01-11  9:40             ` Jarek Poplawski
  2008-01-12 19:48               ` Bernard Pidoux F6BVP
  2008-01-11 21:40             ` [PATCH] [ROSE] two extra tab characters removed Bernard Pidoux F6BVP
  1 sibling, 1 reply; 21+ messages in thread
From: Jarek Poplawski @ 2008-01-11  9:40 UTC (permalink / raw)
  To: David Miller; +Cc: f6bvp, ralf, adobriyan, netdev

On Thu, Jan 10, 2008 at 09:22:42PM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Sun, 30 Dec 2007 15:13:23 +0100
> 
> > On Sat, Dec 29, 2007 at 07:14:43PM -0800, David Miller wrote:
...
> I've removed the warning and made the branch back to 'again'
> unconditional as I think this is the safest version of the
> change.
> 
> I'll push this upstream, thanks for fixing this Jarek.
> 

Thanks for checking this and making safer!

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] [ROSE] two extra tab characters removed
  2008-01-11  5:22           ` David Miller
  2008-01-11  9:40             ` Jarek Poplawski
@ 2008-01-11 21:40             ` Bernard Pidoux F6BVP
  1 sibling, 0 replies; 21+ messages in thread
From: Bernard Pidoux F6BVP @ 2008-01-11 21:40 UTC (permalink / raw)
  To: David Miller; +Cc: ralf, netdev


Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
---
  include/net/rose.h |    4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/rose.h b/include/net/rose.h
index d3ab453..0cfdc0e 100644
--- a/include/net/rose.h
+++ b/include/net/rose.h
@@ -86,7 +86,7 @@ struct rose_neigh {
         ax25_address            callsign;
         ax25_digi               *digipeat;
         ax25_cb                 *ax25;
-       struct net_device               *dev;
+       struct net_device       *dev;
         unsigned short          count;
         unsigned short          use;
         unsigned int            number;
@@ -124,7 +124,7 @@ struct rose_sock {
         ax25_address            source_digis[ROSE_MAX_DIGIS];
         ax25_address            dest_digis[ROSE_MAX_DIGIS];
         struct rose_neigh       *neighbour;
-       struct net_device               *device;
+       struct net_device       *device;
         unsigned int            lci, rand;
         unsigned char           state, condition, qbitincl, defer;
         unsigned char           cause, diagnostic;
--
1.5.3.7

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH][ROSE][AX25] af_ax25: possible circular locking
  2008-01-11  9:40             ` Jarek Poplawski
@ 2008-01-12 19:48               ` Bernard Pidoux F6BVP
  0 siblings, 0 replies; 21+ messages in thread
From: Bernard Pidoux F6BVP @ 2008-01-12 19:48 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, ralf, adobriyan, netdev



Jarek Poplawski wrote:
> On Thu, Jan 10, 2008 at 09:22:42PM -0800, David Miller wrote:
>> From: Jarek Poplawski <jarkao2@gmail.com>
>> Date: Sun, 30 Dec 2007 15:13:23 +0100
>>
>>> On Sat, Dec 29, 2007 at 07:14:43PM -0800, David Miller wrote:
> ...
>> I've removed the warning and made the branch back to 'again'
>> unconditional as I think this is the safest version of the
>> change.
>>
>> I'll push this upstream, thanks for fixing this Jarek.
>>
> 
> Thanks for checking this and making safer!
> 
> Regards,
> Jarek P.
> 
> 

I would also like to thank both Jarek and David for this patch.
It is very helpful and probably safe for after killing kissattach
we don't want to use attached ax25 devices.

Regards,

Bernard Pidoux

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH][AX25] ax25_ds_timer: use mod_timer instead of add_timer
  2007-12-18 13:52 ` Jarek Poplawski
                     ` (2 preceding siblings ...)
       [not found]   ` <47755FDB.2070501@free.fr>
@ 2008-02-09 18:44   ` Bernard Pidoux F6BVP
  2008-02-09 19:39     ` Jarek Poplawski
                       ` (2 more replies)
  3 siblings, 3 replies; 21+ messages in thread
From: Bernard Pidoux F6BVP @ 2008-02-09 18:44 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Ralf Baechle DL5RB, Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 110 bytes --]

Hi,

With AX25 patches applied I still get this possible circular locking 
message.

Regards,

Bernard P.





[-- Attachment #2: possible_circular_locking --]
[-- Type: text/plain, Size: 2911 bytes --]


=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.24 #3
-------------------------------------------------------
swapper/0 is trying to acquire lock:
 (ax25_list_lock){-+..}, at: [<f91dd3b1>] ax25_destroy_socket+0x171/0x1f0 [ax25]

but task is already holding lock:
 (slock-AF_AX25){-+..}, at: [<f91dbabc>] ax25_std_heartbeat_expiry+0x1c/0xe0 [ax25]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (slock-AF_AX25){-+..}:
       [<c014c8ca>] __lock_acquire+0xc4a/0x10d0
       [<c014cdd1>] lock_acquire+0x81/0xa0
       [<c03280e2>] _spin_lock+0x32/0x60
       [<f91dce3a>] ax25_info_show+0x24a/0x2c0 [ax25]
       [<c01a4aa1>] seq_read+0xa1/0x2a0
       [<c01c20fd>] proc_reg_read+0x5d/0x90
       [<c018a41a>] vfs_read+0xaa/0x130
       [<c018a8ed>] sys_read+0x3d/0x70
       [<c01042ae>] sysenter_past_esp+0x5f/0xa5
       [<ffffffff>] 0xffffffff

-> #0 (ax25_list_lock){-+..}:
       [<c014c6f1>] __lock_acquire+0xa71/0x10d0
       [<c014cdd1>] lock_acquire+0x81/0xa0
       [<c0328143>] _spin_lock_bh+0x33/0x60
       [<f91dd3b1>] ax25_destroy_socket+0x171/0x1f0 [ax25]
       [<f91dbaf3>] ax25_std_heartbeat_expiry+0x53/0xe0 [ax25]
       [<f91dc5ab>] ax25_heartbeat_expiry+0x1b/0x40 [ax25]
       [<c01325dd>] run_timer_softirq+0x15d/0x1c0
       [<c012e343>] __do_softirq+0x93/0x120
       [<c012e427>] do_softirq+0x57/0x60
       [<c012e868>] irq_exit+0x48/0x60
       [<c0119b3b>] smp_apic_timer_interrupt+0x5b/0x90
       [<c0104e27>] apic_timer_interrupt+0x33/0x38
       [<c0102512>] mwait_idle+0x12/0x20
       [<c0102631>] cpu_idle+0x71/0xc0
       [<c0325549>] rest_init+0x49/0x50
       [<c0427a9a>] start_kernel+0x2ea/0x370
       [<00000000>] 0x0
       [<ffffffff>] 0xffffffff

other info that might help us debug this:

1 lock held by swapper/0:
 #0:  (slock-AF_AX25){-+..}, at: [<f91dbabc>] ax25_std_heartbeat_expiry+0x1c/0xe0 [ax25]

stack backtrace:
Pid: 0, comm: swapper Not tainted 2.6.24 #3
 [<c01053da>] show_trace_log_lvl+0x1a/0x30
 [<c0105e12>] show_trace+0x12/0x20
 [<c01067cc>] dump_stack+0x6c/0x80
 [<c014a32f>] print_circular_bug_tail+0x6f/0x80
 [<c014c6f1>] __lock_acquire+0xa71/0x10d0
 [<c014cdd1>] lock_acquire+0x81/0xa0
 [<c0328143>] _spin_lock_bh+0x33/0x60
 [<f91dd3b1>] ax25_destroy_socket+0x171/0x1f0 [ax25]
 [<f91dbaf3>] ax25_std_heartbeat_expiry+0x53/0xe0 [ax25]
 [<f91dc5ab>] ax25_heartbeat_expiry+0x1b/0x40 [ax25]
 [<c01325dd>] run_timer_softirq+0x15d/0x1c0
 [<c012e343>] __do_softirq+0x93/0x120
 [<c012e427>] do_softirq+0x57/0x60
 [<c012e868>] irq_exit+0x48/0x60
 [<c0119b3b>] smp_apic_timer_interrupt+0x5b/0x90
 [<c0104e27>] apic_timer_interrupt+0x33/0x38
 [<c0102512>] mwait_idle+0x12/0x20
 [<c0102631>] cpu_idle+0x71/0xc0
 [<c0325549>] rest_init+0x49/0x50
 [<c0427a9a>] start_kernel+0x2ea/0x370
 [<00000000>] 0x0
 =======================

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][AX25] ax25_ds_timer: use mod_timer instead of add_timer
  2008-02-09 18:44   ` [PATCH][AX25] ax25_ds_timer: use mod_timer instead of add_timer Bernard Pidoux F6BVP
@ 2008-02-09 19:39     ` Jarek Poplawski
  2008-02-10 18:07       ` Bernard Pidoux F6BVP
  2008-02-09 23:50     ` [PATCH][AX25] af_ax25: remove sock lock in ax25_info_show() Jarek Poplawski
  2008-02-10 13:10     ` [PATCH v2][AX25] " Jarek Poplawski
  2 siblings, 1 reply; 21+ messages in thread
From: Jarek Poplawski @ 2008-02-09 19:39 UTC (permalink / raw)
  To: Bernard Pidoux F6BVP
  Cc: Ralf Baechle DL5RB, Jann Traschewski, Linux Netdev List

On Sat, Feb 09, 2008 at 07:44:50PM +0100, Bernard Pidoux F6BVP wrote:
> Hi,
>
> With AX25 patches applied I still get this possible circular locking  
> message.

Hi Bernard,

Could you confirm which exactly patches did you try? Is this vanilla
2.6.24 plus these two: ax25_timer and ax25_ds_timer or something more?
And I'm not sure what do you mean by "still": the warning came back
just after these last patches? At least these timer patches don't seem
to change anything around locking?

Thanks for testing this,
Jarek P.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH][AX25] af_ax25: remove sock lock in ax25_info_show()
  2008-02-09 18:44   ` [PATCH][AX25] ax25_ds_timer: use mod_timer instead of add_timer Bernard Pidoux F6BVP
  2008-02-09 19:39     ` Jarek Poplawski
@ 2008-02-09 23:50     ` Jarek Poplawski
  2008-02-10 13:10     ` [PATCH v2][AX25] " Jarek Poplawski
  2 siblings, 0 replies; 21+ messages in thread
From: Jarek Poplawski @ 2008-02-09 23:50 UTC (permalink / raw)
  To: David Miller
  Cc: Bernard Pidoux F6BVP, Ralf Baechle DL5RB, Jann Traschewski, netdev

On Sat, Feb 09, 2008 at 07:44:50PM +0100, Bernard Pidoux F6BVP wrote:
> Hi,
>
> With AX25 patches applied I still get this possible circular locking  
> message.

IMHO this warning could happen earlier too...

Thanks,
Jarek P.

-------------->

Subject: [AX25] af_ax25: remove sock lock in ax25_info_show()
 
This lockdep warning:

> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.24 #3
> -------------------------------------------------------
> swapper/0 is trying to acquire lock:
>  (ax25_list_lock){-+..}, at: [<f91dd3b1>] ax25_destroy_socket+0x171/0x1f0 [ax25]
> 
> but task is already holding lock:
>  (slock-AF_AX25){-+..}, at: [<f91dbabc>] ax25_std_heartbeat_expiry+0x1c/0xe0 [ax25]
> 
> which lock already depends on the new lock.
...

shows that ax25_list_lock and slock-AF_AX25 are taken in different
order: ax25_info_show() takes slock (bh_lock_sock(ax25->sk)) while
ax25_list_lock is held, so reversely to other functions. To fix this
the sock lock should be moved to ax25_info_start(), but since it's
only for reading proc info it seems this is not necessary (e.g.
ax25_send_to_raw() does similar reading without this lock too). So,
this patch removes this lock to avoid deadlock possibility.


Reported-by: Bernard Pidoux F6BVP <f6bvp@free.fr>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

---

 net/ax25/af_ax25.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 94b2b1b..68b5171 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1924,12 +1924,10 @@ static int ax25_info_show(struct seq_file *seq, void *v)
 		   ax25->paclen);
 
 	if (ax25->sk != NULL) {
-		bh_lock_sock(ax25->sk);
 		seq_printf(seq," %d %d %ld\n",
 			   atomic_read(&ax25->sk->sk_wmem_alloc),
 			   atomic_read(&ax25->sk->sk_rmem_alloc),
 			   ax25->sk->sk_socket != NULL ? SOCK_INODE(ax25->sk->sk_socket)->i_ino : 0L);
-		bh_unlock_sock(ax25->sk);
 	} else {
 		seq_puts(seq, " * * *\n");
 	}

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2][AX25] af_ax25: remove sock lock in ax25_info_show()
  2008-02-09 18:44   ` [PATCH][AX25] ax25_ds_timer: use mod_timer instead of add_timer Bernard Pidoux F6BVP
  2008-02-09 19:39     ` Jarek Poplawski
  2008-02-09 23:50     ` [PATCH][AX25] af_ax25: remove sock lock in ax25_info_show() Jarek Poplawski
@ 2008-02-10 13:10     ` Jarek Poplawski
  2008-02-12  5:25       ` David Miller
  2 siblings, 1 reply; 21+ messages in thread
From: Jarek Poplawski @ 2008-02-10 13:10 UTC (permalink / raw)
  To: David Miller
  Cc: Bernard Pidoux F6BVP, Ralf Baechle DL5RB, Jann Traschewski, netdev

Hi,

Here is a little bit better version, I hope.

Regards,
Jarek P.

--------------> (take 2)

Subject: [AX25] af_ax25: remove sock lock in ax25_info_show()
 
This lockdep warning:

> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.24 #3
> -------------------------------------------------------
> swapper/0 is trying to acquire lock:
>  (ax25_list_lock){-+..}, at: [<f91dd3b1>] ax25_destroy_socket+0x171/0x1f0 [ax25]
> 
> but task is already holding lock:
>  (slock-AF_AX25){-+..}, at: [<f91dbabc>] ax25_std_heartbeat_expiry+0x1c/0xe0 [ax25]
> 
> which lock already depends on the new lock.
...

shows that ax25_list_lock and slock-AF_AX25 are taken in different
order: ax25_info_show() takes slock (bh_lock_sock(ax25->sk)) while
ax25_list_lock is held, so reversely to other functions. To fix this
the sock lock should be moved to ax25_info_start(), and there would
be still problem with breaking ax25_list_lock (it seems this "proper"
order isn't optimal yet). But, since it's only for reading proc info
it seems this is not necessary (e.g.  ax25_send_to_raw() does similar
reading without this lock too).

So, this patch removes sock lock to avoid deadlock possibility; there
is also used sock_i_ino() function, which reads sk_socket under proper
read lock. Additionally printf format of this i_ino is changed to %lu.


Reported-by: Bernard Pidoux F6BVP <f6bvp@free.fr>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

---

 net/ax25/af_ax25.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 94b2b1b..48bfcc7 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1924,12 +1924,10 @@ static int ax25_info_show(struct seq_file *seq, void *v)
 		   ax25->paclen);
 
 	if (ax25->sk != NULL) {
-		bh_lock_sock(ax25->sk);
-		seq_printf(seq," %d %d %ld\n",
+		seq_printf(seq, " %d %d %lu\n",
 			   atomic_read(&ax25->sk->sk_wmem_alloc),
 			   atomic_read(&ax25->sk->sk_rmem_alloc),
-			   ax25->sk->sk_socket != NULL ? SOCK_INODE(ax25->sk->sk_socket)->i_ino : 0L);
-		bh_unlock_sock(ax25->sk);
+			   sock_i_ino(ax25->sk));
 	} else {
 		seq_puts(seq, " * * *\n");
 	}

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH][AX25] ax25_ds_timer: use mod_timer instead of add_timer
  2008-02-09 19:39     ` Jarek Poplawski
@ 2008-02-10 18:07       ` Bernard Pidoux F6BVP
  0 siblings, 0 replies; 21+ messages in thread
From: Bernard Pidoux F6BVP @ 2008-02-10 18:07 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Ralf Baechle DL5RB, Jann Traschewski, Linux Netdev List

Hi Jarek,

Sorry, I should have been more explicit about the patches I applied.

My CPU is an Intel Core 2 duo and I compiled 2.6.24 with SMP option.

I applied 3 patches you have submitted for ax25 before I observed the 
possible locking.

That is :

mkiss patch
diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c
index cfcd15a..30c9b3b 100644

[AX25] ax25_timer: use mod_timer instead of add_timer

[AX25] ax25_ds_timer: use mod_timer instead of add_timer

I also applied 4 patches for rose.ko I sent to the list a while ago.
But for this report I did not installed any ROSE driver nor application.

I will try your new patch version 2 and report any possible change.


Regards,


Bernard Pidoux,
F6BVP


Jarek Poplawski wrote:
> On Sat, Feb 09, 2008 at 07:44:50PM +0100, Bernard Pidoux F6BVP wrote:
>> Hi,
>>
>> With AX25 patches applied I still get this possible circular locking  
>> message.
> 
> Hi Bernard,
> 
> Could you confirm which exactly patches did you try? Is this vanilla
> 2.6.24 plus these two: ax25_timer and ax25_ds_timer or something more?
> And I'm not sure what do you mean by "still": the warning came back
> just after these last patches? At least these timer patches don't seem
> to change anything around locking?
> 
> Thanks for testing this,
> Jarek P.
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2][AX25] af_ax25: remove sock lock in ax25_info_show()
  2008-02-10 13:10     ` [PATCH v2][AX25] " Jarek Poplawski
@ 2008-02-12  5:25       ` David Miller
  0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2008-02-12  5:25 UTC (permalink / raw)
  To: jarkao2; +Cc: f6bvp, ralf, jann, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Sun, 10 Feb 2008 14:10:51 +0100

> [AX25] af_ax25: remove sock lock in ax25_info_show()
>  
> This lockdep warning:
> 
> > =======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 2.6.24 #3
> > -------------------------------------------------------
> > swapper/0 is trying to acquire lock:
> >  (ax25_list_lock){-+..}, at: [<f91dd3b1>] ax25_destroy_socket+0x171/0x1f0 [ax25]
> > 
> > but task is already holding lock:
> >  (slock-AF_AX25){-+..}, at: [<f91dbabc>] ax25_std_heartbeat_expiry+0x1c/0xe0 [ax25]
> > 
> > which lock already depends on the new lock.
> ...
> 
> shows that ax25_list_lock and slock-AF_AX25 are taken in different
> order: ax25_info_show() takes slock (bh_lock_sock(ax25->sk)) while
> ax25_list_lock is held, so reversely to other functions. To fix this
> the sock lock should be moved to ax25_info_start(), and there would
> be still problem with breaking ax25_list_lock (it seems this "proper"
> order isn't optimal yet). But, since it's only for reading proc info
> it seems this is not necessary (e.g.  ax25_send_to_raw() does similar
> reading without this lock too).
> 
> So, this patch removes sock lock to avoid deadlock possibility; there
> is also used sock_i_ino() function, which reads sk_socket under proper
> read lock. Additionally printf format of this i_ino is changed to %lu.
> 
> Reported-by: Bernard Pidoux F6BVP <f6bvp@free.fr>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

Applied, thanks Jarek.

> +			   sock_i_ino(ax25->sk));

Note that this taks the sk callback lock, it should be OK but
let's keep a watch out for any new lockdep warnings this
ends up causing :-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][AX25] ax25_ds_timer: use mod_timer instead of add_timer
  2008-02-06  9:14   ` [PATCH][AX25] ax25_ds_timer: use mod_timer instead of add_timer Jarek Poplawski
  2008-02-10 18:23     ` Jann Traschewski
@ 2008-02-12  5:38     ` David Miller
  1 sibling, 0 replies; 21+ messages in thread
From: David Miller @ 2008-02-12  5:38 UTC (permalink / raw)
  To: jarkao2; +Cc: netdev, ralf, jann

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Wed, 6 Feb 2008 09:14:13 +0000

> [AX25] ax25_ds_timer: use mod_timer instead of add_timer
> 
> This patch changes current use of: init_timer(), add_timer()
> and del_timer() to setup_timer() with mod_timer(), which
> should be safer anyway.
> 
> 
> Reported-by: Jann Traschewski <jann@gmx.de>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][AX25] ax25_ds_timer: use mod_timer instead of add_timer
  2008-02-06  9:14   ` [PATCH][AX25] ax25_ds_timer: use mod_timer instead of add_timer Jarek Poplawski
@ 2008-02-10 18:23     ` Jann Traschewski
  2008-02-12  5:38     ` David Miller
  1 sibling, 0 replies; 21+ messages in thread
From: Jann Traschewski @ 2008-02-10 18:23 UTC (permalink / raw)
  To: 'Jarek Poplawski', netdev
  Cc: 'Ralf Baechle', 'David Miller'

Patches from Jarek applied (incl. both "testing" patches). Machine is stable
since 2 days now.
Regards,
Jann

> -----Ursprüngliche Nachricht-----
> Von: Jarek Poplawski [mailto:jarkao2@gmail.com] 
> Gesendet: Mittwoch, 6. Februar 2008 10:14
> An: netdev@vger.kernel.org
> Cc: Ralf Baechle; Jann Traschewski; David Miller
> Betreff: [PATCH][AX25] ax25_ds_timer: use mod_timer instead 
> of add_timer
> 
> On Wed, Feb 06, 2008 at 08:15:09AM +0000, Jarek Poplawski wrote:
> > On Wed, Feb 06, 2008 at 07:45:29AM +0000, Jarek Poplawski wrote:
> > ...
> > > From: Jann Traschewski <jann@gmx.de>
> > > Subject: SMP with AX.25
> ...
> 
> 
> [AX25] ax25_ds_timer: use mod_timer instead of add_timer
> 
> This patch changes current use of: init_timer(), add_timer() 
> and del_timer() to setup_timer() with mod_timer(), which 
> should be safer anyway.
> 
> 
> Reported-by: Jann Traschewski <jann@gmx.de>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> 
> ---
> 
>  include/net/ax25.h       |    1 +
>  net/ax25/ax25_dev.c      |    2 +-
>  net/ax25/ax25_ds_timer.c |   12 ++++--------
>  3 files changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/include/net/ax25.h b/include/net/ax25.h index 
> 3f0236f..717e219 100644
> --- a/include/net/ax25.h
> +++ b/include/net/ax25.h
> @@ -324,6 +324,7 @@ extern void ax25_dama_on(ax25_cb *);  
> extern void ax25_dama_off(ax25_cb *);
>  
>  /* ax25_ds_timer.c */
> +extern void ax25_ds_setup_timer(ax25_dev *);
>  extern void ax25_ds_set_timer(ax25_dev *);  extern void 
> ax25_ds_del_timer(ax25_dev *);  extern void 
> ax25_ds_timer(ax25_cb *); diff --git a/net/ax25/ax25_dev.c 
> b/net/ax25/ax25_dev.c index 528c874..a7a0e0c 100644
> --- a/net/ax25/ax25_dev.c
> +++ b/net/ax25/ax25_dev.c
> @@ -82,7 +82,7 @@ void ax25_dev_device_up(struct net_device *dev)
>  	ax25_dev->values[AX25_VALUES_DS_TIMEOUT]= AX25_DEF_DS_TIMEOUT;
>  
>  #if defined(CONFIG_AX25_DAMA_SLAVE) || 
> defined(CONFIG_AX25_DAMA_MASTER)
> -	init_timer(&ax25_dev->dama.slave_timer);
> +	ax25_ds_setup_timer(ax25_dev);
>  #endif
>  
>  	spin_lock_bh(&ax25_dev_lock);
> diff --git a/net/ax25/ax25_ds_timer.c 
> b/net/ax25/ax25_ds_timer.c index c4e3b02..2ce79df 100644
> --- a/net/ax25/ax25_ds_timer.c
> +++ b/net/ax25/ax25_ds_timer.c
> @@ -40,13 +40,10 @@ static void ax25_ds_timeout(unsigned long);
>   *	1/10th of a second.
>   */
>  
> -static void ax25_ds_add_timer(ax25_dev *ax25_dev)
> +void ax25_ds_setup_timer(ax25_dev *ax25_dev)
>  {
> -	struct timer_list *t = &ax25_dev->dama.slave_timer;
> -	t->data		= (unsigned long) ax25_dev;
> -	t->function	= &ax25_ds_timeout;
> -	t->expires	= jiffies + HZ;
> -	add_timer(t);
> +	setup_timer(&ax25_dev->dama.slave_timer, ax25_ds_timeout,
> +		    (unsigned long)ax25_dev);
>  }
>  
>  void ax25_ds_del_timer(ax25_dev *ax25_dev) @@ -60,10 +57,9 
> @@ void ax25_ds_set_timer(ax25_dev *ax25_dev)
>  	if (ax25_dev == NULL)		/* paranoia */
>  		return;
>  
> -	del_timer(&ax25_dev->dama.slave_timer);
>  	ax25_dev->dama.slave_timeout =
>  		
> msecs_to_jiffies(ax25_dev->values[AX25_VALUES_DS_TIMEOUT]) / 10;
> -	ax25_ds_add_timer(ax25_dev);
> +	mod_timer(&ax25_dev->dama.slave_timer, jiffies + HZ);
>  }
>  
>  /*


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH][AX25] ax25_ds_timer: use mod_timer instead of add_timer
  2008-02-06  8:15 ` [PATCH][AX25] " Jarek Poplawski
@ 2008-02-06  9:14   ` Jarek Poplawski
  2008-02-10 18:23     ` Jann Traschewski
  2008-02-12  5:38     ` David Miller
  0 siblings, 2 replies; 21+ messages in thread
From: Jarek Poplawski @ 2008-02-06  9:14 UTC (permalink / raw)
  To: netdev; +Cc: Ralf Baechle, Jann Traschewski, David Miller

On Wed, Feb 06, 2008 at 08:15:09AM +0000, Jarek Poplawski wrote:
> On Wed, Feb 06, 2008 at 07:45:29AM +0000, Jarek Poplawski wrote:
> ...
> > From: Jann Traschewski <jann@gmx.de>
> > Subject: SMP with AX.25
...


[AX25] ax25_ds_timer: use mod_timer instead of add_timer

This patch changes current use of: init_timer(), add_timer()
and del_timer() to setup_timer() with mod_timer(), which
should be safer anyway.


Reported-by: Jann Traschewski <jann@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

---

 include/net/ax25.h       |    1 +
 net/ax25/ax25_dev.c      |    2 +-
 net/ax25/ax25_ds_timer.c |   12 ++++--------
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/include/net/ax25.h b/include/net/ax25.h
index 3f0236f..717e219 100644
--- a/include/net/ax25.h
+++ b/include/net/ax25.h
@@ -324,6 +324,7 @@ extern void ax25_dama_on(ax25_cb *);
 extern void ax25_dama_off(ax25_cb *);
 
 /* ax25_ds_timer.c */
+extern void ax25_ds_setup_timer(ax25_dev *);
 extern void ax25_ds_set_timer(ax25_dev *);
 extern void ax25_ds_del_timer(ax25_dev *);
 extern void ax25_ds_timer(ax25_cb *);
diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c
index 528c874..a7a0e0c 100644
--- a/net/ax25/ax25_dev.c
+++ b/net/ax25/ax25_dev.c
@@ -82,7 +82,7 @@ void ax25_dev_device_up(struct net_device *dev)
 	ax25_dev->values[AX25_VALUES_DS_TIMEOUT]= AX25_DEF_DS_TIMEOUT;
 
 #if defined(CONFIG_AX25_DAMA_SLAVE) || defined(CONFIG_AX25_DAMA_MASTER)
-	init_timer(&ax25_dev->dama.slave_timer);
+	ax25_ds_setup_timer(ax25_dev);
 #endif
 
 	spin_lock_bh(&ax25_dev_lock);
diff --git a/net/ax25/ax25_ds_timer.c b/net/ax25/ax25_ds_timer.c
index c4e3b02..2ce79df 100644
--- a/net/ax25/ax25_ds_timer.c
+++ b/net/ax25/ax25_ds_timer.c
@@ -40,13 +40,10 @@ static void ax25_ds_timeout(unsigned long);
  *	1/10th of a second.
  */
 
-static void ax25_ds_add_timer(ax25_dev *ax25_dev)
+void ax25_ds_setup_timer(ax25_dev *ax25_dev)
 {
-	struct timer_list *t = &ax25_dev->dama.slave_timer;
-	t->data		= (unsigned long) ax25_dev;
-	t->function	= &ax25_ds_timeout;
-	t->expires	= jiffies + HZ;
-	add_timer(t);
+	setup_timer(&ax25_dev->dama.slave_timer, ax25_ds_timeout,
+		    (unsigned long)ax25_dev);
 }
 
 void ax25_ds_del_timer(ax25_dev *ax25_dev)
@@ -60,10 +57,9 @@ void ax25_ds_set_timer(ax25_dev *ax25_dev)
 	if (ax25_dev == NULL)		/* paranoia */
 		return;
 
-	del_timer(&ax25_dev->dama.slave_timer);
 	ax25_dev->dama.slave_timeout =
 		msecs_to_jiffies(ax25_dev->values[AX25_VALUES_DS_TIMEOUT]) / 10;
-	ax25_ds_add_timer(ax25_dev);
+	mod_timer(&ax25_dev->dama.slave_timer, jiffies + HZ);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2008-02-12  5:38 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-17 10:06 [ROSE] [AX25] possible circular locking Bernard Pidoux F6BVP
2007-12-18 13:52 ` Jarek Poplawski
     [not found]   ` <476837BF.3070207@free.fr>
2007-12-18 22:04     ` Jarek Poplawski
2007-12-28 21:30   ` Pidoux
     [not found]   ` <47755FDB.2070501@free.fr>
2007-12-28 21:48     ` [PATCH][ROSE][AX25] af_ax25: " Jarek Poplawski
2007-12-30  3:14       ` David Miller
2007-12-30 14:13         ` Jarek Poplawski
2007-12-31  5:00           ` David Miller
2008-01-11  5:22           ` David Miller
2008-01-11  9:40             ` Jarek Poplawski
2008-01-12 19:48               ` Bernard Pidoux F6BVP
2008-01-11 21:40             ` [PATCH] [ROSE] two extra tab characters removed Bernard Pidoux F6BVP
2008-02-09 18:44   ` [PATCH][AX25] ax25_ds_timer: use mod_timer instead of add_timer Bernard Pidoux F6BVP
2008-02-09 19:39     ` Jarek Poplawski
2008-02-10 18:07       ` Bernard Pidoux F6BVP
2008-02-09 23:50     ` [PATCH][AX25] af_ax25: remove sock lock in ax25_info_show() Jarek Poplawski
2008-02-10 13:10     ` [PATCH v2][AX25] " Jarek Poplawski
2008-02-12  5:25       ` David Miller
2008-02-06  7:45 [BUG][AX25] Fwd: SMP with AX.25 Jarek Poplawski
2008-02-06  8:15 ` [PATCH][AX25] " Jarek Poplawski
2008-02-06  9:14   ` [PATCH][AX25] ax25_ds_timer: use mod_timer instead of add_timer Jarek Poplawski
2008-02-10 18:23     ` Jann Traschewski
2008-02-12  5:38     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.