* [PATCH 0/2] drop_monitor: misc fixes for some recently reported bugs @ 2012-04-26 18:47 Neil Horman 2012-04-26 18:47 ` [PATCH 1/2] drop_monitor: fix sleeping in invalid context warning Neil Horman ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Neil Horman @ 2012-04-26 18:47 UTC (permalink / raw) To: netdev; +Cc: Neil Horman, Eric Dumazet, David Miller Hey- Eric was using dropwatch recently and reported a few bugs to me that he had noted. This short series should fix them all up. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Miller <davem@davemloft.net> ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/2] drop_monitor: fix sleeping in invalid context warning 2012-04-26 18:47 [PATCH 0/2] drop_monitor: misc fixes for some recently reported bugs Neil Horman @ 2012-04-26 18:47 ` Neil Horman 2012-04-26 19:01 ` Eric Dumazet 2012-04-26 18:47 ` [PATCH 2/2] drop_monitor: Make updating data->skb smp safe Neil Horman 2012-04-27 20:11 ` [PATCH v2 0/2] drop_monitor: misc fixes for some recently reported bugs Neil Horman 2 siblings, 1 reply; 17+ messages in thread From: Neil Horman @ 2012-04-26 18:47 UTC (permalink / raw) To: netdev; +Cc: Neil Horman, David Miller Eric Dumazet pointed out this warning in the drop_monitor protocol to me: [ 38.352571] BUG: sleeping function called from invalid context at kernel/mutex.c:85 [ 38.352576] in_atomic(): 1, irqs_disabled(): 0, pid: 4415, name: dropwatch [ 38.352580] Pid: 4415, comm: dropwatch Not tainted 3.4.0-rc2+ #71 [ 38.352582] Call Trace: [ 38.352592] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0 [ 38.352599] [<ffffffff81063f2a>] __might_sleep+0xca/0xf0 [ 38.352606] [<ffffffff81655b16>] mutex_lock+0x26/0x50 [ 38.352610] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0 [ 38.352616] [<ffffffff810b72d9>] tracepoint_probe_register+0x29/0x90 [ 38.352621] [<ffffffff8153a585>] set_all_monitor_traces+0x105/0x170 [ 38.352625] [<ffffffff8153a8ca>] net_dm_cmd_trace+0x2a/0x40 [ 38.352630] [<ffffffff8154a81a>] genl_rcv_msg+0x21a/0x2b0 [ 38.352636] [<ffffffff810f8029>] ? zone_statistics+0x99/0xc0 [ 38.352640] [<ffffffff8154a600>] ? genl_rcv+0x30/0x30 [ 38.352645] [<ffffffff8154a059>] netlink_rcv_skb+0xa9/0xd0 [ 38.352649] [<ffffffff8154a5f0>] genl_rcv+0x20/0x30 [ 38.352653] [<ffffffff81549a7e>] netlink_unicast+0x1ae/0x1f0 [ 38.352658] [<ffffffff81549d76>] netlink_sendmsg+0x2b6/0x310 [ 38.352663] [<ffffffff8150824f>] sock_sendmsg+0x10f/0x130 [ 38.352668] [<ffffffff8150abe0>] ? move_addr_to_kernel+0x60/0xb0 [ 38.352673] [<ffffffff81515f04>] ? verify_iovec+0x64/0xe0 [ 38.352677] [<ffffffff81509c46>] __sys_sendmsg+0x386/0x390 [ 38.352682] [<ffffffff810ffaf9>] ? handle_mm_fault+0x139/0x210 [ 38.352687] [<ffffffff8165b5bc>] ? do_page_fault+0x1ec/0x4f0 [ 38.352693] [<ffffffff8106ba4d>] ? set_next_entity+0x9d/0xb0 [ 38.352699] [<ffffffff81310b49>] ? tty_ldisc_deref+0x9/0x10 [ 38.352703] [<ffffffff8106d363>] ? pick_next_task_fair+0x63/0x140 [ 38.352708] [<ffffffff8150b8d4>] sys_sendmsg+0x44/0x80 [ 38.352713] [<ffffffff8165f8e2>] system_call_fastpath+0x16/0x1b It stems from holding a spinlock (trace_state_lock) while attempting to register or unregister tracepoint hooks, making in_atomic() true in this context, leading to the warning when the tracepoint calls might_sleep() while its taking a mutex. Since we only use the trace_state_lock to prevent trace protocol state races, as well as hardware stat list updates on an rcu write side, we can just convert the spinlock to a mutex to avoid this problem. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> CC: David Miller <davem@davemloft.net> --- net/core/drop_monitor.c | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index 5c3c81a..04ce1dd 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -42,7 +42,7 @@ static void send_dm_alert(struct work_struct *unused); * netlink alerts */ static int trace_state = TRACE_OFF; -static DEFINE_SPINLOCK(trace_state_lock); +static DEFINE_MUTEX(trace_state_lock); struct per_cpu_dm_data { struct work_struct dm_alert_work; @@ -214,7 +214,7 @@ static int set_all_monitor_traces(int state) struct dm_hw_stat_delta *new_stat = NULL; struct dm_hw_stat_delta *temp; - spin_lock(&trace_state_lock); + mutex_lock(&trace_state_lock); if (state == trace_state) { rc = -EAGAIN; @@ -253,7 +253,7 @@ static int set_all_monitor_traces(int state) rc = -EINPROGRESS; out_unlock: - spin_unlock(&trace_state_lock); + mutex_unlock(&trace_state_lock); return rc; } @@ -296,12 +296,12 @@ static int dropmon_net_event(struct notifier_block *ev_block, new_stat->dev = dev; new_stat->last_rx = jiffies; - spin_lock(&trace_state_lock); + mutex_lock(&trace_state_lock); list_add_rcu(&new_stat->list, &hw_stats_list); - spin_unlock(&trace_state_lock); + mutex_unlock(&trace_state_lock); break; case NETDEV_UNREGISTER: - spin_lock(&trace_state_lock); + mutex_lock(&trace_state_lock); list_for_each_entry_safe(new_stat, tmp, &hw_stats_list, list) { if (new_stat->dev == dev) { new_stat->dev = NULL; @@ -312,7 +312,7 @@ static int dropmon_net_event(struct notifier_block *ev_block, } } } - spin_unlock(&trace_state_lock); + mutex_unlock(&trace_state_lock); break; } out: -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] drop_monitor: fix sleeping in invalid context warning 2012-04-26 18:47 ` [PATCH 1/2] drop_monitor: fix sleeping in invalid context warning Neil Horman @ 2012-04-26 19:01 ` Eric Dumazet 2012-04-26 19:21 ` Neil Horman 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2012-04-26 19:01 UTC (permalink / raw) To: Neil Horman; +Cc: netdev, David Miller On Thu, 2012-04-26 at 14:47 -0400, Neil Horman wrote: > Eric Dumazet pointed out this warning in the drop_monitor protocol to me: > > [ 38.352571] BUG: sleeping function called from invalid context at kernel/mutex.c:85 > [ 38.352576] in_atomic(): 1, irqs_disabled(): 0, pid: 4415, name: dropwatch > [ 38.352580] Pid: 4415, comm: dropwatch Not tainted 3.4.0-rc2+ #71 > [ 38.352582] Call Trace: > [ 38.352592] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0 > [ 38.352599] [<ffffffff81063f2a>] __might_sleep+0xca/0xf0 > [ 38.352606] [<ffffffff81655b16>] mutex_lock+0x26/0x50 > [ 38.352610] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0 > [ 38.352616] [<ffffffff810b72d9>] tracepoint_probe_register+0x29/0x90 > [ 38.352621] [<ffffffff8153a585>] set_all_monitor_traces+0x105/0x170 > [ 38.352625] [<ffffffff8153a8ca>] net_dm_cmd_trace+0x2a/0x40 > [ 38.352630] [<ffffffff8154a81a>] genl_rcv_msg+0x21a/0x2b0 > [ 38.352636] [<ffffffff810f8029>] ? zone_statistics+0x99/0xc0 > [ 38.352640] [<ffffffff8154a600>] ? genl_rcv+0x30/0x30 > [ 38.352645] [<ffffffff8154a059>] netlink_rcv_skb+0xa9/0xd0 > [ 38.352649] [<ffffffff8154a5f0>] genl_rcv+0x20/0x30 > [ 38.352653] [<ffffffff81549a7e>] netlink_unicast+0x1ae/0x1f0 > [ 38.352658] [<ffffffff81549d76>] netlink_sendmsg+0x2b6/0x310 > [ 38.352663] [<ffffffff8150824f>] sock_sendmsg+0x10f/0x130 > [ 38.352668] [<ffffffff8150abe0>] ? move_addr_to_kernel+0x60/0xb0 > [ 38.352673] [<ffffffff81515f04>] ? verify_iovec+0x64/0xe0 > [ 38.352677] [<ffffffff81509c46>] __sys_sendmsg+0x386/0x390 > [ 38.352682] [<ffffffff810ffaf9>] ? handle_mm_fault+0x139/0x210 > [ 38.352687] [<ffffffff8165b5bc>] ? do_page_fault+0x1ec/0x4f0 > [ 38.352693] [<ffffffff8106ba4d>] ? set_next_entity+0x9d/0xb0 > [ 38.352699] [<ffffffff81310b49>] ? tty_ldisc_deref+0x9/0x10 > [ 38.352703] [<ffffffff8106d363>] ? pick_next_task_fair+0x63/0x140 > [ 38.352708] [<ffffffff8150b8d4>] sys_sendmsg+0x44/0x80 > [ 38.352713] [<ffffffff8165f8e2>] system_call_fastpath+0x16/0x1b > > It stems from holding a spinlock (trace_state_lock) while attempting to register > or unregister tracepoint hooks, making in_atomic() true in this context, leading > to the warning when the tracepoint calls might_sleep() while its taking a mutex. > Since we only use the trace_state_lock to prevent trace protocol state races, as > well as hardware stat list updates on an rcu write side, we can just convert the > spinlock to a mutex to avoid this problem. > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > Reported-by: Eric Dumazet <eric.dumazet@gmail.com> > CC: David Miller <davem@davemloft.net> > --- > net/core/drop_monitor.c | 14 +++++++------- > 1 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c > index 5c3c81a..04ce1dd 100644 > --- a/net/core/drop_monitor.c > +++ b/net/core/drop_monitor.c > @@ -42,7 +42,7 @@ static void send_dm_alert(struct work_struct *unused); > * netlink alerts > */ > static int trace_state = TRACE_OFF; > -static DEFINE_SPINLOCK(trace_state_lock); > +static DEFINE_MUTEX(trace_state_lock); Maybe rename it to trace_state_mutex ? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] drop_monitor: fix sleeping in invalid context warning 2012-04-26 19:01 ` Eric Dumazet @ 2012-04-26 19:21 ` Neil Horman 0 siblings, 0 replies; 17+ messages in thread From: Neil Horman @ 2012-04-26 19:21 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, David Miller On Thu, Apr 26, 2012 at 09:01:29PM +0200, Eric Dumazet wrote: > On Thu, 2012-04-26 at 14:47 -0400, Neil Horman wrote: > > Eric Dumazet pointed out this warning in the drop_monitor protocol to me: > > > > [ 38.352571] BUG: sleeping function called from invalid context at kernel/mutex.c:85 > > [ 38.352576] in_atomic(): 1, irqs_disabled(): 0, pid: 4415, name: dropwatch > > [ 38.352580] Pid: 4415, comm: dropwatch Not tainted 3.4.0-rc2+ #71 > > [ 38.352582] Call Trace: > > [ 38.352592] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0 > > [ 38.352599] [<ffffffff81063f2a>] __might_sleep+0xca/0xf0 > > [ 38.352606] [<ffffffff81655b16>] mutex_lock+0x26/0x50 > > [ 38.352610] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0 > > [ 38.352616] [<ffffffff810b72d9>] tracepoint_probe_register+0x29/0x90 > > [ 38.352621] [<ffffffff8153a585>] set_all_monitor_traces+0x105/0x170 > > [ 38.352625] [<ffffffff8153a8ca>] net_dm_cmd_trace+0x2a/0x40 > > [ 38.352630] [<ffffffff8154a81a>] genl_rcv_msg+0x21a/0x2b0 > > [ 38.352636] [<ffffffff810f8029>] ? zone_statistics+0x99/0xc0 > > [ 38.352640] [<ffffffff8154a600>] ? genl_rcv+0x30/0x30 > > [ 38.352645] [<ffffffff8154a059>] netlink_rcv_skb+0xa9/0xd0 > > [ 38.352649] [<ffffffff8154a5f0>] genl_rcv+0x20/0x30 > > [ 38.352653] [<ffffffff81549a7e>] netlink_unicast+0x1ae/0x1f0 > > [ 38.352658] [<ffffffff81549d76>] netlink_sendmsg+0x2b6/0x310 > > [ 38.352663] [<ffffffff8150824f>] sock_sendmsg+0x10f/0x130 > > [ 38.352668] [<ffffffff8150abe0>] ? move_addr_to_kernel+0x60/0xb0 > > [ 38.352673] [<ffffffff81515f04>] ? verify_iovec+0x64/0xe0 > > [ 38.352677] [<ffffffff81509c46>] __sys_sendmsg+0x386/0x390 > > [ 38.352682] [<ffffffff810ffaf9>] ? handle_mm_fault+0x139/0x210 > > [ 38.352687] [<ffffffff8165b5bc>] ? do_page_fault+0x1ec/0x4f0 > > [ 38.352693] [<ffffffff8106ba4d>] ? set_next_entity+0x9d/0xb0 > > [ 38.352699] [<ffffffff81310b49>] ? tty_ldisc_deref+0x9/0x10 > > [ 38.352703] [<ffffffff8106d363>] ? pick_next_task_fair+0x63/0x140 > > [ 38.352708] [<ffffffff8150b8d4>] sys_sendmsg+0x44/0x80 > > [ 38.352713] [<ffffffff8165f8e2>] system_call_fastpath+0x16/0x1b > > > > It stems from holding a spinlock (trace_state_lock) while attempting to register > > or unregister tracepoint hooks, making in_atomic() true in this context, leading > > to the warning when the tracepoint calls might_sleep() while its taking a mutex. > > Since we only use the trace_state_lock to prevent trace protocol state races, as > > well as hardware stat list updates on an rcu write side, we can just convert the > > spinlock to a mutex to avoid this problem. > > > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > > Reported-by: Eric Dumazet <eric.dumazet@gmail.com> > > CC: David Miller <davem@davemloft.net> > > --- > > net/core/drop_monitor.c | 14 +++++++------- > > 1 files changed, 7 insertions(+), 7 deletions(-) > > > > diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c > > index 5c3c81a..04ce1dd 100644 > > --- a/net/core/drop_monitor.c > > +++ b/net/core/drop_monitor.c > > @@ -42,7 +42,7 @@ static void send_dm_alert(struct work_struct *unused); > > * netlink alerts > > */ > > static int trace_state = TRACE_OFF; > > -static DEFINE_SPINLOCK(trace_state_lock); > > +static DEFINE_MUTEX(trace_state_lock); > > Maybe rename it to trace_state_mutex ? > ACK, no problem > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 2/2] drop_monitor: Make updating data->skb smp safe 2012-04-26 18:47 [PATCH 0/2] drop_monitor: misc fixes for some recently reported bugs Neil Horman 2012-04-26 18:47 ` [PATCH 1/2] drop_monitor: fix sleeping in invalid context warning Neil Horman @ 2012-04-26 18:47 ` Neil Horman 2012-04-26 19:00 ` Eric Dumazet 2012-04-27 20:11 ` [PATCH v2 0/2] drop_monitor: misc fixes for some recently reported bugs Neil Horman 2 siblings, 1 reply; 17+ messages in thread From: Neil Horman @ 2012-04-26 18:47 UTC (permalink / raw) To: netdev; +Cc: Neil Horman, David Miller Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in its smp protections. Specifically, its possible to replace data->skb while its being written. This patch corrects that by making data->skb and rcu protected variable. That will prevent it from being overwritten while a tracepoint is modifying it. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> CC: David Miller <davem@davemloft.net> --- net/core/drop_monitor.c | 43 ++++++++++++++++++++++++++++++++----------- 1 files changed, 32 insertions(+), 11 deletions(-) diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index 04ce1dd..852e36b 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -46,7 +46,7 @@ static DEFINE_MUTEX(trace_state_lock); struct per_cpu_dm_data { struct work_struct dm_alert_work; - struct sk_buff *skb; + struct sk_buff __rcu *skb; atomic_t dm_hit_count; struct timer_list send_timer; }; @@ -79,29 +79,41 @@ static void reset_per_cpu_data(struct per_cpu_dm_data *data) size_t al; struct net_dm_alert_msg *msg; struct nlattr *nla; + struct sk_buff *skb; al = sizeof(struct net_dm_alert_msg); al += dm_hit_limit * sizeof(struct net_dm_drop_point); al += sizeof(struct nlattr); - data->skb = genlmsg_new(al, GFP_KERNEL); - genlmsg_put(data->skb, 0, 0, &net_drop_monitor_family, + skb = genlmsg_new(al, GFP_KERNEL); + genlmsg_put(skb, 0, 0, &net_drop_monitor_family, 0, NET_DM_CMD_ALERT); - nla = nla_reserve(data->skb, NLA_UNSPEC, sizeof(struct net_dm_alert_msg)); + nla = nla_reserve(skb, NLA_UNSPEC, sizeof(struct net_dm_alert_msg)); msg = nla_data(nla); memset(msg, 0, al); + + /* + * Don't need to lock this, since we are guaranteed to only + * run this on a single cpu at a time + */ + rcu_assign_pointer(data->skb, skb); + + synchronize_rcu(); + atomic_set(&data->dm_hit_count, dm_hit_limit); } static void send_dm_alert(struct work_struct *unused) { struct sk_buff *skb; - struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data); + struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data); /* * Grab the skb we're about to send */ - skb = data->skb; + rcu_read_lock(); + skb = rcu_dereference(data->skb); + rcu_read_unlock(); /* * Replace it with a new one @@ -113,6 +125,7 @@ static void send_dm_alert(struct work_struct *unused) */ genlmsg_multicast(skb, 0, NET_DM_GRP_ALERT, GFP_KERNEL); + put_cpu_var(dm_cpu_data); } /* @@ -123,9 +136,11 @@ static void send_dm_alert(struct work_struct *unused) */ static void sched_send_work(unsigned long unused) { - struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data); + struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data); + + schedule_work_on(smp_processor_id(), &data->dm_alert_work); - schedule_work(&data->dm_alert_work); + put_cpu_var(dm_cpu_data); } static void trace_drop_common(struct sk_buff *skb, void *location) @@ -134,9 +149,13 @@ static void trace_drop_common(struct sk_buff *skb, void *location) struct nlmsghdr *nlh; struct nlattr *nla; int i; - struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data); + struct sk_buff *dskb; + struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data); + rcu_read_lock(); + dskb = rcu_dereference(data->skb); + if (!atomic_add_unless(&data->dm_hit_count, -1, 0)) { /* * we're already at zero, discard this hit @@ -144,7 +163,7 @@ static void trace_drop_common(struct sk_buff *skb, void *location) goto out; } - nlh = (struct nlmsghdr *)data->skb->data; + nlh = (struct nlmsghdr *)dskb->data; nla = genlmsg_data(nlmsg_data(nlh)); msg = nla_data(nla); for (i = 0; i < msg->entries; i++) { @@ -158,7 +177,7 @@ static void trace_drop_common(struct sk_buff *skb, void *location) /* * We need to create a new entry */ - __nla_reserve_nohdr(data->skb, sizeof(struct net_dm_drop_point)); + __nla_reserve_nohdr(dskb, sizeof(struct net_dm_drop_point)); nla->nla_len += NLA_ALIGN(sizeof(struct net_dm_drop_point)); memcpy(msg->points[msg->entries].pc, &location, sizeof(void *)); msg->points[msg->entries].count = 1; @@ -170,6 +189,8 @@ static void trace_drop_common(struct sk_buff *skb, void *location) } out: + rcu_read_unlock(); + put_cpu_var(dm_cpu_data); return; } -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] drop_monitor: Make updating data->skb smp safe 2012-04-26 18:47 ` [PATCH 2/2] drop_monitor: Make updating data->skb smp safe Neil Horman @ 2012-04-26 19:00 ` Eric Dumazet 2012-04-26 19:18 ` Neil Horman 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2012-04-26 19:00 UTC (permalink / raw) To: Neil Horman; +Cc: netdev, David Miller On Thu, 2012-04-26 at 14:47 -0400, Neil Horman wrote: > Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in > its smp protections. Specifically, its possible to replace data->skb while its > being written. This patch corrects that by making data->skb and rcu protected > variable. That will prevent it from being overwritten while a tracepoint is > modifying it. > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > Reported-by: Eric Dumazet <eric.dumazet@gmail.com> > CC: David Miller <davem@davemloft.net> > --- > net/core/drop_monitor.c | 43 ++++++++++++++++++++++++++++++++----------- > 1 files changed, 32 insertions(+), 11 deletions(-) > Hi Neil I believe more work is needed on this patch > diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c > index 04ce1dd..852e36b 100644 > --- a/net/core/drop_monitor.c > +++ b/net/core/drop_monitor.c > @@ -46,7 +46,7 @@ static DEFINE_MUTEX(trace_state_lock); > > struct per_cpu_dm_data { > struct work_struct dm_alert_work; > - struct sk_buff *skb; > + struct sk_buff __rcu *skb; > atomic_t dm_hit_count; > struct timer_list send_timer; > }; > @@ -79,29 +79,41 @@ static void reset_per_cpu_data(struct per_cpu_dm_data *data) > size_t al; > struct net_dm_alert_msg *msg; > struct nlattr *nla; > + struct sk_buff *skb; > > al = sizeof(struct net_dm_alert_msg); > al += dm_hit_limit * sizeof(struct net_dm_drop_point); > al += sizeof(struct nlattr); > > - data->skb = genlmsg_new(al, GFP_KERNEL); > - genlmsg_put(data->skb, 0, 0, &net_drop_monitor_family, > + skb = genlmsg_new(al, GFP_KERNEL); skb can be NULL here... > + genlmsg_put(skb, 0, 0, &net_drop_monitor_family, > 0, NET_DM_CMD_ALERT); > - nla = nla_reserve(data->skb, NLA_UNSPEC, sizeof(struct net_dm_alert_msg)); > + nla = nla_reserve(skb, NLA_UNSPEC, sizeof(struct net_dm_alert_msg)); > msg = nla_data(nla); > memset(msg, 0, al); > + > + /* > + * Don't need to lock this, since we are guaranteed to only > + * run this on a single cpu at a time > + */ > + rcu_assign_pointer(data->skb, skb); > + > + synchronize_rcu(); > + > atomic_set(&data->dm_hit_count, dm_hit_limit); > } > > static void send_dm_alert(struct work_struct *unused) > { > struct sk_buff *skb; > - struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data); > + struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data); > > /* > * Grab the skb we're about to send > */ > - skb = data->skb; > + rcu_read_lock(); > + skb = rcu_dereference(data->skb); This protects nothing ... > + rcu_read_unlock(); > > /* > * Replace it with a new one > @@ -113,6 +125,7 @@ static void send_dm_alert(struct work_struct *unused) > */ > genlmsg_multicast(skb, 0, NET_DM_GRP_ALERT, GFP_KERNEL); > > + put_cpu_var(dm_cpu_data); > } > > /* > @@ -123,9 +136,11 @@ static void send_dm_alert(struct work_struct *unused) > */ > static void sched_send_work(unsigned long unused) > { > - struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data); > + struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data); > + > + schedule_work_on(smp_processor_id(), &data->dm_alert_work); > > - schedule_work(&data->dm_alert_work); > + put_cpu_var(dm_cpu_data); > } > > static void trace_drop_common(struct sk_buff *skb, void *location) > @@ -134,9 +149,13 @@ static void trace_drop_common(struct sk_buff *skb, void *location) > struct nlmsghdr *nlh; > struct nlattr *nla; > int i; > - struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data); > + struct sk_buff *dskb; > + struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data); > > > + rcu_read_lock(); > + dskb = rcu_dereference(data->skb); > + dskb can be NULL here > if (!atomic_add_unless(&data->dm_hit_count, -1, 0)) { > /* > * we're already at zero, discard this hit > @@ -144,7 +163,7 @@ static void trace_drop_common(struct sk_buff *skb, void *location) > goto out; > } > > - nlh = (struct nlmsghdr *)data->skb->data; > + nlh = (struct nlmsghdr *)dskb->data; > nla = genlmsg_data(nlmsg_data(nlh)); > msg = nla_data(nla); > for (i = 0; i < msg->entries; i++) { > @@ -158,7 +177,7 @@ static void trace_drop_common(struct sk_buff *skb, void *location) > /* > * We need to create a new entry > */ > - __nla_reserve_nohdr(data->skb, sizeof(struct net_dm_drop_point)); > + __nla_reserve_nohdr(dskb, sizeof(struct net_dm_drop_point)); > nla->nla_len += NLA_ALIGN(sizeof(struct net_dm_drop_point)); > memcpy(msg->points[msg->entries].pc, &location, sizeof(void *)); > msg->points[msg->entries].count = 1; > @@ -170,6 +189,8 @@ static void trace_drop_common(struct sk_buff *skb, void *location) > } > > out: > + rcu_read_unlock(); > + put_cpu_var(dm_cpu_data); > return; > } > Thanks ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] drop_monitor: Make updating data->skb smp safe 2012-04-26 19:00 ` Eric Dumazet @ 2012-04-26 19:18 ` Neil Horman 2012-04-26 19:25 ` Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: Neil Horman @ 2012-04-26 19:18 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, David Miller On Thu, Apr 26, 2012 at 09:00:09PM +0200, Eric Dumazet wrote: > On Thu, 2012-04-26 at 14:47 -0400, Neil Horman wrote: > > Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in > > its smp protections. Specifically, its possible to replace data->skb while its > > being written. This patch corrects that by making data->skb and rcu protected > > variable. That will prevent it from being overwritten while a tracepoint is > > modifying it. > > > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > > Reported-by: Eric Dumazet <eric.dumazet@gmail.com> > > CC: David Miller <davem@davemloft.net> > > --- > > net/core/drop_monitor.c | 43 ++++++++++++++++++++++++++++++++----------- > > 1 files changed, 32 insertions(+), 11 deletions(-) > > > > Hi Neil > > I believe more work is needed on this patch > > > diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c > > index 04ce1dd..852e36b 100644 > > --- a/net/core/drop_monitor.c > > +++ b/net/core/drop_monitor.c > > @@ -46,7 +46,7 @@ static DEFINE_MUTEX(trace_state_lock); > > > > struct per_cpu_dm_data { > > struct work_struct dm_alert_work; > > - struct sk_buff *skb; > > + struct sk_buff __rcu *skb; > > atomic_t dm_hit_count; > > struct timer_list send_timer; > > }; > > @@ -79,29 +79,41 @@ static void reset_per_cpu_data(struct per_cpu_dm_data *data) > > size_t al; > > struct net_dm_alert_msg *msg; > > struct nlattr *nla; > > + struct sk_buff *skb; > > > > al = sizeof(struct net_dm_alert_msg); > > al += dm_hit_limit * sizeof(struct net_dm_drop_point); > > al += sizeof(struct nlattr); > > > > - data->skb = genlmsg_new(al, GFP_KERNEL); > > - genlmsg_put(data->skb, 0, 0, &net_drop_monitor_family, > > + skb = genlmsg_new(al, GFP_KERNEL); > > skb can be NULL here... > Good point, I'll add NULL checks > > + genlmsg_put(skb, 0, 0, &net_drop_monitor_family, > > 0, NET_DM_CMD_ALERT); > > - nla = nla_reserve(data->skb, NLA_UNSPEC, sizeof(struct net_dm_alert_msg)); > > + nla = nla_reserve(skb, NLA_UNSPEC, sizeof(struct net_dm_alert_msg)); > > msg = nla_data(nla); > > memset(msg, 0, al); > > + > > + /* > > + * Don't need to lock this, since we are guaranteed to only > > + * run this on a single cpu at a time > > + */ > > + rcu_assign_pointer(data->skb, skb); > > + > > + synchronize_rcu(); > > + > > atomic_set(&data->dm_hit_count, dm_hit_limit); > > } > > > > static void send_dm_alert(struct work_struct *unused) > > { > > struct sk_buff *skb; > > - struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data); > > + struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data); > > > > /* > > * Grab the skb we're about to send > > */ > > - skb = data->skb; > > + rcu_read_lock(); > > + skb = rcu_dereference(data->skb); > > This protects nothing ... > Hmm, it doesn't really need to be protected either, I just added the read_lock to prevent any rcu_dereference from complaining about not holding the rcu_read_lock, but as I'm typing this, it occurs to me that that would make rcu_dereference_protected the call to use here. Thanks for kick starting me on that. > > + rcu_read_unlock(); > > > > > > > /* > > * Replace it with a new one > > @@ -113,6 +125,7 @@ static void send_dm_alert(struct work_struct *unused) > > */ > > genlmsg_multicast(skb, 0, NET_DM_GRP_ALERT, GFP_KERNEL); > > > > + put_cpu_var(dm_cpu_data); > > } > > > > /* > > @@ -123,9 +136,11 @@ static void send_dm_alert(struct work_struct *unused) > > */ > > static void sched_send_work(unsigned long unused) > > { > > - struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data); > > + struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data); > > + > > + schedule_work_on(smp_processor_id(), &data->dm_alert_work); > > > > - schedule_work(&data->dm_alert_work); > > + put_cpu_var(dm_cpu_data); > > } > > > > static void trace_drop_common(struct sk_buff *skb, void *location) > > @@ -134,9 +149,13 @@ static void trace_drop_common(struct sk_buff *skb, void *location) > > struct nlmsghdr *nlh; > > struct nlattr *nla; > > int i; > > - struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data); > > + struct sk_buff *dskb; > > + struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data); > > > > > > + rcu_read_lock(); > > + dskb = rcu_dereference(data->skb); > > + > > dskb can be NULL here > ACK, I'll check that > > if (!atomic_add_unless(&data->dm_hit_count, -1, 0)) { > > /* > > * we're already at zero, discard this hit > > @@ -144,7 +163,7 @@ static void trace_drop_common(struct sk_buff *skb, void *location) > > goto out; > > } > > > > - nlh = (struct nlmsghdr *)data->skb->data; > > + nlh = (struct nlmsghdr *)dskb->data; > > nla = genlmsg_data(nlmsg_data(nlh)); > > msg = nla_data(nla); > > for (i = 0; i < msg->entries; i++) { > > @@ -158,7 +177,7 @@ static void trace_drop_common(struct sk_buff *skb, void *location) > > /* > > * We need to create a new entry > > */ > > - __nla_reserve_nohdr(data->skb, sizeof(struct net_dm_drop_point)); > > + __nla_reserve_nohdr(dskb, sizeof(struct net_dm_drop_point)); > > nla->nla_len += NLA_ALIGN(sizeof(struct net_dm_drop_point)); > > memcpy(msg->points[msg->entries].pc, &location, sizeof(void *)); > > msg->points[msg->entries].count = 1; > > @@ -170,6 +189,8 @@ static void trace_drop_common(struct sk_buff *skb, void *location) > > } > > > > out: > > + rcu_read_unlock(); > > + put_cpu_var(dm_cpu_data); > > return; > > } > > > > Thanks > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] drop_monitor: Make updating data->skb smp safe 2012-04-26 19:18 ` Neil Horman @ 2012-04-26 19:25 ` Eric Dumazet 0 siblings, 0 replies; 17+ messages in thread From: Eric Dumazet @ 2012-04-26 19:25 UTC (permalink / raw) To: Neil Horman; +Cc: netdev, David Miller On Thu, 2012-04-26 at 15:18 -0400, Neil Horman wrote: > On Thu, Apr 26, 2012 at 09:00:09PM +0200, Eric Dumazet wrote: > > > * Grab the skb we're about to send > > > */ > > > - skb = data->skb; > > > + rcu_read_lock(); > > > + skb = rcu_dereference(data->skb); > > > > This protects nothing ... > > > Hmm, it doesn't really need to be protected either, I just added the read_lock > to prevent any rcu_dereference from complaining about not holding the > rcu_read_lock, but as I'm typing this, it occurs to me that that would make > rcu_dereference_protected the call to use here. Thanks for kick starting me on > that. perfect ;) ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 0/2] drop_monitor: misc fixes for some recently reported bugs 2012-04-26 18:47 [PATCH 0/2] drop_monitor: misc fixes for some recently reported bugs Neil Horman 2012-04-26 18:47 ` [PATCH 1/2] drop_monitor: fix sleeping in invalid context warning Neil Horman 2012-04-26 18:47 ` [PATCH 2/2] drop_monitor: Make updating data->skb smp safe Neil Horman @ 2012-04-27 20:11 ` Neil Horman 2012-04-27 20:11 ` [PATCH v2 1/2] drop_monitor: fix sleeping in invalid context warning Neil Horman ` (2 more replies) 2 siblings, 3 replies; 17+ messages in thread From: Neil Horman @ 2012-04-27 20:11 UTC (permalink / raw) To: netdev; +Cc: Neil Horman, Eric Dumazet, David Miller Hey- Eric was using dropwatch recently and reported a few bugs to me that he had noted. This short series should fix them all up. Change Notes: V2) renamed trace_state_lock to trace_state_mutex cleaned up rcu write side access to use rcu_dereference_protected handled some NULL allocation failure cases Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Miller <davem@davemloft.net> ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 1/2] drop_monitor: fix sleeping in invalid context warning 2012-04-27 20:11 ` [PATCH v2 0/2] drop_monitor: misc fixes for some recently reported bugs Neil Horman @ 2012-04-27 20:11 ` Neil Horman 2012-04-28 6:11 ` Eric Dumazet 2012-04-27 20:11 ` [PATCH v2 2/2] drop_monitor: Make updating data->skb smp safe Neil Horman 2012-04-28 6:19 ` [PATCH v2 0/2] drop_monitor: misc fixes for some recently reported bugs David Miller 2 siblings, 1 reply; 17+ messages in thread From: Neil Horman @ 2012-04-27 20:11 UTC (permalink / raw) To: netdev; +Cc: Neil Horman, David Miller Eric Dumazet pointed out this warning in the drop_monitor protocol to me: [ 38.352571] BUG: sleeping function called from invalid context at kernel/mutex.c:85 [ 38.352576] in_atomic(): 1, irqs_disabled(): 0, pid: 4415, name: dropwatch [ 38.352580] Pid: 4415, comm: dropwatch Not tainted 3.4.0-rc2+ #71 [ 38.352582] Call Trace: [ 38.352592] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0 [ 38.352599] [<ffffffff81063f2a>] __might_sleep+0xca/0xf0 [ 38.352606] [<ffffffff81655b16>] mutex_lock+0x26/0x50 [ 38.352610] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0 [ 38.352616] [<ffffffff810b72d9>] tracepoint_probe_register+0x29/0x90 [ 38.352621] [<ffffffff8153a585>] set_all_monitor_traces+0x105/0x170 [ 38.352625] [<ffffffff8153a8ca>] net_dm_cmd_trace+0x2a/0x40 [ 38.352630] [<ffffffff8154a81a>] genl_rcv_msg+0x21a/0x2b0 [ 38.352636] [<ffffffff810f8029>] ? zone_statistics+0x99/0xc0 [ 38.352640] [<ffffffff8154a600>] ? genl_rcv+0x30/0x30 [ 38.352645] [<ffffffff8154a059>] netlink_rcv_skb+0xa9/0xd0 [ 38.352649] [<ffffffff8154a5f0>] genl_rcv+0x20/0x30 [ 38.352653] [<ffffffff81549a7e>] netlink_unicast+0x1ae/0x1f0 [ 38.352658] [<ffffffff81549d76>] netlink_sendmsg+0x2b6/0x310 [ 38.352663] [<ffffffff8150824f>] sock_sendmsg+0x10f/0x130 [ 38.352668] [<ffffffff8150abe0>] ? move_addr_to_kernel+0x60/0xb0 [ 38.352673] [<ffffffff81515f04>] ? verify_iovec+0x64/0xe0 [ 38.352677] [<ffffffff81509c46>] __sys_sendmsg+0x386/0x390 [ 38.352682] [<ffffffff810ffaf9>] ? handle_mm_fault+0x139/0x210 [ 38.352687] [<ffffffff8165b5bc>] ? do_page_fault+0x1ec/0x4f0 [ 38.352693] [<ffffffff8106ba4d>] ? set_next_entity+0x9d/0xb0 [ 38.352699] [<ffffffff81310b49>] ? tty_ldisc_deref+0x9/0x10 [ 38.352703] [<ffffffff8106d363>] ? pick_next_task_fair+0x63/0x140 [ 38.352708] [<ffffffff8150b8d4>] sys_sendmsg+0x44/0x80 [ 38.352713] [<ffffffff8165f8e2>] system_call_fastpath+0x16/0x1b It stems from holding a spinlock (trace_state_lock) while attempting to register or unregister tracepoint hooks, making in_atomic() true in this context, leading to the warning when the tracepoint calls might_sleep() while its taking a mutex. Since we only use the trace_state_lock to prevent trace protocol state races, as well as hardware stat list updates on an rcu write side, we can just convert the spinlock to a mutex to avoid this problem. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> CC: David Miller <davem@davemloft.net> --- net/core/drop_monitor.c | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index 5c3c81a..a221a5b 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -42,7 +42,7 @@ static void send_dm_alert(struct work_struct *unused); * netlink alerts */ static int trace_state = TRACE_OFF; -static DEFINE_SPINLOCK(trace_state_lock); +static DEFINE_MUTEX(trace_state_mutex); struct per_cpu_dm_data { struct work_struct dm_alert_work; @@ -214,7 +214,7 @@ static int set_all_monitor_traces(int state) struct dm_hw_stat_delta *new_stat = NULL; struct dm_hw_stat_delta *temp; - spin_lock(&trace_state_lock); + mutex_lock(&trace_state_mutex); if (state == trace_state) { rc = -EAGAIN; @@ -253,7 +253,7 @@ static int set_all_monitor_traces(int state) rc = -EINPROGRESS; out_unlock: - spin_unlock(&trace_state_lock); + mutex_unlock(&trace_state_mutex); return rc; } @@ -296,12 +296,12 @@ static int dropmon_net_event(struct notifier_block *ev_block, new_stat->dev = dev; new_stat->last_rx = jiffies; - spin_lock(&trace_state_lock); + mutex_lock(&trace_state_mutex); list_add_rcu(&new_stat->list, &hw_stats_list); - spin_unlock(&trace_state_lock); + mutex_unlock(&trace_state_mutex); break; case NETDEV_UNREGISTER: - spin_lock(&trace_state_lock); + mutex_lock(&trace_state_mutex); list_for_each_entry_safe(new_stat, tmp, &hw_stats_list, list) { if (new_stat->dev == dev) { new_stat->dev = NULL; @@ -312,7 +312,7 @@ static int dropmon_net_event(struct notifier_block *ev_block, } } } - spin_unlock(&trace_state_lock); + mutex_unlock(&trace_state_mutex); break; } out: -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] drop_monitor: fix sleeping in invalid context warning 2012-04-27 20:11 ` [PATCH v2 1/2] drop_monitor: fix sleeping in invalid context warning Neil Horman @ 2012-04-28 6:11 ` Eric Dumazet 2012-04-28 14:30 ` Neil Horman 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2012-04-28 6:11 UTC (permalink / raw) To: Neil Horman; +Cc: netdev, David Miller On Fri, 2012-04-27 at 16:11 -0400, Neil Horman wrote: > Eric Dumazet pointed out this warning in the drop_monitor protocol to me: > > It stems from holding a spinlock (trace_state_lock) while attempting to register > or unregister tracepoint hooks, making in_atomic() true in this context, leading > to the warning when the tracepoint calls might_sleep() while its taking a mutex. > Since we only use the trace_state_lock to prevent trace protocol state races, as > well as hardware stat list updates on an rcu write side, we can just convert the > spinlock to a mutex to avoid this problem. > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > Reported-by: Eric Dumazet <eric.dumazet@gmail.com> > CC: David Miller <davem@davemloft.net> > --- Acked-by: Eric Dumazet <edumazet@google.com> Thanks Neil ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] drop_monitor: fix sleeping in invalid context warning 2012-04-28 6:11 ` Eric Dumazet @ 2012-04-28 14:30 ` Neil Horman 0 siblings, 0 replies; 17+ messages in thread From: Neil Horman @ 2012-04-28 14:30 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, David Miller On Sat, Apr 28, 2012 at 08:11:13AM +0200, Eric Dumazet wrote: > On Fri, 2012-04-27 at 16:11 -0400, Neil Horman wrote: > > Eric Dumazet pointed out this warning in the drop_monitor protocol to me: > > > > > It stems from holding a spinlock (trace_state_lock) while attempting to register > > or unregister tracepoint hooks, making in_atomic() true in this context, leading > > to the warning when the tracepoint calls might_sleep() while its taking a mutex. > > Since we only use the trace_state_lock to prevent trace protocol state races, as > > well as hardware stat list updates on an rcu write side, we can just convert the > > spinlock to a mutex to avoid this problem. > > > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > > Reported-by: Eric Dumazet <eric.dumazet@gmail.com> > > CC: David Miller <davem@davemloft.net> > > --- > > Acked-by: Eric Dumazet <edumazet@google.com> > > Thanks Neil > Anytmie, thanks for reporting it. Neil > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 2/2] drop_monitor: Make updating data->skb smp safe 2012-04-27 20:11 ` [PATCH v2 0/2] drop_monitor: misc fixes for some recently reported bugs Neil Horman 2012-04-27 20:11 ` [PATCH v2 1/2] drop_monitor: fix sleeping in invalid context warning Neil Horman @ 2012-04-27 20:11 ` Neil Horman 2012-04-28 6:13 ` Eric Dumazet 2012-06-04 7:45 ` Eric Dumazet 2012-04-28 6:19 ` [PATCH v2 0/2] drop_monitor: misc fixes for some recently reported bugs David Miller 2 siblings, 2 replies; 17+ messages in thread From: Neil Horman @ 2012-04-27 20:11 UTC (permalink / raw) To: netdev; +Cc: Neil Horman, David Miller Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in its smp protections. Specifically, its possible to replace data->skb while its being written. This patch corrects that by making data->skb and rcu protected variable. That will prevent it from being overwritten while a tracepoint is modifying it. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> CC: David Miller <davem@davemloft.net> --- net/core/drop_monitor.c | 70 ++++++++++++++++++++++++++++++++++++----------- 1 files changed, 54 insertions(+), 16 deletions(-) diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index a221a5b..4e04cf6 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -46,7 +46,7 @@ static DEFINE_MUTEX(trace_state_mutex); struct per_cpu_dm_data { struct work_struct dm_alert_work; - struct sk_buff *skb; + struct sk_buff __rcu *skb; atomic_t dm_hit_count; struct timer_list send_timer; }; @@ -73,35 +73,58 @@ static int dm_hit_limit = 64; static int dm_delay = 1; static unsigned long dm_hw_check_delta = 2*HZ; static LIST_HEAD(hw_stats_list); +static int initalized = 0; static void reset_per_cpu_data(struct per_cpu_dm_data *data) { size_t al; struct net_dm_alert_msg *msg; struct nlattr *nla; + struct sk_buff *skb; + struct sk_buff *oskb = rcu_dereference_protected(data->skb, 1); al = sizeof(struct net_dm_alert_msg); al += dm_hit_limit * sizeof(struct net_dm_drop_point); al += sizeof(struct nlattr); - data->skb = genlmsg_new(al, GFP_KERNEL); - genlmsg_put(data->skb, 0, 0, &net_drop_monitor_family, - 0, NET_DM_CMD_ALERT); - nla = nla_reserve(data->skb, NLA_UNSPEC, sizeof(struct net_dm_alert_msg)); - msg = nla_data(nla); - memset(msg, 0, al); - atomic_set(&data->dm_hit_count, dm_hit_limit); + skb = genlmsg_new(al, GFP_KERNEL); + + if (skb) { + genlmsg_put(skb, 0, 0, &net_drop_monitor_family, + 0, NET_DM_CMD_ALERT); + nla = nla_reserve(skb, NLA_UNSPEC, + sizeof(struct net_dm_alert_msg)); + msg = nla_data(nla); + memset(msg, 0, al); + } else if (initalized) + schedule_work_on(smp_processor_id(), &data->dm_alert_work); + + /* + * Don't need to lock this, since we are guaranteed to only + * run this on a single cpu at a time. + * Note also that we only update data->skb if the old and new skb + * pointers don't match. This ensures that we don't continually call + * synchornize_rcu if we repeatedly fail to alloc a new netlink message. + */ + if (skb != oskb) { + rcu_assign_pointer(data->skb, skb); + + synchronize_rcu(); + + atomic_set(&data->dm_hit_count, dm_hit_limit); + } + } static void send_dm_alert(struct work_struct *unused) { struct sk_buff *skb; - struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data); + struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data); /* * Grab the skb we're about to send */ - skb = data->skb; + skb = rcu_dereference_protected(data->skb, 1); /* * Replace it with a new one @@ -111,8 +134,10 @@ static void send_dm_alert(struct work_struct *unused) /* * Ship it! */ - genlmsg_multicast(skb, 0, NET_DM_GRP_ALERT, GFP_KERNEL); + if (skb) + genlmsg_multicast(skb, 0, NET_DM_GRP_ALERT, GFP_KERNEL); + put_cpu_var(dm_cpu_data); } /* @@ -123,9 +148,11 @@ static void send_dm_alert(struct work_struct *unused) */ static void sched_send_work(unsigned long unused) { - struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data); + struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data); + + schedule_work_on(smp_processor_id(), &data->dm_alert_work); - schedule_work(&data->dm_alert_work); + put_cpu_var(dm_cpu_data); } static void trace_drop_common(struct sk_buff *skb, void *location) @@ -134,9 +161,16 @@ static void trace_drop_common(struct sk_buff *skb, void *location) struct nlmsghdr *nlh; struct nlattr *nla; int i; - struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data); + struct sk_buff *dskb; + struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data); + rcu_read_lock(); + dskb = rcu_dereference(data->skb); + + if (!dskb) + goto out; + if (!atomic_add_unless(&data->dm_hit_count, -1, 0)) { /* * we're already at zero, discard this hit @@ -144,7 +178,7 @@ static void trace_drop_common(struct sk_buff *skb, void *location) goto out; } - nlh = (struct nlmsghdr *)data->skb->data; + nlh = (struct nlmsghdr *)dskb->data; nla = genlmsg_data(nlmsg_data(nlh)); msg = nla_data(nla); for (i = 0; i < msg->entries; i++) { @@ -158,7 +192,7 @@ static void trace_drop_common(struct sk_buff *skb, void *location) /* * We need to create a new entry */ - __nla_reserve_nohdr(data->skb, sizeof(struct net_dm_drop_point)); + __nla_reserve_nohdr(dskb, sizeof(struct net_dm_drop_point)); nla->nla_len += NLA_ALIGN(sizeof(struct net_dm_drop_point)); memcpy(msg->points[msg->entries].pc, &location, sizeof(void *)); msg->points[msg->entries].count = 1; @@ -170,6 +204,8 @@ static void trace_drop_common(struct sk_buff *skb, void *location) } out: + rcu_read_unlock(); + put_cpu_var(dm_cpu_data); return; } @@ -375,6 +411,8 @@ static int __init init_net_drop_monitor(void) data->send_timer.function = sched_send_work; } + initalized = 1; + goto out; out_unreg: -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] drop_monitor: Make updating data->skb smp safe 2012-04-27 20:11 ` [PATCH v2 2/2] drop_monitor: Make updating data->skb smp safe Neil Horman @ 2012-04-28 6:13 ` Eric Dumazet 2012-06-04 7:45 ` Eric Dumazet 1 sibling, 0 replies; 17+ messages in thread From: Eric Dumazet @ 2012-04-28 6:13 UTC (permalink / raw) To: Neil Horman; +Cc: netdev, David Miller On Fri, 2012-04-27 at 16:11 -0400, Neil Horman wrote: > Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in > its smp protections. Specifically, its possible to replace data->skb while its > being written. This patch corrects that by making data->skb and rcu protected > variable. That will prevent it from being overwritten while a tracepoint is > modifying it. > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > Reported-by: Eric Dumazet <eric.dumazet@gmail.com> > CC: David Miller <davem@davemloft.net> > --- > net/core/drop_monitor.c | 70 ++++++++++++++++++++++++++++++++++++----------- > 1 files changed, 54 insertions(+), 16 deletions(-) Acked-by: Eric Dumazet <edumazet@google.com> Thanks Neil ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] drop_monitor: Make updating data->skb smp safe 2012-04-27 20:11 ` [PATCH v2 2/2] drop_monitor: Make updating data->skb smp safe Neil Horman 2012-04-28 6:13 ` Eric Dumazet @ 2012-06-04 7:45 ` Eric Dumazet 2012-06-04 10:39 ` Neil Horman 1 sibling, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2012-06-04 7:45 UTC (permalink / raw) To: Neil Horman; +Cc: netdev, David Miller On Fri, 2012-04-27 at 16:11 -0400, Neil Horman wrote: > Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in > its smp protections. Specifically, its possible to replace data->skb while its > being written. This patch corrects that by making data->skb and rcu protected > variable. That will prevent it from being overwritten while a tracepoint is > modifying it. > > static void send_dm_alert(struct work_struct *unused) > { > struct sk_buff *skb; > - struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data); > + struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data); > > /* > * Grab the skb we're about to send > */ > - skb = data->skb; > + skb = rcu_dereference_protected(data->skb, 1); > > /* > * Replace it with a new one > @@ -111,8 +134,10 @@ static void send_dm_alert(struct work_struct *unused) > /* > * Ship it! > */ > - genlmsg_multicast(skb, 0, NET_DM_GRP_ALERT, GFP_KERNEL); > + if (skb) > + genlmsg_multicast(skb, 0, NET_DM_GRP_ALERT, GFP_KERNEL); > > + put_cpu_var(dm_cpu_data); > } > Oh well, drop_monitor can still trigger alerts : Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161774] BUG: sleeping function called from invalid context at mm/slub.c:943 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161779] in_atomic(): 1, irqs_disabled(): 0, pid: 2103, name: kworker/0:2 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161782] Pid: 2103, comm: kworker/0:2 Not tainted 3.5.0-rc1+ #55 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161784] Call Trace: Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161793] [<ffffffff810697ca>] __might_sleep+0xca/0xf0 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161798] [<ffffffff811345a3>] kmem_cache_alloc_node+0x1b3/0x1c0 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161804] [<ffffffff8105578c>] ? queue_delayed_work_on+0x11c/0x130 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161808] [<ffffffff815343fb>] __alloc_skb+0x4b/0x230 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161813] [<ffffffffa00b0360>] ? reset_per_cpu_data+0x160/0x160 [drop_monitor] Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161817] [<ffffffffa00b022f>] reset_per_cpu_data+0x2f/0x160 [drop_monitor] Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161820] [<ffffffffa00b03ab>] send_dm_alert+0x4b/0xb0 [drop_monitor] Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161824] [<ffffffff810568e0>] process_one_work+0x130/0x4c0 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161827] [<ffffffff81058249>] worker_thread+0x159/0x360 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161830] [<ffffffff810580f0>] ? manage_workers.isra.27+0x240/0x240 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161834] [<ffffffff8105d403>] kthread+0x93/0xa0 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161839] [<ffffffff816be6d4>] kernel_thread_helper+0x4/0x10 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161843] [<ffffffff8105d370>] ? kthread_freezable_should_stop+0x80/0x80 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161846] [<ffffffff816be6d0>] ? gs_change+0xb/0xb Also synchronize_rcu() cant be called in reset_per_cpu_data() for the same reason. Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161865] BUG: scheduling while atomic: kworker/0:2/2103/0x00000002 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161881] Modules linked in: drop_monitor ip6table_filter ip6_tables ebtable_nat ebtables fuse ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT iptable_filter bridge stp rt61pci crc_itu_t rt2x00pci rt2x00lib eeprom_93cx6 igb ixgbe mdio Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161884] Pid: 2103, comm: kworker/0:2 Not tainted 3.5.0-rc1+ #55 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161885] Call Trace: Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161890] [<ffffffff816ab9c3>] __schedule_bug+0x48/0x54 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161895] [<ffffffff816b42d3>] __schedule+0x793/0x7e0 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161898] [<ffffffff811314b2>] ? set_track+0x62/0x1a0 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161901] [<ffffffff816b43d9>] schedule+0x29/0x70 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161904] [<ffffffff816b2a15>] schedule_timeout+0x2c5/0x340 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161907] [<ffffffffa00b022f>] ? reset_per_cpu_data+0x2f/0x160 [drop_monitor] Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161910] [<ffffffff815343fb>] ? __alloc_skb+0x4b/0x230 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161914] [<ffffffff816b3a1a>] wait_for_common+0x13a/0x180 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161917] [<ffffffff8106f1f0>] ? try_to_wake_up+0x2e0/0x2e0 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161920] [<ffffffffa00b022f>] ? reset_per_cpu_data+0x2f/0x160 [drop_monitor] Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161925] [<ffffffff810be860>] ? call_rcu_bh+0x20/0x20 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161928] [<ffffffffa00b0360>] ? reset_per_cpu_data+0x160/0x160 [drop_monitor] Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161931] [<ffffffff816b3b3d>] wait_for_completion+0x1d/0x20 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161934] [<ffffffff8105a47d>] wait_rcu_gp+0x4d/0x60 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161937] [<ffffffff8105a490>] ? wait_rcu_gp+0x60/0x60 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161941] [<ffffffff812a0101>] ? uuid_le_gen+0x1/0x30 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161944] [<ffffffff810bf364>] synchronize_sched+0x44/0x50 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161948] [<ffffffffa00b02b5>] reset_per_cpu_data+0xb5/0x160 [drop_monitor] Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161951] [<ffffffffa00b03ab>] send_dm_alert+0x4b/0xb0 [drop_monitor] Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161955] [<ffffffff810568e0>] process_one_work+0x130/0x4c0 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161957] [<ffffffff81058249>] worker_thread+0x159/0x360 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161960] [<ffffffff810580f0>] ? manage_workers.isra.27+0x240/0x240 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161963] [<ffffffff8105d403>] kthread+0x93/0xa0 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161966] [<ffffffff816be6d4>] kernel_thread_helper+0x4/0x10 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161970] [<ffffffff8105d370>] ? kthread_freezable_should_stop+0x80/0x80 Jun 4 09:03:46 edumazet-laptop kernel: [ 2999.161973] [<ffffffff816be6d0>] ? gs_change+0xb/0xb ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] drop_monitor: Make updating data->skb smp safe 2012-06-04 7:45 ` Eric Dumazet @ 2012-06-04 10:39 ` Neil Horman 0 siblings, 0 replies; 17+ messages in thread From: Neil Horman @ 2012-06-04 10:39 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, David Miller On Mon, Jun 04, 2012 at 09:45:10AM +0200, Eric Dumazet wrote: > On Fri, 2012-04-27 at 16:11 -0400, Neil Horman wrote: > > Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in > > its smp protections. Specifically, its possible to replace data->skb while its > > being written. This patch corrects that by making data->skb and rcu protected > > variable. That will prevent it from being overwritten while a tracepoint is > > modifying it. > > > > > static void send_dm_alert(struct work_struct *unused) > > { > > struct sk_buff *skb; > > - struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data); > > + struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data); > > > > /* > > * Grab the skb we're about to send > > */ > > - skb = data->skb; > > + skb = rcu_dereference_protected(data->skb, 1); > > > > /* > > * Replace it with a new one > > @@ -111,8 +134,10 @@ static void send_dm_alert(struct work_struct *unused) > > /* > > * Ship it! > > */ > > - genlmsg_multicast(skb, 0, NET_DM_GRP_ALERT, GFP_KERNEL); > > + if (skb) > > + genlmsg_multicast(skb, 0, NET_DM_GRP_ALERT, GFP_KERNEL); > > > > + put_cpu_var(dm_cpu_data); > > } > > > > Oh well, drop_monitor can still trigger alerts : > Grr, Not sure why I didn't see this before. I'll take care of it shortly. Neil ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/2] drop_monitor: misc fixes for some recently reported bugs 2012-04-27 20:11 ` [PATCH v2 0/2] drop_monitor: misc fixes for some recently reported bugs Neil Horman 2012-04-27 20:11 ` [PATCH v2 1/2] drop_monitor: fix sleeping in invalid context warning Neil Horman 2012-04-27 20:11 ` [PATCH v2 2/2] drop_monitor: Make updating data->skb smp safe Neil Horman @ 2012-04-28 6:19 ` David Miller 2 siblings, 0 replies; 17+ messages in thread From: David Miller @ 2012-04-28 6:19 UTC (permalink / raw) To: nhorman; +Cc: netdev, eric.dumazet From: Neil Horman <nhorman@tuxdriver.com> Date: Fri, 27 Apr 2012 16:11:47 -0400 > Eric was using dropwatch recently and reported a few bugs to me that he > had noted. This short series should fix them all up. > > Change Notes: > V2) > renamed trace_state_lock to trace_state_mutex > cleaned up rcu write side access to use rcu_dereference_protected > handled some NULL allocation failure cases Both applied, but in the second patch I had to fix the spelling of the "initialized" variable :-) Thanks Neil. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2012-06-04 10:39 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-04-26 18:47 [PATCH 0/2] drop_monitor: misc fixes for some recently reported bugs Neil Horman 2012-04-26 18:47 ` [PATCH 1/2] drop_monitor: fix sleeping in invalid context warning Neil Horman 2012-04-26 19:01 ` Eric Dumazet 2012-04-26 19:21 ` Neil Horman 2012-04-26 18:47 ` [PATCH 2/2] drop_monitor: Make updating data->skb smp safe Neil Horman 2012-04-26 19:00 ` Eric Dumazet 2012-04-26 19:18 ` Neil Horman 2012-04-26 19:25 ` Eric Dumazet 2012-04-27 20:11 ` [PATCH v2 0/2] drop_monitor: misc fixes for some recently reported bugs Neil Horman 2012-04-27 20:11 ` [PATCH v2 1/2] drop_monitor: fix sleeping in invalid context warning Neil Horman 2012-04-28 6:11 ` Eric Dumazet 2012-04-28 14:30 ` Neil Horman 2012-04-27 20:11 ` [PATCH v2 2/2] drop_monitor: Make updating data->skb smp safe Neil Horman 2012-04-28 6:13 ` Eric Dumazet 2012-06-04 7:45 ` Eric Dumazet 2012-06-04 10:39 ` Neil Horman 2012-04-28 6:19 ` [PATCH v2 0/2] drop_monitor: misc fixes for some recently reported bugs David Miller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.