Linux-rt-users archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] net/xfrm/xfrm_ipcomp: Use {get,put}_cpu_light
@ 2019-07-17  7:20 Juri Lelli
  2019-07-17  7:22 ` Daniel Bristot de Oliveira
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Juri Lelli @ 2019-07-17  7:20 UTC (permalink / raw)
  To: rostedt, tglx, bigeasy; +Cc: linux-rt-users, linux-kernel, williams, Juri Lelli

The following BUG has been reported while running ipsec tests.

 BUG: scheduling while atomic: irq/78-eno3-rx-/12023/0x00000002
 Modules linked in: ipcomp xfrm_ipcomp ...
 Preemption disabled at:
 [<ffffffffc0b29730>] ipcomp_input+0xd0/0x9a0 [xfrm_ipcomp]
 CPU: 1 PID: 12023 Comm: irq/78-eno3-rx- Kdump: loaded Not tainted [...] #1
 Hardware name: [...]
 Call Trace:
  dump_stack+0x5c/0x80
  ? ipcomp_input+0xd0/0x9a0 [xfrm_ipcomp]
  __schedule_bug.cold.81+0x44/0x51
  __schedule+0x5bf/0x6a0
  schedule+0x39/0xd0
  rt_spin_lock_slowlock_locked+0x10e/0x2b0
  rt_spin_lock_slowlock+0x50/0x80
  get_page_from_freelist+0x609/0x1560
  ? zlib_updatewindow+0x5a/0xd0
  __alloc_pages_nodemask+0xd9/0x280
  ipcomp_input+0x299/0x9a0 [xfrm_ipcomp]
  xfrm_input+0x5e3/0x960
  xfrm4_ipcomp_rcv+0x34/0x50
  ip_local_deliver_finish+0x22d/0x250
  ip_local_deliver+0x6d/0x110
  ? ip_rcv_finish+0xac/0x480
  ip_rcv+0x28e/0x3f9
  ? packet_rcv+0x43/0x4c0
  __netif_receive_skb_core+0xb7c/0xd10
  ? inet_gro_receive+0x8e/0x2f0
  netif_receive_skb_internal+0x4a/0x160
  napi_gro_receive+0xee/0x110
  tg3_rx+0x2a8/0x810 [tg3]
  tg3_poll_work+0x3b3/0x830 [tg3]
  tg3_poll_msix+0x3b/0x170 [tg3]
  net_rx_action+0x1ff/0x470
  ? __switch_to_asm+0x41/0x70
  do_current_softirqs+0x223/0x3e0
  ? irq_thread_check_affinity+0x20/0x20
  __local_bh_enable+0x51/0x60
  irq_forced_thread_fn+0x5e/0x80
  ? irq_finalize_oneshot.part.45+0xf0/0xf0
  irq_thread+0x13d/0x1a0
  ? wake_threads_waitq+0x30/0x30
  kthread+0x112/0x130
  ? kthread_create_worker_on_cpu+0x70/0x70
  ret_from_fork+0x35/0x40

The problem resides in the fact that get_cpu() called from ipcomp_input()
disables preemption, and that triggers the scheduling while atomic BUG further
down the callpath chain of get_page_from_freelist(), i.e.,

  ipcomp_input
    ipcomp_decompress
      <-- get_cpu()
      alloc_page(GFP_ATOMIC)
        alloc_pages(GFP_ATOMIC, 0)
          alloc_pages_current
            __alloc_pages_nodemask
              get_page_from_freelist
                (try_this_zone:) rmqueue
                  rmqueue_pcplist
                    __rmqueue_pcplist
                      rmqueue_bulk
                        <-- spin_lock(&zone->lock) - BUG

Fix this by using {get,put}_cpu_light() in ipcomp_decompress().

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
Hi,

This has been found on a 4.19.x-rt kernel, but 5.x-rt(s) are affected as
well.

Best,

Juri
---
 net/xfrm/xfrm_ipcomp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
index a00ec715aa46..39d9e663384f 100644
--- a/net/xfrm/xfrm_ipcomp.c
+++ b/net/xfrm/xfrm_ipcomp.c
@@ -45,7 +45,7 @@ static int ipcomp_decompress(struct xfrm_state *x, struct sk_buff *skb)
 	const int plen = skb->len;
 	int dlen = IPCOMP_SCRATCH_SIZE;
 	const u8 *start = skb->data;
-	const int cpu = get_cpu();
+	const int cpu = get_cpu_light();
 	u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
 	struct crypto_comp *tfm = *per_cpu_ptr(ipcd->tfms, cpu);
 	int err = crypto_comp_decompress(tfm, start, plen, scratch, &dlen);
@@ -103,7 +103,7 @@ static int ipcomp_decompress(struct xfrm_state *x, struct sk_buff *skb)
 	err = 0;
 
 out:
-	put_cpu();
+	put_cpu_light();
 	return err;
 }
 
-- 
2.17.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net/xfrm/xfrm_ipcomp: Use {get,put}_cpu_light
  2019-07-17  7:20 [PATCH] net/xfrm/xfrm_ipcomp: Use {get,put}_cpu_light Juri Lelli
@ 2019-07-17  7:22 ` Daniel Bristot de Oliveira
  2019-07-17 13:58 ` Pankaj Gupta
  2019-08-13 15:58 ` Sebastian Andrzej Siewior
  2 siblings, 0 replies; 4+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-07-17  7:22 UTC (permalink / raw)
  To: Juri Lelli, rostedt, tglx, bigeasy; +Cc: linux-rt-users, linux-kernel, williams



On 17/07/2019 09:20, Juri Lelli wrote:
> The following BUG has been reported while running ipsec tests.
> 
>  BUG: scheduling while atomic: irq/78-eno3-rx-/12023/0x00000002
>  Modules linked in: ipcomp xfrm_ipcomp ...
>  Preemption disabled at:
>  [<ffffffffc0b29730>] ipcomp_input+0xd0/0x9a0 [xfrm_ipcomp]
>  CPU: 1 PID: 12023 Comm: irq/78-eno3-rx- Kdump: loaded Not tainted [...] #1
>  Hardware name: [...]
>  Call Trace:
>   dump_stack+0x5c/0x80
>   ? ipcomp_input+0xd0/0x9a0 [xfrm_ipcomp]
>   __schedule_bug.cold.81+0x44/0x51
>   __schedule+0x5bf/0x6a0
>   schedule+0x39/0xd0
>   rt_spin_lock_slowlock_locked+0x10e/0x2b0
>   rt_spin_lock_slowlock+0x50/0x80
>   get_page_from_freelist+0x609/0x1560
>   ? zlib_updatewindow+0x5a/0xd0
>   __alloc_pages_nodemask+0xd9/0x280
>   ipcomp_input+0x299/0x9a0 [xfrm_ipcomp]
>   xfrm_input+0x5e3/0x960
>   xfrm4_ipcomp_rcv+0x34/0x50
>   ip_local_deliver_finish+0x22d/0x250
>   ip_local_deliver+0x6d/0x110
>   ? ip_rcv_finish+0xac/0x480
>   ip_rcv+0x28e/0x3f9
>   ? packet_rcv+0x43/0x4c0
>   __netif_receive_skb_core+0xb7c/0xd10
>   ? inet_gro_receive+0x8e/0x2f0
>   netif_receive_skb_internal+0x4a/0x160
>   napi_gro_receive+0xee/0x110
>   tg3_rx+0x2a8/0x810 [tg3]
>   tg3_poll_work+0x3b3/0x830 [tg3]
>   tg3_poll_msix+0x3b/0x170 [tg3]
>   net_rx_action+0x1ff/0x470
>   ? __switch_to_asm+0x41/0x70
>   do_current_softirqs+0x223/0x3e0
>   ? irq_thread_check_affinity+0x20/0x20
>   __local_bh_enable+0x51/0x60
>   irq_forced_thread_fn+0x5e/0x80
>   ? irq_finalize_oneshot.part.45+0xf0/0xf0
>   irq_thread+0x13d/0x1a0
>   ? wake_threads_waitq+0x30/0x30
>   kthread+0x112/0x130
>   ? kthread_create_worker_on_cpu+0x70/0x70
>   ret_from_fork+0x35/0x40
> 
> The problem resides in the fact that get_cpu() called from ipcomp_input()
> disables preemption, and that triggers the scheduling while atomic BUG further
> down the callpath chain of get_page_from_freelist(), i.e.,
> 
>   ipcomp_input
>     ipcomp_decompress
>       <-- get_cpu()
>       alloc_page(GFP_ATOMIC)
>         alloc_pages(GFP_ATOMIC, 0)
>           alloc_pages_current
>             __alloc_pages_nodemask
>               get_page_from_freelist
>                 (try_this_zone:) rmqueue
>                   rmqueue_pcplist
>                     __rmqueue_pcplist
>                       rmqueue_bulk
>                         <-- spin_lock(&zone->lock) - BUG
> 
> Fix this by using {get,put}_cpu_light() in ipcomp_decompress().
> 
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>

Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>

Thanks!
-- Daniel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net/xfrm/xfrm_ipcomp: Use {get,put}_cpu_light
  2019-07-17  7:20 [PATCH] net/xfrm/xfrm_ipcomp: Use {get,put}_cpu_light Juri Lelli
  2019-07-17  7:22 ` Daniel Bristot de Oliveira
@ 2019-07-17 13:58 ` Pankaj Gupta
  2019-08-13 15:58 ` Sebastian Andrzej Siewior
  2 siblings, 0 replies; 4+ messages in thread
From: Pankaj Gupta @ 2019-07-17 13:58 UTC (permalink / raw)
  To: Juri Lelli; +Cc: rostedt, tglx, bigeasy, linux-rt-users, linux-kernel, williams


> 
> The following BUG has been reported while running ipsec tests.
> 
>  BUG: scheduling while atomic: irq/78-eno3-rx-/12023/0x00000002
>  Modules linked in: ipcomp xfrm_ipcomp ...
>  Preemption disabled at:
>  [<ffffffffc0b29730>] ipcomp_input+0xd0/0x9a0 [xfrm_ipcomp]
>  CPU: 1 PID: 12023 Comm: irq/78-eno3-rx- Kdump: loaded Not tainted [...] #1
>  Hardware name: [...]
>  Call Trace:
>   dump_stack+0x5c/0x80
>   ? ipcomp_input+0xd0/0x9a0 [xfrm_ipcomp]
>   __schedule_bug.cold.81+0x44/0x51
>   __schedule+0x5bf/0x6a0
>   schedule+0x39/0xd0
>   rt_spin_lock_slowlock_locked+0x10e/0x2b0
>   rt_spin_lock_slowlock+0x50/0x80
>   get_page_from_freelist+0x609/0x1560
>   ? zlib_updatewindow+0x5a/0xd0
>   __alloc_pages_nodemask+0xd9/0x280
>   ipcomp_input+0x299/0x9a0 [xfrm_ipcomp]
>   xfrm_input+0x5e3/0x960
>   xfrm4_ipcomp_rcv+0x34/0x50
>   ip_local_deliver_finish+0x22d/0x250
>   ip_local_deliver+0x6d/0x110
>   ? ip_rcv_finish+0xac/0x480
>   ip_rcv+0x28e/0x3f9
>   ? packet_rcv+0x43/0x4c0
>   __netif_receive_skb_core+0xb7c/0xd10
>   ? inet_gro_receive+0x8e/0x2f0
>   netif_receive_skb_internal+0x4a/0x160
>   napi_gro_receive+0xee/0x110
>   tg3_rx+0x2a8/0x810 [tg3]
>   tg3_poll_work+0x3b3/0x830 [tg3]
>   tg3_poll_msix+0x3b/0x170 [tg3]
>   net_rx_action+0x1ff/0x470
>   ? __switch_to_asm+0x41/0x70
>   do_current_softirqs+0x223/0x3e0
>   ? irq_thread_check_affinity+0x20/0x20
>   __local_bh_enable+0x51/0x60
>   irq_forced_thread_fn+0x5e/0x80
>   ? irq_finalize_oneshot.part.45+0xf0/0xf0
>   irq_thread+0x13d/0x1a0
>   ? wake_threads_waitq+0x30/0x30
>   kthread+0x112/0x130
>   ? kthread_create_worker_on_cpu+0x70/0x70
>   ret_from_fork+0x35/0x40
> 
> The problem resides in the fact that get_cpu() called from ipcomp_input()
> disables preemption, and that triggers the scheduling while atomic BUG
> further
> down the callpath chain of get_page_from_freelist(), i.e.,
> 
>   ipcomp_input
>     ipcomp_decompress
>       <-- get_cpu()
>       alloc_page(GFP_ATOMIC)
>         alloc_pages(GFP_ATOMIC, 0)
>           alloc_pages_current
>             __alloc_pages_nodemask
>               get_page_from_freelist
>                 (try_this_zone:) rmqueue
>                   rmqueue_pcplist
>                     __rmqueue_pcplist
>                       rmqueue_bulk
>                         <-- spin_lock(&zone->lock) - BUG
> 
> Fix this by using {get,put}_cpu_light() in ipcomp_decompress().
> 
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
> ---
> Hi,
> 
> This has been found on a 4.19.x-rt kernel, but 5.x-rt(s) are affected as
> well.
> 
> Best,
> 
> Juri
> ---
>  net/xfrm/xfrm_ipcomp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
> index a00ec715aa46..39d9e663384f 100644
> --- a/net/xfrm/xfrm_ipcomp.c
> +++ b/net/xfrm/xfrm_ipcomp.c
> @@ -45,7 +45,7 @@ static int ipcomp_decompress(struct xfrm_state *x, struct
> sk_buff *skb)
>  	const int plen = skb->len;
>  	int dlen = IPCOMP_SCRATCH_SIZE;
>  	const u8 *start = skb->data;
> -	const int cpu = get_cpu();
> +	const int cpu = get_cpu_light();
>  	u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
>  	struct crypto_comp *tfm = *per_cpu_ptr(ipcd->tfms, cpu);
>  	int err = crypto_comp_decompress(tfm, start, plen, scratch, &dlen);
> @@ -103,7 +103,7 @@ static int ipcomp_decompress(struct xfrm_state *x, struct
> sk_buff *skb)
>  	err = 0;
>  
>  out:
> -	put_cpu();
> +	put_cpu_light();
>  	return err;
>  }
>  
> --

Reviewed-by: Pankaj Gupta <pagupta@redhat.com>

> 2.17.2
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net/xfrm/xfrm_ipcomp: Use {get,put}_cpu_light
  2019-07-17  7:20 [PATCH] net/xfrm/xfrm_ipcomp: Use {get,put}_cpu_light Juri Lelli
  2019-07-17  7:22 ` Daniel Bristot de Oliveira
  2019-07-17 13:58 ` Pankaj Gupta
@ 2019-08-13 15:58 ` Sebastian Andrzej Siewior
  2 siblings, 0 replies; 4+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-08-13 15:58 UTC (permalink / raw)
  To: Juri Lelli; +Cc: rostedt, tglx, linux-rt-users, linux-kernel, williams

On 2019-07-17 09:20:19 [+0200], Juri Lelli wrote:
> The following BUG has been reported while running ipsec tests.
> Hi,
> 
> This has been found on a 4.19.x-rt kernel, but 5.x-rt(s) are affected as
> well.
> 
> Best,
> 
> Juri
> ---
>  net/xfrm/xfrm_ipcomp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
> index a00ec715aa46..39d9e663384f 100644
> --- a/net/xfrm/xfrm_ipcomp.c
> +++ b/net/xfrm/xfrm_ipcomp.c
> @@ -45,7 +45,7 @@ static int ipcomp_decompress(struct xfrm_state *x, struct sk_buff *skb)
>  	const int plen = skb->len;
>  	int dlen = IPCOMP_SCRATCH_SIZE;
>  	const u8 *start = skb->data;
> -	const int cpu = get_cpu();
> +	const int cpu = get_cpu_light();

By using get_cpu_light() you don't forbid another function to invoke
ipcomp_decompress() on the same CPU. That means that

>  	u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu);

scratch buffer here could be used by two tasks on the same CPU. You are
aware of that right?
According to your backtrace you get here from NAPI which means BH which
means it is enough to use smp_processor_id() in such a case.

ipcomp_compress() is using the very same buffer while invoking
local_bh_disable() before using the buffer to ensure nothing else is
using the buffer on this CPU. This will work in v5.2-RT because the new
softirq code uses a local_lock() as part of local_bh_disable(). This
does not work on v4.19-RT and earlier. 

For v4.19 and earlier I suggest to use a local_lock().
For v5.2 and later I suggest to replace get_cpu() with
smp_processor_id(). Ideally a with a lockdep annotation to ensure that
BH is disabled (which we don't have).

>  	struct crypto_comp *tfm = *per_cpu_ptr(ipcd->tfms, cpu);
>  	int err = crypto_comp_decompress(tfm, start, plen, scratch, &dlen);

Sebastian

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-17  7:20 [PATCH] net/xfrm/xfrm_ipcomp: Use {get,put}_cpu_light Juri Lelli
2019-07-17  7:22 ` Daniel Bristot de Oliveira
2019-07-17 13:58 ` Pankaj Gupta
2019-08-13 15:58 ` Sebastian Andrzej Siewior

Linux-rt-users archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rt-users/0 linux-rt-users/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rt-users linux-rt-users/ https://lore.kernel.org/linux-rt-users \
		linux-rt-users@vger.kernel.org linux-rt-users@archiver.kernel.org
	public-inbox-index linux-rt-users


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rt-users


AGPL code for this site: git clone https://public-inbox.org/ public-inbox