netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Bug URGENT Report with new kernel 5.5.10-5.6-rc6
       [not found]   ` <CALidq=UXHz+rjiG5JxAz-CJ1mKsFLVupsH3W+z58L2nSPKE-7w@mail.gmail.com>
@ 2020-03-18 23:38     ` Stefano Brivio
       [not found]       ` <CALidq=Xow0EkAP4LkqvQiDOmVDduEwLKa4c-A54or3GMj6+qVw@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Brivio @ 2020-03-18 23:38 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: ecree, Eric Dumazet, David Miller, pablo, Florian Westphal,
	netfilter-devel, netdev, Marco Oliverio

[Adding netfilter-devel, netdev, Marco]

Martin,

On Thu, 19 Mar 2020 00:53:53 +0200
Martin Zaharinov <micron10@gmail.com> wrote:

> Back check with last kernel 5.4.26 machine work stable without crash
> Changes is comme from 5.5.x >  kernel release i see in mailin Florian
> add nf_hook_slow_list and other changes .
> But need to investigate this crash...

I just had a very quick look, I might be wrong, but can you try without:

commit 0b9173f4688dfa7c5d723426be1d979c24ce3d51
Author: Marco Oliverio <marco.oliverio@tanaza.com>
Date:   Mon Dec 2 19:54:30 2019 +0100

    netfilter: nf_queue: enqueue skbs with NULL dst

? To me it looks like we're hitting nf_queue_entry_get_br_nf_refs()
with an skb that's not supposed to end up there, and this commit might
reveal some issue in that sense.

-- 
Stefano

> 
> Martin
> 
> На чт, 19.03.2020 г. в 0:29 Martin Zaharinov <micron10@gmail.com> написа:
> 
> >
> >
> > ---------- Forwarded message ---------
> > От: Martin Zaharinov <micron10@gmail.com>
> > Date: ср, 18.03.2020 г. в 23:31
> > Subject: Bug URGENT Report with new kernel 5.5.10-5.6-rc6
> > To: <sbrivio@redhat.com>, <pablo@netfilter.org>, Florian Westphal <  
> > fw@strlen.de>  
> >
> >
> > Hi all
> > Sorry i write hear not in kernel bug list i not found how to report bug
> > them.
> > Server have 300 pppoe customer connect with 400mbit/s traffic
> > When machine run and load all rules need 20-30 min and machine crash with
> > this bug for my this is old bug but in new kernel manifested immediately.
> > Please help .
> > Please check this BUG :
> >
> > Mar 17 22:26:16  [ 2344.252448][    C5] general protection fault, probably
> > for non-canonical address 0x9a830ebedfe5c683: 0000 [#1] SMP PTI
> >
> > Mar 17 22:26:16  [ 2344.253382][    C5] CPU: 5 PID: 12224 Comm: xmrig
> > Tainted: G           O      5.6.0 #1
> >
> > Mar 17 22:26:16  [ 2344.254060][    C5] Hardware name: Supermicro Super
> > Server/X11SPi-TF, BIOS 3.2 10/17/2019
> >
> > Mar 17 22:26:16  [ 2344.254773][    C5] RIP:
> > 0010:nf_queue_entry_get_refs+0x14/0xe0
> >
> > Mar 17 22:26:16  [ 2344.255279][    C5] Code: 5b c3 be 03 00 00 00 4c 89
> > c7 e8 77 b8 be ff e9 7c ff ff ff 66 90 53 48 8b 47 28 48 89 fb 48 85 c0 74
> > 0a 48 8b 80 80 04 00 00 <65> ff 00 48 8b 43 30 48 85 c0 74 0a 48 8b 80 80
> > 04 00 00 65 ff 00
> >
> > Mar 17 22:26:16  [ 2344.256950][    C5] RSP: 0000:ffffa7e44033cc50 EFLAGS:
> > 00010286
> >
> > Mar 17 22:26:16  [ 2344.257456][    C5] RAX: 9a837d63c011c683 RBX:
> > ffff915af771cf80 RCX: ffff915aecf23780
> >
> > Mar 17 22:26:16  [ 2344.258127][    C5] RDX: ffffffff9c82bad0 RSI:
> > 0000000000000000 RDI: ffff915af771cf80
> >
> > Mar 17 22:26:16  [ 2344.258798][    C5] RBP: ffffa7e44033cca8 R08:
> > ffffffff9d6aaac0 R09: ffff915af7ece000
> >
> > Mar 17 22:26:16  [ 2344.259469][    C5] R10: 0000000000000002 R11:
> > 0000000000000004 R12: ffff915af771cf80
> >
> > Mar 17 22:26:16  [ 2344.260140][    C5] R13: ffff915aeccee6f0 R14:
> > 0000000000000006 R15: ffffffffc03da3b0
> >
> > Mar 17 22:26:16  [ 2344.260811][    C5] FS:  00007fd1237fe700(0000)
> > GS:ffff915b1fd40000(0000) knlGS:0000000000000000
> >
> > Mar 17 22:26:16  [ 2344.261564][    C5] CS:  0010 DS: 0000 ES: 0000 CR0:
> > 0000000080050033
> >
> > Mar 17 22:26:16  [ 2344.276319][    C5] CR2: 00007fec73ad5cd0 CR3:
> > 00000007ff81e005 CR4: 00000000001606e0
> >
> > Mar 17 22:26:16  [ 2344.306107][    C5] DR0: 0000000000000000 DR1:
> > 0000000000000000 DR2: 0000000000000000
> >
> > Mar 17 22:26:16  [ 2344.336579][    C5] DR3: 0000000000000000 DR6:
> > 00000000fffe0ff0 DR7: 0000000000000400
> >
> > Mar 17 22:26:16  [ 2344.367000][    C5] Call Trace:
> >
> > Mar 17 22:26:16  [ 2344.381799][    C5]  <IRQ>
> >
> > Mar 17 22:26:16  [ 2344.396244][    C5]  nf_queue+0x14f/0x2d0
> >
> > Mar 17 22:26:16  [ 2344.410633][    C5]  nf_hook_slow+0x84/0xe0
> >
> > Mar 17 22:26:16  [ 2344.424672][    C5]  ip_output+0xcd/0x1b0
> >
> > Mar 17 22:26:16  [ 2344.438376][    C5]  ? ip_finish_output_gso+0x160/0x160
> >
> > Mar 17 22:26:16  [ 2344.452012][    C5]  __ip_queue_xmit+0x17a/0x370
> >
> > Mar 17 22:26:16  [ 2344.465466][    C5]  __tcp_transmit_skb+0x57a/0xce0
> >
> > Mar 17 22:26:16  [ 2344.478628][    C5]  ? tcp_v4_rcv+0xd5d/0xe30
> >
> > Mar 17 22:26:16  [ 2344.491600][    C5]  __tcp_retransmit_skb+0x177/0x870
> >
> > Mar 17 22:26:16  [ 2344.504406][
> >   C5]  tcp_xmit_retransmit_queue.part.0+0x194/0x390
> >
> > Mar 17 22:26:16  [ 2344.517311][    C5]  tcp_pace_kick+0x161/0x180
> >
> > Mar 17 22:26:16  [ 2344.529847][    C5]  ? tcp_tasklet_func+0x1f0/0x1f0
> >
> > Mar 17 22:26:16  [ 2344.542148][    C5]  __hrtimer_run_queues+0x10b/0x1b0
> >
> > Mar 17 22:26:16  [ 2344.554178][    C5]  hrtimer_run_softirq+0x7f/0x170
> >
> > Mar 17 22:26:16  [ 2344.565940][    C5]  __do_softirq+0xc8/0x206
> >
> > Mar 17 22:26:16  [ 2344.577389][    C5]  irq_exit+0xda/0xf0
> >
> > Mar 17 22:26:16  [ 2344.588474][    C5]  smp_apic_timer_interrupt+0x55/0x80
> >
> > Mar 17 22:26:16  [ 2344.599449][    C5]  apic_timer_interrupt+0xf/0x20
> >
> > Mar 17 22:26:16  [ 2344.610107][    C5]  </IRQ>
> >
> > Mar 17 22:26:16  [ 2344.620341][    C5] RIP: 0033:0x7fd128ed01c3
> >
> > Mar 17 22:26:16  [ 2344.630378][    C5] Code: f2 25 f8 3f 00 00 f3 44 0f
> > e6 24 06 66 41 0f 5c c4 4d 0f af c4 41 8d 82 4d dd 34 ec 25 f8 3f 00 00 4c
> > 89 1c 06 66 41 0f 58 d0 <66> 41 0f 59 f0 49 81 c0 ff 42 83 88 49 f7 c0 00
> > 00 80 7f 74 d6 41
> >
> > Mar 17 22:26:16  [ 2344.660620][    C5] RSP: 002b:00007fd1237fdd78 EFLAGS:
> > 00000206 ORIG_RAX: ffffffffffffff13
> >
> > Mar 17 22:26:16  [ 2344.680376][    C5] RAX: 0000000000000fc0 RBX:
> > 00000000000000fe RCX: 000000003b741dc9
> >
> > Mar 17 22:26:16  [ 2344.700118][    C5] RDX: 62b3a34bbd2445be RSI:
> > 00007fd128200000 RDI: 00007fd09abec0c0
> >
> > Mar 17 22:26:16  [ 2344.720222][    C5] RBP: 1791b95bb8165a3d R08:
> > 0086c4305d0ac11c R09: cb4d89df4f950a70
> >
> > Mar 17 22:26:16  [ 2344.741734][    C5] R10: 10ce58330b1f3279 R11:
> > 0e9fac5dfa9ec7b8 R12: f4e400dfd4176ea4
> >
> > Mar 17 22:26:16  [ 2344.764623][    C5] R13: 454baf3f4a564cae R14:
> > 47331223df7be353 R15: b8ab1194f474425a
> >
> > Mar 17 22:26:16  [ 2344.788559][    C5] Modules linked in: udp_diag
> > raw_diag unix_diag af_packet_diag sch_hfsc iptable_filter iptable_mangle
> > xt_addrtype xt_nat xt_MASQUERADE iptable_nat ip_tables bpfilter  sch_fq_pie
> > sch_pie netconsole coretemp tg3 e1000e e1000 igb i2c_algo_bit ixgbe mdio
> > libphy i40e nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp
> > nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6
> > nf_defrag_ipv4 pppoe pptp gre pppox ppp_mppe ppp_generic slhc libarc4 tun
> > hpsa scsi_transport_sas ipmi_si ipmi_devintf ipmi_msghandler  sch_fq_codel
> >
> > Mar 17 22:26:16  [ 2344.898031][    C5] ---[ end trace d15fca245f16372d
> > ]---
> >
> > Mar 17 22:26:16  [ 2344.912955][    C5] RIP:
> > 0010:nf_queue_entry_get_refs+0x14/0xe0
> >
> > Mar 17 22:26:17  [ 2344.928110][    C5] Code: 5b c3 be 03 00 00 00 4c 89
> > c7 e8 77 b8 be ff e9 7c ff ff ff 66 90 53 48 8b 47 28 48 89 fb 48 85 c0 74
> > 0a 48 8b 80 80 04 00 00 <65> ff 00 48 8b 43 30 48 85 c0 74 0a 48 8b 80 80
> > 04 00 00 65 ff 00
> >
> > Mar 17 22:26:17  [ 2344.974788][    C5] RSP: 0000:ffffa7e44033cc50 EFLAGS:
> > 00010286
> >
> > Mar 17 22:26:17  [ 2344.990738][    C5] RAX: 9a837d63c011c683 RBX:
> > ffff915af771cf80 RCX: ffff915aecf23780
> >
> > Mar 17 22:26:17  [ 2345.022183][    C5] RDX: ffffffff9c82bad0 RSI:
> > 0000000000000000 RDI: ffff915af771cf80
> >
> > Mar 17 22:26:17  [ 2345.053943][    C5] RBP: ffffa7e44033cca8 R08:
> > ffffffff9d6aaac0 R09: ffff915af7ece000
> >
> > Mar 17 22:26:17  [ 2345.085639][    C5] R10: 0000000000000002 R11:
> > 0000000000000004 R12: ffff915af771cf80
> >
> > Mar 17 22:26:17  [ 2345.117285][    C5] R13: ffff915aeccee6f0 R14:
> > 0000000000000006 R15: ffffffffc03da3b0
> >
> > Mar 17 22:26:17  [ 2345.148948][    C5] FS:  00007fd1237fe700(0000)
> > GS:ffff915b1fd40000(0000) knlGS:0000000000000000
> >
> > Mar 17 22:26:17  [ 2345.180715][    C5] CS:  0010 DS: 0000 ES: 0000 CR0:
> > 0000000080050033
> >
> > Mar 17 22:26:17  [ 2345.196835][    C5] CR2: 00007fec73ad5cd0 CR3:
> > 00000007ff81e005 CR4: 00000000001606e0
> >
> > Mar 17 22:26:17  [ 2345.228199][    C5] DR0: 0000000000000000 DR1:
> > 0000000000000000 DR2: 0000000000000000
> >
> > Mar 17 22:26:17  [ 2345.259580][    C5] DR3: 0000000000000000 DR6:
> > 00000000fffe0ff0 DR7: 0000000000000400
> >
> > Mar 17 22:26:17  [ 2345.290736][    C5] Kernel panic - not syncing: Fatal
> > exception in interrupt
> >
> > Mar 17 22:26:17  [ 2345.359056][    C5] Kernel Offset: 0x1b000000 from
> > 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> >
> > Mar 17 22:26:17  [ 2345.389933][    C5] Rebooting in 10 seconds..
> >
> > Mar 17 22:26:27  [ 2355.405624][    C5] ACPI MEMORY or I/O RESET_REG.
> >
> >
> >
> > best Regards,
> >
> > Martin
> >  


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bug URGENT Report with new kernel 5.5.10-5.6-rc6
       [not found]       ` <CALidq=Xow0EkAP4LkqvQiDOmVDduEwLKa4c-A54or3GMj6+qVw@mail.gmail.com>
@ 2020-03-19 10:34         ` Florian Westphal
  2020-03-19 10:47           ` Pablo Neira Ayuso
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2020-03-19 10:34 UTC (permalink / raw)
  To: Martin Zaharinov; +Cc: netfilter-devel, netdev

Martin Zaharinov <micron10@gmail.com> wrote:

[ trimming CC ]

Please revert

commit 28f8bfd1ac948403ebd5c8070ae1e25421560059
netfilter: Support iif matches in POSTROUTING


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bug URGENT Report with new kernel 5.5.10-5.6-rc6
  2020-03-19 10:34         ` Florian Westphal
@ 2020-03-19 10:47           ` Pablo Neira Ayuso
  2020-03-19 10:52             ` Florian Westphal
  0 siblings, 1 reply; 9+ messages in thread
From: Pablo Neira Ayuso @ 2020-03-19 10:47 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Martin Zaharinov, netfilter-devel, netdev

On Thu, Mar 19, 2020 at 11:34:38AM +0100, Florian Westphal wrote:
> Martin Zaharinov <micron10@gmail.com> wrote:
> 
> [ trimming CC ]
> 
> Please revert
> 
> commit 28f8bfd1ac948403ebd5c8070ae1e25421560059
> netfilter: Support iif matches in POSTROUTING

Please, specify a short description to append to the revert.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bug URGENT Report with new kernel 5.5.10-5.6-rc6
  2020-03-19 10:47           ` Pablo Neira Ayuso
@ 2020-03-19 10:52             ` Florian Westphal
  2020-03-19 16:40               ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2020-03-19 10:52 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Florian Westphal, Martin Zaharinov, netfilter-devel, netdev

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Thu, Mar 19, 2020 at 11:34:38AM +0100, Florian Westphal wrote:
> > Martin Zaharinov <micron10@gmail.com> wrote:
> > 
> > [ trimming CC ]
> > 
> > Please revert
> > 
> > commit 28f8bfd1ac948403ebd5c8070ae1e25421560059
> > netfilter: Support iif matches in POSTROUTING
> 
> Please, specify a short description to append to the revert.

TCP makes use of the rb_node in sk_buff for its retransmit queue,
amongst others.  skb->dev aliases to this storage, i.e., passing
skb->dev as the input interface in postrouting may point to another
sk_buff instead.
This will cause crashes and data corruption with nf_queue, as we will
attempt to increment a random pcpu variable when calling dev_hold().

Also, the memory address may also be free'd, which gives UAF splat.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bug URGENT Report with new kernel 5.5.10-5.6-rc6
  2020-03-19 10:52             ` Florian Westphal
@ 2020-03-19 16:40               ` Eric Dumazet
  2020-03-19 16:45                 ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2020-03-19 16:40 UTC (permalink / raw)
  To: Florian Westphal, Pablo Neira Ayuso
  Cc: Martin Zaharinov, netfilter-devel, netdev



On 3/19/20 3:52 AM, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> On Thu, Mar 19, 2020 at 11:34:38AM +0100, Florian Westphal wrote:
>>> Martin Zaharinov <micron10@gmail.com> wrote:
>>>
>>> [ trimming CC ]
>>>
>>> Please revert
>>>
>>> commit 28f8bfd1ac948403ebd5c8070ae1e25421560059
>>> netfilter: Support iif matches in POSTROUTING
>>
>> Please, specify a short description to append to the revert.
> 
> TCP makes use of the rb_node in sk_buff for its retransmit queue,
> amongst others.


Only for master skbs kept in TCP internal queues (rtx rb tree)

However the packets leaving TCP stack are clones.

  skb->dev aliases to this storage, i.e., passing
> skb->dev as the input interface in postrouting may point to another
> sk_buff instead.
> This will cause crashes and data corruption with nf_queue, as we will
> attempt to increment a random pcpu variable when calling dev_hold().
> 
> Also, the memory address may also be free'd, which gives UAF splat.
> 

This seems to suggest clones skb->dev should be cleared before leaving TCP stack,
if some layer is confused because skb->dev has not yet been set by IP layer ?

Untested patch :

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 306e25d743e8de1bfe23d6e3b3a9fb0f23664912..c40fb3880307aa3156d01a8b49f1296657346cfd 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1228,6 +1228,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
        /* Cleanup our debris for IP stacks */
        memset(skb->cb, 0, max(sizeof(struct inet_skb_parm),
                               sizeof(struct inet6_skb_parm)));
+       skb->dev = NULL;
 
        tcp_add_tx_delay(skb, tp);
 


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Bug URGENT Report with new kernel 5.5.10-5.6-rc6
  2020-03-19 16:40               ` Eric Dumazet
@ 2020-03-19 16:45                 ` Eric Dumazet
       [not found]                   ` <CALidq=VJuhEPO-FWOuUdSG+-VO+h7VHfmtQiAxikxH+vMB+vdQ@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2020-03-19 16:45 UTC (permalink / raw)
  To: Florian Westphal, Pablo Neira Ayuso
  Cc: Martin Zaharinov, netfilter-devel, netdev



On 3/19/20 9:40 AM, Eric Dumazet wrote:
> 
> 
> On 3/19/20 3:52 AM, Florian Westphal wrote:
>> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>>> On Thu, Mar 19, 2020 at 11:34:38AM +0100, Florian Westphal wrote:
>>>> Martin Zaharinov <micron10@gmail.com> wrote:
>>>>
>>>> [ trimming CC ]
>>>>
>>>> Please revert
>>>>
>>>> commit 28f8bfd1ac948403ebd5c8070ae1e25421560059
>>>> netfilter: Support iif matches in POSTROUTING
>>>
>>> Please, specify a short description to append to the revert.
>>
>> TCP makes use of the rb_node in sk_buff for its retransmit queue,
>> amongst others.
> 
> 
> Only for master skbs kept in TCP internal queues (rtx rb tree)
> 
> However the packets leaving TCP stack are clones.
> 
>   skb->dev aliases to this storage, i.e., passing
>> skb->dev as the input interface in postrouting may point to another
>> sk_buff instead.
>> This will cause crashes and data corruption with nf_queue, as we will
>> attempt to increment a random pcpu variable when calling dev_hold().
>>
>> Also, the memory address may also be free'd, which gives UAF splat.
>>
> 
> This seems to suggest clones skb->dev should be cleared before leaving TCP stack,
> if some layer is confused because skb->dev has not yet been set by IP layer ?
> 
> Untested patch :
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 306e25d743e8de1bfe23d6e3b3a9fb0f23664912..c40fb3880307aa3156d01a8b49f1296657346cfd 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1228,6 +1228,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
>         /* Cleanup our debris for IP stacks */
>         memset(skb->cb, 0, max(sizeof(struct inet_skb_parm),
>                                sizeof(struct inet6_skb_parm)));
> +       skb->dev = NULL;
>  
>         tcp_add_tx_delay(skb, tp);
>  
> 

Or clear the field only after cloning :

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 306e25d743e8de1bfe23d6e3b3a9fb0f23664912..13dd0d8003baee3febcfb85df84421f8f91132ef 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1109,6 +1109,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
 
                if (unlikely(!skb))
                        return -ENOBUFS;
+               skb->dev = NULL;
        }
 
        inet = inet_sk(sk);


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Bug URGENT Report with new kernel 5.5.10-5.6-rc6
       [not found]                       ` <CALidq=VYSt3WbtapwL-n8cG71=ysYDJTo3L---xj4U1rEC63KQ@mail.gmail.com>
@ 2020-03-24 13:18                         ` Florian Westphal
       [not found]                           ` <CALidq=WBGwMWZeK95WpunO=+yiCo=iFFijXmjQdOMKxj7-XC1A@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2020-03-24 13:18 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: Eric Dumazet, Florian Westphal, Pablo Neira Ayuso,
	netfilter-devel, netdev

Martin Zaharinov <micron10@gmail.com> wrote:
> Hi All
> More information :
> 
> After this bug one of cpu goin lock and load on 100%
> After reboot machine start and work fine but after go to load time night
> when user is online machine get in dmesg same crash log and go to lock
> other cpu
> Hear is bug report :
> 
> 
> [21542.828151] ------------[ cut here ]------------
> [21542.828979] refcount_t: underflow; use-after-free.
> [21542.829840] WARNING: CPU: 52 PID: 0 at lib/refcount.c:28
> refcount_warn_saturate+0xd8/0xe0
> [21542.831211] Modules linked in: udp_diag raw_diag unix_diag
> af_packet_diag sch_hfsc iptable_filter xt_IMQ iptable_mangle xt_addrtype
> xt_nat xt_MASQUERADE iptable_nat ip_tables bpfilter  sch_fq_pie sch_pie
> netconsole imq r8169 realtek tg3 igb i2c_algo_bit ixgbe mdio libphy
> nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp
> nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 pppoe pptp gre pppox ppp_mppe ppp_generic
> slhc libarc4 tun megaraid_sas ipmi_si ipmi_devintf ipmi_msghandler
> sch_fq_codel

Does this patch help?

diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -14,7 +14,10 @@ struct nf_queue_entry {
 	struct sk_buff		*skb;
 	unsigned int		id;
 	unsigned int		hook_index;	/* index in hook_entries->hook[] */
-
+#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
+	struct net_device	*physin;
+	struct net_device	*physout;
+#endif
 	struct nf_hook_state	state;
 	u16			size; /* sizeof(entry) + saved route keys */
 
@@ -35,7 +38,7 @@ void nf_unregister_queue_handler(struct net *net);
 void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict);
 
 void nf_queue_entry_get_refs(struct nf_queue_entry *entry);
-void nf_queue_entry_release_refs(struct nf_queue_entry *entry);
+void nf_queue_entry_free(struct nf_queue_entry *entry);
 
 static inline void init_hashrandom(u32 *jhash_initval)
 {
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -46,25 +46,7 @@ void nf_unregister_queue_handler(struct net *net)
 }
 EXPORT_SYMBOL(nf_unregister_queue_handler);
 
-static void nf_queue_entry_release_br_nf_refs(struct sk_buff *skb)
-{
-#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
-
-	if (nf_bridge) {
-		struct net_device *physdev;
-
-		physdev = nf_bridge_get_physindev(skb);
-		if (physdev)
-			dev_put(physdev);
-		physdev = nf_bridge_get_physoutdev(skb);
-		if (physdev)
-			dev_put(physdev);
-	}
-#endif
-}
-
-void nf_queue_entry_release_refs(struct nf_queue_entry *entry)
+static void nf_queue_entry_release_refs(struct nf_queue_entry *entry)
 {
 	struct nf_hook_state *state = &entry->state;
 
@@ -76,24 +58,34 @@ void nf_queue_entry_release_refs(struct nf_queue_entry *entry)
 	if (state->sk)
 		sock_put(state->sk);
 
-	nf_queue_entry_release_br_nf_refs(entry->skb);
+#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
+	if (entry->physin)
+		dev_put(entry->physin);
+	if (entry->physout)
+		dev_put(entry->physout);
+#endif
 }
-EXPORT_SYMBOL_GPL(nf_queue_entry_release_refs);
 
-static void nf_queue_entry_get_br_nf_refs(struct sk_buff *skb)
+void nf_queue_entry_free(struct nf_queue_entry *entry)
+{
+	nf_queue_entry_release_refs(entry);
+	kfree(entry);
+}
+EXPORT_SYMBOL_GPL(nf_queue_entry_free);
+
+static void __nf_queue_entry_init_physdevs(struct nf_queue_entry *entry)
 {
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
+	const struct sk_buff *skb = entry->skb;
+	struct nf_bridge_info *nf_bridge;
 
+	nf_bridge = nf_bridge_info_get(skb);
 	if (nf_bridge) {
-		struct net_device *physdev;
-
-		physdev = nf_bridge_get_physindev(skb);
-		if (physdev)
-			dev_hold(physdev);
-		physdev = nf_bridge_get_physoutdev(skb);
-		if (physdev)
-			dev_hold(physdev);
+		entry->physin = nf_bridge_get_physindev(skb);
+		entry->physout = nf_bridge_get_physoutdev(skb);
+	} else {
+		entry->physin = NULL;
+		entry->physout = NULL;
 	}
 #endif
 }
@@ -110,7 +102,12 @@ void nf_queue_entry_get_refs(struct nf_queue_entry *entry)
 	if (state->sk)
 		sock_hold(state->sk);
 
-	nf_queue_entry_get_br_nf_refs(entry->skb);
+#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
+	if (entry->physin)
+		dev_hold(entry->physin);
+	if (entry->physout)
+		dev_hold(entry->physout);
+#endif
 }
 EXPORT_SYMBOL_GPL(nf_queue_entry_get_refs);
 
@@ -201,6 +198,8 @@ static int __nf_queue(struct sk_buff *skb, const struct nf_hook_state *state,
 		.size	= sizeof(*entry) + route_key_size,
 	};
 
+	__nf_queue_entry_init_physdevs(entry);
+
 	nf_queue_entry_get_refs(entry);
 
 	switch (entry->state.pf) {
@@ -304,12 +303,10 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)
 
 	hooks = nf_hook_entries_head(net, pf, entry->state.hook);
 
-	nf_queue_entry_release_refs(entry);
-
 	i = entry->hook_index;
 	if (WARN_ON_ONCE(!hooks || i >= hooks->num_hook_entries)) {
 		kfree_skb(skb);
-		kfree(entry);
+		nf_queue_entry_free(entry);
 		return;
 	}
 
@@ -348,6 +345,6 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)
 		kfree_skb(skb);
 	}
 
-	kfree(entry);
+	nf_queue_entry_free(entry);
 }
 EXPORT_SYMBOL(nf_reinject);
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 76535fd9278c..3243a31f6e82 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -737,12 +737,6 @@ static void nf_bridge_adjust_segmented_data(struct sk_buff *skb)
 #define nf_bridge_adjust_segmented_data(s) do {} while (0)
 #endif
 
-static void free_entry(struct nf_queue_entry *entry)
-{
-	nf_queue_entry_release_refs(entry);
-	kfree(entry);
-}
-
 static int
 __nfqnl_enqueue_packet_gso(struct net *net, struct nfqnl_instance *queue,
 			   struct sk_buff *skb, struct nf_queue_entry *entry)
@@ -768,7 +762,7 @@ __nfqnl_enqueue_packet_gso(struct net *net, struct nfqnl_instance *queue,
 		entry_seg->skb = skb;
 		ret = __nfqnl_enqueue_packet(net, queue, entry_seg);
 		if (ret)
-			free_entry(entry_seg);
+			nf_queue_entry_free(entry_seg);
 	}
 	return ret;
 }
@@ -827,7 +821,7 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
 
 	if (queued) {
 		if (err) /* some segments are already queued */
-			free_entry(entry);
+			nf_queue_entry_free(entry);
 		kfree_skb(skb);
 		return 0;
 	}

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Bug URGENT Report with new kernel 5.5.10-5.6-rc6
       [not found]                             ` <CALidq=X1fVQgr1CFNqswNK=me42aYtrqp8cmbFO63ekimn4O-g@mail.gmail.com>
@ 2020-03-25 15:22                               ` Florian Westphal
  2020-03-25 15:38                               ` Florian Westphal
  1 sibling, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2020-03-25 15:22 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: Florian Westphal, Eric Dumazet, Pablo Neira Ayuso,
	netfilter-devel, netdev

Martin Zaharinov <micron10@gmail.com> wrote:
> Hi Florian
> 
> after run machine for 7-8 hour in dmesg get same debug :

Do you ahve a reproducer that doesn't need out of tree module?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bug URGENT Report with new kernel 5.5.10-5.6-rc6
       [not found]                             ` <CALidq=X1fVQgr1CFNqswNK=me42aYtrqp8cmbFO63ekimn4O-g@mail.gmail.com>
  2020-03-25 15:22                               ` Florian Westphal
@ 2020-03-25 15:38                               ` Florian Westphal
  1 sibling, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2020-03-25 15:38 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: Florian Westphal, Eric Dumazet, Pablo Neira Ayuso,
	netfilter-devel, netdev

Martin Zaharinov <micron10@gmail.com> wrote:
> Hi Florian
> 
> after run machine for 7-8 hour in dmesg get same debug :

Mhhh, are you sure you applied the patch?

> [28514.488813] Call Trace:
> [28514.517959]  <IRQ>
> [28514.546187]  nf_queue_entry_release_refs+0x77/0x90
> [28514.574371]  nf_reinject+0x65/0x170

nf_reinject() doesn't call nf_queue_entry_release_refs() anymore
after the patch I made.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-03-25 15:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CALidq=XsQy66n-pTMOMN=B7nEsk7BpRZnUHery5RJyjnMsiXZQ@mail.gmail.com>
     [not found] ` <CALidq=VVpixeJFJFkUSeDqTW=OX0+dhA04ypE=y949B+Aqaq0w@mail.gmail.com>
     [not found]   ` <CALidq=UXHz+rjiG5JxAz-CJ1mKsFLVupsH3W+z58L2nSPKE-7w@mail.gmail.com>
2020-03-18 23:38     ` Bug URGENT Report with new kernel 5.5.10-5.6-rc6 Stefano Brivio
     [not found]       ` <CALidq=Xow0EkAP4LkqvQiDOmVDduEwLKa4c-A54or3GMj6+qVw@mail.gmail.com>
2020-03-19 10:34         ` Florian Westphal
2020-03-19 10:47           ` Pablo Neira Ayuso
2020-03-19 10:52             ` Florian Westphal
2020-03-19 16:40               ` Eric Dumazet
2020-03-19 16:45                 ` Eric Dumazet
     [not found]                   ` <CALidq=VJuhEPO-FWOuUdSG+-VO+h7VHfmtQiAxikxH+vMB+vdQ@mail.gmail.com>
     [not found]                     ` <CALidq=Wq3FaGPbbjDvcjvw3V=yPWNMPDeFFy-bDL6fffdjb2rw@mail.gmail.com>
     [not found]                       ` <CALidq=VYSt3WbtapwL-n8cG71=ysYDJTo3L---xj4U1rEC63KQ@mail.gmail.com>
2020-03-24 13:18                         ` Florian Westphal
     [not found]                           ` <CALidq=WBGwMWZeK95WpunO=+yiCo=iFFijXmjQdOMKxj7-XC1A@mail.gmail.com>
     [not found]                             ` <CALidq=X1fVQgr1CFNqswNK=me42aYtrqp8cmbFO63ekimn4O-g@mail.gmail.com>
2020-03-25 15:22                               ` Florian Westphal
2020-03-25 15:38                               ` Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).