All of lore.kernel.org
 help / color / mirror / Atom feed
* Silently dropped UDP packets on kernel 4.14
@ 2018-05-01 18:50 Kristian Evensen
  2018-05-01 18:58 ` Kristian Evensen
  0 siblings, 1 reply; 9+ messages in thread
From: Kristian Evensen @ 2018-05-01 18:50 UTC (permalink / raw)
  To: Netfilter Development Mailing list, Network Development

Hello,

I am currently debugging an issue where it looks like UDP packets are
silently dropped. My setup is a client with a tool that sends two UDP
packets (DNS requests) "simultaneously" using the same socket, and
then a router running latest OpenWRT (with kernel 4.14.37). What I am
seeing is that the first request goes through, while the second seems
to be lost somewhere. I can see the second packet in my firewall logs
on the router (all the way past the POSTROUTING chain in the
nat-table), but the packet is not sent over the wire (checked with
tcpdump).

In order to try to figure out what was going on, I added some
LOG-rules to the firewall. I Inserted the rules in the PREROUTING
chains in the raw, mangle and nat tables, in filter FORWARD and nat
and mangle POSTROUTING. Here are the logs for one set of requests
(with some notes):

First DNS request (verified by packet length):
[723269.968256] raw-pre: IN=br-lan OUT=
MAC=78:a3:51:34:e4:86:2e:8e:de:be:22:5a:08:00 SRC=10.0.0.3 DST=1.1.1.1
LEN=56 TOS=0x00 PREC=0x00 TTL=64 ID=8147 DF PROTO=UDP SPT=42397 DPT=53
LEN=36
[723269.984892] mangle-pre: IN=br-lan OUT=
MAC=78:a3:51:34:e4:86:2e:8e:de:be:22:5a:08:00 SRC=10.0.0.3 DST=1.1.1.1
LEN=56 TOS=0x00 PREC=0x00 TTL=64 ID=8147 DF PROTO=UDP SPT=42397 DPT=53
LEN=36
[723270.001769] mangle-pre #2: IN=br-lan OUT=
MAC=78:a3:51:34:e4:86:2e:8e:de:be:22:5a:08:00 SRC=10.0.0.3 DST=1.1.1.1
LEN=56 TOS=0x00 PREC=0x00 TTL=64 ID=8147 DF PROTO=UDP SPT=42397 DPT=53
LEN=36 -> this LOG-rule only matches when ctstate is NEW.

Second DNS request (verified by packet length):
[723270.019033] raw-pre: IN=br-lan OUT=
MAC=78:a3:51:34:e4:86:2e:8e:de:be:22:5a:08:00 SRC=10.0.0.3 DST=1.1.1.1
LEN=58 TOS=0x00 PREC=0x00 TTL=64 ID=8148 DF PROTO=UDP SPT=42397 DPT=53
LEN=38
[723270.035734] mangle-pre: IN=br-lan OUT=
MAC=78:a3:51:34:e4:86:2e:8e:de:be:22:5a:08:00 SRC=10.0.0.3 DST=1.1.1.1
LEN=58 TOS=0x00 PREC=0x00 TTL=64 ID=8148 DF PROTO=UDP SPT=42397 DPT=53
LEN=38
-> No mangle-pre #2, so the conntrack entry is found

Processing first request:
[723270.036213] nat-pre: IN=br-lan OUT=
MAC=78:a3:51:34:e4:86:2e:8e:de:be:22:5a:08:00 SRC=10.0.0.3 DST=1.1.1.1
LEN=56 TOS=0x00 PREC=0x00 TTL=64 ID=8147 DF PROTO=UDP SPT=42397 DPT=53
LEN=36 MARK=0x4000200 -> I use policy routing, therefore mark. Routing
works correctly.
[723270.036366] filter-fwd: IN=br-lan OUT=eth0.2
MAC=78:a3:51:34:e4:86:2e:8e:de:be:22:5a:08:00 SRC=10.0.0.3 DST=1.1.1.1
LEN=56 TOS=0x00 PREC=0x00 TTL=63 ID=8147 DF PROTO=UDP SPT=42397 DPT=53
LEN=36 MARK=0x4000200
[723270.036419] mangle-post: IN= OUT=eth0.2 SRC=10.0.0.3 DST=1.1.1.1
LEN=56 TOS=0x00 PREC=0x00 TTL=63 ID=8147 DF PROTO=UDP SPT=42397 DPT=53
LEN=36 MARK=0x4000200
[723270.036457] nat-post: IN= OUT=eth0.2 SRC=10.0.0.3 DST=1.1.1.1
LEN=56 TOS=0x00 PREC=0x00 TTL=63 ID=8147 DF PROTO=UDP SPT=42397 DPT=53
LEN=36 MARK=0x4000200

Processing second request:
[723270.117394] mangle-pre #2: IN=br-lan OUT=
MAC=78:a3:51:34:e4:86:2e:8e:de:be:22:5a:08:00 SRC=10.0.0.3 DST=1.1.1.1
LEN=58 TOS=0x00 PREC=0x00 TTL=64 ID=8148 DF PROTO=UDP SPT=42397 DPT=53
LEN=38
[723270.136644] nat-pre: IN=br-lan OUT=
MAC=78:a3:51:34:e4:86:2e:8e:de:be:22:5a:08:00 SRC=10.0.0.3 DST=1.1.1.1
LEN=58 TOS=0x00 PREC=0x00 TTL=64 ID=8148 DF PROTO=UDP SPT=42397 DPT=53
LEN=38 MARK=0x4000200
[723270.154716] filter-fwd: IN=br-lan OUT=eth0.2
MAC=78:a3:51:34:e4:86:2e:8e:de:be:22:5a:08:00 SRC=10.0.0.3 DST=1.1.1.1
LEN=58 TOS=0x00 PREC=0x00 TTL=63 ID=8148 DF PROTO=UDP SPT=42397 DPT=53
LEN=38 MARK=0x4000200
[723270.173501] mangle-post: IN= OUT=eth0.2 SRC=10.0.0.3 DST=1.1.1.1
LEN=58 TOS=0x00 PREC=0x00 TTL=63 ID=8148 DF PROTO=UDP SPT=42397 DPT=53
LEN=38 MARK=0x4000200
[723270.187787] nat-post: IN= OUT=eth0.2 SRC=10.0.0.3 DST=1.1.1.1
LEN=58 TOS=0x00 PREC=0x00 TTL=63 ID=8148 DF PROTO=UDP SPT=42397 DPT=53
LEN=38 MARK=0x4000200

When dumping conntrack after the second request has been processed, I
get the following:
udp      17 55 src=10.0.0.3 dst=1.1.1.1 sport=42397 dport=53 packets=1
bytes=56 src=1.1.1.1 dst=193.213.155.210 sport=53 dport=42397
packets=1 bytes=72 mark=67109120 delta-time=4 use=1

This looks a bit strange to me, it seems that only the first request
has been counted.

Does anyone have any idea of what could be wrong, where I should look
or other things I can try? I tried to space the requests out a bit in
time (I inserted a sleep 1 between them), and then the problem went
away.

Thanks in advance for any help,
Kristian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Silently dropped UDP packets on kernel 4.14
  2018-05-01 18:50 Silently dropped UDP packets on kernel 4.14 Kristian Evensen
@ 2018-05-01 18:58 ` Kristian Evensen
  2018-05-01 22:24   ` Florian Westphal
  0 siblings, 1 reply; 9+ messages in thread
From: Kristian Evensen @ 2018-05-01 18:58 UTC (permalink / raw)
  To: Netfilter Development Mailing list, Network Development

On Tue, May 1, 2018 at 8:50 PM, Kristian Evensen
<kristian.evensen@gmail.com> wrote:
> Does anyone have any idea of what could be wrong, where I should look
> or other things I can try? I tried to space the requests out a bit in
> time (I inserted a sleep 1 between them), and then the problem went
> away.

I should learn to always go through everything one last time before
sending an email. First of all, I see that both requests are treated
as new. Second, on my router, new requests are sent to user space for
marking, which explains the large delay in processing. When removing
the NFQUEUE-rule + handling and marking statically, my problem goes
away and I get an answer for both packets.

However, I do have one question. In my application, both packets are
assigned the same mark. Shouldn't they then match the same conntrack
entry, or am I missing something since that seems not to be the case?

BR,
Kristian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Silently dropped UDP packets on kernel 4.14
  2018-05-01 18:58 ` Kristian Evensen
@ 2018-05-01 22:24   ` Florian Westphal
  2018-05-01 22:42     ` Kristian Evensen
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2018-05-01 22:24 UTC (permalink / raw)
  To: Kristian Evensen; +Cc: Netfilter Development Mailing list, Network Development

Kristian Evensen <kristian.evensen@gmail.com> wrote:
> On Tue, May 1, 2018 at 8:50 PM, Kristian Evensen
> <kristian.evensen@gmail.com> wrote:
> > Does anyone have any idea of what could be wrong, where I should look
> > or other things I can try? I tried to space the requests out a bit in
> > time (I inserted a sleep 1 between them), and then the problem went
> > away.
> 
> I should learn to always go through everything one last time before
> sending an email. First of all, I see that both requests are treated
> as new.

Normal when nfqueue is in use.

> Second, on my router, new requests are sent to user space for
> marking, which explains the large delay in processing. When removing
> the NFQUEUE-rule + handling and marking statically, my problem goes
> away and I get an answer for both packets.

Yes, because 2nd packet finds the conntrack entry created by the first
one.

> However, I do have one question. In my application, both packets are
> assigned the same mark. Shouldn't they then match the same conntrack
> entry, or am I missing something since that seems not to be the case?

The 2nd packet creates a new conntrack entry, because the conntrack
entry created by the first one is not inserted into global table yet.

This happens as last step, after packet has traversed all chains.
When nfqueue is used, this gets delayed, and 2nd packet will be dropped
as the insertion step finds that another packet created same flow
already.

I'm not sure what the best way to solve this is, we either need
to insert earlier in nfqueue case, or do conflict resolution in nfqueue
case (and perhaps also nat undo? not sure).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Silently dropped UDP packets on kernel 4.14
  2018-05-01 22:24   ` Florian Westphal
@ 2018-05-01 22:42     ` Kristian Evensen
  2018-05-02 23:54       ` Kristian Evensen
  0 siblings, 1 reply; 9+ messages in thread
From: Kristian Evensen @ 2018-05-01 22:42 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Netfilter Development Mailing list, Network Development

Hi,

Thanks for your quick and detailed reply!

On Wed, May 2, 2018 at 12:24 AM, Florian Westphal <fw@strlen.de> wrote:
> I'm not sure what the best way to solve this is, we either need
> to insert earlier in nfqueue case, or do conflict resolution in nfqueue
> case (and perhaps also nat undo? not sure).

My knowledge of the conntrack/nat subsystem is not that great, and I
don't know the implications of what I am about to suggest. However,
considering that the two packets represent the same flow, wouldn't it
be possible to apply the existing nat-mapping to the second packet,
and then let the second packet pass?

BR,
Kristian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Silently dropped UDP packets on kernel 4.14
  2018-05-01 22:42     ` Kristian Evensen
@ 2018-05-02 23:54       ` Kristian Evensen
  2018-05-03  5:03         ` Florian Westphal
  0 siblings, 1 reply; 9+ messages in thread
From: Kristian Evensen @ 2018-05-02 23:54 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Netfilter Development Mailing list, Network Development

[-- Attachment #1: Type: text/plain, Size: 12216 bytes --]

Hello,

On Wed, May 2, 2018 at 12:42 AM, Kristian Evensen
<kristian.evensen@gmail.com> wrote:
> My knowledge of the conntrack/nat subsystem is not that great, and I
> don't know the implications of what I am about to suggest. However,
> considering that the two packets represent the same flow, wouldn't it
> be possible to apply the existing nat-mapping to the second packet,
> and then let the second packet pass?

I have spent the day today trying to solve my problem and I think I am
almost there. I have attached my work in progress patch to this email
if anyone wants to take a look (for kernel 4.14).

I went for the early-insert approached and have patched
nfnetlink_queue to perform an insert if no conntrack entry is found
when the verdict is passed from user-space. If a conntrack entry is
found, then I replace the ct attached to the skb with the existing
conntrack entry. I have verified that my approach works by
artificially delaying the verdict from my application for the second
packet (so that the first packet has passed all netfilter hooks).
After replacing the ct, the second packet is handled correctly and
sent over the wire.

However, something goes wrong after the conntrack entry has been
inserted (using nf_conntrack_hash_check_insert()). At some random (but
very short) time after I see a couple of "early insert ..."/"early
insert confirmed", I get an RCU-stall. The trace for the stall looks
as follows:

[  105.420024] INFO: rcu_sched self-detected stall on CPU
[  105.425191]  2-...: (5999 ticks this GP) idle=12a/140000000000001/0
softirq=2674/2674 fqs=2543
[  105.433845]   (t=6001 jiffies g=587 c=586 q=5896)
[  105.438545] CPU: 2 PID: 3632 Comm: dlb Not tainted 4.14.36 #0
[  105.444261] Stack : 00000000 00000000 00000000 00000000 805c7ada
00000031 00000000 8052c588
[  105.452610]         8fd48ff4 805668e7 804f5a2c 00000002 00000e30
00000001 8fc11d20 00000007
[  105.460957]         00000000 00000000 805c0000 00007448 00000000
0000016e 00000007 00000000
[  105.469303]         00000000 80570000 0006b111 00000000 80000000
00000000 80590000 804608e4
[  105.477650]         8056c2c0 8056408c 000000e0 80560000 00000000
8027c418 00000008 805c0008
[  105.485996]         ...
[  105.488437] Call Trace:
[  105.490909] [<800103f8>] show_stack+0x58/0x100
[  105.495363] [<8043c1dc>] dump_stack+0x9c/0xe0
[  105.499707] [<8000d938>] arch_trigger_cpumask_backtrace+0x50/0x78
[  105.505785] [<80083540>] rcu_dump_cpu_stacks+0xc4/0x134
[  105.510991] [<800829c4>] rcu_check_callbacks+0x310/0x814
[  105.516295] [<80085f04>] update_process_times+0x34/0x70
[  105.521522] [<8009691c>] tick_handle_periodic+0x34/0xd0
[  105.526749] [<802f6e98>] gic_compare_interrupt+0x48/0x58
[  105.532047] [<80076450>] handle_percpu_devid_irq+0xbc/0x1a8
[  105.537619] [<800707c0>] generic_handle_irq+0x40/0x58
[  105.542679] [<8023a274>] gic_handle_local_int+0x84/0xd0
[  105.547886] [<8023a434>] gic_irq_dispatch+0x10/0x20
[  105.552747] [<800707c0>] generic_handle_irq+0x40/0x58
[  105.557806] [<804591b4>] do_IRQ+0x1c/0x2c
[  105.561805] [<802394bc>] plat_irq_dispatch+0xfc/0x138
[  105.566839] [<8000b508>] except_vec_vi_end+0xb8/0xc4
[  105.571904] [<8f200ca4>] nf_conntrack_lock+0x28c/0x440 [nf_conntrack]
[  105.578341] ------------[ cut here ]------------
[  105.582948] WARNING: CPU: 2 PID: 3632 at kernel/smp.c:416
smp_call_function_many+0xc8/0x3bc
[  105.591257] Modules linked in: rt2800pci rt2800mmio rt2800lib
qcserial ppp_async option usb_wwan rt2x00pci rt2x00mmio rt2x00lib
rndis_host qmi_wwan ppp_generic nf_nat_pptp nf_conntrack_pptp
nf_conntrack_ipv6p
[  105.662308]  nf_nat_snmp_basic nf_nat_sip nf_nat_redirect
nf_nat_proto_gre nf_nat_masquerade_ipv4 nf_nat_irc nf_conntrack_ipv4
nf_nat_ipv4 nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_nat nf_log_ipv4
nf_flow_tablt
[  105.733288]  ip_set_hash_netiface ip_set_hash_netport
ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet
ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip
ip_set_hash_ipport ip_set_hash_ipmarm
[  105.804724]  ohci_hcd ehci_platform sd_mod scsi_mod ehci_hcd
gpio_button_hotplug usbcore nls_base usb_common mii
[  105.814899] CPU: 2 PID: 3632 Comm: dlb Not tainted 4.14.36 #0
[  105.820615] Stack : 00000000 00000000 00000000 00000000 805c7ada
00000031 00000000 8052c588
[  105.828961]         8fd48ff4 805668e7 804f5a2c 00000002 00000e30
00000001 8fc11c88 00000007
[  105.837308]         00000000 00000000 805c0000 00008590 00000000
0000018d 00000007 00000000
[  105.845654]         00000000 80570000 000c6f33 00000000 80000000
00000000 80590000 8009e304
[  105.854001]         00000009 000001a0 8000d0fc 00000000 00000000
8027c418 00000008 805c0008
[  105.862348]         ...
[  105.864786] Call Trace:
[  105.867230] [<800103f8>] show_stack+0x58/0x100
[  105.871662] [<8043c1dc>] dump_stack+0x9c/0xe0
[  105.876009] [<8002e190>] __warn+0xe0/0x114
[  105.880091] [<8002e254>] warn_slowpath_null+0x1c/0x28
[  105.885124] [<8009e304>] smp_call_function_many+0xc8/0x3bc
[  105.890591] [<8000d950>] arch_trigger_cpumask_backtrace+0x68/0x78
[  105.896663] [<80083540>] rcu_dump_cpu_stacks+0xc4/0x134
[  105.901869] [<800829c4>] rcu_check_callbacks+0x310/0x814
[  105.907164] [<80085f04>] update_process_times+0x34/0x70
[  105.912374] [<8009691c>] tick_handle_periodic+0x34/0xd0
[  105.917585] [<802f6e98>] gic_compare_interrupt+0x48/0x58
[  105.922878] [<80076450>] handle_percpu_devid_irq+0xbc/0x1a8
[  105.928434] [<800707c0>] generic_handle_irq+0x40/0x58
[  105.933474] [<8023a274>] gic_handle_local_int+0x84/0xd0
[  105.938681] [<8023a434>] gic_irq_dispatch+0x10/0x20
[  105.943542] [<800707c0>] generic_handle_irq+0x40/0x58
[  105.948581] [<804591b4>] do_IRQ+0x1c/0x2c
[  105.952580] [<802394bc>] plat_irq_dispatch+0xfc/0x138
[  105.957613] [<8000b508>] except_vec_vi_end+0xb8/0xc4
[  105.962602] [<8f200ca4>] nf_conntrack_lock+0x28c/0x440 [nf_conntrack]
[  105.969038] ---[ end trace 0b9e3b8a6909a0fb ]---
[  105.973638] INFO: rcu_sched detected stalls on CPUs/tasks:
[  105.979149]  2-...: (6054 ticks this GP) idle=12a/140000000000000/0
softirq=2674/2674 fqs=2544
[  105.987818]  (detected by 3, t=6056 jiffies, g=587, c=586, q=5896)
[  105.993996] ------------[ cut here ]------------
[  105.998615] WARNING: CPU: 3 PID: 0 at kernel/smp.c:291
smp_call_function_single+0xc0/0x18c
[  106.006841] Modules linked in: rt2800pci rt2800mmio rt2800lib
qcserial ppp_async option usb_wwan rt2x00pci rt2x00mmio rt2x00lib
rndis_host qmi_wwan ppp_generic nf_nat_pptp nf_conntrack_pptp
nf_conntrack_ipv6p
[  106.077960]  nf_nat_snmp_basic nf_nat_sip nf_nat_redirect
nf_nat_proto_gre nf_nat_masquerade_ipv4 nf_nat_irc nf_conntrack_ipv4
nf_nat_ipv4 nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_nat nf_log_ipv4
nf_flow_tablt
[  106.148997]  ip_set_hash_netiface ip_set_hash_netport
ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet
ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip
ip_set_hash_ipport ip_set_hash_ipmarm
[  106.220489]  ohci_hcd ehci_platform sd_mod scsi_mod ehci_hcd
gpio_button_hotplug usbcore nls_base usb_common mii
[  106.230686] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W
4.14.36 #0
[  106.237876] Stack : 00000000 00000000 00000000 00000000 805c7ada
00000042 00000000 8052c588
[  106.246232]         8fc44e74 805668e7 804f5a2c 00000003 00000000
00000001 8fc15c80 00000007
[  106.254587]         00000000 00000000 805c0000 000098c0 00000000
000001b3 00000007 00000000
[  106.262942]         00000000 80570000 0003851e 00000000 00000000
00000000 80590000 8009dfb0
[  106.271299]         00000009 00000123 000000e0 80560000 00000001
8027c418 0000000c 805c000c
[  106.279653]         ...
[  106.282097] Call Trace:
[  106.284563] [<800103f8>] show_stack+0x58/0x100
[  106.289026] [<8043c1dc>] dump_stack+0x9c/0xe0
[  106.293380] [<8002e190>] __warn+0xe0/0x114
[  106.297467] [<8002e254>] warn_slowpath_null+0x1c/0x28
[  106.302506] [<8009dfb0>] smp_call_function_single+0xc0/0x18c
[  106.308150] [<8000d950>] arch_trigger_cpumask_backtrace+0x68/0x78
[  106.314235] [<80083540>] rcu_dump_cpu_stacks+0xc4/0x134
[  106.319445] [<80082c8c>] rcu_check_callbacks+0x5d8/0x814
[  106.324757] [<80085f04>] update_process_times+0x34/0x70
[  106.329992] [<8009691c>] tick_handle_periodic+0x34/0xd0
[  106.335230] [<802f6e98>] gic_compare_interrupt+0x48/0x58
[  106.340537] [<80076450>] handle_percpu_devid_irq+0xbc/0x1a8
[  106.346118] [<800707c0>] generic_handle_irq+0x40/0x58
[  106.351190] [<8023a274>] gic_handle_local_int+0x84/0xd0
[  106.356402] [<8023a434>] gic_irq_dispatch+0x10/0x20
[  106.361270] [<800707c0>] generic_handle_irq+0x40/0x58
[  106.366339] [<804591b4>] do_IRQ+0x1c/0x2c
[  106.370344] [<802394bc>] plat_irq_dispatch+0xfc/0x138
[  106.375383] [<8000b508>] except_vec_vi_end+0xb8/0xc4
[  106.380336] [<8000ced0>] r4k_wait_irqoff+0x1c/0x24
[  106.385146] [<80065f6c>] do_idle+0xe4/0x168
[  106.389322] [<800661e8>] cpu_startup_entry+0x24/0x2c
[  106.394275] ---[ end trace 0b9e3b8a6909a0fc ]---
[  106.398888] CPU: 2 PID: 3632 Comm: dlb Tainted: G        W       4.14.36 #0
[  106.405819] task: 8fd48c80 task.stack: 8e274000
[  106.410324] $ 0   : 00000000 00000001 814b7588 00000001
[  106.415544] $ 4   : 09036633 80564198 71804fce d0db5a72
[  106.420763] $ 8   : 00000000 8aee9400 814bee18 00157558
[  106.425982] $12   : 00000000 00000000 fffffffe 00000114
[  106.431201] $16   : 00005b78 8e275998 8e874540 80560000
[  106.436421] $20   : 00000001 8f210000 00005a31 8f210000
[  106.441639] $24   : 00000000 80015e5c
[  106.446857] $28   : 8e274000 8e275940 8f210000 8f201634
[  106.452078] Hi    : 00005a31
[  106.454942] Lo    : 6e620000
[  106.457912] epc   : 8f2017f4 nf_conntrack_tuple_taken+0x258/0x308
[nf_conntrack]
[  106.465299] ra    : 8f201634 nf_conntrack_tuple_taken+0x98/0x308
[nf_conntrack]
[  106.472571] Status: 11007c03 KERNEL EXL IE
[  106.476748] Cause : 50800400 (ExcCode 00)
[  106.480738] PrId  : 0001992f (MIPS 1004Kc)
[  106.484819] CPU: 2 PID: 3632 Comm: dlb Tainted: G        W       4.14.36 #0
[  106.491742] Stack : 00000000 00000000 00000000 00000000 805c7ada
0000003f 00000000 8052c588
[  106.500088]         8fd48ff4 805668e7 804f5a2c 00000002 00000e30
00000001 8fc11d68 00000007
[  106.508435]         00000000 00000000 805c0000 0000a498 00000000
000001e3 00000007 00000000
[  106.516779]         00000000 80570000 000765d3 00000000 80000000
00000000 80590000 8000d0fc
[  106.525126]         804fce50 805574e0 804faa5c 80570000 00000002
8027c418 00000008 805c0008
[  106.533473]         ...
[  106.535911] Call Trace:
[  106.538356] [<800103f8>] show_stack+0x58/0x100
[  106.542788] [<8043c1dc>] dump_stack+0x9c/0xe0
[  106.547130] [<8009dccc>] flush_smp_call_function_queue+0x144/0x1d8
[  106.553294] [<80015830>] ipi_call_interrupt+0x10/0x20
[  106.558336] [<80071380>] __handle_irq_event_percpu+0x78/0x18c
[  106.564061] [<800714b4>] handle_irq_event_percpu+0x20/0x64
[  106.569528] [<80071558>] handle_irq_event+0x60/0xa8
[  106.574388] [<800757d0>] handle_edge_irq+0x204/0x25c
[  106.579337] [<800707c0>] generic_handle_irq+0x40/0x58
[  106.584377] [<8023a3c8>] gic_handle_shared_int+0x108/0x164
[  106.589844] [<800707c0>] generic_handle_irq+0x40/0x58
[  106.594883] [<804591b4>] do_IRQ+0x1c/0x2c
[  106.598880] [<802394bc>] plat_irq_dispatch+0xfc/0x138
[  106.603913] [<8000b508>] except_vec_vi_end+0xb8/0xc4
[  106.608901] [<8f2017f4>] nf_conntrack_tuple_taken+0x258/0x308 [nf_conntrack]
[  106.615963] [<8f1fcd88>] nf_ct_nat_ext_add+0x7ac/0x908 [nf_nat]

I instrumented nf_conntrack_tuple_taken() and it seems that what
happens, is that tuple_taken() spins on the "goto begin"-part. I am
probably doing something wrong with my insert, but I am not able to
see what. Is anyone able to see what is wrong, or can provide me with
pointers on where to look? When looking at the syslog, I also noticed
the following:

[   45.394817] early insert: tuple 17 192.168.6.2:36376 -> 1.1.1.1:53
[   45.401056] early insert confirmed
[   45.405468] early insert: tuple 17 192.168.6.2:36376 -> 1.1.1.1:53
[   45.411996] early insert confirmed

I.e., the entry created by the first packet is not found when the
second packet is handled. Could it be some synchronization step that I
am missing?

Thanks in advance for any help,
Kristian

[-- Attachment #2: 0001-NFQUEUE-early-insert.patch --]
[-- Type: text/x-patch, Size: 3148 bytes --]

From cfb4e8187940b89acf163a7f4181bf04abd333d0 Mon Sep 17 00:00:00 2001
From: Kristian Evensen <kristian.evensen@gmail.com>
Date: Tue, 1 May 2018 22:44:16 +0200
Subject: [PATCH] NFQUEUE early insert

---
 net/netfilter/nfnetlink_queue.c | 69 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index c97966298..254866d5c 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -43,6 +43,9 @@
 
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
 #include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_l3proto.h>
+#include <net/netfilter/nf_conntrack_l4proto.h>
 #endif
 
 #define NFQNL_QMAX_DEFAULT 1024
@@ -1151,6 +1154,64 @@ static int nfqa_parse_bridge(struct nf_queue_entry *entry,
 	return 0;
 }
 
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+static void nfqnl_update_ct(struct net *net, struct sk_buff *skb)
+{
+	struct nf_conn *ct = NULL;
+	enum ip_conntrack_info ctinfo;
+	const struct nf_conntrack_l3proto *l3proto;
+	const struct nf_conntrack_l4proto *l4proto;
+	u_int16_t l3num;
+	unsigned int dataoff;
+	u_int8_t l4num;
+	struct nf_conntrack_tuple tuple;
+	struct nf_conntrack_tuple_hash *h;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	l3num = nf_ct_l3num(ct);
+	l3proto = nf_ct_l3proto_find_get(l3num);
+
+	if (l3proto->get_l4proto(skb, skb_network_offset(skb), &dataoff,
+				 &l4num) <= 0) {
+		pr_info("Failed to get l4 protonum\n");
+		return;
+	}
+
+	l4proto = nf_ct_l4proto_find_get(l3num, l4num);
+
+	if (!nf_ct_get_tuple(skb, skb_network_offset(skb), dataoff, l3num,
+			     l4num, net, &tuple, l3proto, l4proto)) {
+		pr_info("Failed to get tuple\n");
+		return;
+	}
+
+#if IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES)
+	h = nf_conntrack_find_get(net, &(ct->zone), &tuple);
+#else
+	h = nf_conntrack_find_get(net, NULL, &tuple);
+#endif
+
+	if (!h) {
+		pr_info("early insert: tuple %u %pI4:%hu -> %pI4:%hu\n",
+		       tuple.dst.protonum, &(tuple.src.u3.ip), ntohs(tuple.src.u.all),
+	       	       &(tuple.dst.u3.ip), ntohs(tuple.dst.u.all));
+
+		rcu_read_lock();
+		if (!nf_conntrack_hash_check_insert(ct)) {
+			pr_info("early insert confirmed\n");
+		}
+		rcu_read_unlock();
+	} else {
+		pr_info("replace: tuple %u %pI4:%hu -> %pI4:%hu\n",
+		       tuple.dst.protonum, &(tuple.src.u3.ip), ntohs(tuple.src.u.all),
+	       	       &(tuple.dst.u3.ip), ntohs(tuple.dst.u.all));
+		nf_ct_put(ct);
+		ct = nf_ct_tuplehash_to_ctrack(h);
+		nf_ct_set(skb, ct, IP_CT_NEW);
+	}
+}
+#endif
+
 static int nfqnl_recv_verdict(struct net *net, struct sock *ctnl,
 			      struct sk_buff *skb,
 			      const struct nlmsghdr *nlh,
@@ -1213,6 +1274,14 @@ static int nfqnl_recv_verdict(struct net *net, struct sock *ctnl,
 	if (nfqa[NFQA_MARK])
 		entry->skb->mark = ntohl(nla_get_be32(nfqa[NFQA_MARK]));
 
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	nf_ct_get(entry->skb, &ctinfo);
+
+	if (ctinfo == IP_CT_NEW && verdict != NF_STOLEN && verdict != NF_DROP) {
+		nfqnl_update_ct(net, entry->skb);
+	}
+#endif
+
 	nf_reinject(entry, verdict);
 	return 0;
 }
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Silently dropped UDP packets on kernel 4.14
  2018-05-02 23:54       ` Kristian Evensen
@ 2018-05-03  5:03         ` Florian Westphal
  2018-05-03  9:06           ` Kristian Evensen
  2018-05-03  9:42           ` Michal Kubecek
  0 siblings, 2 replies; 9+ messages in thread
From: Florian Westphal @ 2018-05-03  5:03 UTC (permalink / raw)
  To: Kristian Evensen
  Cc: Florian Westphal, Netfilter Development Mailing list,
	Network Development

Kristian Evensen <kristian.evensen@gmail.com> wrote:
> I went for the early-insert approached and have patched

I'm sorry for suggesting that.

It doesn't work, because of NAT.
NAT rewrites packet content and changes the reply tuple, but the tuples
determine the hash insertion location.

I don't know how to solve this problem.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Silently dropped UDP packets on kernel 4.14
  2018-05-03  5:03         ` Florian Westphal
@ 2018-05-03  9:06           ` Kristian Evensen
  2018-05-03  9:42           ` Michal Kubecek
  1 sibling, 0 replies; 9+ messages in thread
From: Kristian Evensen @ 2018-05-03  9:06 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Netfilter Development Mailing list, Network Development

Hi Florian,

On Thu, May 3, 2018 at 7:03 AM, Florian Westphal <fw@strlen.de> wrote:
> I'm sorry for suggesting that.
>
> It doesn't work, because of NAT.
> NAT rewrites packet content and changes the reply tuple, but the tuples
> determine the hash insertion location.
>
> I don't know how to solve this problem.

No problem. This has anyway served as a good exercise for getting more
familiar with the conntrack/nat code in the kernel. I did some more
tests and I see that on my router (or routers actually), just
replacing the ct solves the issue. While not a perfect solution, the
situation is improved considerably. Do you think a patch where the ct
is replace would be acceptable, or would upstream rather wait for a
"proper" fix to this problem? When replacing the ct, it is at least
possible to work around the problem in userspace, while without
replacing ct we are stuck with the original entry.

BR,
Kristian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Silently dropped UDP packets on kernel 4.14
  2018-05-03  5:03         ` Florian Westphal
  2018-05-03  9:06           ` Kristian Evensen
@ 2018-05-03  9:42           ` Michal Kubecek
  2018-05-03 11:19             ` Kristian Evensen
  1 sibling, 1 reply; 9+ messages in thread
From: Michal Kubecek @ 2018-05-03  9:42 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Kristian Evensen, Netfilter Development Mailing list,
	Network Development

On Thu, May 03, 2018 at 07:03:45AM +0200, Florian Westphal wrote:
> Kristian Evensen <kristian.evensen@gmail.com> wrote:
> > I went for the early-insert approached and have patched
> 
> I'm sorry for suggesting that.
> 
> It doesn't work, because of NAT.
> NAT rewrites packet content and changes the reply tuple, but the tuples
> determine the hash insertion location.
> 
> I don't know how to solve this problem.

It's an old problem which surfaces from time to time when some special
conditions make it more visible. When I was facing it in 2015, I found
this thread from as early as 2009:

  https://www.spinics.net/lists/linux-net/msg16712.html

In our case, the customer was using IPVS in "one packet scheduling" mode
(it drops the conntrack entry after each packet) which increased the
probability of insert collisions significantly. Using NFQUEUE 

We were lucky, though, as it turned out the only reason why customer
needed connection tracking was to make sure fragments of long UDP
datagrams are not sent to different real servers. For newer kernels
after commit 6aafeef03b9d ("netfilter: push reasm skb through instead of
original frag skbs"), this was no longer necessary so that they could
disable connection tracking for these packets.

For older kernels without this change, I tried several ideas, each of
which didn't work for some reason. We ended up with rather hacky
workaround, not dropping the packet on collision (so that its conntrack
wasn't inserted into the table and was dropped once the packet was
sent). It worked fine for our customer but like the early insert
approach, it wouldn't work with NAT.

One of the ideas I had was this:

  - keep also unconfirmed conntracks in some data structure
  - check new packets also against unconfirmed conntracks
  - if it matches an unconfirmed conntrack, defer its processing
    until that conntrack is either inserted or discarded

But as it would be rather complicated to implement without races and
harming performance, I didn't want to actually try it until I would
run out of other ideas. With NAT coming to the play, there doesn't seem
to be many other options.

Michal Kubecek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Silently dropped UDP packets on kernel 4.14
  2018-05-03  9:42           ` Michal Kubecek
@ 2018-05-03 11:19             ` Kristian Evensen
  0 siblings, 0 replies; 9+ messages in thread
From: Kristian Evensen @ 2018-05-03 11:19 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: Florian Westphal, Netfilter Development Mailing list,
	Network Development

Hi Michal,

Thanks for providing a nice summary of your experience when dealing
with this problem. Always nice to know that I am not alone :)

On Thu, May 3, 2018 at 11:42 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
> One of the ideas I had was this:
>
>   - keep also unconfirmed conntracks in some data structure
>   - check new packets also against unconfirmed conntracks
>   - if it matches an unconfirmed conntrack, defer its processing
>     until that conntrack is either inserted or discarded

I was thinking about something along the same lines and came to the
same conclusion, it is a lot of hassle and work for a very special
case. I think that replacing the conntrack entry is a good compromise,
it improves on the current situation, and allows for the creation of
"perfect" solutions in user-space. For example, a user can keep track
of seen UDP flows, and then only release new packets belonging to the
same flow when the conntrack entry is created.

BR,
Kristian

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-05-03 11:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-01 18:50 Silently dropped UDP packets on kernel 4.14 Kristian Evensen
2018-05-01 18:58 ` Kristian Evensen
2018-05-01 22:24   ` Florian Westphal
2018-05-01 22:42     ` Kristian Evensen
2018-05-02 23:54       ` Kristian Evensen
2018-05-03  5:03         ` Florian Westphal
2018-05-03  9:06           ` Kristian Evensen
2018-05-03  9:42           ` Michal Kubecek
2018-05-03 11:19             ` Kristian Evensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.