linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* netfilter regression causes lost pings "operation not permitted"
@ 2016-12-07  8:03 Trevor Cordes
  2016-12-07  8:23 ` Trevor Cordes
  0 siblings, 1 reply; 3+ messages in thread
From: Trevor Cordes @ 2016-12-07  8:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: Florian Westphal, Pablo Neira Ayuso

Bisected down to:
870190a9ec9075205c0fa795a09fa931694a3ff1
7c9664351980aaa6a4b8837a314360b3a4ad382a

Hi!  4.8.x caused a script of mine that pings all IPs on my LAN /24 subnet 
in about 0.5s, and nmap doing the same, to error on the send() call with 
"operation not permitted".  This happens after a somewhat random number of 
packets have already been sent.  That number shrinks each time you run the 
script, so the first run you'll get up to around 200 pings, then it goes 
down to 50 pings, before the error.  If you wait, it goes back up to 
around 200 pings.  It almost never completes all 253 of them.

Interestingly, the problem only occurs when you ping different IPs.  If 
you send the same ping count using my script to just one IP, there is no 
bug.

4.7.0 kernels don't have this problem: the pings go out and everything is 
fine no matter how fast you repeat the script.

I bisected the bug to the above commits.  I had to skip 
7c9664351980aaa6a4b8837a314360b3a4ad382a because it wouldn't boot... just 
panic on every try.  So I can't narrow it any closer than within 2 
commits.

You can reproduce this bug in 4.8.8 or newer with:

# change to your LAN subnet
nmap -PE 192.168.100.0/24

Or use my test script I will paste below.  (Modify the top lines to suit 
your LAN IPs; or more work for different netmasks.)  Sometimes you have to 
run the script a few times before the error occurs.

When you see "operation not permitted", that's the symptom.  Boot into 
4.7.10, say, and you don't get any error.

I played with all the sysctls that looked relevant, like: ratelimit, 
per_sec, max, etc.  I modified everything I could find but nothing made 
the problem go away, though I *think* some had a modest effect on how many 
times I could run the script before the error popped up, but even if I 
took them to extreme values the bug never went away.

I'm back to the Fedora defaults now:

#sysctl -a | grep -iP 'icmp|nf_|conntrack|iptable'|grep -viP 'nf_log'
net.ipv4.icmp_echo_ignore_all = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_errors_use_inbound_ifaddr = 0
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.icmp_msgs_burst = 50
net.ipv4.icmp_msgs_per_sec = 1000
net.ipv4.icmp_ratelimit = 1000
net.ipv4.icmp_ratemask = 6168
net.ipv6.icmp.ratelimit = 1000
net.netfilter.nf_conntrack_acct = 0
net.netfilter.nf_conntrack_buckets = 65536
net.netfilter.nf_conntrack_checksum = 1
net.netfilter.nf_conntrack_count = 201
net.netfilter.nf_conntrack_events = 1
net.netfilter.nf_conntrack_expect_max = 1024
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_helper = 0
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_log_invalid = 0
net.netfilter.nf_conntrack_max = 262144
net.netfilter.nf_conntrack_tcp_be_liberal = 0
net.netfilter.nf_conntrack_tcp_loose = 1
net.netfilter.nf_conntrack_tcp_max_retrans = 3
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_timestamp = 0
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.nf_conntrack_max = 262144


Thanks for your help!



TEST SCRIPT:::::

#!/usr/bin/perl -w
# sorry, cheesy formatting, this is a test case I just slapped together

my $subnet = '192.168.100.';
#my $single = '192.168.101.110';

use Socket;
use Symbol;
use NetAddr::IP::Lite;

sub ICMP_ECHO       ()  { 8 }
sub ICMP_SUBCODE    ()  { 0 }
sub ICMP_STRUCT     ()  { 'C2S3A56' }
sub ICMP_FLAGS      ()  { 0 }
sub ICMP_PORT       ()  { 0 }

$sequence=0;

for $i (2..254) {

            $protocol = (getprotobyname('icmp'))[2] or
               die('Cannot get ICMP protocol number by name - ', $!);

            $socket = Symbol::gensym;
            socket($socket, PF_INET, SOCK_RAW, $protocol) or
                die('Cannot create IMCP socket - ', $!);


            $sequence = ($sequence+1) & 0xFFFF;

        my $checksum = 0;
	my $msg = pack(
            ICMP_STRUCT,
		ICMP_ECHO,
                ICMP_SUBCODE,
		$checksum,
                $$ & 0xFFFF,
                $sequence,
                '0' x 56
	);

	my $short = int(length($msg) / 2);
        $checksum += $_ for unpack "S$short", $msg;
        $checksum += ord(substr($msg, -1)) if length($msg) % 2;
        $checksum = ($checksum >> 16) + ($checksum & 0xFFFF);
	$checksum = ~(($checksum >> 16) + $checksum) & 0xFFFF;

	$msg = pack(
            ICMP_STRUCT,
                ICMP_ECHO,
		ICMP_SUBCODE,
		$checksum,
                $$ & 0xFFFF,
		$sequence,
                '0' x 56
        );

	my($address)=$single?$single:"$subnet$i";
        my $netaddr = inet_aton($address);
        my $sockaddr = pack_sockaddr_in(ICMP_PORT, $netaddr);
        send($socket, $msg, ICMP_FLAGS, $sockaddr) or
            die("ERROR ($address) sending ICMP packet - $!");
}
print "OK\n";

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: netfilter regression causes lost pings "operation not permitted"
  2016-12-07  8:03 netfilter regression causes lost pings "operation not permitted" Trevor Cordes
@ 2016-12-07  8:23 ` Trevor Cordes
  2016-12-17 12:25   ` Florian Westphal
  0 siblings, 1 reply; 3+ messages in thread
From: Trevor Cordes @ 2016-12-07  8:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Florian Westphal, Pablo Neira Ayuso

On 2016-12-07 Trevor Cordes wrote:
> Bisected down to:
> 870190a9ec9075205c0fa795a09fa931694a3ff1
> 7c9664351980aaa6a4b8837a314360b3a4ad382a

Oh!  I forgot to mention the most important point: iptable_nat module
MUST be loaded for the bug to show up!

modprobe iptable_nat

If you rmmod it, the bug goes away.  Interestingly, the bug occurs even
if you have every iptables table (including -t nat) completely empty
(no rules).  All that is required is iptable_nat simply to be loaded.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: netfilter regression causes lost pings "operation not permitted"
  2016-12-07  8:23 ` Trevor Cordes
@ 2016-12-17 12:25   ` Florian Westphal
  0 siblings, 0 replies; 3+ messages in thread
From: Florian Westphal @ 2016-12-17 12:25 UTC (permalink / raw)
  To: Trevor Cordes; +Cc: linux-kernel, Florian Westphal, Pablo Neira Ayuso

Trevor Cordes <trevor@tecnopolis.ca> wrote:

Sorry for late reply.

> On 2016-12-07 Trevor Cordes wrote:
> > Bisected down to:
> > 870190a9ec9075205c0fa795a09fa931694a3ff1
> > 7c9664351980aaa6a4b8837a314360b3a4ad382a
> 
> Oh!  I forgot to mention the most important point: iptable_nat module
> MUST be loaded for the bug to show up!
> 
> modprobe iptable_nat
> 
> If you rmmod it, the bug goes away.  Interestingly, the bug occurs even
> if you have every iptables table (including -t nat) completely empty
> (no rules).  All that is required is iptable_nat simply to be loaded.

Pablo, I think stable should revert both patches.

The alternative is for stable to pick up the fixes from 4.10 tree but
that requires to pull rhhashtables new rhlist interface too...

So I think revert is the way to go.

Should I take care of that?

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-12-17 12:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-07  8:03 netfilter regression causes lost pings "operation not permitted" Trevor Cordes
2016-12-07  8:23 ` Trevor Cordes
2016-12-17 12:25   ` Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).