netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dexuan Cui <decui@microsoft.com>
To: "netfilter-devel@vger.kernel.org"
	<netfilter-devel@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: netfilter: iptables-restore: setsockopt(3, SOL_IP, IPT_SO_SET_REPLACE, "security...", ...) return -EAGAIN
Date: Wed, 12 May 2021 22:17:22 +0000	[thread overview]
Message-ID: <MW2PR2101MB0892FC0F67BD25661CDCE149BF529@MW2PR2101MB0892.namprd21.prod.outlook.com> (raw)

Hi,
I'm debugging an iptables-restore failure, which happens about 5% of the
time when I keep stopping and starting the Linux VM. The VM has only 1
CPU, and kernel version is 4.15.0-1098-azure, but I suspect the issue may
also exist in the mainline Linux kernel.

When the failure happens, it's always caused by line 27 of the rule file:

  1 # Generated by iptables-save v1.6.0 on Fri Apr 23 09:22:59 2021
  2 *raw
  3 :PREROUTING ACCEPT [0:0]
  4 :OUTPUT ACCEPT [0:0]
  5 -A PREROUTING ! -s 168.63.129.16/32 -p tcp -j NOTRACK
  6 -A OUTPUT ! -d 168.63.129.16/32 -p tcp -j NOTRACK
  7 COMMIT
  8 # Completed on Fri Apr 23 09:22:59 2021
  9 # Generated by iptables-save v1.6.0 on Fri Apr 23 09:22:59 2021
 10 *filter
 11 :INPUT ACCEPT [2407:79190058]
 12 :FORWARD ACCEPT [0:0]
 13 :OUTPUT ACCEPT [1648:2190051]
 14 -A OUTPUT -d 169.254.169.254/32 -m owner --uid-owner 33 -j DROP
 15 COMMIT
 16 # Completed on Fri Apr 23 09:22:59 2021
 17 # Generated by iptables-save v1.6.0 on Fri Apr 23 09:22:59 2021
 18 *security
 19 :INPUT ACCEPT [2345:79155398]
 20 :FORWARD ACCEPT [0:0]
 21 :OUTPUT ACCEPT [1504:2129015]
 22 -A OUTPUT -d 168.63.129.16/32 -p tcp -m owner --uid-owner 0 -j ACCEPT
 23 -A OUTPUT -d 168.63.129.16/32 -p tcp -m conntrack --ctstate INVALID,NEW -j DROP
 24 -A OUTPUT -d 168.63.129.16/32 -p tcp -m owner --uid-owner 0 -j ACCEPT
 25 -A OUTPUT -d 168.63.129.16/32 -p tcp -m conntrack --ctstate INVALID,NEW -j DROP
 26 -A OUTPUT -d 168.63.129.16/32 -p tcp -m conntrack --ctstate INVALID,NEW -j DROP
 27 COMMIT

The related part of the strace log is:

  1 socket(PF_INET, SOCK_RAW, IPPROTO_RAW) = 3
  2 getsockopt(3, SOL_IP, IPT_SO_GET_INFO, "security\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., [84]) = 0
  3 getsockopt(3, SOL_IP, IPT_SO_GET_ENTRIES, "security\0\357B\16Z\177\0\0Pg\355\0\0\0\0\0Pg\355\0\0\0\0\0"..., [880]) = 0
  4 setsockopt(3, SOL_IP, IPT_SO_SET_REPLACE, "security\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2200) = -1 EAGAIN (Resource temporarily unavailable)
  5 close(3)                          = 0
  6 write(2, "iptables-restore: line 27 failed"..., 33) = 33

The -EAGAIN error comes from line 1240 of xt_replace_table():

  do_ipt_set_ctl
    do_replace
      __do_replace
        xt_replace_table

1216 xt_replace_table(struct xt_table *table,
1217               unsigned int num_counters,
1218               struct xt_table_info *newinfo,
1219               int *error)
1220 {
1221         struct xt_table_info *private;
1222         unsigned int cpu;
1223         int ret;
1224
1225         ret = xt_jumpstack_alloc(newinfo);
1226         if (ret < 0) {
1227                 *error = ret;
1228                 return NULL;
1229         }
1230
1231         /* Do the substitution. */
1232         local_bh_disable();
1233         private = table->private;
1234
1235         /* Check inside lock: is the old number correct? */
1236         if (num_counters != private->number) {
1237                 pr_debug("num_counters != table->private->number (%u/%u)\n",
1238                          num_counters, private->number);
1239                 local_bh_enable();
1240                 *error = -EAGAIN;
1241                 return NULL;
1242         }

When the function returns -EAGAIN, the 'num_counters' is 5 while
'private->number' is 6.

If I re-run the iptables-restore program upon the failure, the program
will succeed.

I checked the function xt_replace_table() in the recent mainline kernel and it
looks like the function is the same.

It looks like there is a race condition between iptables-restore calls
getsockopt() to get the number of table entries and iptables call
setsockopt() to replace the entries? Looks like some other program is
concurrently calling getsockopt()/setsockopt() -- but it looks like this is
not the case according to the messages I print via trace_printk() around
do_replace() in do_ipt_set_ctl(): when the -EAGAIN error happens, there is
no other program calling do_replace(); the table entry number was changed
to 5 by another program 'iptables' about 1.3 milliseconds ago, and then
this program 'iptables-restore' calls setsockopt() and the kernel sees
'num_counters' being 5 and the 'private->number' being 6 (how can this
happen??); the next setsockopt() call for the same 'security' table
happens in about 1 minute with both the numbers being 6.

Can you please shed some light on the issue? Thanks!

BTW, iptables does have a retry mechanism for getsockopt():
2f93205b375e ("Retry ruleset dump when kernel returns EAGAIN.")
(https://git.netfilter.org/iptables/commit/libiptc?id=2f93205b375e&context=10&ignorews=0&dt=0)

But it looks like this is enough? e.g. here getsockopt() returns 0, but
setsockopt() returns -EAGAIN.

Thanks,
Dexuan

             reply	other threads:[~2021-05-12 23:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-12 22:17 Dexuan Cui [this message]
2021-05-13  4:19 ` netfilter: iptables-restore: setsockopt(3, SOL_IP, IPT_SO_SET_REPLACE, "security...", ...) return -EAGAIN Dexuan Cui
2021-05-13  6:02   ` Dexuan Cui
2021-05-13  6:19     ` Dexuan Cui
2021-05-13  9:40       ` Pablo Neira Ayuso
2021-05-13 17:08         ` Phil Sutter
2021-05-13 20:40           ` Dexuan Cui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MW2PR2101MB0892FC0F67BD25661CDCE149BF529@MW2PR2101MB0892.namprd21.prod.outlook.com \
    --to=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=sthemmin@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).