netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dexuan Cui <decui@microsoft.com>
To: "'netfilter-devel@vger.kernel.org'"
	<netfilter-devel@vger.kernel.org>,
	"'netdev@vger.kernel.org'" <netdev@vger.kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>
Subject: RE: netfilter: iptables-restore: setsockopt(3, SOL_IP, IPT_SO_SET_REPLACE, "security...", ...) return -EAGAIN
Date: Thu, 13 May 2021 04:19:06 +0000	[thread overview]
Message-ID: <MW2PR2101MB0892864684CFDB096E0DBF02BF519@MW2PR2101MB0892.namprd21.prod.outlook.com> (raw)
In-Reply-To: <MW2PR2101MB0892FC0F67BD25661CDCE149BF529@MW2PR2101MB0892.namprd21.prod.outlook.com>

> From: Dexuan Cui
> Sent: Wednesday, May 12, 2021 3:17 PM
> ...
> It looks like there is a race condition between iptables-restore calls
> getsockopt() to get the number of table entries and iptables call
> setsockopt() to replace the entries? Looks like some other program is
> concurrently calling getsockopt()/setsockopt() -- but it looks like this is
> not the case according to the messages I print via trace_printk() around
> do_replace() in do_ipt_set_ctl(): when the -EAGAIN error happens, there is
> no other program calling do_replace(); the table entry number was changed
> to 5 by another program 'iptables' about 1.3 milliseconds ago, and then
> this program 'iptables-restore' calls setsockopt() and the kernel sees
> 'num_counters' being 5 and the 'private->number' being 6 (how can this
> happen??); the next setsockopt() call for the same 'security' table
> happens in about 1 minute with both the numbers being 6.

Well, after tracing the code more closely, it turns out that the program
'iptables' indeed concurrently changes the number from 5 to 6, and this
causes the 'iptable-restores' program to get EAGAIN:

1. iptables-906 (906 is the process ID) calls IPT_SO_GET_ENTRIES and the
current num_entries is 4; the process calls IPT_SO_SET_REPLACE and the
private->number becomes 5.

2. iptables-restor-895 calls IPT_SO_GET_ENTRIES, and gets num_entries==5.

3. iptables-906 calls IPT_SO_GET_ENTRIES again, and also gets num_entries==5;
the process calls IPT_SO_SET_REPLACE and the private->number becomes 6.

4. iptables-restor-895 calls IPT_SO_SET_REPLACE and the kernel function
xt_replace_table() returns -EAGAIN due to num_counters == 5 and
private->number == 6.

I think the latest mainline kernel should also have the same race.
It looks like this by-design race exists since day one?

> BTW, iptables does have a retry mechanism for getsockopt():
> 2f93205b375e ("Retry ruleset dump when kernel returns EAGAIN.")
> (https://git.netfilter.org/iptables/commit/libiptc?id=2f93205b375e&context=10
> &ignorews=0&dt=0)
> 
> But it looks like this is enough? e.g. here getsockopt() returns 0, but
> setsockopt() returns -EAGAIN.

It looks like we need to add a retry mechanism to the iptables-restore
program: iptables/iptables-restore.c: ip46tables_restore_main():

if cb->ops->commit(handle) -> ... -> setsockopt(...,IPT_SO_SET_REPLACE, ...)
fails due to EAGAIN, it should start over from the very begining, i.e. call
create_handle() -> handle = cb->ops->init(tablename) again to get the new
num_entries, and retry the commit op. But I'm not sure how to easily
re-create the context associated with the old handle (should/can it re-parse
the rules?), as I'm not really familiar with iptables.

Or, is it possible to fix the race condition from the netfilter module?

Thanks,
-- Dexuan


  reply	other threads:[~2021-05-13  4:19 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-12 22:17 netfilter: iptables-restore: setsockopt(3, SOL_IP, IPT_SO_SET_REPLACE, "security...", ...) return -EAGAIN Dexuan Cui
2021-05-13  4:19 ` Dexuan Cui [this message]
2021-05-13  6:02   ` Dexuan Cui
2021-05-13  6:19     ` Dexuan Cui
2021-05-13  9:40       ` Pablo Neira Ayuso
2021-05-13 17:08         ` Phil Sutter
2021-05-13 20:40           ` Dexuan Cui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MW2PR2101MB0892864684CFDB096E0DBF02BF519@MW2PR2101MB0892.namprd21.prod.outlook.com \
    --to=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=sthemmin@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).