All of lore.kernel.org
 help / color / mirror / Atom feed
* Concurrent iptables-restore calls clobberring each other
@ 2017-02-03 20:37 Shaun Crampton
  2017-02-03 23:47 ` Jan Engelhardt
  0 siblings, 1 reply; 4+ messages in thread
From: Shaun Crampton @ 2017-02-03 20:37 UTC (permalink / raw)
  To: netfilter-devel

Hi,

I'm trying to diagnose an incompatibility between my application
(Project Calico's Felix daemon) and another (Kuberenetes' kube-proxy).
Both are (ab)using iptables-restore to do high-speed bulk updates to
iptables and they're both using --noflush so they can use
iptables-restore to edit only some chains.  Mostly, this works great
and it's many times faster than using individual iptables commands.
However, sometimes when they do an iptables-restore at the same time,
I see one of the updates get lost even though the command reported
success.  I've boiled it down to a repro script[1] that starts two
threads writing to iptables and looks for missing updates.

My understanding is that each iptables-restore call actually does a
read-modify-write of the whole table so it's not too surprising that
we could get a missed update.  However, I thought that iptables has
some sort of sequence number to prevent clobbering, making it a
compare-and-swap operation.  I've certainly seen iptables-restore
calls fail on the COMMIT line when doing concurrent updates and I have
a tweaked script[2] that exhibits that behaviour.  In script [2] I
added an extra superfluous rule update to one of the writers and
suddenly the COMMIT starts failing as I was hoping.  While the toy
example in [2] seems to work, if I add more operations, it seems to go
back to failing again so it may just be a timing window.

Output from script [1] (it quickly fails after detecting a lost update):

$ sudo ./iptables.sh
[sudo] password for shaun:
akKkKkKkKkiptables-restore: line 4 failed
AbKkKkBKkCaKkAbKkBKkCaKkAbKkBKk
FELIX-B update was clobbered

Output from script [2] (keeps going for as long as I've let it run):

$ sudo ./iptables.sh
akKkAbKkBKkCaKkAbKkBKkKCakAbZkBZkKkCaKkAbKkBCaKkAbBZkKkKkCaKkAbKkBZkKkCaKkAbKkBKkKkCaKkKkAbKkBKkKkCaKkAbKkBZkKkCaKkAbKkBKkCaKkAbKkBZkKkCaKkAbKkBKkKkCaKkAbKkBKkKkCaAbKkKkBKkKkCaKkAbKkBKkCaKkAbBZkCaKkAbKkBKkKkCaKkAbBCa....

Where a K means that the "kube" thread successfully wrote to iptables
and a Z means it got a "COMMIT failed".

It'd be great to know if this is working as designed or a bug, or if
there's a way to make sure that I get a COMMIT failure if there's been
a concurrent update.  Without that, I'm thinking we'll have to do a
regular poll to make sure that nothing got clobberred.

I'd appreciate if you CCed me on any responses since I'm not
subscribed to the list.  Thanks,

-Shaun

[1] https://gist.github.com/fasaxc/ee443a9ef82ce2e4dab059161f095ec2
[2] https://gist.github.com/fasaxc/05a80a48211500e4f2225011a131f92e

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Concurrent iptables-restore calls clobberring each other
  2017-02-03 20:37 Concurrent iptables-restore calls clobberring each other Shaun Crampton
@ 2017-02-03 23:47 ` Jan Engelhardt
  2017-02-04  8:53   ` Shaun Crampton
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Engelhardt @ 2017-02-03 23:47 UTC (permalink / raw)
  To: Shaun Crampton; +Cc: netfilter-devel


On Friday 2017-02-03 21:37, Shaun Crampton wrote:
>
>I'm trying to diagnose an incompatibility between my application
>(Project Calico's Felix daemon) and another (Kuberenetes' kube-proxy).
>Both are (ab)using iptables-restore to do high-speed bulk updates to
>iptables and they're both using --noflush so they can use
>iptables-restore to edit only some chains.  Mostly, this works great
>and it's many times faster than using individual iptables commands.
[...]
>My understanding is that each iptables-restore call actually does a
>read-modify-write of the whole table

This is by design; the RMW cycle in principle also affects the "slower"
iptables - which is why it is slower, because it does only one rule per cycle.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Concurrent iptables-restore calls clobberring each other
  2017-02-03 23:47 ` Jan Engelhardt
@ 2017-02-04  8:53   ` Shaun Crampton
  2017-02-09 14:39     ` Shaun Crampton
  0 siblings, 1 reply; 4+ messages in thread
From: Shaun Crampton @ 2017-02-04  8:53 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

> This is by design; the RMW cycle in principle also affects the "slower"
> iptables - which is why it is slower, because it does only one rule per cycle.

Thanks for the response. I understand that the RMW is by design. Is there
any protection built into the protocol to prevent concurrent writes from
clobbering each other?  I thought I'd read that there was a "version"
on the read
that let the kernel spot if a write was stale.

My second script acts as if there is; the commits of the "kube" loop
fail reliably
rather than clobbering the writes of the "felix" loop.  However,
that's not the case
for the first script.  I'm wondering if there is supposed to be
protection but it's
bugged.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Concurrent iptables-restore calls clobberring each other
  2017-02-04  8:53   ` Shaun Crampton
@ 2017-02-09 14:39     ` Shaun Crampton
  0 siblings, 0 replies; 4+ messages in thread
From: Shaun Crampton @ 2017-02-09 14:39 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

> Is there any protection built into the protocol to prevent concurrent
> writes from clobbering each other?

Ah, I think I see the reason for the behaviour that I'm seeing.  It looks
like the kernel does check that the number of entries in the table hasn't
changed since the data was read [1]  That explains why the unrelated rule
change in my second script causes the COMMIT to fail.

[1] https://github.com/torvalds/linux/blob/master/net/ipv4/netfilter/ip_tables.c#L1033

On 4 February 2017 at 08:53, Shaun Crampton <shaun@cantab.net> wrote:
>> This is by design; the RMW cycle in principle also affects the "slower"
>> iptables - which is why it is slower, because it does only one rule per cycle.
>
> Thanks for the response. I understand that the RMW is by design. Is there
> any protection built into the protocol to prevent concurrent writes from
> clobbering each other?  I thought I'd read that there was a "version"
> on the read
> that let the kernel spot if a write was stale.
>
> My second script acts as if there is; the commits of the "kube" loop
> fail reliably
> rather than clobbering the writes of the "felix" loop.  However,
> that's not the case
> for the first script.  I'm wondering if there is supposed to be
> protection but it's
> bugged.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-02-09 14:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-03 20:37 Concurrent iptables-restore calls clobberring each other Shaun Crampton
2017-02-03 23:47 ` Jan Engelhardt
2017-02-04  8:53   ` Shaun Crampton
2017-02-09 14:39     ` Shaun Crampton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.