netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* tc ipt action
@ 2012-12-09 12:20 Yury Stankevich
  2012-12-13 10:58 ` Jamal Hadi Salim
  0 siblings, 1 reply; 69+ messages in thread
From: Yury Stankevich @ 2012-12-09 12:20 UTC (permalink / raw)
  To: netdev; +Cc: urykhy

Hello,

i not sure this is correct list, please advise if not.

i'm trying to use ipt action, and got a problem:

#tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0
action ipt -j CONNMARK --restore-mark action mirred egress redirect dev ifb0
-> bad action type ipt

from strace:
open("/usr/lib/tc//m_gact.so", O_RDONLY) = -1 ENOENT (No such file or
directory)
write(2, "bad action type ipt\n", 20bad action type ipt

well. i'm trying to use xt:
#tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0
action xt -j CONNMARK --restore-mark action mirred egress redirect dev ifb0
xt: unrecognized option '--restore-mark'

from strace:
open("/lib/xtables/libxt_CONNMARK.so", O_RDONLY) = 4
read(4,
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\200\6\0\0004\0\0\0"...,
512) = 512
fstat64(4, {st_mode=S_IFREG|0644, st_size=9756, ...}) = 0
mmap2(NULL, 12548, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0)
= 0xf76f3000
mmap2(0xf76f5000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x1) = 0xf76f5000
close(4)                                = 0
mprotect(0xf76f5000, 4096, PROT_READ)   = 0
socket(PF_INET, SOCK_RAW, IPPROTO_RAW)  = 4
fcntl64(4, F_SETFD, FD_CLOEXEC)         = 0
lstat64("/proc/net/ip_tables_names", {st_mode=S_IFREG|0440, st_size=0,
...}) = 0
statfs64("/proc/net/ip_tables_names", 84, {f_type="PROC_SUPER_MAGIC",
f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0,
f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
getsockopt(4, SOL_IP, 0x43 /* IP_??? */,
"CONNMARK\0\367\f\300\0\0\0po\367l8p\367\364/p\367:}\302\1", [30]) = 0
close(4)                                = 0
write(2, "xt: unrecognized option '--resto"..., 41xt: unrecognized
option '--restore-mark'

so... i make something wrong or this is a bug ?

ps: 3.6.8 kernel 64 bit kernel with 32 bit userspace, iproute 20121001
from debian-experimental,
module act_ipt is loaded.
pps: please, cc me in reply.


-- 
Linux registered user #402966 // pub 1024D/E99AF373 <pgp.mit.edu>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-09 12:20 tc ipt action Yury Stankevich
@ 2012-12-13 10:58 ` Jamal Hadi Salim
  2012-12-15 21:19   ` Jamal Hadi Salim
  0 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-13 10:58 UTC (permalink / raw)
  To: Yury Stankevich; +Cc: netdev, pablo

Yury,

This appears to be an ABI breakage on iptables/netfilter side.
I will look at it (and hopefully fix it) over the weekend.

cheers,
jamal

On 12-12-09 07:20 AM, Yury Stankevich wrote:
> Hello,
>
> i not sure this is correct list, please advise if not.
>
> i'm trying to use ipt action, and got a problem:
>
> #tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0
> action ipt -j CONNMARK --restore-mark action mirred egress redirect dev ifb0
> -> bad action type ipt
>
> from strace:
> open("/usr/lib/tc//m_gact.so", O_RDONLY) = -1 ENOENT (No such file or
> directory)
> write(2, "bad action type ipt\n", 20bad action type ipt
>
> well. i'm trying to use xt:
> #tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0
> action xt -j CONNMARK --restore-mark action mirred egress redirect dev ifb0
> xt: unrecognized option '--restore-mark'
>
> from strace:
> open("/lib/xtables/libxt_CONNMARK.so", O_RDONLY) = 4
> read(4,
> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\200\6\0\0004\0\0\0"...,
> 512) = 512
> fstat64(4, {st_mode=S_IFREG|0644, st_size=9756, ...}) = 0
> mmap2(NULL, 12548, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0)
> = 0xf76f3000
> mmap2(0xf76f5000, 8192, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x1) = 0xf76f5000
> close(4)                                = 0
> mprotect(0xf76f5000, 4096, PROT_READ)   = 0
> socket(PF_INET, SOCK_RAW, IPPROTO_RAW)  = 4
> fcntl64(4, F_SETFD, FD_CLOEXEC)         = 0
> lstat64("/proc/net/ip_tables_names", {st_mode=S_IFREG|0440, st_size=0,
> ...}) = 0
> statfs64("/proc/net/ip_tables_names", 84, {f_type="PROC_SUPER_MAGIC",
> f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0,
> f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
> getsockopt(4, SOL_IP, 0x43 /* IP_??? */,
> "CONNMARK\0\367\f\300\0\0\0po\367l8p\367\364/p\367:}\302\1", [30]) = 0
> close(4)                                = 0
> write(2, "xt: unrecognized option '--resto"..., 41xt: unrecognized
> option '--restore-mark'
>
> so... i make something wrong or this is a bug ?
>
> ps: 3.6.8 kernel 64 bit kernel with 32 bit userspace, iproute 20121001
> from debian-experimental,
> module act_ipt is loaded.
> pps: please, cc me in reply.
>
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-13 10:58 ` Jamal Hadi Salim
@ 2012-12-15 21:19   ` Jamal Hadi Salim
  2012-12-15 23:06     ` Jan Engelhardt
  2012-12-16  0:27     ` tc ipt action Pablo Neira Ayuso
  0 siblings, 2 replies; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-15 21:19 UTC (permalink / raw)
  To: Yury Stankevich, shemonc; +Cc: netdev, pablo, netfilter-devel

Yury,

I took a brief look and run some quick tests on ubuntu 12.04. I am going
to be lazy and try and involve the netfilter folks.
It seems that if you left out the args to CONNMARK (includes other 
targets like MARK etc) you will succeed - but you get default values.


Example, the following should work for
tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0
action ipt -j CONNMARK \
action mirred egress redirect dev ifb0

Here is what the output looks like when you dont pass the parameters.

-------
j@ubuntu:~$ sudo tc filter show dev eth0 parent ffff:
filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 
flowid 1:15
   match 0a000015/ffffffff at 12
	action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING
	target  MARK and 0xffffffff
	index 2 ref 1 bind 1

filter protocol ip pref 49149 u32
filter protocol ip pref 49149 u32 fh 804: ht divisor 1
filter protocol ip pref 49149 u32 fh 804::800 order 2048 key ht 804 bkt 
0 flowid 1:12
   match 00000000/00000000 at 0
	action order 33: tablename: mangle  hook: NF_IP_PRE_ROUTING
	target  CONNMARK and 0x0
	index 123 ref 1 bind 1
----------------

Pablo, Hasan Chowdhury tells me this broke after iptable 1.4.10
Hasan also sent me a small patch to fake "xt" instead of "ipt" - but i 
think there's more than meets the eye here; some interface we are using 
to talk to xtables on user space seems to have changed.

cheers,
jamal

On 12-12-13 05:58 AM, Jamal Hadi Salim wrote:
> Yury,
>
> This appears to be an ABI breakage on iptables/netfilter side.
> I will look at it (and hopefully fix it) over the weekend.
>
> cheers,
> jamal
>
> On 12-12-09 07:20 AM, Yury Stankevich wrote:
>> Hello,
>>
>> i not sure this is correct list, please advise if not.
>>
>> i'm trying to use ipt action, and got a problem:
>>
>> #tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0
>> action ipt -j CONNMARK --restore-mark action mirred egress redirect
>> dev ifb0
>> -> bad action type ipt
>>
>> from strace:
>> open("/usr/lib/tc//m_gact.so", O_RDONLY) = -1 ENOENT (No such file or
>> directory)
>> write(2, "bad action type ipt\n", 20bad action type ipt
>>
>> well. i'm trying to use xt:
>> #tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0
>> action xt -j CONNMARK --restore-mark action mirred egress redirect dev
>> ifb0
>> xt: unrecognized option '--restore-mark'
>>
>> from strace:
>> open("/lib/xtables/libxt_CONNMARK.so", O_RDONLY) = 4
>> read(4,
>> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\200\6\0\0004\0\0\0"...,
>> 512) = 512
>> fstat64(4, {st_mode=S_IFREG|0644, st_size=9756, ...}) = 0
>> mmap2(NULL, 12548, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0)
>> = 0xf76f3000
>> mmap2(0xf76f5000, 8192, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x1) = 0xf76f5000
>> close(4)                                = 0
>> mprotect(0xf76f5000, 4096, PROT_READ)   = 0
>> socket(PF_INET, SOCK_RAW, IPPROTO_RAW)  = 4
>> fcntl64(4, F_SETFD, FD_CLOEXEC)         = 0
>> lstat64("/proc/net/ip_tables_names", {st_mode=S_IFREG|0440, st_size=0,
>> ...}) = 0
>> statfs64("/proc/net/ip_tables_names", 84, {f_type="PROC_SUPER_MAGIC",
>> f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0,
>> f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
>> getsockopt(4, SOL_IP, 0x43 /* IP_??? */,
>> "CONNMARK\0\367\f\300\0\0\0po\367l8p\367\364/p\367:}\302\1", [30]) = 0
>> close(4)                                = 0
>> write(2, "xt: unrecognized option '--resto"..., 41xt: unrecognized
>> option '--restore-mark'
>>
>> so... i make something wrong or this is a bug ?
>>
>> ps: 3.6.8 kernel 64 bit kernel with 32 bit userspace, iproute 20121001
>> from debian-experimental,
>> module act_ipt is loaded.
>> pps: please, cc me in reply.
>>
>>
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-15 21:19   ` Jamal Hadi Salim
@ 2012-12-15 23:06     ` Jan Engelhardt
  2012-12-16  0:26       ` Jan Engelhardt
                         ` (2 more replies)
  2012-12-16  0:27     ` tc ipt action Pablo Neira Ayuso
  1 sibling, 3 replies; 69+ messages in thread
From: Jan Engelhardt @ 2012-12-15 23:06 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Yury Stankevich, shemonc, netdev, pablo, netfilter-devel


On Saturday 2012-12-15 22:19, Jamal Hadi Salim wrote:
>
> Example, the following should work for
> tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0
> action ipt -j CONNMARK \
> action mirred egress redirect dev ifb0

If I try that command (substituting ipt->xt and eth0->dummy0,
ifb0->dummy1), all I get is the dreaded "Invalid argument".
So the kernel rejected the command, which could indicate that
userspace construction might have been ok.

# tc filter add dev dummy0 parent ffff: protocol ip u32 match u32 0 0 \
action xt -j CONNMARK action mirred egress redirect dev dummy1

tablename: mangle hook: NF_IP_PRE_ROUTING
        target:  CONNMARK and 0x0 index 0
RTNETLINK answers: Invalid argument
We have an error talking to the kernel

> Pablo, Hasan Chowdhury tells me this broke after iptable 1.4.10
> Hasan also sent me a small patch to fake "xt" instead of "ipt" - but i think
> there's more than meets the eye here; some interface we are using to talk to
> xtables on user space seems to have changed.

What was the last combination that worked?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-15 23:06     ` Jan Engelhardt
@ 2012-12-16  0:26       ` Jan Engelhardt
  2012-12-16  0:32         ` [PATCH] build: unbreak linkage of m_xt.so Jan Engelhardt
  2012-12-16 10:22       ` tc ipt action Jamal Hadi Salim
       [not found]       ` <CAASe=fQT2pVOK0uctdaKL+aOrF8nYeTMfoF15kmd-rC02+7Vnw@mail.gmail.com>
  2 siblings, 1 reply; 69+ messages in thread
From: Jan Engelhardt @ 2012-12-16  0:26 UTC (permalink / raw)
  To: vapier
  Cc: Jamal Hadi Salim, Yury Stankevich, shemonc, netdev,
	Pablo Neira Ayuso, Netfilter Developer Mailing List


On Sunday 2012-12-16 00:06, Jan Engelhardt wrote:
>On Saturday 2012-12-15 22:19, Jamal Hadi Salim wrote:
>>
>> Example, the following should work for
>> tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0
>> action ipt -j CONNMARK \
>> action mirred egress redirect dev ifb0
>
>If I try that command (substituting ipt->xt and eth0->dummy0,
>ifb0->dummy1), all I get is the dreaded "Invalid argument".
>So the kernel rejected the command, which could indicate that
>userspace construction might have been ok.
>
># tc filter add dev dummy0 parent ffff: protocol ip u32 match u32 0 0 \
>action xt -j CONNMARK action mirred egress redirect dev dummy1
>
>tablename: mangle hook: NF_IP_PRE_ROUTING
>        target:  CONNMARK and 0x0 index 0
>RTNETLINK answers: Invalid argument
>We have an error talking to the kernel
>
>> Pablo, Hasan Chowdhury tells me this broke after iptable 1.4.10
>> Hasan also sent me a small patch to fake "xt" instead of "ipt" - but i think
>> there's more than meets the eye here; some interface we are using to talk to
>> xtables on user space seems to have changed.
>
>What was the last combination that worked?

For added fun, it works even less in iproute2-3.7.0.

commit e4fc4ada3317ea94452576add25981279d705b14
Author: Mike Frysinger <vapier@gentoo.org>
Date:   Thu Nov 8 11:41:17 2012 -0500

    allow pkg-config to be customized
    
    Rather than hard coding `pkg-config`, use ${PKG_CONFIG} so people can
    override it to their specific version (like when cross-compiling).
    
    This is the same way the upstream pkg-config code works.
    
    Signed-off-by: Mike Frysinger <vapier@gentoo.org>


broke it by causing tc/m_xt.so to no longer link against libxtables.so,
leading to:

# tc [above parameters]
tc: symbol lookup error: /usr/lib64/tc//m_xt.so: undefined symbol:
xtables_init_all


(Makefiles being simpler than $other_buildsys? A distant reality!)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-15 21:19   ` Jamal Hadi Salim
  2012-12-15 23:06     ` Jan Engelhardt
@ 2012-12-16  0:27     ` Pablo Neira Ayuso
  2012-12-16  0:59       ` Jan Engelhardt
  2012-12-16 10:26       ` Jamal Hadi Salim
  1 sibling, 2 replies; 69+ messages in thread
From: Pablo Neira Ayuso @ 2012-12-16  0:27 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Yury Stankevich, shemonc, netdev, netfilter-devel

Hi Jamal!

On Sat, Dec 15, 2012 at 04:19:29PM -0500, Jamal Hadi Salim wrote:
> Yury,
> 
> I took a brief look and run some quick tests on ubuntu 12.04. I am going
> to be lazy and try and involve the netfilter folks.
> It seems that if you left out the args to CONNMARK (includes other
> targets like MARK etc) you will succeed - but you get default
> values.
> 
> 
> Example, the following should work for
> tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0
> action ipt -j CONNMARK \
> action mirred egress redirect dev ifb0
> 
> Here is what the output looks like when you dont pass the parameters.
> 
> -------
> j@ubuntu:~$ sudo tc filter show dev eth0 parent ffff:
> filter protocol ip pref 1 u32
> filter protocol ip pref 1 u32 fh 800: ht divisor 1
> filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt
> 0 flowid 1:15
>   match 0a000015/ffffffff at 12
> 	action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING
> 	target  MARK and 0xffffffff
> 	index 2 ref 1 bind 1
> 
> filter protocol ip pref 49149 u32
> filter protocol ip pref 49149 u32 fh 804: ht divisor 1
> filter protocol ip pref 49149 u32 fh 804::800 order 2048 key ht 804
> bkt 0 flowid 1:12
>   match 00000000/00000000 at 0
> 	action order 33: tablename: mangle  hook: NF_IP_PRE_ROUTING
> 	target  CONNMARK and 0x0
> 	index 123 ref 1 bind 1
> ----------------
> 
> Pablo, Hasan Chowdhury tells me this broke after iptable 1.4.10
> Hasan also sent me a small patch to fake "xt" instead of "ipt" - but
> i think there's more than meets the eye here; some interface we are
> using to talk to xtables on user space seems to have changed.

The binary interface was broken in 1.4.11 with the guided option
parser:

commit 7299fa4b615d7f7ee12cde444266f6b31f667f9f
Author: Jan Engelhardt <jengelh@medozas.de>
Date:   Sun Mar 6 15:54:58 2011 +0100

    libxt_CONNMARK: use guided option parser

You need a patch to use the new interface to stay in sync with current
iptables libraries. I'll make it for tc and send it to you.

BTW, I think it would be good if we find the way to check for
libxtables current version (see iptables/configure.ac), so you can
know that we broke binary compatibility again.

Cheers,
Pablo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH] build: unbreak linkage of m_xt.so
  2012-12-16  0:26       ` Jan Engelhardt
@ 2012-12-16  0:32         ` Jan Engelhardt
  2012-12-16 10:30           ` Jamal Hadi Salim
                             ` (2 more replies)
  0 siblings, 3 replies; 69+ messages in thread
From: Jan Engelhardt @ 2012-12-16  0:32 UTC (permalink / raw)
  To: stephen.hemminger
  Cc: vapier, netdev, jhs, urykhy, shemonc, pablo, netfilter-devel

Commit v3.7.0~10 caused the variable new PKG_CONFIG variable never
to be present at the time of calling make, leading to tc/m_xt.so
not linked with -lxtables (result from pkg-config xtables --libs),
that in turn leading to

tc: symbol lookup error: /usr/lib64/tc//m_xt.so: undefined symbol:
xtables_init_all

Fixing that.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
---
 configure |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 9912114..573ee55 100755
--- a/configure
+++ b/configure
@@ -4,7 +4,6 @@
 INCLUDE=${1:-"$PWD/include"}
 : ${PKG_CONFIG:=pkg-config}
 : ${CC=gcc}
-echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
 
 # Make a temp directory in build tree.
 TMPDIR=$(mktemp -d config.XXXXXX)
@@ -224,6 +223,7 @@ rm -f $TMPDIR/ipsettest.c $TMPDIR/ipsettest
 }
 
 echo "# Generated config based on" $INCLUDE >Config
+echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
 
 echo "TC schedulers"
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-16  0:27     ` tc ipt action Pablo Neira Ayuso
@ 2012-12-16  0:59       ` Jan Engelhardt
  2012-12-16 10:43         ` Jamal Hadi Salim
  2012-12-16 10:26       ` Jamal Hadi Salim
  1 sibling, 1 reply; 69+ messages in thread
From: Jan Engelhardt @ 2012-12-16  0:59 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jamal Hadi Salim, Yury Stankevich, shemonc, netdev, netfilter-devel


On Sunday 2012-12-16 01:27, Pablo Neira Ayuso wrote:
>On Sat, Dec 15, 2012 at 04:19:29PM -0500, Jamal Hadi Salim wrote:
>> Example, the following should work for
>> tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0
>> action ipt -j CONNMARK \
>> action mirred egress redirect dev ifb0
>> 
>commit 7299fa4b615d7f7ee12cde444266f6b31f667f9f
>    libxt_CONNMARK: use guided option parser
>
>BTW, I think it would be good if we find the way to check for
>libxtables current version (see iptables/configure.ac), so you can
>know that we broke binary compatibility again.

For the C level, there is XTABLES_VERSION_CODE.

#if XTABLES_VERSION_CODE >= 6
	if (m != NULL && m->x6_parse != NULL)
		m->x6_parse(...)
#else
	else if (m != NULL && m->parse != NULL)
		m->parse(...)
	...

We can also export this through pkgconfig, similar to how
downstream users are to discover the plugin dir
(`pkg-config xtables --variable libdir`).

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-15 23:06     ` Jan Engelhardt
  2012-12-16  0:26       ` Jan Engelhardt
@ 2012-12-16 10:22       ` Jamal Hadi Salim
       [not found]       ` <CAASe=fQT2pVOK0uctdaKL+aOrF8nYeTMfoF15kmd-rC02+7Vnw@mail.gmail.com>
  2 siblings, 0 replies; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-16 10:22 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Yury Stankevich, shemonc, netdev, pablo, netfilter-devel

On 12-12-15 06:06 PM, Jan Engelhardt wrote:

> If I try that command (substituting ipt->xt and eth0->dummy0,
> ifb0->dummy1), all I get is the dreaded "Invalid argument".
> So the kernel rejected the command, which could indicate that
> userspace construction might have been ok.
>
> # tc filter add dev dummy0 parent ffff: protocol ip u32 match u32 0 0 \
> action xt -j CONNMARK action mirred egress redirect dev dummy1
>
> tablename: mangle hook: NF_IP_PRE_ROUTING
>          target:  CONNMARK and 0x0 index 0
> RTNETLINK answers: Invalid argument
> We have an error talking to the kernel
>

No problem sending it to the kernel here on ubuntu 12.04.
I also upgraded to current linus git tree, same result.
The problem is the parameters are not accepted in user space as
you can see for connmark and what gets sent (eg CONNMARK and 0x0)
doesnt seem sensible.

> What was the last combination that worked?

First time this got reported to me (or i got CCed on the problem) - I am 
told it broke after iptables 1.4.11.

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-16  0:27     ` tc ipt action Pablo Neira Ayuso
  2012-12-16  0:59       ` Jan Engelhardt
@ 2012-12-16 10:26       ` Jamal Hadi Salim
  1 sibling, 0 replies; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-16 10:26 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Yury Stankevich, shemonc, netdev, netfilter-devel

Hi Pablo,

On 12-12-15 07:27 PM, Pablo Neira Ayuso wrote:

> The binary interface was broken in 1.4.11 with the guided option
> parser:

Ah. Thanks that would explain it.


> You need a patch to use the new interface to stay in sync with current
> iptables libraries. I'll make it for tc and send it to you.
>

Much thanks. I just scanned it and things have changed; old way used to 
take multiparams. New one a single struct, so would have taken much 
longer for me to resolve.

> BTW, I think it would be good if we find the way to check for
> libxtables current version (see iptables/configure.ac), so you can
> know that we broke binary compatibility again.
>

will do.

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] build: unbreak linkage of m_xt.so
  2012-12-16  0:32         ` [PATCH] build: unbreak linkage of m_xt.so Jan Engelhardt
@ 2012-12-16 10:30           ` Jamal Hadi Salim
  2012-12-16 17:03             ` Jamal Hadi Salim
  2012-12-16 22:02           ` Mike Frysinger
  2012-12-18 17:21           ` Stephen Hemminger
  2 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-16 10:30 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: stephen.hemminger, vapier, netdev, urykhy, shemonc, pablo,
	netfilter-devel

On 12-12-15 07:32 PM, Jan Engelhardt wrote:
> Commit v3.7.0~10 caused the variable new PKG_CONFIG variable never
> to be present at the time of calling make, leading to tc/m_xt.so
> not linked with -lxtables (result from pkg-config xtables --libs),
> that in turn leading to
>
> tc: symbol lookup error: /usr/lib64/tc//m_xt.so: undefined symbol:
> xtables_init_all
>

Yep - run into this problem, scratching my head thinking something
wrong with my environment with latest iproute2 git tree. I hacked
mine to just always include xtables in LDLIBS.

> Fixing that.
>
> Signed-off-by: Jan Engelhardt <jengelh@inai.de>

I can confirm it builds fine for me now if i take out the hack I had and 
use this patch.
Acked-by: Jamal Hadi Salim <hadi@mojatatu.com>

Stephen, I think this patch would be equivalent of "stable fix".

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-16  0:59       ` Jan Engelhardt
@ 2012-12-16 10:43         ` Jamal Hadi Salim
  2012-12-16 17:21           ` Jan Engelhardt
  0 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-16 10:43 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc, netdev, netfilter-devel

On 12-12-15 07:59 PM, Jan Engelhardt wrote:
>

> For the C level, there is XTABLES_VERSION_CODE.
>
> #if XTABLES_VERSION_CODE >= 6
> 	if (m != NULL && m->x6_parse != NULL)
> 		m->x6_parse(...)
> #else
> 	else if (m != NULL && m->parse != NULL)
> 		m->parse(...)
> 	...
>

I think you are suggesting this to be done in tc. That would make it 
easier to fix.
IMO, it is easier to keep backward compat if you left the old
APIs around for a period of time and maybe log a warning that they
will be deprecated over a period of time (sort of like kernel approach 
to changing APIs).

BTW: another interface that seems to have changed that we
need is m->final_check().

cheers,
jamal

> We can also export this through pkgconfig, similar to how
> downstream users are to discover the plugin dir
> (`pkg-config xtables --variable libdir`).
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
       [not found]       ` <CAASe=fQT2pVOK0uctdaKL+aOrF8nYeTMfoF15kmd-rC02+7Vnw@mail.gmail.com>
@ 2012-12-16 16:48         ` Jamal Hadi Salim
  2012-12-16 18:59           ` Jamal Hadi Salim
  0 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-16 16:48 UTC (permalink / raw)
  To: Hasan Chowdhury
  Cc: Jan Engelhardt, Yury Stankevich, netdev, pablo, netfilter-devel

On 12-12-16 10:17 AM, Hasan Chowdhury wrote:
>
>
> [Hasan**] I thought  "xt"  is a supported action kind  for
> iproute2-3.6.0. Besides  with a default compilation  it compiled m_xt.c
> (not m_ipt.c ) linked with shared object  m_xt.so
>

It is - but my hope is not to change the interface to existing scripts.
One approach is rename "xt" to "ipt" and make the old vs new ways 
mutually exclusive based on Config options. But that will add more to
baggage of all sorts of workarounds for 13 versions of iptables changing 
interfaces. My goal was to have a staged way to kill the old way but 
maintain the same command interface. I was hoping not to change
the kernel but based on your patch, that may be the best place to place
warnings about deprecating APIs (so maybe i will add support for "xt" 
and warn about "ipt"). Would you be able to test that kernel patch?

>   the workaround exits in the patch  for file m_action (see the changes
> there ) as netlink reply from kernel for  this tc  u32 action xt command
>   comes as .kind = "ipt" instead of xt (assumed  act_ipt.c   in kernle
> is not aware of new xt extensions .)

The most important part of your patch that i missed is you took
care of some of the new API changes Pablo mentioned. I am suspicious of
one of them: why call xtables_options_xfrm(). Pablo/Jan, could you 
please look at Hasan's patch in m_xt.c?

Also, your patch doesnt compile for me. Can you please provide a version 
against the latest iproute2 git tree?

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] build: unbreak linkage of m_xt.so
  2012-12-16 10:30           ` Jamal Hadi Salim
@ 2012-12-16 17:03             ` Jamal Hadi Salim
  2012-12-16 17:43               ` Jan Engelhardt
  0 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-16 17:03 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: stephen.hemminger, vapier, netdev, urykhy, shemonc, pablo,
	netfilter-devel

On 12-12-16 05:30 AM, Jamal Hadi Salim wrote:

> I can confirm it builds fine for me now if i take out the hack I had and
> use this patch.


Sorry, I take what i said back and went back to explicitly adding -l 
xtables. The problem is still the intepretation of tc/Makefile. Here's 
the compile output.
----
gcc -Wall -Wstrict-prototypes -O2 -I../include -DRESOLVE_HOSTNAMES 
-DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE 
-DCONFIG_GACT -DCONFIG_GACT_PROB -DIPT_LIB_DIR=\"/lib/xtables\" 
-DYY_NO_INPUT -Wl,-export-dynamic -shared -fpic -o m_xt.so m_xt.c 
$(pkg-config xtables --cflags --libs)
----

Note the missing expansion.

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-16 10:43         ` Jamal Hadi Salim
@ 2012-12-16 17:21           ` Jan Engelhardt
  2012-12-16 17:47             ` Jamal Hadi Salim
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Engelhardt @ 2012-12-16 17:21 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc, netdev, netfilter-devel


On Sunday 2012-12-16 11:43, Jamal Hadi Salim wrote:

> On 12-12-15 07:59 PM, Jan Engelhardt wrote:
>>
>
>> For the C level, there is XTABLES_VERSION_CODE.
>>
>> #if XTABLES_VERSION_CODE >= 6
>> 	if (m != NULL && m->x6_parse != NULL)
>> 		m->x6_parse(...)
>> #else
>> 	else if (m != NULL && m->parse != NULL)
>> 		m->parse(...)
>> 	...
>>
>
> I think you are suggesting this to be done in tc. That would make it easier to
> fix.
> IMO, it is easier to keep backward compat if you left the old
> APIs around for a period of time

As you can see, the old ->parse() that goes back to libxtables.so.0
still remains. It's just that... only 5 out of 99 plugins still come
with an old parse function.

	[m_xt] -> [libxtables] <- (plugins: libxt_*.so)

> and maybe log a warning that they
> will be deprecated over a period of time (sort of like kernel approach to
> changing APIs).

old parse has not entered any deprecation stage yet, since there are still
plugins out there (both the 5 and external ones) that make use of it.
Right now, both parse and x6_parse are valid.

> BTW: another interface that seems to have changed that we
> need is m->final_check().

Yes, all those with an x6_ prefix are new.
Mh, I already dream of plans reducing m_xt to something that
does not require libxtables.so anymore.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] build: unbreak linkage of m_xt.so
  2012-12-16 17:03             ` Jamal Hadi Salim
@ 2012-12-16 17:43               ` Jan Engelhardt
  2012-12-16 18:05                 ` Jamal Hadi Salim
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Engelhardt @ 2012-12-16 17:43 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: stephen.hemminger, vapier, netdev, urykhy, shemonc, pablo,
	netfilter-devel

On Sunday 2012-12-16 18:03, Jamal Hadi Salim wrote:

> On 12-12-16 05:30 AM, Jamal Hadi Salim wrote:
>
>> I can confirm it builds fine for me now if i take out the hack I had and
>> use this patch.
>
>
> Sorry, I take what i said back and went back to explicitly adding -l xtables.
> The problem is still the intepretation of tc/Makefile. Here's the compile
> output.
> ----
> gcc -Wall -Wstrict-prototypes -O2 -I../include -DRESOLVE_HOSTNAMES
> -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE -DCONFIG_GACT
> -DCONFIG_GACT_PROB -DIPT_LIB_DIR=\"/lib/xtables\" -DYY_NO_INPUT
> -Wl,-export-dynamic -shared -fpic -o m_xt.so m_xt.c $(pkg-config xtables
> --cflags --libs)
> ----

I saw the same during make, _but_, on running `ldd tc/m_xt.so`, I got a
libxtables.so entry, so I thought I was fine.



"$() "is something for the shell to expand, not make. See this testcase.

$ make
echo $(pkg-config xtables --cflags --libs)
-I/usr/include/iptables-1.4.16.3 -lxtables
$ cat Makefile 
a:
        echo $$(pkg-config xtables --cflags --libs)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-16 17:21           ` Jan Engelhardt
@ 2012-12-16 17:47             ` Jamal Hadi Salim
  2012-12-16 18:59               ` Jan Engelhardt
  0 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-16 17:47 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc, netdev, netfilter-devel

On 12-12-16 12:21 PM, Jan Engelhardt wrote:
>
>
> As you can see, the old ->parse() that goes back to libxtables.so.0
> still remains. It's just that... only 5 out of 99 plugins still come
> with an old parse function.
>

I see.
So calling m->XXX may not be a wise long term solution.
Hasan's patch has a call to xtables_option_tpcall(), if that is the 
right interface I hope that going forward if any of the m->parseXX
changes you will take care of hiding all that stuff.

> old parse has not entered any deprecation stage yet, since there are still
> plugins out there (both the 5 and external ones) that make use of it.
> Right now, both parse and x6_parse are valid.
>

True - but we are getting broken because we are using a call that only 5 
or so users provide. It would have saved us time if we got the
a good log message instead of checking for if !m->parse()

> Yes, all those with an x6_ prefix are new.
> Mh, I already dream of plans reducing m_xt to something that
> does not require libxtables.so anymore.
>

That would be nice - but someone is going to have to link to libxtables, no?

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] build: unbreak linkage of m_xt.so
  2012-12-16 17:43               ` Jan Engelhardt
@ 2012-12-16 18:05                 ` Jamal Hadi Salim
  0 siblings, 0 replies; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-16 18:05 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: stephen.hemminger, vapier, netdev, urykhy, shemonc, pablo,
	netfilter-devel

On 12-12-16 12:43 PM, Jan Engelhardt wrote:
> On Sunday 2012-12-16 18:03, Jamal Hadi Salim wrote:
>

>
> I saw the same during make, _but_, on running `ldd tc/m_xt.so`, I got a
> libxtables.so entry, so I thought I was fine.
>

Sorry, you are right. Without your patch that doesnt happen. I had 
removed the global dlopen while debugging, so it was using the wrong
m_xt.so

So my Ack is back on;->

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-16 16:48         ` Jamal Hadi Salim
@ 2012-12-16 18:59           ` Jamal Hadi Salim
  2012-12-16 19:13             ` Jan Engelhardt
  0 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-16 18:59 UTC (permalink / raw)
  To: Hasan Chowdhury
  Cc: Jan Engelhardt, Yury Stankevich, netdev, pablo, netfilter-devel

Hasan,

I can confirm that xtables_options_xfrm() works.
Just suspicious of that call, "xfrm" seems too specific to xfrm
subsystem but generic enough.
Dont bother resending the patch, it works right now, I am just
waiting for confirmation from Pablo/Jan and i will proceed from there.

I am also still struggling with whether to add your intermediate
solution to rename the xt->ipt.
I think i will go ahead and add a kernel change.

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-16 17:47             ` Jamal Hadi Salim
@ 2012-12-16 18:59               ` Jan Engelhardt
  2012-12-16 20:35                 ` Jamal Hadi Salim
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Engelhardt @ 2012-12-16 18:59 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc, netdev, netfilter-devel


On Sunday 2012-12-16 18:47, Jamal Hadi Salim wrote:
>
>> old parse has not entered any deprecation stage yet, since there are still
>> plugins out there (both the 5 and external ones) that make use of it.
>> Right now, both parse and x6_parse are valid.
>
> True - but we are getting broken because we are using a call that only 5 or so
> users provide. It would have saved us time if we got the
> a good log message instead of checking for if !m->parse()

A certainly safe bet would be to write, at the top of tc/m_xt.c,

#if XTABLES_VERSION_CODE > 9
#	error Someone call the guy who changed iptables and \
		make him fix it^W^W^W^W say you need help.
#endif

Then I would automatically notify myself of "oh I need fix that too" when I
feed any new releases of {iptables, iproute} through the Open Build Service.

>> Yes, all those with an x6_ prefix are new.
>> Mh, I already dream of plans reducing m_xt to something that
>> does not require libxtables.so anymore.
>
> That would be nice - but someone is going to have to link to libxtables, no?

I hope the complete opposite.

The following is a rough, it-compiles, untested never-run, draft of a
module. The vision here is that userspace only ever sends a chain
name to the kernel. The git tree/branch for it is

	git://git.inai.de/linux xt2-pktsched

commit 42c559c148cbbc22bf2cc29f2ad08bc330891838

    net_sched: act: new action to call into Xtables2 chains

 include/net/netfilter/xt_core.h    |    8 ++
 include/uapi/linux/tc_act/tc_ipt.h |    2 +
 net/netfilter/xt_core.c            |    3 +-
 net/sched/Kconfig                  |    9 ++
 net/sched/Makefile                 |    1 +
 net/sched/act_xtables.c            |  238 ++++++++++++++++++++++++++++++++++++
 6 files changed, 260 insertions(+), 1 deletion(-)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-16 18:59           ` Jamal Hadi Salim
@ 2012-12-16 19:13             ` Jan Engelhardt
  2012-12-16 20:36               ` Jamal Hadi Salim
  2012-12-16 20:41               ` [PATCH] iproute2: act_ipt fix xtables breakage Jamal Hadi Salim
  0 siblings, 2 replies; 69+ messages in thread
From: Jan Engelhardt @ 2012-12-16 19:13 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Hasan Chowdhury, Yury Stankevich, netdev, pablo, netfilter-devel

On Sunday 2012-12-16 19:59, Jamal Hadi Salim wrote:

> Hasan,
>
> I can confirm that xtables_options_xfrm() works.
> Just suspicious of that call, "xfrm" seems too specific to xfrm
> subsystem but generic enough.

Heh, nah.

"xfrm" is one of these pictogram-based abbreviations like "xing" (the 
thing they paint on roads/signs), apparently standing for "trans"form 
and "cross"ing, respectively, though 'x' is ambiguous and open to a lot 
of other interpretations.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-16 18:59               ` Jan Engelhardt
@ 2012-12-16 20:35                 ` Jamal Hadi Salim
  2012-12-16 21:21                   ` Jan Engelhardt
  0 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-16 20:35 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc, netdev, netfilter-devel

On 12-12-16 01:59 PM, Jan Engelhardt wrote:
>
>
> A certainly safe bet would be to write, at the top of tc/m_xt.c,
>
> #if XTABLES_VERSION_CODE > 9
> #	error Someone call the guy who changed iptables and \
> 		make him fix it^W^W^W^W say you need help.
> #endif
>

Excellent idea ;->


> The following is a rough, it-compiles, untested never-run, draft of a
> module. The vision here is that userspace only ever sends a chain
> name to the kernel. The git tree/branch for it is
>
> 	git://git.inai.de/linux xt2-pktsched
>
> commit 42c559c148cbbc22bf2cc29f2ad08bc330891838
>


I'll look at it later - very slow connection at the moment so cloning 
will take a while.

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-16 19:13             ` Jan Engelhardt
@ 2012-12-16 20:36               ` Jamal Hadi Salim
  2012-12-16 20:41               ` [PATCH] iproute2: act_ipt fix xtables breakage Jamal Hadi Salim
  1 sibling, 0 replies; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-16 20:36 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Hasan Chowdhury, Yury Stankevich, netdev, pablo, netfilter-devel

On 12-12-16 02:13 PM, Jan Engelhardt wrote:
>
> "xfrm" is one of these pictogram-based abbreviations like "xing" (the
> thing they paint on roads/signs), apparently standing for "trans"form
> and "cross"ing, respectively, though 'x' is ambiguous and open to a lot
> of other interpretations.

Ok, In that case i'll push half of Hasan's patch. I have a kernel change
that works with it.

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH] iproute2:  act_ipt fix xtables breakage
  2012-12-16 19:13             ` Jan Engelhardt
  2012-12-16 20:36               ` Jamal Hadi Salim
@ 2012-12-16 20:41               ` Jamal Hadi Salim
  2012-12-17 12:30                 ` RFC [PATCH] iproute2: temporary solution to fix xt breakage Jamal Hadi Salim
  1 sibling, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-16 20:41 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Hasan Chowdhury, Jan Engelhardt, Yury Stankevich, netdev, pablo,
	netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 307 bytes --]


Attached.

I am going to send a kernel patch as well.
There is an "intermediate solution" from Hasan which doesnt require
the kernel change. It changes the kernel endpoint to "ipt". I am
conflicted because it is a quick hack while otoh forcing people to
upgrade kernel is a usability issue.

cheers,
jamal

[-- Attachment #2: patch-xt --]
[-- Type: text/plain, Size: 4163 bytes --]

commit ff707eaa1fd51435958ae2afcd09d4b3600160b4
Author: Hasan Chowdhury <shemonc@gmail.com>
Date:   Sun Dec 16 14:09:38 2012 -0500

    Fixes breakage with xtables API starting with version 1.4.10
    
    Signed-off-by: Hasan Chowdhury <shemonc@gmail.com>
    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

diff --git a/tc/m_xt.c b/tc/m_xt.c
index bcc4d75..e1c3f38 100644
--- a/tc/m_xt.c
+++ b/tc/m_xt.c
@@ -118,6 +118,7 @@ static int parse_ipt(struct action_util *a,int *argc_p,
 	struct xtables_target *m = NULL;
 	struct ipt_entry fw;
 	struct rtattr *tail;
+
 	int c;
 	int rargc = *argc_p;
 	char **argv = *argv_p;
@@ -126,6 +127,7 @@ static int parse_ipt(struct action_util *a,int *argc_p,
 	int size = 0;
 	int iok = 0, ok = 0;
 	__u32 hook = 0, index = 0;
+	struct option *opts = NULL;
 
 	xtables_init_all(&tcipt_globals, NFPROTO_IPV4);
 	set_lib_dir();
@@ -158,14 +160,22 @@ static int parse_ipt(struct action_util *a,int *argc_p,
 					printf(" %s error \n", m->name);
 					return -1;
 				}
-				tcipt_globals.opts =
-				    xtables_merge_options(
 #if (XTABLES_VERSION_CODE >= 6)
-				        tcipt_globals.orig_opts,
+			opts = xtables_options_xfrm(tcipt_globals.orig_opts,
+						    tcipt_globals.opts, 
+						    m->x6_options,
+						    &m->option_offset);
+#else                                   
+			opts = xtables_merge_options(tcipt_globals.orig_opts,
+						     tcipt_globals.opts,
+						     m->extra_opts,
+						     &m->option_offset); 
 #endif
-				        tcipt_globals.opts,
-				        m->extra_opts,
-				        &m->option_offset);
+			if (opts == NULL) {
+				fprintf(stderr, " failed to find aditional options for target %s\n\n", optarg);
+				return -1;
+			} else
+				tcipt_globals.opts = opts;
 			} else {
 				fprintf(stderr," failed to find target %s\n\n", optarg);
 				return -1;
@@ -175,17 +185,21 @@ static int parse_ipt(struct action_util *a,int *argc_p,
 
 		default:
 			memset(&fw, 0, sizeof (fw));
-			if (m) {
-				m->parse(c - m->option_offset, argv, 0,
-					 &m->tflags, NULL, &m->t);
+#if (XTABLES_VERSION_CODE >= 6)
+		if (m != NULL && m->x6_parse != NULL ) {
+			xtables_option_tpcall(c, argv, 0 , m, NULL);
+#else
+		if (m != NULL && m->parse != NULL ) {
+			m->parse(c - m->option_offset, argv, 0, &m->tflags,
+				 NULL, &m->t);
+#endif
 			} else {
-				fprintf(stderr," failed to find target %s\n\n", optarg);
+				fprintf(stderr,"failed to find target %s\n\n", optarg);
 				return -1;
 
 			}
 			ok++;
 			break;
-
 		}
 	}
 
@@ -208,8 +222,13 @@ static int parse_ipt(struct action_util *a,int *argc_p,
 	}
 
 	/* check that we passed the correct parameters to the target */
+#if (XTABLES_VERSION_CODE >= 6)
+	if (m)
+		xtables_option_tfcall(m);
+#else
 	if (m && m->final_check)
 		m->final_check(m->tflags);
+#endif
 
 	{
 		struct tcmsg *t = NLMSG_DATA(n);
@@ -271,6 +290,7 @@ print_ipt(struct action_util *au,FILE * f, struct rtattr *arg)
 {
 	struct rtattr *tb[TCA_IPT_MAX + 1];
 	struct xt_entry_target *t = NULL;
+	struct option *opts = NULL;
 
 	if (arg == NULL)
 		return -1;
@@ -309,14 +329,22 @@ print_ipt(struct action_util *au,FILE * f, struct rtattr *arg)
 				return -1;
 			}
 
-			tcipt_globals.opts =
-			    xtables_merge_options(
 #if (XTABLES_VERSION_CODE >= 6)
-				                  tcipt_globals.orig_opts,
+		opts = xtables_options_xfrm(tcipt_globals.orig_opts,
+					    tcipt_globals.opts,
+					    m->x6_options,
+					    &m->option_offset);
+#else                                   
+		opts = xtables_merge_options(tcipt_globals.orig_opts,
+					     tcipt_globals.opts,
+					     m->extra_opts,
+					     &m->option_offset); 
 #endif
-				                  tcipt_globals.opts,
-			                          m->extra_opts,
-			                          &m->option_offset);
+	if (opts == NULL) {
+		fprintf(stderr, " failed to find aditional options for target %s\n\n", optarg);
+		return -1;
+	} else
+		tcipt_globals.opts = opts;
 		} else {
 			fprintf(stderr, " failed to find target %s\n\n",
 				t->u.user.name);
@@ -355,4 +383,3 @@ struct action_util xt_action_util = {
         .parse_aopt = parse_ipt,
         .print_aopt = print_ipt,
 };
-

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-16 20:35                 ` Jamal Hadi Salim
@ 2012-12-16 21:21                   ` Jan Engelhardt
  2012-12-17 12:58                     ` Jamal Hadi Salim
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Engelhardt @ 2012-12-16 21:21 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc, netdev, netfilter-devel

On Sunday 2012-12-16 21:35, Jamal Hadi Salim wrote:

>> 	git://git.inai.de/linux xt2-pktsched
>> commit 42c559c148cbbc22bf2cc29f2ad08bc330891838
>
> I'll look at it later - very slow connection at the moment so cloning will take
> a while.

If you have a preexisting clone of any linux tree, you can utilize
`git remote add ...` to only grab the deltas.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] build: unbreak linkage of m_xt.so
  2012-12-16  0:32         ` [PATCH] build: unbreak linkage of m_xt.so Jan Engelhardt
  2012-12-16 10:30           ` Jamal Hadi Salim
@ 2012-12-16 22:02           ` Mike Frysinger
  2012-12-18 17:21           ` Stephen Hemminger
  2 siblings, 0 replies; 69+ messages in thread
From: Mike Frysinger @ 2012-12-16 22:02 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: stephen.hemminger, netdev, jhs, urykhy, shemonc, pablo, netfilter-devel

[-- Attachment #1: Type: Text/Plain, Size: 792 bytes --]

On Saturday 15 December 2012 19:32:48 Jan Engelhardt wrote:
> --- a/configure
> +++ b/configure
> @@ -4,7 +4,6 @@
>  INCLUDE=${1:-"$PWD/include"}
> 
>  : ${PKG_CONFIG:=pkg-config}
>  : ${CC=gcc}
> 
> -echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
> 
>  # Make a temp directory in build tree.
>  TMPDIR=$(mktemp -d config.XXXXXX)
> @@ -224,6 +223,7 @@ rm -f $TMPDIR/ipsettest.c $TMPDIR/ipsettest
>  }
> 
>  echo "# Generated config based on" $INCLUDE >Config
> +echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
> 
>  echo "TC schedulers"

the use of un-indented shell functions makes the code read in a way it doesn't 
actually execute.  i'd suggest moving this logic into a function to match 
existing style rather than simply moving the Config write.  i'll post a patch.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RFC  [PATCH] iproute2:  temporary solution to fix xt breakage
  2012-12-16 20:41               ` [PATCH] iproute2: act_ipt fix xtables breakage Jamal Hadi Salim
@ 2012-12-17 12:30                 ` Jamal Hadi Salim
  2012-12-17 16:12                   ` Stephen Hemminger
       [not found]                   ` <CAASe=fRuJdtisEvp7uo=PHwN3nKHqsYDW4Om1gk2MK-vyNvBrA@mail.gmail.com>
  0 siblings, 2 replies; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-17 12:30 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Hasan Chowdhury, Jan Engelhardt, Yury Stankevich, netdev, pablo,
	netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 704 bytes --]

On 12-12-16 03:41 PM, Jamal Hadi Salim wrote:
>
> There is an "intermediate solution" from Hasan which doesnt require
> the kernel change. It changes the kernel endpoint to "ipt". I am
> conflicted because it is a quick hack while otoh forcing people to
> upgrade kernel is a usability issue.
>


Attached. Author is Hasan - I didnt sign it because i am looking for
feedback and i find it distasteful but it solves the problem.
This is needed until we have a proper fix in the kernel propagated.
Once that kernel change is ubiquitous this change is noise and a
maintanance pain. I am making it hard to even turn it on
(i.e someone knowledgeable will have to compile with CONFIG_XT_HACK)

cheers,
jamal



[-- Attachment #2: p1 --]
[-- Type: text/plain, Size: 1092 bytes --]

diff --git a/tc/m_action.c b/tc/m_action.c
index 1fe2431..fa9a7c8 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -209,10 +209,17 @@ done0:
 
 			tail = NLMSG_TAIL(n);
 			addattr_l(n, MAX_MSG, ++prio, NULL, 0);
+			/*XXX: hack to work around old kernels, newer xtables */
+#ifdef CONFIG_XT_HACK
+			if (strncmp(k,"xt",2)==0)
+				addattr_l(n, MAX_MSG, TCA_ACT_KIND, "ipt" , strlen("ipt") + 1);
+			else 
+				addattr_l(n, MAX_MSG, TCA_ACT_KIND, k, strlen(k) + 1);
+#else
 			addattr_l(n, MAX_MSG, TCA_ACT_KIND, k, strlen(k) + 1);
+#endif
 
 			ret = a->parse_aopt(a,&argc, &argv, TCA_ACT_OPTIONS, n);
-
 			if (ret < 0) {
 				fprintf(stderr,"bad action parsing\n");
 				goto bad_val;
@@ -259,7 +266,15 @@ tc_print_one_action(FILE * f, struct rtattr *arg)
 	}
 
 
+	/*XXX: hack to work around old kernels, newer xtables */
+#ifdef CONFIG_XT_HACK
+	if (strcmp(RTA_DATA(tb[TCA_ACT_KIND]), "ipt")==0)
+		a = get_action_kind("xt");
+	else 
+		a = get_action_kind(RTA_DATA(tb[TCA_ACT_KIND]));
+#else
 	a = get_action_kind(RTA_DATA(tb[TCA_ACT_KIND]));
+#endif
 	if (NULL == a)
 		return err;
 

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-16 21:21                   ` Jan Engelhardt
@ 2012-12-17 12:58                     ` Jamal Hadi Salim
  2012-12-17 13:28                       ` Jan Engelhardt
  0 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-17 12:58 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc, netdev, netfilter-devel

On 12-12-16 04:21 PM, Jan Engelhardt wrote:

> If you have a preexisting clone of any linux tree, you can utilize
> `git remote add ...` to only grab the deltas.

It downloaded eventually.
So looking at this quickly, basic question: is xtables2 different API 
wise from what we do today in act_ipt?
Second: Are chain names unique system wide? i.e at the moment we send
a hook and table selection?
The patch i have currently for the kernel tries to pursue an approach 
that maximizes code reuse - depending on your answer I may go the 
approach of having a separate act_xt and hope you can build on top of that.

cheers,
jamal


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-17 12:58                     ` Jamal Hadi Salim
@ 2012-12-17 13:28                       ` Jan Engelhardt
  2012-12-18 13:23                         ` Jamal Hadi Salim
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Engelhardt @ 2012-12-17 13:28 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc, netdev, netfilter-devel


On Monday 2012-12-17 13:58, Jamal Hadi Salim wrote:
> On 12-12-16 04:21 PM, Jan Engelhardt wrote:
>
>> If you have a preexisting clone of any linux tree, you can utilize
>> `git remote add ...` to only grab the deltas.
>
>It downloaded eventually. So looking at this quickly, basic
>question: is xtables2 different API wise from what we do today in
>act_ipt?

AFAICS, (one instance of) act_ipt today directly invokes (exactly one
instance of) a target. With act_xt2 as drafted, it instead invokes a
chain, which would

1. leave the construction of the target data and calling it
   to the subsystems they conceptually belong to - the packet filter
2. lets you do matches, jumps and all that.

>Second: Are chain names unique system wide?

Good thing you ask. Chain names are unique within a netns, and this
act_xtables.c draft looks at the packet to get to know its netns, so
that seems fine.

However, your question also leads to looking at whether TC Actions
themselves are sufficiently netns-ified, and it seems this is _not_
the case. Am I right in the observation that variables like
"tcf_ipt_ht" are in fact global rather tha per-netns?


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: RFC  [PATCH] iproute2:  temporary solution to fix xt breakage
  2012-12-17 12:30                 ` RFC [PATCH] iproute2: temporary solution to fix xt breakage Jamal Hadi Salim
@ 2012-12-17 16:12                   ` Stephen Hemminger
  2012-12-19 11:36                     ` Jamal Hadi Salim
       [not found]                   ` <CAASe=fRuJdtisEvp7uo=PHwN3nKHqsYDW4Om1gk2MK-vyNvBrA@mail.gmail.com>
  1 sibling, 1 reply; 69+ messages in thread
From: Stephen Hemminger @ 2012-12-17 16:12 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Hasan Chowdhury, Jan Engelhardt, Yury Stankevich, netdev, pablo,
	netfilter-devel

On Mon, 17 Dec 2012 07:30:41 -0500
Jamal Hadi Salim <jhs@mojatatu.com> wrote:

> On 12-12-16 03:41 PM, Jamal Hadi Salim wrote:
> >
> > There is an "intermediate solution" from Hasan which doesnt require
> > the kernel change. It changes the kernel endpoint to "ipt". I am
> > conflicted because it is a quick hack while otoh forcing people to
> > upgrade kernel is a usability issue.
> >
> 
> 
> Attached. Author is Hasan - I didnt sign it because i am looking for
> feedback and i find it distasteful but it solves the problem.
> This is needed until we have a proper fix in the kernel propagated.
> Once that kernel change is ubiquitous this change is noise and a
> maintanance pain. I am making it hard to even turn it on
> (i.e someone knowledgeable will have to compile with CONFIG_XT_HACK)
> 
> cheers,
> jamal
> 
> 

Maybe xtables should have stable API/ABI and use shim routines there?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: RFC [PATCH] iproute2: temporary solution to fix xt breakage
       [not found]                   ` <CAASe=fRuJdtisEvp7uo=PHwN3nKHqsYDW4Om1gk2MK-vyNvBrA@mail.gmail.com>
@ 2012-12-18 12:28                     ` Jamal Hadi Salim
       [not found]                       ` <CAASe=fR6Hm2dxp=1wDchtrzqnaH6qacHpg2wrsqLfmGpPbQ9Fg@mail.gmail.com>
  0 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-18 12:28 UTC (permalink / raw)
  To: Hasan Chowdhury
  Cc: Stephen Hemminger, Jan Engelhardt, Yury Stankevich, netdev,
	pablo, netfilter-devel

On 12-12-17 11:10 AM, Hasan Chowdhury wrote:
> Juts noticed , the attached patch does not have the modification for
> m_xt.c , without it   ,I guess this patch  is not going  to work.


Thats in the first patch i sent out which I hope Stephen will apply
right away.
This one depends on that. So you must apply that patch, then this.

cheers,
jamal



^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-17 13:28                       ` Jan Engelhardt
@ 2012-12-18 13:23                         ` Jamal Hadi Salim
  2012-12-18 13:58                           ` Jan Engelhardt
  0 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-18 13:23 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc, netdev, netfilter-devel

On 12-12-17 08:28 AM, Jan Engelhardt wrote:
>
> On Monday 2012-12-17 13:58, Jamal Hadi Salim wrote:

> AFAICS, (one instance of) act_ipt today directly invokes (exactly one
> instance of) a target.

Design intent.
You can have the same target instance used by specifying the same index
on the command line.

>With act_xt2 as drafted, it instead invokes a chain, which would
>
> 1. leave the construction of the target data and calling it
>     to the subsystems they conceptually belong to - the packet filter
> 2. lets you do matches, jumps and all that.
>

I like #2. For #1 as long as it doesnt deviate from desire to have one 
or more instances of targets, we should be fine.

> Good thing you ask. Chain names are unique within a netns, and this
> act_xtables.c draft looks at the packet to get to know its netns, so
> that seems fine.

My motivation for that question:
Is it possible to ignore the hook and tablename and just use the chain
name?

> However, your question also leads to looking at whether TC Actions
> themselves are sufficiently netns-ified, and it seems this is _not_
> the case. Am I right in the observation that variables like
> "tcf_ipt_ht" are in fact global rather tha per-netns?

In general we dont need to worry about netns since actions are attached 
to the filters which are dependent on qdiscs which are dependent on 
netdevs which are per netns.
I believe actions (not filters or qdiscs) have a way where this can
be circumvented in one scenario (I can configure them bypassing the 
filter interface). Thanks for bringing this up - I will look at it.

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-18 13:23                         ` Jamal Hadi Salim
@ 2012-12-18 13:58                           ` Jan Engelhardt
  2012-12-19 11:43                             ` Jamal Hadi Salim
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Engelhardt @ 2012-12-18 13:58 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc, netdev, netfilter-devel


On Tuesday 2012-12-18 14:23, Jamal Hadi Salim wrote:
> On 12-12-17 08:28 AM, Jan Engelhardt wrote:
>>
>> With act_xt2 as drafted, it instead invokes a chain, which would
>>
>> 1. leave the construction of the target data and calling it
>>    to the subsystems they conceptually belong to - the packet filter
>> 2. lets you do matches, jumps and all that.
>
>I like #2. For #1 as long as it doesnt deviate from desire to have
>one or more instances of targets, we should be fine.

Chains can store multiple targets, so no loss.

>> Good thing you ask. Chain names are unique within a netns, and this
>> act_xtables.c draft looks at the packet to get to know its netns, so
>> that seems fine.
>
> My motivation for that question:
> Is it possible to ignore the hook and tablename and just use the chain
> name?

1. table

First, I think some targets need to relax their restrictions, such as
with xt_DSCP.

Then, only a handful of extensions remain: CT, <all NATs>,
TPROXY and REJECT. Would anyone want to call these from act_ipt?
I doubt it. :)

2. hooks

Extensions with hook limit: <NAT>, TPROXY, REJECT, CLASSIFY.
Again, I don't quite see the value of attempting to NAT from act_ipt.
CLASSIFY {c|sh?}ould be relaxed, unless I am missing something.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] build: unbreak linkage of m_xt.so
  2012-12-16  0:32         ` [PATCH] build: unbreak linkage of m_xt.so Jan Engelhardt
  2012-12-16 10:30           ` Jamal Hadi Salim
  2012-12-16 22:02           ` Mike Frysinger
@ 2012-12-18 17:21           ` Stephen Hemminger
  2012-12-18 18:47             ` Mike Frysinger
  2 siblings, 1 reply; 69+ messages in thread
From: Stephen Hemminger @ 2012-12-18 17:21 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: stephen.hemminger, vapier, netdev, jhs, urykhy, shemonc, pablo,
	netfilter-devel

On Sun, 16 Dec 2012 01:32:48 +0100
Jan Engelhardt <jengelh@inai.de> wrote:

> Commit v3.7.0~10 caused the variable new PKG_CONFIG variable never
> to be present at the time of calling make, leading to tc/m_xt.so
> not linked with -lxtables (result from pkg-config xtables --libs),
> that in turn leading to
> 
> tc: symbol lookup error: /usr/lib64/tc//m_xt.so: undefined symbol:
> xtables_init_all
> 
> Fixing that.
> 
> Signed-off-by: Jan Engelhardt <jengelh@inai.de>
> ---
>  configure |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/configure b/configure
> index 9912114..573ee55 100755
> --- a/configure
> +++ b/configure
> @@ -4,7 +4,6 @@
>  INCLUDE=${1:-"$PWD/include"}
>  : ${PKG_CONFIG:=pkg-config}
>  : ${CC=gcc}
> -echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
>  
>  # Make a temp directory in build tree.
>  TMPDIR=$(mktemp -d config.XXXXXX)
> @@ -224,6 +223,7 @@ rm -f $TMPDIR/ipsettest.c $TMPDIR/ipsettest
>  }
>  
>  echo "# Generated config based on" $INCLUDE >Config
> +echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
>  
>  echo "TC schedulers"
>  

Ok, manually did the diff (conflicted with other previous changes).

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] build: unbreak linkage of m_xt.so
  2012-12-18 17:21           ` Stephen Hemminger
@ 2012-12-18 18:47             ` Mike Frysinger
  2012-12-20  0:03               ` Stephen Hemminger
  0 siblings, 1 reply; 69+ messages in thread
From: Mike Frysinger @ 2012-12-18 18:47 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jan Engelhardt, stephen.hemminger, netdev, jhs, urykhy, shemonc,
	pablo, netfilter-devel

[-- Attachment #1: Type: Text/Plain, Size: 1346 bytes --]

On Tuesday 18 December 2012 12:21:30 Stephen Hemminger wrote:
> On Sun, 16 Dec 2012 01:32:48 +0100 Jan Engelhardt wrote:
> > Commit v3.7.0~10 caused the variable new PKG_CONFIG variable never
> > to be present at the time of calling make, leading to tc/m_xt.so
> > not linked with -lxtables (result from pkg-config xtables --libs),
> > that in turn leading to
> > 
> > tc: symbol lookup error: /usr/lib64/tc//m_xt.so: undefined symbol:
> > xtables_init_all
> > 
> > Fixing that.
> > 
> > --- a/configure
> > +++ b/configure
> > @@ -4,7 +4,6 @@
> >  INCLUDE=${1:-"$PWD/include"}
> >  : ${PKG_CONFIG:=pkg-config}
> >  : ${CC=gcc}
> > -echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
> > 
> >  # Make a temp directory in build tree.
> >  TMPDIR=$(mktemp -d config.XXXXXX)
> > @@ -224,6 +223,7 @@ rm -f $TMPDIR/ipsettest.c $TMPDIR/ipsettest
> >  }
> >  
> >  echo "# Generated config based on" $INCLUDE >Config
> > +echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
> >  echo "TC schedulers"
> 
> Ok, manually did the diff (conflicted with other previous changes).

this patch is no longer necessary one you merged my:
	configure: move toolchain init to a function

it's actually undesirable to apply this after that since it makes the configure 
script less clear again ...

sorry if my commit message wasn't obvious.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: RFC  [PATCH] iproute2:  temporary solution to fix xt breakage
  2012-12-17 16:12                   ` Stephen Hemminger
@ 2012-12-19 11:36                     ` Jamal Hadi Salim
  0 siblings, 0 replies; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-19 11:36 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Hasan Chowdhury, Jan Engelhardt, Yury Stankevich, netdev, pablo,
	netfilter-devel

On 12-12-17 11:12 AM, Stephen Hemminger wrote:

>
> Maybe xtables should have stable API/ABI and use shim routines there?

Thats the general direction being taken now with this last changes...

cheers,
jamal



^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: tc ipt action
  2012-12-18 13:58                           ` Jan Engelhardt
@ 2012-12-19 11:43                             ` Jamal Hadi Salim
  0 siblings, 0 replies; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-19 11:43 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc, netdev, netfilter-devel

On 12-12-18 08:58 AM, Jan Engelhardt wrote:
>

> Chains can store multiple targets, so no loss.

Nice.

> 1. table
>
> First, I think some targets need to relax their restrictions, such as
> with xt_DSCP.

Saw your other patch to get rid of mangle hardcoding.

> Then, only a handful of extensions remain: CT, <all NATs>,
> TPROXY and REJECT. Would anyone want to call these from act_ipt?
> I doubt it. :)
>

Tempted to say tproxy.

> 2. hooks
>
> Extensions with hook limit: <NAT>, TPROXY, REJECT, CLASSIFY.
> Again, I don't quite see the value of attempting to NAT from act_ipt.
> CLASSIFY {c|sh?}ould be relaxed, unless I am missing something.
>


I could live with that. It would be an improvement over whats there 
today. I would prefer however for this to be an improvement over
act_xt.c i posted as opposed to have even more interfaces for xt.
We've suffered enough already ;-> i.e add your patches on top.

cheers,
jamal


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: RFC [PATCH] iproute2: temporary solution to fix xt breakage
       [not found]                       ` <CAASe=fR6Hm2dxp=1wDchtrzqnaH6qacHpg2wrsqLfmGpPbQ9Fg@mail.gmail.com>
@ 2012-12-19 11:44                         ` Jamal Hadi Salim
  2012-12-19 11:56                           ` [PATCH] pkt_sched: act_xt support new Xtables interface Jamal Hadi Salim
  0 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-19 11:44 UTC (permalink / raw)
  To: Hasan Chowdhury
  Cc: Stephen Hemminger, Jan Engelhardt, Yury Stankevich, netdev,
	pablo, netfilter-devel

On 12-12-18 09:45 AM, Hasan Chowdhury wrote:
> Hi Jamal,
>
> Thanks for all the help and the information. I will keep tune myself so
> when the proper path from kernel side will show up I will integrate it
> into my system to test it.
>

Yikes. I guess i never posted that? Will do it shortly.

cheers,
jamal


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-19 11:44                         ` Jamal Hadi Salim
@ 2012-12-19 11:56                           ` Jamal Hadi Salim
  2012-12-19 15:52                             ` Jan Engelhardt
                                               ` (2 more replies)
  0 siblings, 3 replies; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-19 11:56 UTC (permalink / raw)
  To: Hasan Chowdhury
  Cc: Stephen Hemminger, Jan Engelhardt, Yury Stankevich, netdev,
	pablo, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 485 bytes --]


To be applied pending more testing.

Attached. Sorry, I thought I had sent this out over the weekend.
I have done basic testing with a single mark and sending pings to
update stats which can then displayed for the mark.

Hasan/Yury, if you test this please use the latest iproute2 with only 
the first patch I posted (originally from Hasan). Hasan please use that
patch not your version - if theres anything wrong we can find out sooner
before the patch becomes final.

cheers,
jamal

[-- Attachment #2: xt-p1 --]
[-- Type: text/plain, Size: 10173 bytes --]

commit 82330cc874429c63bd0e476e413a79ebab3da350
Author: Jamal Hadi Salim <hadi@mojatatu.com>
Date:   Wed Dec 19 06:23:28 2012 -0500

    Fix iptables/xtables ABI changes. We will eventually replace
    act_ipt with act_xt since only very few targets still support the
    old xtables interface
    
    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 235e01a..1693973 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -578,12 +578,25 @@ config NET_ACT_MIRRED
 config NET_ACT_IPT
         tristate "IPtables targets"
         depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+	select NET_ACT_XT
         ---help---
 	  Say Y here to be able to invoke iptables targets after successful
-	  classification.
+	  classification. Better yet choose NET_ACT_XT since this version
+	  will eventually be obsoleted.
 
 	  To compile this code as a module, choose M here: the
 	  module will be called act_ipt.
+config NET_ACT_XT
+        tristate "New IPtables targets"
+        depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+        ---help---
+	  Say Y here to be able to invoke iptables targets after successful
+	  classification using the new xtables mechanism. This mechanism
+	  will eventually replace NET_ACT_IPT
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_xt.
+
 
 config NET_ACT_NAT
         tristate "Stateless NAT"
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 978cbf0..10a1136 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_NET_ACT_POLICE)	+= act_police.o
 obj-$(CONFIG_NET_ACT_GACT)	+= act_gact.o
 obj-$(CONFIG_NET_ACT_MIRRED)	+= act_mirred.o
 obj-$(CONFIG_NET_ACT_IPT)	+= act_ipt.o
+obj-$(CONFIG_NET_ACT_XT)	+= act_xt.o
 obj-$(CONFIG_NET_ACT_NAT)	+= act_nat.o
 obj-$(CONFIG_NET_ACT_PEDIT)	+= act_pedit.o
 obj-$(CONFIG_NET_ACT_SIMP)	+= act_simple.o
diff --git a/net/sched/act_xt.c b/net/sched/act_xt.c
new file mode 100644
index 0000000..589cfe6
--- /dev/null
+++ b/net/sched/act_xt.c
@@ -0,0 +1,324 @@
+/*
+ * net/sched/act_xt.c     iptables target interface
+ *
+ *TODO: Add other tables. For now we only support the ipv4 table targets
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Copyright:	Jamal Hadi Salim (2002-12)
+ */
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <linux/tc_act/tc_ipt.h>
+#include <net/tc_act/tc_ipt.h>
+
+#include <linux/netfilter_ipv4/ip_tables.h>
+
+#define IPT_TAB_MASK     15
+static struct tcf_common *tcf_ipt_ht[IPT_TAB_MASK + 1];
+static u32 ipt_idx_gen;
+static DEFINE_RWLOCK(ipt_lock);
+
+static struct tcf_hashinfo ipt_hash_info = {
+	.htab = tcf_ipt_ht,
+	.hmask = IPT_TAB_MASK,
+	.lock = &ipt_lock,
+};
+
+static int ipt_init_target(struct xt_entry_target *t, char *table,
+			   unsigned int hook)
+{
+	struct xt_tgchk_param par;
+	struct xt_target *target;
+	int ret = 0;
+
+	target = xt_request_find_target(AF_INET, t->u.user.name,
+					t->u.user.revision);
+	if (IS_ERR(target))
+		return PTR_ERR(target);
+
+	t->u.kernel.target = target;
+	par.table = table;
+	par.entryinfo = NULL;
+	par.target = target;
+	par.targinfo = t->data;
+	par.hook_mask = hook;
+	par.family = NFPROTO_IPV4;
+
+	ret = xt_check_target(&par, t->u.target_size - sizeof(*t), 0, false);
+	if (ret < 0) {
+		module_put(t->u.kernel.target->me);
+		return ret;
+	}
+	return 0;
+}
+
+static void ipt_destroy_target(struct xt_entry_target *t)
+{
+	struct xt_tgdtor_param par = {
+		.target = t->u.kernel.target,
+		.targinfo = t->data,
+	};
+	if (par.target->destroy != NULL)
+		par.target->destroy(&par);
+	module_put(par.target->me);
+}
+
+static int tcf_ipt_release(struct tcf_ipt *ipt, int bind)
+{
+	int ret = 0;
+	if (ipt) {
+		if (bind)
+			ipt->tcf_bindcnt--;
+		ipt->tcf_refcnt--;
+		if (ipt->tcf_bindcnt <= 0 && ipt->tcf_refcnt <= 0) {
+			ipt_destroy_target(ipt->tcfi_t);
+			kfree(ipt->tcfi_tname);
+			kfree(ipt->tcfi_t);
+			tcf_hash_destroy(&ipt->common, &ipt_hash_info);
+			ret = ACT_P_DELETED;
+		}
+	}
+	return ret;
+}
+
+static const struct nla_policy ipt_policy[TCA_IPT_MAX + 1] = {
+	[TCA_IPT_TABLE] = {.type = NLA_STRING,.len = IFNAMSIZ},
+	[TCA_IPT_HOOK] = {.type = NLA_U32},
+	[TCA_IPT_INDEX] = {.type = NLA_U32},
+	[TCA_IPT_TARG] = {.len = sizeof(struct xt_entry_target)},
+};
+
+static int tcf_ipt_init(struct nlattr *nla, struct nlattr *est,
+			struct tc_action *a, int ovr, int bind)
+{
+	struct nlattr *tb[TCA_IPT_MAX + 1];
+	struct tcf_ipt *ipt;
+	struct tcf_common *pc;
+	struct xt_entry_target *td, *t;
+	char *tname;
+	int ret = 0, err;
+	u32 hook = 0;
+	u32 index = 0;
+
+	if (nla == NULL)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_IPT_MAX, nla, ipt_policy);
+	if (err < 0)
+		return err;
+
+	if (tb[TCA_IPT_HOOK] == NULL)
+		return -EINVAL;
+	if (tb[TCA_IPT_TARG] == NULL)
+		return -EINVAL;
+
+	td = (struct xt_entry_target *)nla_data(tb[TCA_IPT_TARG]);
+	if (nla_len(tb[TCA_IPT_TARG]) < td->u.target_size)
+		return -EINVAL;
+
+	if (tb[TCA_IPT_INDEX] != NULL)
+		index = nla_get_u32(tb[TCA_IPT_INDEX]);
+
+	pc = tcf_hash_check(index, a, bind, &ipt_hash_info);
+	if (!pc) {
+		pc = tcf_hash_create(index, est, a, sizeof(*ipt), bind,
+				     &ipt_idx_gen, &ipt_hash_info);
+		if (IS_ERR(pc))
+			return PTR_ERR(pc);
+		ret = ACT_P_CREATED;
+	} else {
+		if (!ovr) {
+			tcf_ipt_release(to_ipt(pc), bind);
+			return -EEXIST;
+		}
+	}
+	ipt = to_ipt(pc);
+
+	hook = nla_get_u32(tb[TCA_IPT_HOOK]);
+
+	err = -ENOMEM;
+	tname = kmalloc(IFNAMSIZ, GFP_KERNEL);
+	if (unlikely(!tname))
+		goto err1;
+	if (tb[TCA_IPT_TABLE] == NULL ||
+	    nla_strlcpy(tname, tb[TCA_IPT_TABLE], IFNAMSIZ) >= IFNAMSIZ)
+		strcpy(tname, "mangle");
+
+	t = kmemdup(td, td->u.target_size, GFP_KERNEL);
+	if (unlikely(!t))
+		goto err2;
+
+	err = ipt_init_target(t, tname, hook);
+	if (err < 0)
+		goto err3;
+
+	spin_lock_bh(&ipt->tcf_lock);
+	if (ret != ACT_P_CREATED) {
+		ipt_destroy_target(ipt->tcfi_t);
+		kfree(ipt->tcfi_tname);
+		kfree(ipt->tcfi_t);
+	}
+	ipt->tcfi_tname = tname;
+	ipt->tcfi_t = t;
+	ipt->tcfi_hook = hook;
+	spin_unlock_bh(&ipt->tcf_lock);
+	if (ret == ACT_P_CREATED)
+		tcf_hash_insert(pc, &ipt_hash_info);
+	return ret;
+
+err3:
+	kfree(t);
+err2:
+	kfree(tname);
+err1:
+	if (ret == ACT_P_CREATED) {
+		if (est)
+			gen_kill_estimator(&pc->tcfc_bstats,
+					   &pc->tcfc_rate_est);
+		kfree_rcu(pc, tcfc_rcu);
+	}
+	return err;
+}
+
+static int tcf_ipt_cleanup(struct tc_action *a, int bind)
+{
+	struct tcf_ipt *ipt = a->priv;
+	return tcf_ipt_release(ipt, bind);
+}
+
+static int tcf_ipt(struct sk_buff *skb, const struct tc_action *a,
+		   struct tcf_result *res)
+{
+	int ret = 0, result = 0;
+	struct tcf_ipt *ipt = a->priv;
+	struct xt_action_param par;
+
+	if (skb_cloned(skb)) {
+		if (pskb_expand_head(skb, 0, 0, GFP_ATOMIC))
+			return TC_ACT_UNSPEC;
+	}
+
+	spin_lock(&ipt->tcf_lock);
+
+	ipt->tcf_tm.lastuse = jiffies;
+	bstats_update(&ipt->tcf_bstats, skb);
+
+	/* yes, we have to worry about both in and out dev
+	 * worry later - danger - this API seems to have changed
+	 * from earlier kernels
+	 */
+	par.in = skb->dev;
+	par.out = NULL;
+	par.hooknum = ipt->tcfi_hook;
+	par.target = ipt->tcfi_t->u.kernel.target;
+	par.targinfo = ipt->tcfi_t->data;
+	ret = par.target->target(skb, &par);
+
+	switch (ret) {
+	case NF_ACCEPT:
+		result = TC_ACT_OK;
+		break;
+	case NF_DROP:
+		result = TC_ACT_SHOT;
+		ipt->tcf_qstats.drops++;
+		break;
+	case XT_CONTINUE:
+		result = TC_ACT_PIPE;
+		break;
+	default:
+		net_notice_ratelimited
+		    ("tc filter: Bogus netfilter code %d assume ACCEPT\n", ret);
+		result = TC_POLICE_OK;
+		break;
+	}
+	spin_unlock(&ipt->tcf_lock);
+	return result;
+
+}
+
+static int tcf_ipt_dump(struct sk_buff *skb, struct tc_action *a, int bind,
+			int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_ipt *ipt = a->priv;
+	struct xt_entry_target *t;
+	struct tcf_t tm;
+	struct tc_cnt c;
+
+	/* for simple targets kernel size == user size
+	 * user name = target name
+	 * for foolproof you need to not assume this
+	 */
+
+	t = kmemdup(ipt->tcfi_t, ipt->tcfi_t->u.user.target_size, GFP_ATOMIC);
+	if (unlikely(!t))
+		goto nla_put_failure;
+
+	c.bindcnt = ipt->tcf_bindcnt - bind;
+	c.refcnt = ipt->tcf_refcnt - ref;
+	strcpy(t->u.user.name, ipt->tcfi_t->u.kernel.target->name);
+
+	if (nla_put(skb, TCA_IPT_TARG, ipt->tcfi_t->u.user.target_size, t) ||
+	    nla_put_u32(skb, TCA_IPT_INDEX, ipt->tcf_index) ||
+	    nla_put_u32(skb, TCA_IPT_HOOK, ipt->tcfi_hook) ||
+	    nla_put(skb, TCA_IPT_CNT, sizeof(struct tc_cnt), &c) ||
+	    nla_put_string(skb, TCA_IPT_TABLE, ipt->tcfi_tname))
+		goto nla_put_failure;
+	tm.install = jiffies_to_clock_t(jiffies - ipt->tcf_tm.install);
+	tm.lastuse = jiffies_to_clock_t(jiffies - ipt->tcf_tm.lastuse);
+	tm.expires = jiffies_to_clock_t(ipt->tcf_tm.expires);
+	if (nla_put(skb, TCA_IPT_TM, sizeof(tm), &tm))
+		goto nla_put_failure;
+	kfree(t);
+	return skb->len;
+
+nla_put_failure:
+	nlmsg_trim(skb, b);
+	kfree(t);
+	return -1;
+}
+
+static struct tc_action_ops act_ipt_ops = {
+	.kind = "xt",
+	.hinfo = &ipt_hash_info,
+	.type = TCA_ACT_IPT,
+	.capab = TCA_CAP_NONE,
+	.owner = THIS_MODULE,
+	.act = tcf_ipt,
+	.dump = tcf_ipt_dump,
+	.cleanup = tcf_ipt_cleanup,
+	.lookup = tcf_hash_search,
+	.init = tcf_ipt_init,
+	.walk = tcf_generic_walker
+};
+
+MODULE_AUTHOR("Jamal Hadi Salim(2002-12)");
+MODULE_DESCRIPTION("New Iptables target actions");
+MODULE_LICENSE("GPL");
+
+static int __init ipt_init_module(void)
+{
+	return tcf_register_action(&act_ipt_ops);
+}
+
+static void __exit ipt_cleanup_module(void)
+{
+	tcf_unregister_action(&act_ipt_ops);
+}
+
+module_init(ipt_init_module);
+module_exit(ipt_cleanup_module);

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-19 11:56                           ` [PATCH] pkt_sched: act_xt support new Xtables interface Jamal Hadi Salim
@ 2012-12-19 15:52                             ` Jan Engelhardt
  2012-12-19 23:05                               ` Jamal Hadi Salim
       [not found]                             ` <CAASe=fQZGwjM_2PStRE0tje33Doi6TuwJJ3p7x-SRcwq3mQvRg@mail.gmail.com>
  2012-12-20  8:54                             ` Yury Stankevich
  2 siblings, 1 reply; 69+ messages in thread
From: Jan Engelhardt @ 2012-12-19 15:52 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Hasan Chowdhury, Stephen Hemminger, Yury Stankevich, netdev,
	pablo, netfilter-devel


On Wednesday 2012-12-19 12:56, Jamal Hadi Salim wrote:
>
> To be applied pending more testing.
>
> Attached. Sorry, I thought I had sent this out over the weekend.
> I have done basic testing with a single mark and sending pings to
> update stats which can then displayed for the mark.
>
> diffstat xt-p1
> Kconfig  |   15 ++
> Makefile |    1 
> act_xt.c |  324 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 339 insertions(+), 1 deletion(-)

Humm... that's a huge patch for what seems to be equal to act_ipt.c
Let's do a cross-diff:

--- act_ipt.c	2012-10-25 19:49:25.372191795 +0200
+++ act_xt.c	2012-12-19 16:48:22.052419730 +0100
@@ -2 +2 @@
- * net/sched/ipt.c     iptables target interface
+ * net/sched/act_xt.c     iptables target interface
@@ -11 +11 @@
- * Copyright:	Jamal Hadi Salim (2002-4)
+ * Copyright:	Jamal Hadi Salim (2002-12)
@@ -30 +29,0 @@
-
@@ -42 +41,2 @@ static struct tcf_hashinfo ipt_hash_info
-static int ipt_init_target(struct xt_entry_target *t, char *table, unsigned int hook)
+static int ipt_init_target(struct xt_entry_target *t, char *table,
+			   unsigned int hook)
@@ -243,2 +243,2 @@ static int tcf_ipt(struct sk_buff *skb,
-		net_notice_ratelimited("tc filter: Bogus netfilter code %d assume ACCEPT\n",
-				       ret);
+		net_notice_ratelimited
+		    ("tc filter: Bogus netfilter code %d assume ACCEPT\n", ret);
@@ -253 +253,2 @@ static int tcf_ipt(struct sk_buff *skb,
-static int tcf_ipt_dump(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
+static int tcf_ipt_dump(struct sk_buff *skb, struct tc_action *a, int bind,
+			int ref)
@@ -295 +296 @@ static struct tc_action_ops act_ipt_ops
-	.kind		=	"ipt",
+	.kind = "xt",
@@ -308,2 +309,2 @@ static struct tc_action_ops act_ipt_ops
-MODULE_AUTHOR("Jamal Hadi Salim(2002-4)");
-MODULE_DESCRIPTION("Iptables target actions");
+MODULE_AUTHOR("Jamal Hadi Salim(2002-12)");
+MODULE_DESCRIPTION("New Iptables target actions");


Is that [the set of hunks] all? Then I would instead suggest
something like:


diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
index 58fb3c7..f92a007 100644
--- a/net/sched/act_ipt.c
+++ b/net/sched/act_ipt.c
@@ -305,18 +305,43 @@ static struct tc_action_ops act_ipt_ops = {
 	.walk		=	tcf_generic_walker
 };
 
+static struct tc_action_ops act_xt_ops = {
+	.kind		=	"xt",
+	.hinfo		=	&ipt_hash_info,
+	.type		=	TCA_ACT_IPT,
+	.capab		=	TCA_CAP_NONE,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_ipt,
+	.dump		=	tcf_ipt_dump,
+	.cleanup	=	tcf_ipt_cleanup,
+	.lookup		=	tcf_hash_search,
+	.init		=	tcf_ipt_init,
+	.walk		=	tcf_generic_walker
+};
+
 MODULE_AUTHOR("Jamal Hadi Salim(2002-4)");
 MODULE_DESCRIPTION("Iptables target actions");
 MODULE_LICENSE("GPL");
+MODULE_ALIAS("act_xt");
 
 static int __init ipt_init_module(void)
 {
-	return tcf_register_action(&act_ipt_ops);
+	int ret;
+	ret = tcf_register_action(&act_ipt_ops);
+	if (ret < 0)
+		return ret;
+	ret = tcf_register_action(&xt_ipt_ops);
+	if (ret < 0) {
+		tcf_unregister_action(&act_ipt_ops);
+		return ret;
+	}
+	return 0;
 }
 
 static void __exit ipt_cleanup_module(void)
 {
 	tcf_unregister_action(&act_ipt_ops);
+	tcf_unregister_action(&act_xt_ops);
 }
 
 module_init(ipt_init_module);

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
       [not found]                             ` <CAASe=fQZGwjM_2PStRE0tje33Doi6TuwJJ3p7x-SRcwq3mQvRg@mail.gmail.com>
@ 2012-12-19 23:00                               ` Jamal Hadi Salim
  0 siblings, 0 replies; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-19 23:00 UTC (permalink / raw)
  To: Hasan Chowdhury
  Cc: Stephen Hemminger, Jan Engelhardt, Yury Stankevich, netdev,
	pablo, netfilter-devel


On 12-12-19 10:51 AM, Hasan Chowdhury wrote:
> Hi Jamal,
> I will test it once I get some opportunity , but think I like to know
> even before any  testing
>
> 1. What will be the new procedure to compile iproute2 after the patch
> apply (any new library or any configuration that needs to be adjusted )

git pull Stephens latest tree.
Apply patch 1 and compile.
When you are done compiling an m_xt.so will sit in the tc directory;
unfortunate that we are putting out this shared libs. Backup your distro
version and copy this over to that location. On ubuntu 12.04:
sudo cp tc/m_xt.so /usr/lib/tc/m_xt.so
will do it.

> 2.  tc filter add dev eth0 protocol ip parent 1: prio 3 u32 match ip src
> 192.168.0.0/16 <http://192.168.0.0/16>  flowid 1:1  action xt  -j MARK
> --set-mark 3
>
>
> is this still a valid command ?
>

Indeed it is.

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-19 15:52                             ` Jan Engelhardt
@ 2012-12-19 23:05                               ` Jamal Hadi Salim
  0 siblings, 0 replies; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-19 23:05 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Hasan Chowdhury, Stephen Hemminger, Yury Stankevich, netdev,
	pablo, netfilter-devel

On 12-12-19 10:52 AM, Jan Engelhardt wrote:
>
>
> Humm... that's a huge patch for what seems to be equal to act_ipt.c
> Let's do a cross-diff:
>

I was thinking of our little discussion when doing that.
The one reason i separated the two is so when the time is right you
can patch on top of only act_xt.c and eventually act_ipt.c will die..
Does changes on top of act_xt.c sound palatable to you?
Otherwise, you are right - it is overkill

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] build: unbreak linkage of m_xt.so
  2012-12-18 18:47             ` Mike Frysinger
@ 2012-12-20  0:03               ` Stephen Hemminger
  0 siblings, 0 replies; 69+ messages in thread
From: Stephen Hemminger @ 2012-12-20  0:03 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: Jan Engelhardt, stephen.hemminger, netdev, jhs, urykhy, shemonc,
	pablo, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 393 bytes --]

On Tue, 18 Dec 2012 13:47:58 -0500
Mike Frysinger <vapier@gentoo.org> wrote:

> this patch is no longer necessary one you merged my:
> 	configure: move toolchain init to a function
> 
> it's actually undesirable to apply this after that since it makes the configure 
> script less clear again ...
> 
> sorry if my commit message wasn't obvious.
> -mike

ok, went back to old way.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-19 11:56                           ` [PATCH] pkt_sched: act_xt support new Xtables interface Jamal Hadi Salim
  2012-12-19 15:52                             ` Jan Engelhardt
       [not found]                             ` <CAASe=fQZGwjM_2PStRE0tje33Doi6TuwJJ3p7x-SRcwq3mQvRg@mail.gmail.com>
@ 2012-12-20  8:54                             ` Yury Stankevich
  2012-12-20 12:35                               ` Jamal Hadi Salim
  2 siblings, 1 reply; 69+ messages in thread
From: Yury Stankevich @ 2012-12-20  8:54 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt, netdev,
	pablo, netfilter-devel

19.12.2012 15:56, Jamal Hadi Salim пишет:
> Hasan/Yury, if you test this please use the latest iproute2 with only
> the first patch I posted (originally from Hasan). Hasan please use that
> patch not your version - if theres anything wrong we can find out sooner
> before the patch becomes final.

Hello,
3.7.1 kernel with 3.7.0 iproute,
patch-xt, xt-p1 + linkage fix was applyed
command successfully performed, but actually doesn't work.

command:
tc filter add dev $dev parent ffff: protocol ip u32 match u32 0 0 \
            action xt -j CONNMARK --restore-mark \
            action mirred egress redirect dev ifb0
then i use filter:

tc filter add dev ifb0 protocol ip parent 1: prio 2 handle 0xa fw flowid
1:102

iptables line:
iptable -t mangle -A POSTROUTING -p tcp --dport 80 -m connmark --mark 0
-m connbytes --connbytes 204800: --connbytes-dir both --connbytes-mode
bytes -j CONNMARK --set-mark 0xa

once i run a test to download 300K file,
from iptables counters i can see that rule in POSTROUTING is triggered,
but from `tc -s qdisc show dev ifb0` i see that no packets was sent to
1:102 flow.

btw,
tc -p -s filter show dev ifb0 parent 1:
do not show stats `(rule hit 416 success 0)` for this (filter protocol
ip pref 2 fw handle 0xa classid 1:102) rule.



-- 
Linux registered user #402966 // pub 1024D/E99AF373 <pgp.mit.edu>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-20  8:54                             ` Yury Stankevich
@ 2012-12-20 12:35                               ` Jamal Hadi Salim
  2012-12-20 14:59                                 ` Yury Stankevich
  0 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-20 12:35 UTC (permalink / raw)
  To: Yury Stankevich
  Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt, netdev,
	pablo, netfilter-devel


Could be your setup. I didnt do a lot of testing but
from my notes (running different kernel at the moment):

#try to point to everything (no iptables setup)
tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0 flowid 
23:23 action xt -j CONNMARK --restore-mark
#let it run for a 1 sec then display with
tc -s filter show dev eth0 parent ffff:

----
filter protocol ip pref 49152 u32
filter protocol ip pref 49152 u32 fh 800: ht divisor 1
filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt 
0 flowid 23:23
   match 00000000/00000000 at 0
	action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING
	target  CONNMARK restore
	index 1 ref 1 bind 1 installed 3 sec used 1 sec
	Action statistics:
	Sent 280 bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0
----

cheers,
jamal

On 12-12-20 03:54 AM, Yury Stankevich wrote:
> 19.12.2012 15:56, Jamal Hadi Salim пишет:
>> Hasan/Yury, if you test this please use the latest iproute2 with only
>> the first patch I posted (originally from Hasan). Hasan please use that
>> patch not your version - if theres anything wrong we can find out sooner
>> before the patch becomes final.
>
> Hello,
> 3.7.1 kernel with 3.7.0 iproute,
> patch-xt, xt-p1 + linkage fix was applyed
> command successfully performed, but actually doesn't work.
>
> command:
> tc filter add dev $dev parent ffff: protocol ip u32 match u32 0 0 \
>              action xt -j CONNMARK --restore-mark \
>              action mirred egress redirect dev ifb0
> then i use filter:
>
> tc filter add dev ifb0 protocol ip parent 1: prio 2 handle 0xa fw flowid
> 1:102
>
> iptables line:
> iptable -t mangle -A POSTROUTING -p tcp --dport 80 -m connmark --mark 0
> -m connbytes --connbytes 204800: --connbytes-dir both --connbytes-mode
> bytes -j CONNMARK --set-mark 0xa
>
> once i run a test to download 300K file,
> from iptables counters i can see that rule in POSTROUTING is triggered,
> but from `tc -s qdisc show dev ifb0` i see that no packets was sent to
> 1:102 flow.
>
> btw,
> tc -p -s filter show dev ifb0 parent 1:
> do not show stats `(rule hit 416 success 0)` for this (filter protocol
> ip pref 2 fw handle 0xa classid 1:102) rule.
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-20 12:35                               ` Jamal Hadi Salim
@ 2012-12-20 14:59                                 ` Yury Stankevich
  2012-12-21 13:03                                   ` Jamal Hadi Salim
  0 siblings, 1 reply; 69+ messages in thread
From: Yury Stankevich @ 2012-12-20 14:59 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt, netdev,
	pablo, netfilter-devel

interesting,

#tc -s filter show dev usb0 parent ffff:
filter protocol ip pref 49152 u32
filter protocol ip pref 49152 u32 fh 800: ht divisor 1
filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt
0 terminal flowid ???  (rule hit 707 success 707)
  match 00000000/00000000 at 0 (success 707 )
	action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING
	target  CONNMARK restore
	index 5 ref 1 bind 1 installed 394 sec used 11 sec
	Action statistics:
	Sent 783783 bytes 707 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 2: mirred (Egress Redirect to device ifb0) stolen
 	index 5 ref 1 bind 1 installed 394 sec used 11 sec
 	Action statistics:
	Sent 783783 bytes 707 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

so, looks like packets was sent to CONNMARK target.

but...
i make a iptables rule to log packets with 0xa mark:

Chain PREROUTING (policy ACCEPT 1308 packets, 848K bytes)
 pkts bytes target     prot opt in     out     source
destination
    0     0 NFLOG      all  --  *      *       0.0.0.0/0
0.0.0.0/0            mark match 0xa nflog-group 1

Chain POSTROUTING (policy ACCEPT 1240 packets, 550K bytes)
 pkts bytes target     prot opt in     out     source
destination
    1    40 CONNMARK   tcp  --  *      *       0.0.0.0/0
0.0.0.0/0            tcp dpt:80 connmark match  0x0 connbytes 204800
connbytes mode bytes connbytes direction both CONNMARK set 0xa

idea is:
i run downloading, rule from POSTROUTING must fire if i download more
than ~200K,
tc filter call to CONNMARK restore, must restore mark (0xa) for packets
belong to this connection.
so i expect, that PREROUTING rule must notice the restored mark, but it
doesn't.
maybe i miss something ?


20.12.2012 16:35, Jamal Hadi Salim пишет:
> 
> Could be your setup. I didnt do a lot of testing but
> from my notes (running different kernel at the moment):
> 
> #try to point to everything (no iptables setup)
> tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0 flowid
> 23:23 action xt -j CONNMARK --restore-mark
> #let it run for a 1 sec then display with
> tc -s filter show dev eth0 parent ffff:
> 
> ----
> filter protocol ip pref 49152 u32
> filter protocol ip pref 49152 u32 fh 800: ht divisor 1
> filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt
> 0 flowid 23:23
>   match 00000000/00000000 at 0
>     action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING
>     target  CONNMARK restore
>     index 1 ref 1 bind 1 installed 3 sec used 1 sec
>     Action statistics:
>     Sent 280 bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
>     backlog 0b 0p requeues 0
> ----
> 
> cheers,
> jamal
> 
> On 12-12-20 03:54 AM, Yury Stankevich wrote:
>> 19.12.2012 15:56, Jamal Hadi Salim пишет:
>>> Hasan/Yury, if you test this please use the latest iproute2 with only
>>> the first patch I posted (originally from Hasan). Hasan please use that
>>> patch not your version - if theres anything wrong we can find out sooner
>>> before the patch becomes final.
>>
>> Hello,
>> 3.7.1 kernel with 3.7.0 iproute,
>> patch-xt, xt-p1 + linkage fix was applyed
>> command successfully performed, but actually doesn't work.
>>
>> command:
>> tc filter add dev $dev parent ffff: protocol ip u32 match u32 0 0 \
>>              action xt -j CONNMARK --restore-mark \
>>              action mirred egress redirect dev ifb0
>> then i use filter:
>>
>> tc filter add dev ifb0 protocol ip parent 1: prio 2 handle 0xa fw flowid
>> 1:102
>>
>> iptables line:
>> iptable -t mangle -A POSTROUTING -p tcp --dport 80 -m connmark --mark 0
>> -m connbytes --connbytes 204800: --connbytes-dir both --connbytes-mode
>> bytes -j CONNMARK --set-mark 0xa
>>
>> once i run a test to download 300K file,
>> from iptables counters i can see that rule in POSTROUTING is triggered,
>> but from `tc -s qdisc show dev ifb0` i see that no packets was sent to
>> 1:102 flow.
>>
>> btw,
>> tc -p -s filter show dev ifb0 parent 1:
>> do not show stats `(rule hit 416 success 0)` for this (filter protocol
>> ip pref 2 fw handle 0xa classid 1:102) rule.
>>
>>
>>
> 


-- 
Linux registered user #402966 // pub 1024D/E99AF373 <pgp.mit.edu>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-20 14:59                                 ` Yury Stankevich
@ 2012-12-21 13:03                                   ` Jamal Hadi Salim
  2012-12-21 13:13                                     ` Yury Stankevich
  0 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-21 13:03 UTC (permalink / raw)
  To: Yury Stankevich
  Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt, netdev,
	pablo, netfilter-devel

On 12-12-20 09:59 AM, Yury Stankevich wrote:
> interesting,
>
> #tc -s filter show dev usb0 parent ffff:


Given you are adding this on ingress - the settings you have will
happen before pre-routing hook.
If you did things at egress - the setting will take effect after
post-routing. So take a closer look at those details they look
like your source of issues..

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-21 13:03                                   ` Jamal Hadi Salim
@ 2012-12-21 13:13                                     ` Yury Stankevich
  2012-12-21 13:50                                       ` Jamal Hadi Salim
  0 siblings, 1 reply; 69+ messages in thread
From: Yury Stankevich @ 2012-12-21 13:13 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt, netdev,
	pablo, netfilter-devel

21.12.2012 17:03, Jamal Hadi Salim пишет:
> On 12-12-20 09:59 AM, Yury Stankevich wrote:
>> interesting,
>>
>> #tc -s filter show dev usb0 parent ffff:
> 
> 
> Given you are adding this on ingress - the settings you have will
> happen before pre-routing hook.
> If you did things at egress - the setting will take effect after
> post-routing. So take a closer look at those details they look
> like your source of issues..

sure,
i use it ingress,
so, i need to use tc xt action
to get mark on the packet, before filter on ifb will run.

prerouting rule, in turn, used to test if mark was actually restored.

in practice:
1. prerouting rule - is not fired. so, no packets with mark was seen.
2. filter on ifb - do not pass traffic to flow configured.
looks like `CONNMARK --restore` is not really called.


-- 
Linux registered user #402966 // pub 1024D/E99AF373 <pgp.mit.edu>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-21 13:13                                     ` Yury Stankevich
@ 2012-12-21 13:50                                       ` Jamal Hadi Salim
  2012-12-21 14:14                                         ` Yury Stankevich
  2012-12-21 14:35                                         ` Jan Engelhardt
  0 siblings, 2 replies; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-21 13:50 UTC (permalink / raw)
  To: Yury Stankevich
  Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt, netdev,
	pablo, netfilter-devel

On 12-12-21 08:13 AM, Yury Stankevich wrote:

> sure,
> i use it ingress,
> so, i need to use tc xt action
> to get mark on the packet, before filter on ifb will run.

Ok. So does ifb see it?

> prerouting rule, in turn, used to test if mark was actually restored.

No experience with connmark, but - in order to restore something has
to store it, correct?

> in practice:
> 1. prerouting rule - is not fired. so, no packets with mark was seen.
> 2. filter on ifb - do not pass traffic to flow configured.
> looks like `CONNMARK --restore` is not really called.
>

My suspicion is that it is not set to begin with...

cheers,
jamal


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-21 13:50                                       ` Jamal Hadi Salim
@ 2012-12-21 14:14                                         ` Yury Stankevich
  2012-12-22 13:19                                           ` Jamal Hadi Salim
  2012-12-21 14:35                                         ` Jan Engelhardt
  1 sibling, 1 reply; 69+ messages in thread
From: Yury Stankevich @ 2012-12-21 14:14 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt, netdev,
	pablo, netfilter-devel


well.
let me describe whole picture i want to achieve

1. use htb/sfq on ingress.

i got a traffic, and use few u32 filters to direct it to 3 flows,
priority, interactive, and bulk.
as http normally pass to interactive flow, i want to move long donwloads
to the bulk one.

how i trying to find these downloads:

iptables -t mangle -A POSTROUTING -p tcp --dport 80 -m connmark --mark 0
-m connbytes --connbytes 204800: --connbytes-dir both --connbytes-mode
bytes -j CONNMARK --set-mark 0xa
so, http connection where more than 200K downloaded, must got a
connection mark.

since ingress traffic hits qos before netfilter,
i use xt action, to copy connection mark, to a packet.
(action xt -j CONNMARK --restore-mark )
from this moment, i expect packet must have a restored mark

after this, i can use high priority tc filter .. handle 0xa fw flowid 1:102
to direct packet with mark 0xa to 1:102 flow (bulk).

now about a problem.

1. i run http download, once i get 200K - i can see that rule in
POSTROUTING is triggered and connection mark is installed (iptables -L
-n -v mangle -- can show number of packets matched by rule)

2. i see to tc stats for my flows, and i see, that packets still going
to interactive flow, not bulk as i expect.

3. from tc -s filter show dev usb0 parent ffff:
filter protocol ip pref 49152 u32
filter protocol ip pref 49152 u32 fh 800: ht divisor 1
filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt
0 terminal flowid ???  (rule hit 707 success 707)
  match 00000000/00000000 at 0 (success 707 )
	action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING
	target  CONNMARK restore
	index 5 ref 1 bind 1 installed 394 sec used 11 sec
	Action statistics:
	Sent 783783 bytes 707 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 2: mirred (Egress Redirect to device ifb0) stolen
 	index 5 ref 1 bind 1 installed 394 sec used 11 sec
 	Action statistics:
	Sent 783783 bytes 707 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

i can see that packets must reach xt action.

4. lets try to check packets mark with iptables,
if mark restored by xt action - i must be able to match it in prerouting
rule.
iptables -t mangle -A PREROUTING -m mark --mark 0xa -j NFLOG --nflog-group 1

but this rule not macthesd - so, no mark is restored by xt action.

maybe im completely wrong here, and such mode can't work for some reason ?

21.12.2012 17:50, Jamal Hadi Salim пишет:
> On 12-12-21 08:13 AM, Yury Stankevich wrote:
> 
>> sure,
>> i use it ingress,
>> so, i need to use tc xt action
>> to get mark on the packet, before filter on ifb will run.
> 
> Ok. So does ifb see it?
> 
>> prerouting rule, in turn, used to test if mark was actually restored.
> 
> No experience with connmark, but - in order to restore something has
> to store it, correct?
> 
>> in practice:
>> 1. prerouting rule - is not fired. so, no packets with mark was seen.
>> 2. filter on ifb - do not pass traffic to flow configured.
>> looks like `CONNMARK --restore` is not really called.
>>
> 
> My suspicion is that it is not set to begin with...
> 
> cheers,
> jamal
> 


-- 
Linux registered user #402966 // pub 1024D/E99AF373 <pgp.mit.edu>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-21 13:50                                       ` Jamal Hadi Salim
  2012-12-21 14:14                                         ` Yury Stankevich
@ 2012-12-21 14:35                                         ` Jan Engelhardt
  2012-12-21 15:45                                           ` Eric Dumazet
  1 sibling, 1 reply; 69+ messages in thread
From: Jan Engelhardt @ 2012-12-21 14:35 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Yury Stankevich, Hasan Chowdhury, Stephen Hemminger, netdev,
	pablo, netfilter-devel


On Friday 2012-12-21 14:50, Jamal Hadi Salim wrote:
> On 12-12-21 08:13 AM, Yury Stankevich wrote:

>> i use it ingress,
>> so, i need to use tc xt action
>> to get mark on the packet, before filter on ifb will run.
>> prerouting rule, in turn, used to test if mark was actually restored.
>
> No experience with connmark, but - in order to restore something has
> to store it, correct?

The bigger problem here, if I see __netif_receive_skb right, is that
when ingress rules run, skb->nfct is still unset, thereby the
CONNMARK action is a no-op.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-21 14:35                                         ` Jan Engelhardt
@ 2012-12-21 15:45                                           ` Eric Dumazet
  2012-12-22 13:42                                             ` Jamal Hadi Salim
  0 siblings, 1 reply; 69+ messages in thread
From: Eric Dumazet @ 2012-12-21 15:45 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Jamal Hadi Salim, Yury Stankevich, Hasan Chowdhury,
	Stephen Hemminger, netdev, pablo, netfilter-devel

On Fri, 2012-12-21 at 15:35 +0100, Jan Engelhardt wrote:

> 
> The bigger problem here, if I see __netif_receive_skb right, is that
> when ingress rules run, skb->nfct is still unset, thereby the
> CONNMARK action is a no-op.

Right, ingress is performed before IP/netfilter stack.

This reminds me this might be the reason we have
skb_reset_transport_header(skb);
in __netif_receive_skb(), while its not very logical.

(Yes, sorry for being off topic, but I am referring to 
http://www.spinics.net/lists/netdev/msg214662.html )

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-21 14:14                                         ` Yury Stankevich
@ 2012-12-22 13:19                                           ` Jamal Hadi Salim
  2012-12-22 13:43                                             ` Jan Engelhardt
  2012-12-22 13:58                                             ` Yury Stankevich
  0 siblings, 2 replies; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-22 13:19 UTC (permalink / raw)
  To: Yury Stankevich
  Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt, netdev,
	pablo, netfilter-devel

On 12-12-21 09:14 AM, Yury Stankevich wrote:
>
> well.
> let me describe whole picture i want to achieve
>

I think i got what you are trying to do Yury. Clever.
 From the description Jan provided in his response, I dont
think this used to work at all. Are you saying it worked before?

Having said that, what you are doing sounds so useful
that we need to make it work ;-> But it appears like
we need a brand new action for it, something like
GetMarkFromConntrack. Jan, I am assuming (on ingress only)
we need to call "something" to give us the nfct then
grab the skb->mark from nfct. On egress,
I am assuming the skb->mark is already set if connmark
is to be used... Am i correct?
If yes, then this action will only be useful at ingress.

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-21 15:45                                           ` Eric Dumazet
@ 2012-12-22 13:42                                             ` Jamal Hadi Salim
  2013-01-07 19:28                                               ` [PATCH net-next] net: introduce skb_transport_header_was_set() Eric Dumazet
  0 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-22 13:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jan Engelhardt, Yury Stankevich, Hasan Chowdhury,
	Stephen Hemminger, netdev, pablo, netfilter-devel

On 12-12-21 10:45 AM, Eric Dumazet wrote:
> On Fri, 2012-12-21 at 15:35 +0100, Jan Engelhardt wrote:
>

> This reminds me this might be the reason we have
> skb_reset_transport_header(skb);
> in __netif_receive_skb(), while its not very logical.
>

You seem to have nailed the egress part finally. That has
been a constant battle. At one point the standard answer
was "turn off TSO" ;->

> (Yes, sorry for being off topic, but I am referring to
> http://www.spinics.net/lists/netdev/msg214662.html )


I think the skb_reset_transport_header() when Acme made
a major overhaul to replace direct pointer access.
For this reason i think your second option seems preferable.

cheers,
jamal


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-22 13:19                                           ` Jamal Hadi Salim
@ 2012-12-22 13:43                                             ` Jan Engelhardt
  2012-12-22 13:56                                               ` Jamal Hadi Salim
  2012-12-22 13:58                                             ` Yury Stankevich
  1 sibling, 1 reply; 69+ messages in thread
From: Jan Engelhardt @ 2012-12-22 13:43 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Yury Stankevich, Hasan Chowdhury, Stephen Hemminger, netdev,
	pablo, netfilter-devel


On Saturday 2012-12-22 14:19, Jamal Hadi Salim wrote:
>
> Having said that, what you are doing sounds so useful
> that we need to make it work ;-> But it appears like
> we need a brand new action for it, something like
> GetMarkFromConntrack. Jan, I am assuming (on ingress only)
> we need to call "something" to give us the nfct then
> grab the skb->mark from nfct.

Looking up CT before ingress would mean the entire "raw"
table needs to be moved before ingress. But with classic
ip_tables, calling a table requires a lot of setup
(basically ip_rcv).

> On egress,
> I am assuming the skb->mark is already set if connmark
> is to be used... Am i correct?

All new skbs (i.e. those that did not loop due to IPsec, for example)
received through __netif_receive_skb should start out with
skb->mark=0, which is why CONNMARK --restore-mark is needed
to copy skb->mark=ct->mark.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-22 13:43                                             ` Jan Engelhardt
@ 2012-12-22 13:56                                               ` Jamal Hadi Salim
  0 siblings, 0 replies; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-22 13:56 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Yury Stankevich, Hasan Chowdhury, Stephen Hemminger, netdev,
	pablo, netfilter-devel

On 12-12-22 08:43 AM, Jan Engelhardt wrote:

>
> Looking up CT before ingress would mean the entire "raw"
> table needs to be moved before ingress. But with classic
> ip_tables, calling a table requires a lot of setup
> (basically ip_rcv).

Scanning the code:
Would it not work if i only passed it IP packets (the tc
classifier can check) and then for v4 i do something like
ipv4_conntrack_in() with pre-routing as the hook to update
the skb?

> All new skbs (i.e. those that did not loop due to IPsec, for example)
> received through __netif_receive_skb should start out with
> skb->mark=0, which is why CONNMARK --restore-mark is needed
> to copy skb->mark=ct->mark.

I  may be overthinking this: are you saying connmark should do the
copying to skb->mark instead of some action? Earlier you said
conmark depends on presence of skb->nfct.

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-22 13:19                                           ` Jamal Hadi Salim
  2012-12-22 13:43                                             ` Jan Engelhardt
@ 2012-12-22 13:58                                             ` Yury Stankevich
  2012-12-22 14:04                                               ` Florian Westphal
  2012-12-22 14:09                                               ` Jamal Hadi Salim
  1 sibling, 2 replies; 69+ messages in thread
From: Yury Stankevich @ 2012-12-22 13:58 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt, netdev,
	pablo, netfilter-devel

22.12.2012 17:19, Jamal Hadi Salim пишет:
> From the description Jan provided in his response, I dont
> think this used to work at all. Are you saying it worked before?

no.
i'm trying if this can work, alas. it can't.

> Having said that, what you are doing sounds so useful
> that we need to make it work ;-> But it appears like
> we need a brand new action for it, something like
> GetMarkFromConntrack. 

maybe ifb device can be made more friendly to iptables ?
for a sample, run some (or all?) nefilter hooks before qdisc, like on a
normal interface ?


-- 
Linux registered user #402966 // pub 1024D/E99AF373 <pgp.mit.edu>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-22 13:58                                             ` Yury Stankevich
@ 2012-12-22 14:04                                               ` Florian Westphal
  2012-12-22 14:09                                               ` Jamal Hadi Salim
  1 sibling, 0 replies; 69+ messages in thread
From: Florian Westphal @ 2012-12-22 14:04 UTC (permalink / raw)
  To: Yury Stankevich
  Cc: Jamal Hadi Salim, Hasan Chowdhury, Stephen Hemminger,
	Jan Engelhardt, netdev, pablo, netfilter-devel

Yury Stankevich <urykhy@gmail.com> wrote:
> 22.12.2012 17:19, Jamal Hadi Salim пишет:
> > From the description Jan provided in his response, I dont
> > think this used to work at all. Are you saying it worked before?
> 
> no.
> i'm trying if this can work, alas. it can't.

Yury, what are you trying to accomplish?
Is there a particular reason why you want to use ingress shaping
instead of pure policing?

I ask, because you could try to use hashlimit match to do
rate policing via netfilter.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-22 13:58                                             ` Yury Stankevich
  2012-12-22 14:04                                               ` Florian Westphal
@ 2012-12-22 14:09                                               ` Jamal Hadi Salim
  2012-12-24 11:34                                                 ` Jamal Hadi Salim
  1 sibling, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-22 14:09 UTC (permalink / raw)
  To: Yury Stankevich
  Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt, netdev,
	pablo, netfilter-devel

On 12-12-22 08:58 AM, Yury Stankevich wrote:

> i'm trying if this can work, alas. it can't.

Now i want it to work ;-> So dont give up yet.


cheers,
jamal


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-22 14:09                                               ` Jamal Hadi Salim
@ 2012-12-24 11:34                                                 ` Jamal Hadi Salim
  2012-12-24 11:49                                                   ` Felix Fietkau
  0 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-24 11:34 UTC (permalink / raw)
  To: Yury Stankevich, Felix Fietkau
  Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt, netdev,
	pablo, netfilter-devel


Some good news Yury.
I am told Felix Fietkau <nbd@openwrt.org> (on CC) actually
already solved this issue and it is a feature in openwrt. I
cant find the code.

Felix - Yury is trying to retrieve skb->mark fields from
netfilter connmark. My understanding is you have written
such an action. Can you please point us to it - and any
reason you havent submitted this for inclusion in kernel
proper?

cheers,
jamal

On 12-12-22 09:09 AM, Jamal Hadi Salim wrote:
> On 12-12-22 08:58 AM, Yury Stankevich wrote:
>
>> i'm trying if this can work, alas. it can't.
>
> Now i want it to work ;-> So dont give up yet.
>
>
> cheers,
> jamal
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-24 11:34                                                 ` Jamal Hadi Salim
@ 2012-12-24 11:49                                                   ` Felix Fietkau
  2012-12-24 12:19                                                     ` Jamal Hadi Salim
  2012-12-24 13:12                                                     ` Pablo Neira Ayuso
  0 siblings, 2 replies; 69+ messages in thread
From: Felix Fietkau @ 2012-12-24 11:49 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Yury Stankevich, Hasan Chowdhury, Stephen Hemminger,
	Jan Engelhardt, netdev, pablo, netfilter-devel

On 2012-12-24 12:34 PM, Jamal Hadi Salim wrote:
> 
> Some good news Yury.
> I am told Felix Fietkau <nbd@openwrt.org> (on CC) actually
> already solved this issue and it is a feature in openwrt. I
> cant find the code.
> 
> Felix - Yury is trying to retrieve skb->mark fields from
> netfilter connmark. My understanding is you have written
> such an action. Can you please point us to it - and any
> reason you havent submitted this for inclusion in kernel
> proper?
After I added it as an experiment, I got distracted with other projects
again and forgot about submitting it. Take a look at the code - if the
approach is reasonable, I'll submit this thing for inclusion soon.

- Felix

--- /dev/null
+++ b/net/sched/act_connmark.c
@@ -0,0 +1,137 @@
+/*
+ * Copyright (c) 2011 Felix Fietkau <nbd@openwrt.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/pkt_cls.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/act_api.h>
+
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+
+#define TCA_ACT_CONNMARK	20
+
+#define CONNMARK_TAB_MASK     3
+static struct tcf_common *tcf_connmark_ht[CONNMARK_TAB_MASK + 1];
+static u32 connmark_idx_gen;
+static DEFINE_RWLOCK(connmark_lock);
+
+static struct tcf_hashinfo connmark_hash_info = {
+	.htab	=	tcf_connmark_ht,
+	.hmask	=	CONNMARK_TAB_MASK,
+	.lock	=	&connmark_lock,
+};
+
+static int tcf_connmark(struct sk_buff *skb, const struct tc_action *a,
+		       struct tcf_result *res)
+{
+	struct nf_conn *c;
+	enum ip_conntrack_info ctinfo;
+	int proto;
+	int r;
+
+	if (skb->protocol == htons(ETH_P_IP)) {
+		if (skb->len < sizeof(struct iphdr))
+			goto out;
+		proto = PF_INET;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		if (skb->len < sizeof(struct ipv6hdr))
+			goto out;
+		proto = PF_INET6;
+	} else
+		goto out;
+
+	r = nf_conntrack_in(dev_net(skb->dev), proto, NF_INET_PRE_ROUTING, skb);
+	if (r != NF_ACCEPT)
+		goto out;
+
+	c = nf_ct_get(skb, &ctinfo);
+	if (!c)
+		goto out;
+
+	skb->mark = c->mark;
+	nf_conntrack_put(skb->nfct);
+	skb->nfct = NULL;
+
+out:
+	return TC_ACT_PIPE;
+}
+
+static int tcf_connmark_init(struct nlattr *nla, struct nlattr *est,
+			 struct tc_action *a, int ovr, int bind)
+{
+	struct tcf_common *pc;
+
+	pc = tcf_hash_create(0, est, a, sizeof(*pc), bind,
+			     &connmark_idx_gen, &connmark_hash_info);
+	if (IS_ERR(pc))
+	    return PTR_ERR(pc);
+
+	tcf_hash_insert(pc, &connmark_hash_info);
+
+	return ACT_P_CREATED;
+}
+
+static inline int tcf_connmark_cleanup(struct tc_action *a, int bind)
+{
+	if (a->priv)
+		return tcf_hash_release(a->priv, bind, &connmark_hash_info);
+	return 0;
+}
+
+static inline int tcf_connmark_dump(struct sk_buff *skb, struct tc_action *a,
+				int bind, int ref)
+{
+	return skb->len;
+}
+
+static struct tc_action_ops act_connmark_ops = {
+	.kind		=	"connmark",
+	.hinfo		=	&connmark_hash_info,
+	.type		=	TCA_ACT_CONNMARK,
+	.capab		=	TCA_CAP_NONE,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_connmark,
+	.dump		=	tcf_connmark_dump,
+	.cleanup	=	tcf_connmark_cleanup,
+	.init		=	tcf_connmark_init,
+	.walk		=	tcf_generic_walker,
+};
+
+MODULE_AUTHOR("Felix Fietkau <nbd@openwrt.org>");
+MODULE_DESCRIPTION("Connection tracking mark restoring");
+MODULE_LICENSE("GPL");
+
+static int __init connmark_init_module(void)
+{
+	return tcf_register_action(&act_connmark_ops);
+}
+
+static void __exit connmark_cleanup_module(void)
+{
+	tcf_unregister_action(&act_connmark_ops);
+}
+
+module_init(connmark_init_module);
+module_exit(connmark_cleanup_module);
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -670,6 +670,19 @@ config NET_ACT_CSUM
 	  To compile this code as a module, choose M here: the
 	  module will be called act_csum.
 
+config NET_ACT_CONNMARK
+        tristate "Connection Tracking Marking"
+        depends on NET_CLS_ACT
+        depends on NF_CONNTRACK
+	 depends on NF_CONNTRACK_MARK
+        ---help---
+	  Say Y here to restore the connmark from a scheduler action
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_connmark.
+
 config NET_CLS_IND
 	bool "Incoming device classification"
 	depends on NET_CLS_U32 || NET_CLS_FW
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -16,6 +16,7 @@ obj-$(CONFIG_NET_ACT_PEDIT)	+= act_pedit
 obj-$(CONFIG_NET_ACT_SIMP)	+= act_simple.o
 obj-$(CONFIG_NET_ACT_SKBEDIT)	+= act_skbedit.o
 obj-$(CONFIG_NET_ACT_CSUM)	+= act_csum.o
+obj-$(CONFIG_NET_ACT_CONNMARK)	+= act_connmark.o
 obj-$(CONFIG_NET_SCH_FIFO)	+= sch_fifo.o
 obj-$(CONFIG_NET_SCH_CBQ)	+= sch_cbq.o
 obj-$(CONFIG_NET_SCH_HTB)	+= sch_htb.o

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-24 11:49                                                   ` Felix Fietkau
@ 2012-12-24 12:19                                                     ` Jamal Hadi Salim
  2012-12-24 13:12                                                     ` Pablo Neira Ayuso
  1 sibling, 0 replies; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-24 12:19 UTC (permalink / raw)
  To: Felix Fietkau
  Cc: Yury Stankevich, Hasan Chowdhury, Stephen Hemminger,
	Jan Engelhardt, netdev, pablo, netfilter-devel

On 12-12-24 06:49 AM, Felix Fietkau wrote:

>
> After I added it as an experiment, I got distracted with other projects
> again and forgot about submitting it. Take a look at the code - if the
> approach is reasonable, I'll submit this thing for inclusion soon.
>

Excellent ;-> Simple and elegant.

Usable as is  - some minor comments.
First nitpick: The name is not very reflective, how about:
GetMarkFromConntrack or something along those lines?


> +static int tcf_connmark(struct sk_buff *skb, const struct tc_action *a,
> +		       struct tcf_result *res)
> +{
> +	struct nf_conn *c;
> +	enum ip_conntrack_info ctinfo;
> +	int proto;
> +	int r;
> +
> +	if (skb->protocol == htons(ETH_P_IP)) {
> +		if (skb->len < sizeof(struct iphdr))
> +			goto out;
> +		proto = PF_INET;
> +	} else if (skb->protocol == htons(ETH_P_IPV6)) {
> +		if (skb->len < sizeof(struct ipv6hdr))
> +			goto out;
> +		proto = PF_INET6;
> +	} else
> +		goto out;
> +

I would have said that this action is probably also not useful for 
egress qdisc path since skb->mark would already be set. It maybe worth 
checking skb->tc_verd and skipping overhead of nf_conntrack_in() call.
Look at act_mirred for such a check.

> +	r = nf_conntrack_in(dev_net(skb->dev), proto, NF_INET_PRE_ROUTING, skb);
> +	if (r != NF_ACCEPT)
> +		goto out;
> +
> +	c = nf_ct_get(skb, &ctinfo);
> +	if (!c)
> +		goto out;
> +
> +	skb->mark = c->mark;
> +	nf_conntrack_put(skb->nfct);
> +	skb->nfct = NULL;
> +
> +out:
> +	return TC_ACT_PIPE;

Ok, perhaps set tcf_action in (iproute2) user space to TC_ACT_PIPE then 
just return policy->tcf_action here.

Even better is to have a different TC_ACT_XXX returned for failure
vs success... Your success path becomes TC_ACT_PIPE and let the
user program the failure branch optionally. This would allow for 
branching to different actions if success/failure, example:
if mark is found {
    if mark is 0xa redirect to ifb0
    else
      redirect to ifb1
} else
       set mark to 3 then redirect to ifb9

etc.

Not sure if that made sense. I am under the influence of nyquil ;->

cheers,
jamal



^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-24 11:49                                                   ` Felix Fietkau
  2012-12-24 12:19                                                     ` Jamal Hadi Salim
@ 2012-12-24 13:12                                                     ` Pablo Neira Ayuso
  2012-12-24 14:05                                                       ` Jamal Hadi Salim
  1 sibling, 1 reply; 69+ messages in thread
From: Pablo Neira Ayuso @ 2012-12-24 13:12 UTC (permalink / raw)
  To: Felix Fietkau
  Cc: Jamal Hadi Salim, Yury Stankevich, Hasan Chowdhury,
	Stephen Hemminger, Jan Engelhardt, netdev, netfilter-devel

Hi Felix,

On Mon, Dec 24, 2012 at 12:49:16PM +0100, Felix Fietkau wrote:
> On 2012-12-24 12:34 PM, Jamal Hadi Salim wrote:
> > 
> > Some good news Yury.
> > I am told Felix Fietkau <nbd@openwrt.org> (on CC) actually
> > already solved this issue and it is a feature in openwrt. I
> > cant find the code.
> > 
> > Felix - Yury is trying to retrieve skb->mark fields from
> > netfilter connmark. My understanding is you have written
> > such an action. Can you please point us to it - and any
> > reason you havent submitted this for inclusion in kernel
> > proper?
> After I added it as an experiment, I got distracted with other projects
> again and forgot about submitting it. Take a look at the code - if the
> approach is reasonable, I'll submit this thing for inclusion soon.
> 
> - Felix
> 
> --- /dev/null
> +++ b/net/sched/act_connmark.c
> @@ -0,0 +1,137 @@
> +/*
> + * Copyright (c) 2011 Felix Fietkau <nbd@openwrt.org>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/skbuff.h>
> +#include <linux/rtnetlink.h>
> +#include <linux/pkt_cls.h>
> +#include <linux/ip.h>
> +#include <linux/ipv6.h>
> +#include <net/netlink.h>
> +#include <net/pkt_sched.h>
> +#include <net/act_api.h>
> +
> +#include <net/netfilter/nf_conntrack.h>
> +#include <net/netfilter/nf_conntrack_core.h>
> +
> +#define TCA_ACT_CONNMARK	20
> +
> +#define CONNMARK_TAB_MASK     3
> +static struct tcf_common *tcf_connmark_ht[CONNMARK_TAB_MASK + 1];
> +static u32 connmark_idx_gen;
> +static DEFINE_RWLOCK(connmark_lock);
> +
> +static struct tcf_hashinfo connmark_hash_info = {
> +	.htab	=	tcf_connmark_ht,
> +	.hmask	=	CONNMARK_TAB_MASK,
> +	.lock	=	&connmark_lock,
> +};
> +
> +static int tcf_connmark(struct sk_buff *skb, const struct tc_action *a,
> +		       struct tcf_result *res)
> +{
> +	struct nf_conn *c;
> +	enum ip_conntrack_info ctinfo;
> +	int proto;
> +	int r;
> +
> +	if (skb->protocol == htons(ETH_P_IP)) {
> +		if (skb->len < sizeof(struct iphdr))
> +			goto out;
> +		proto = PF_INET;
> +	} else if (skb->protocol == htons(ETH_P_IPV6)) {
> +		if (skb->len < sizeof(struct ipv6hdr))
> +			goto out;
> +		proto = PF_INET6;
> +	} else
> +		goto out;
> +
> +	r = nf_conntrack_in(dev_net(skb->dev), proto, NF_INET_PRE_ROUTING, skb);

conntrack needs to see defragmented packets, you have to call
nf_defrag_ipv4 / _ipv6 respectively before that.

This also changes the semantics of the raw table in iptables since it
will now see packet with conntrack already attached. So this would
also break -j CT --notrack.

This needs more thinking. I can appreciate the value of calling
conntrack from different points of the packet traversal, but there are
a couple of thing we have to resolve before allowing that.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-24 13:12                                                     ` Pablo Neira Ayuso
@ 2012-12-24 14:05                                                       ` Jamal Hadi Salim
  2012-12-24 18:19                                                         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 69+ messages in thread
From: Jamal Hadi Salim @ 2012-12-24 14:05 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Felix Fietkau, Yury Stankevich, Hasan Chowdhury,
	Stephen Hemminger, Jan Engelhardt, netdev, netfilter-devel

Hi Pablo,

On 12-12-24 08:12 AM, Pablo Neira Ayuso wrote:

>
> conntrack needs to see defragmented packets, you have to call
> nf_defrag_ipv4 / _ipv6 respectively before that.
>

This should not be too hard to do - although my thinking says this
should be a separate action.

> This also changes the semantics of the raw table in iptables since it
> will now see packet with conntrack already attached. So this would
> also break -j CT --notrack.
>

Is there a flag we can check which says a flow is not to be tracked?
Doesnt nf_conntrack_in() fail if --no track is set?

> This needs more thinking. I can appreciate the value of calling
> conntrack from different points of the packet traversal, but there are
> a couple of thing we have to resolve before allowing that.

There is user need for this Pablo - as you can see from what Felix
deployed it seems to be used a lot more wider audience dependency.
What do we need to do to get this to work properly?

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-24 14:05                                                       ` Jamal Hadi Salim
@ 2012-12-24 18:19                                                         ` Pablo Neira Ayuso
  2012-12-26 23:10                                                           ` Pablo Neira Ayuso
  0 siblings, 1 reply; 69+ messages in thread
From: Pablo Neira Ayuso @ 2012-12-24 18:19 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Felix Fietkau, Yury Stankevich, Hasan Chowdhury,
	Stephen Hemminger, Jan Engelhardt, netdev, netfilter-devel

Hi Jamal,

On Mon, Dec 24, 2012 at 09:05:42AM -0500, Jamal Hadi Salim wrote:
> On 12-12-24 08:12 AM, Pablo Neira Ayuso wrote:
> >
> >conntrack needs to see defragmented packets, you have to call
> >nf_defrag_ipv4 / _ipv6 respectively before that.
> >
> 
> This should not be too hard to do - although my thinking says this
> should be a separate action.
> 
> >This also changes the semantics of the raw table in iptables since it
> >will now see packet with conntrack already attached. So this would
> >also break -j CT --notrack.
> 
> Is there a flag we can check which says a flow is not to be tracked?
> Doesnt nf_conntrack_in() fail if --no track is set?

The notrack dummy conntrack (consider it a flag) is attached in
prerouting raw table. By attaching conntracks at ingress, the notrack
flag will be ignored. Note that this also breaks conntrack templates
via -j CT, that allows us to set custom conntrack timeouts, zones and
helpers at prerouting raw.

Basically, ct templates are attached via -j CT, this template is
munched by nf_conntrack_in, which adds the corresponding ct features
based on the template information.

> >This needs more thinking. I can appreciate the value of calling
> >conntrack from different points of the packet traversal, but there are
> >a couple of thing we have to resolve before allowing that.
> 
> There is user need for this Pablo - as you can see from what Felix
> deployed it seems to be used a lot more wider audience dependency.
> What do we need to do to get this to work properly?

The conntrack code needs to be generalized to allow creating conntrack
with features all at once (so we can remove the template
infrastructure). Even after that, we'll still have that -j CT rules
will be ignored if you're using, let's name it, act_ct from ingress to
attach the conntrack to it.

With the current approach you're using, people will see conntracks in
the iptables raw table, that breaks the current semantics.

We'll have the netfilter workshop by Q1/Q2 2013 (still TBA), I think
this is material for discussion in it.

cheers,
Pablo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
  2012-12-24 18:19                                                         ` Pablo Neira Ayuso
@ 2012-12-26 23:10                                                           ` Pablo Neira Ayuso
  0 siblings, 0 replies; 69+ messages in thread
From: Pablo Neira Ayuso @ 2012-12-26 23:10 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Felix Fietkau, Yury Stankevich, Hasan Chowdhury,
	Stephen Hemminger, Jan Engelhardt, netdev, netfilter-devel

On Mon, Dec 24, 2012 at 07:19:43PM +0100, Pablo Neira Ayuso wrote:
> Hi Jamal,
> 
> On Mon, Dec 24, 2012 at 09:05:42AM -0500, Jamal Hadi Salim wrote:
> > On 12-12-24 08:12 AM, Pablo Neira Ayuso wrote:
> > >
> > >conntrack needs to see defragmented packets, you have to call
> > >nf_defrag_ipv4 / _ipv6 respectively before that.
> > >
> > 
> > This should not be too hard to do - although my thinking says this
> > should be a separate action.
> > 
> > >This also changes the semantics of the raw table in iptables since it
> > >will now see packet with conntrack already attached. So this would
> > >also break -j CT --notrack.
> > 
> > Is there a flag we can check which says a flow is not to be tracked?
> > Doesnt nf_conntrack_in() fail if --no track is set?
> 
> The notrack dummy conntrack (consider it a flag) is attached in
> prerouting raw table. By attaching conntracks at ingress, the notrack
> flag will be ignored. Note that this also breaks conntrack templates
> via -j CT, that allows us to set custom conntrack timeouts, zones and
> helpers at prerouting raw.
> 
> Basically, ct templates are attached via -j CT, this template is
> munched by nf_conntrack_in, which adds the corresponding ct features
> based on the template information.

I'm still spinning around this and I don't come with some easy
solution that doesn't break the existing semantics. One possibility
can be to drop the ct reference after leaving ingress, so the lookup
happens again in prerouting after the raw table to attach it again and
no ct is seen in the raw table but:

1) it's suboptimal in case users have rules using ct at ingress and in
   iptables.

2) the conntrack template infrastructure needs to be reworked/replaced
   by something more flexible to attach features to conntracks, so we
   can still attach features for conntrack entries that were created
   at ingress (so helpers / custom timeouts / notrack don't break).

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH net-next] net: introduce skb_transport_header_was_set()
  2012-12-22 13:42                                             ` Jamal Hadi Salim
@ 2013-01-07 19:28                                               ` Eric Dumazet
  2013-01-08 14:05                                                 ` Jamal Hadi Salim
  2013-01-09  1:52                                                 ` David Miller
  0 siblings, 2 replies; 69+ messages in thread
From: Eric Dumazet @ 2013-01-07 19:28 UTC (permalink / raw)
  To: Jamal Hadi Salim, David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

On Sat, 2012-12-22 at 08:42 -0500, Jamal Hadi Salim wrote:
> On 12-12-21 10:45 AM, Eric Dumazet wrote:
> > On Fri, 2012-12-21 at 15:35 +0100, Jan Engelhardt wrote:
> >
> 
> > This reminds me this might be the reason we have
> > skb_reset_transport_header(skb);
> > in __netif_receive_skb(), while its not very logical.
> >
> 
> You seem to have nailed the egress part finally. That has
> been a constant battle. At one point the standard answer
> was "turn off TSO" ;->
> 
> > (Yes, sorry for being off topic, but I am referring to
> > http://www.spinics.net/lists/netdev/msg214662.html )
> 
> 
> I think the skb_reset_transport_header() when Acme made
> a major overhaul to replace direct pointer access.
> For this reason i think your second option seems preferable.

It seems we already have a special case for mac_header, with the
skb_mac_header_was_set() helper.

We could have same logic for transport_header

Something like :

[PATCH net-next] net: introduce skb_transport_header_was_set()

We have skb_mac_header_was_set() helper to tell if mac_header
was set on a skb. We would like the same for transport_header.

__netif_receive_skb() doesn't reset the transport header if already
set by GRO layer.

Note that network stacks usually reset the transport header anyway,
after pulling the network header, so this change only allows
a followup patch to have more precise qdisc pkt_len computation
for GSO packets at ingress side.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
---
 include/linux/skbuff.h |   10 ++++++++++
 net/core/dev.c         |    3 ++-
 net/core/skbuff.c      |    2 ++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 320e976..8b2256e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1492,6 +1492,11 @@ static inline void skb_set_inner_network_header(struct sk_buff *skb,
 	skb->inner_network_header += offset;
 }
 
+static inline bool skb_transport_header_was_set(const struct sk_buff *skb)
+{
+	return skb->transport_header != ~0U;
+}
+
 static inline unsigned char *skb_transport_header(const struct sk_buff *skb)
 {
 	return skb->head + skb->transport_header;
@@ -1580,6 +1585,11 @@ static inline void skb_set_inner_network_header(struct sk_buff *skb,
 	skb->inner_network_header = skb->data + offset;
 }
 
+static inline bool skb_transport_header_was_set(const struct sk_buff *skb)
+{
+	return skb->transport_header != NULL;
+}
+
 static inline unsigned char *skb_transport_header(const struct sk_buff *skb)
 {
 	return skb->transport_header;
diff --git a/net/core/dev.c b/net/core/dev.c
index a51ccf4..2e24482 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3352,7 +3352,8 @@ static int __netif_receive_skb(struct sk_buff *skb)
 	orig_dev = skb->dev;
 
 	skb_reset_network_header(skb);
-	skb_reset_transport_header(skb);
+	if (!skb_transport_header_was_set(skb))
+		skb_reset_transport_header(skb);
 	skb_reset_mac_len(skb);
 
 	pt_prev = NULL;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b03fc0c..1e1b9ea 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -260,6 +260,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	skb->end = skb->tail + size;
 #ifdef NET_SKBUFF_DATA_USES_OFFSET
 	skb->mac_header = ~0U;
+	skb->transport_header = ~0U;
 #endif
 
 	/* make sure we initialize shinfo sequentially */
@@ -328,6 +329,7 @@ struct sk_buff *build_skb(void *data, unsigned int frag_size)
 	skb->end = skb->tail + size;
 #ifdef NET_SKBUFF_DATA_USES_OFFSET
 	skb->mac_header = ~0U;
+	skb->transport_header = ~0U;
 #endif
 
 	/* make sure we initialize shinfo sequentially */

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH net-next] net: introduce skb_transport_header_was_set()
  2013-01-07 19:28                                               ` [PATCH net-next] net: introduce skb_transport_header_was_set() Eric Dumazet
@ 2013-01-08 14:05                                                 ` Jamal Hadi Salim
  2013-01-09  1:52                                                 ` David Miller
  1 sibling, 0 replies; 69+ messages in thread
From: Jamal Hadi Salim @ 2013-01-08 14:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On 13-01-07 02:28 PM, Eric Dumazet wrote:

> Note that network stacks usually reset the transport header anyway,
> after pulling the network header, so this change only allows
> a followup patch to have more precise qdisc pkt_len computation
> for GSO packets at ingress side.
>

Looks good to me.

cheers,
jamal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH net-next] net: introduce skb_transport_header_was_set()
  2013-01-07 19:28                                               ` [PATCH net-next] net: introduce skb_transport_header_was_set() Eric Dumazet
  2013-01-08 14:05                                                 ` Jamal Hadi Salim
@ 2013-01-09  1:52                                                 ` David Miller
  1 sibling, 0 replies; 69+ messages in thread
From: David Miller @ 2013-01-09  1:52 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jhs, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 07 Jan 2013 11:28:21 -0800

> [PATCH net-next] net: introduce skb_transport_header_was_set()
> 
> We have skb_mac_header_was_set() helper to tell if mac_header
> was set on a skb. We would like the same for transport_header.
> 
> __netif_receive_skb() doesn't reset the transport header if already
> set by GRO layer.
> 
> Note that network stacks usually reset the transport header anyway,
> after pulling the network header, so this change only allows
> a followup patch to have more precise qdisc pkt_len computation
> for GSO packets at ingress side.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2013-01-09  1:52 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-09 12:20 tc ipt action Yury Stankevich
2012-12-13 10:58 ` Jamal Hadi Salim
2012-12-15 21:19   ` Jamal Hadi Salim
2012-12-15 23:06     ` Jan Engelhardt
2012-12-16  0:26       ` Jan Engelhardt
2012-12-16  0:32         ` [PATCH] build: unbreak linkage of m_xt.so Jan Engelhardt
2012-12-16 10:30           ` Jamal Hadi Salim
2012-12-16 17:03             ` Jamal Hadi Salim
2012-12-16 17:43               ` Jan Engelhardt
2012-12-16 18:05                 ` Jamal Hadi Salim
2012-12-16 22:02           ` Mike Frysinger
2012-12-18 17:21           ` Stephen Hemminger
2012-12-18 18:47             ` Mike Frysinger
2012-12-20  0:03               ` Stephen Hemminger
2012-12-16 10:22       ` tc ipt action Jamal Hadi Salim
     [not found]       ` <CAASe=fQT2pVOK0uctdaKL+aOrF8nYeTMfoF15kmd-rC02+7Vnw@mail.gmail.com>
2012-12-16 16:48         ` Jamal Hadi Salim
2012-12-16 18:59           ` Jamal Hadi Salim
2012-12-16 19:13             ` Jan Engelhardt
2012-12-16 20:36               ` Jamal Hadi Salim
2012-12-16 20:41               ` [PATCH] iproute2: act_ipt fix xtables breakage Jamal Hadi Salim
2012-12-17 12:30                 ` RFC [PATCH] iproute2: temporary solution to fix xt breakage Jamal Hadi Salim
2012-12-17 16:12                   ` Stephen Hemminger
2012-12-19 11:36                     ` Jamal Hadi Salim
     [not found]                   ` <CAASe=fRuJdtisEvp7uo=PHwN3nKHqsYDW4Om1gk2MK-vyNvBrA@mail.gmail.com>
2012-12-18 12:28                     ` Jamal Hadi Salim
     [not found]                       ` <CAASe=fR6Hm2dxp=1wDchtrzqnaH6qacHpg2wrsqLfmGpPbQ9Fg@mail.gmail.com>
2012-12-19 11:44                         ` Jamal Hadi Salim
2012-12-19 11:56                           ` [PATCH] pkt_sched: act_xt support new Xtables interface Jamal Hadi Salim
2012-12-19 15:52                             ` Jan Engelhardt
2012-12-19 23:05                               ` Jamal Hadi Salim
     [not found]                             ` <CAASe=fQZGwjM_2PStRE0tje33Doi6TuwJJ3p7x-SRcwq3mQvRg@mail.gmail.com>
2012-12-19 23:00                               ` Jamal Hadi Salim
2012-12-20  8:54                             ` Yury Stankevich
2012-12-20 12:35                               ` Jamal Hadi Salim
2012-12-20 14:59                                 ` Yury Stankevich
2012-12-21 13:03                                   ` Jamal Hadi Salim
2012-12-21 13:13                                     ` Yury Stankevich
2012-12-21 13:50                                       ` Jamal Hadi Salim
2012-12-21 14:14                                         ` Yury Stankevich
2012-12-22 13:19                                           ` Jamal Hadi Salim
2012-12-22 13:43                                             ` Jan Engelhardt
2012-12-22 13:56                                               ` Jamal Hadi Salim
2012-12-22 13:58                                             ` Yury Stankevich
2012-12-22 14:04                                               ` Florian Westphal
2012-12-22 14:09                                               ` Jamal Hadi Salim
2012-12-24 11:34                                                 ` Jamal Hadi Salim
2012-12-24 11:49                                                   ` Felix Fietkau
2012-12-24 12:19                                                     ` Jamal Hadi Salim
2012-12-24 13:12                                                     ` Pablo Neira Ayuso
2012-12-24 14:05                                                       ` Jamal Hadi Salim
2012-12-24 18:19                                                         ` Pablo Neira Ayuso
2012-12-26 23:10                                                           ` Pablo Neira Ayuso
2012-12-21 14:35                                         ` Jan Engelhardt
2012-12-21 15:45                                           ` Eric Dumazet
2012-12-22 13:42                                             ` Jamal Hadi Salim
2013-01-07 19:28                                               ` [PATCH net-next] net: introduce skb_transport_header_was_set() Eric Dumazet
2013-01-08 14:05                                                 ` Jamal Hadi Salim
2013-01-09  1:52                                                 ` David Miller
2012-12-16  0:27     ` tc ipt action Pablo Neira Ayuso
2012-12-16  0:59       ` Jan Engelhardt
2012-12-16 10:43         ` Jamal Hadi Salim
2012-12-16 17:21           ` Jan Engelhardt
2012-12-16 17:47             ` Jamal Hadi Salim
2012-12-16 18:59               ` Jan Engelhardt
2012-12-16 20:35                 ` Jamal Hadi Salim
2012-12-16 21:21                   ` Jan Engelhardt
2012-12-17 12:58                     ` Jamal Hadi Salim
2012-12-17 13:28                       ` Jan Engelhardt
2012-12-18 13:23                         ` Jamal Hadi Salim
2012-12-18 13:58                           ` Jan Engelhardt
2012-12-19 11:43                             ` Jamal Hadi Salim
2012-12-16 10:26       ` Jamal Hadi Salim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).