All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
@ 2017-04-16 12:34 Jamal Hadi Salim
  2017-04-17  8:19 ` Jiri Pirko
  0 siblings, 1 reply; 17+ messages in thread
From: Jamal Hadi Salim @ 2017-04-16 12:34 UTC (permalink / raw)
  To: davem; +Cc: netdev, eric.dumazet, xiyou.wangcong, jiri, Jamal Hadi Salim

From: Jamal Hadi Salim <jhs@mojatatu.com>

When you dump hundreds of thousands of actions, getting only 32 per
dump batch even when the socket buffer and memory allocations allow
is inefficient.

With this change, the user will get as many as possibly fitting
within the given constraints.

We reuse the pad fields in tcamsg. pad1 is used as a flag space.
User explicitly requests for the large dump in order to maintain
backwards compatibility with user space by setting bit
ACT_LARGE_DUMP_ON; older user space which doesnt set this flag
doesnt get the large (than 32) batches - so continues to work
as before. Older kernels ignore this flag, so legacy behavior
is maintained.
The kernel uses pad2 to tell the user how many actions are put in
a single batch. As such user space app(like tc) knows how long
to iterate instead of hardcoded maximum of 32.

Some results dumping 1.5M actions, first unpatched tc which the
kernel doesnt help:

prompt$ time -p tc actions ls action gact | grep index | wc -l
1500000
real 1388.43
user 2.07
sys 1386.79

Now patched tc which sets the correct flags when requesting a dump:

prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
1500000
real 178.13
user 2.02
sys 176.96

Improvement: Dump time from about 20 minutes to about 2 minutes.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 include/uapi/linux/rtnetlink.h |  7 +++++++
 net/sched/act_api.c            | 17 ++++++++++++++---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index cce0613..3434d98 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -678,6 +678,13 @@ struct tcamsg {
 #define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg))
 #define TCA_ACT_TAB 1 /* attr type must be >=1 */	
 #define TCAA_MAX 1
+/* tcamsg flags stored in tca__pad
+ *
+ * ACT_LARGE_DUMP_ON user->kernel to request for larger than TCA_ACT_MAX_PRIO
+ * actions in a dump. All dump responses will contain the number of actions
+ * being dumped stored in tca__pad2 for user app's info
+ */
+#define ACT_LARGE_DUMP_ON		(1 << 0)
 
 /* New extended info filters for IFLA_EXT_MASK */
 #define RTEXT_FILTER_VF		(1 << 0)
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 79d875c..90cc774 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -83,6 +83,7 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, struct sk_buff *skb,
 			   struct netlink_callback *cb)
 {
 	int err = 0, index = -1, i = 0, s_i = 0, n_i = 0;
+	unsigned short act_flags = cb->args[2];
 	struct nlattr *nest;
 
 	spin_lock_bh(&hinfo->lock);
@@ -111,14 +112,18 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, struct sk_buff *skb,
 			}
 			nla_nest_end(skb, nest);
 			n_i++;
-			if (n_i >= TCA_ACT_MAX_PRIO)
+			if (!(act_flags & ACT_LARGE_DUMP_ON) &&
+			    n_i >= TCA_ACT_MAX_PRIO)
 				goto done;
 		}
 	}
 done:
 	spin_unlock_bh(&hinfo->lock);
-	if (n_i)
+	if (n_i) {
 		cb->args[0] += n_i;
+		if (act_flags & ACT_LARGE_DUMP_ON)
+			cb->args[1] = n_i;
+	}
 	return n_i;
 
 nla_put_failure:
@@ -1081,6 +1086,7 @@ static int tc_dump_action(struct sk_buff *skb, struct netlink_callback *cb)
 	struct tc_action_ops *a_o;
 	int ret = 0;
 	struct tcamsg *t = (struct tcamsg *) nlmsg_data(cb->nlh);
+	unsigned char act_flags = t->tca__pad1;
 	struct nlattr *kind = find_dump_kind(cb->nlh);
 
 	if (kind == NULL) {
@@ -1096,9 +1102,12 @@ static int tc_dump_action(struct sk_buff *skb, struct netlink_callback *cb)
 			cb->nlh->nlmsg_type, sizeof(*t), 0);
 	if (!nlh)
 		goto out_module_put;
+
+	cb->args[2] = act_flags;
+
 	t = nlmsg_data(nlh);
 	t->tca_family = AF_UNSPEC;
-	t->tca__pad1 = 0;
+	t->tca__pad1 = act_flags;
 	t->tca__pad2 = 0;
 
 	nest = nla_nest_start(skb, TCA_ACT_TAB);
@@ -1112,6 +1121,8 @@ static int tc_dump_action(struct sk_buff *skb, struct netlink_callback *cb)
 	if (ret > 0) {
 		nla_nest_end(skb, nest);
 		ret = skb->len;
+		t->tca__pad2 = cb->args[1];
+		cb->args[1] = 0;
 	} else
 		nlmsg_trim(skb, b);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  2017-04-16 12:34 [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch Jamal Hadi Salim
@ 2017-04-17  8:19 ` Jiri Pirko
  2017-04-17 11:01   ` Jamal Hadi Salim
  0 siblings, 1 reply; 17+ messages in thread
From: Jiri Pirko @ 2017-04-17  8:19 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: davem, netdev, eric.dumazet, xiyou.wangcong

Sun, Apr 16, 2017 at 02:34:30PM CEST, jhs@mojatatu.com wrote:
>From: Jamal Hadi Salim <jhs@mojatatu.com>
>
>When you dump hundreds of thousands of actions, getting only 32 per
>dump batch even when the socket buffer and memory allocations allow
>is inefficient.
>
>With this change, the user will get as many as possibly fitting
>within the given constraints.
>
>We reuse the pad fields in tcamsg. pad1 is used as a flag space.
>User explicitly requests for the large dump in order to maintain
>backwards compatibility with user space by setting bit
>ACT_LARGE_DUMP_ON; older user space which doesnt set this flag
>doesnt get the large (than 32) batches - so continues to work
>as before. Older kernels ignore this flag, so legacy behavior
>is maintained.
>The kernel uses pad2 to tell the user how many actions are put in
>a single batch. As such user space app(like tc) knows how long
>to iterate instead of hardcoded maximum of 32.
>
>Some results dumping 1.5M actions, first unpatched tc which the
>kernel doesnt help:
>
>prompt$ time -p tc actions ls action gact | grep index | wc -l
>1500000
>real 1388.43
>user 2.07
>sys 1386.79
>
>Now patched tc which sets the correct flags when requesting a dump:
>
>prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
>1500000
>real 178.13
>user 2.02
>sys 176.96
>
>Improvement: Dump time from about 20 minutes to about 2 minutes.
>
>Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
>---
> include/uapi/linux/rtnetlink.h |  7 +++++++
> net/sched/act_api.c            | 17 ++++++++++++++---
> 2 files changed, 21 insertions(+), 3 deletions(-)
>
>diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
>index cce0613..3434d98 100644
>--- a/include/uapi/linux/rtnetlink.h
>+++ b/include/uapi/linux/rtnetlink.h
>@@ -678,6 +678,13 @@ struct tcamsg {
> #define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg))
> #define TCA_ACT_TAB 1 /* attr type must be >=1 */	
> #define TCAA_MAX 1
>+/* tcamsg flags stored in tca__pad
>+ *
>+ * ACT_LARGE_DUMP_ON user->kernel to request for larger than TCA_ACT_MAX_PRIO
>+ * actions in a dump. All dump responses will contain the number of actions
>+ * being dumped stored in tca__pad2 for user app's info
>+ */
>+#define ACT_LARGE_DUMP_ON		(1 << 0)
> 
> /* New extended info filters for IFLA_EXT_MASK */
> #define RTEXT_FILTER_VF		(1 << 0)
>diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>index 79d875c..90cc774 100644
>--- a/net/sched/act_api.c
>+++ b/net/sched/act_api.c
>@@ -83,6 +83,7 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, struct sk_buff *skb,
> 			   struct netlink_callback *cb)
> {
> 	int err = 0, index = -1, i = 0, s_i = 0, n_i = 0;
>+	unsigned short act_flags = cb->args[2];
> 	struct nlattr *nest;
> 
> 	spin_lock_bh(&hinfo->lock);
>@@ -111,14 +112,18 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, struct sk_buff *skb,
> 			}
> 			nla_nest_end(skb, nest);
> 			n_i++;
>-			if (n_i >= TCA_ACT_MAX_PRIO)
>+			if (!(act_flags & ACT_LARGE_DUMP_ON) &&
>+			    n_i >= TCA_ACT_MAX_PRIO)
> 				goto done;
> 		}
> 	}
> done:
> 	spin_unlock_bh(&hinfo->lock);
>-	if (n_i)
>+	if (n_i) {
> 		cb->args[0] += n_i;
>+		if (act_flags & ACT_LARGE_DUMP_ON)
>+			cb->args[1] = n_i;
>+	}
> 	return n_i;
> 
> nla_put_failure:
>@@ -1081,6 +1086,7 @@ static int tc_dump_action(struct sk_buff *skb, struct netlink_callback *cb)
> 	struct tc_action_ops *a_o;
> 	int ret = 0;
> 	struct tcamsg *t = (struct tcamsg *) nlmsg_data(cb->nlh);
>+	unsigned char act_flags = t->tca__pad1;
> 	struct nlattr *kind = find_dump_kind(cb->nlh);
> 
> 	if (kind == NULL) {
>@@ -1096,9 +1102,12 @@ static int tc_dump_action(struct sk_buff *skb, struct netlink_callback *cb)
> 			cb->nlh->nlmsg_type, sizeof(*t), 0);
> 	if (!nlh)
> 		goto out_module_put;
>+
>+	cb->args[2] = act_flags;
>+
> 	t = nlmsg_data(nlh);
> 	t->tca_family = AF_UNSPEC;
>-	t->tca__pad1 = 0;
>+	t->tca__pad1 = act_flags;

I don't like this one bit. Pad is no longer a pad but by name it still
is. Why don't you introduce new attributes for this?



> 	t->tca__pad2 = 0;
> 
> 	nest = nla_nest_start(skb, TCA_ACT_TAB);
>@@ -1112,6 +1121,8 @@ static int tc_dump_action(struct sk_buff *skb, struct netlink_callback *cb)
> 	if (ret > 0) {
> 		nla_nest_end(skb, nest);
> 		ret = skb->len;
>+		t->tca__pad2 = cb->args[1];
>+		cb->args[1] = 0;
> 	} else
> 		nlmsg_trim(skb, b);
> 
>-- 
>1.9.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  2017-04-17  8:19 ` Jiri Pirko
@ 2017-04-17 11:01   ` Jamal Hadi Salim
  2017-04-17 13:10     ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Jamal Hadi Salim @ 2017-04-17 11:01 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, eric.dumazet, xiyou.wangcong

On 17-04-17 04:19 AM, Jiri Pirko wrote:
> Sun, Apr 16, 2017 at 02:34:30PM CEST, jhs@mojatatu.com wrote:
>> From: Jamal Hadi Salim <jhs@mojatatu.com>
>>
>> When you dump hundreds of thousands of actions, getting only 32 per
>> dump batch even when the socket buffer and memory allocations allow
>> is inefficient.
>>
>> With this change, the user will get as many as possibly fitting
>> within the given constraints.
>>
>> We reuse the pad fields in tcamsg. pad1 is used as a flag space.
>> User explicitly requests for the large dump in order to maintain
>> backwards compatibility with user space by setting bit
>> ACT_LARGE_DUMP_ON; older user space which doesnt set this flag
>> doesnt get the large (than 32) batches - so continues to work
>> as before. Older kernels ignore this flag, so legacy behavior
>> is maintained.
>> The kernel uses pad2 to tell the user how many actions are put in
>> a single batch. As such user space app(like tc) knows how long
>> to iterate instead of hardcoded maximum of 32.
>>
>> Some results dumping 1.5M actions, first unpatched tc which the
>> kernel doesnt help:
>>
>> prompt$ time -p tc actions ls action gact | grep index | wc -l
>> 1500000
>> real 1388.43
>> user 2.07
>> sys 1386.79
>>
>> Now patched tc which sets the correct flags when requesting a dump:
>>
>> prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
>> 1500000
>> real 178.13
>> user 2.02
>> sys 176.96
>>
>> Improvement: Dump time from about 20 minutes to about 2 minutes.
>>
>> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
>> ---
>> include/uapi/linux/rtnetlink.h |  7 +++++++
>> net/sched/act_api.c            | 17 ++++++++++++++---
>> 2 files changed, 21 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
>> index cce0613..3434d98 100644
>> --- a/include/uapi/linux/rtnetlink.h
>> +++ b/include/uapi/linux/rtnetlink.h
>> @@ -678,6 +678,13 @@ struct tcamsg {
>> #define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg))
>> #define TCA_ACT_TAB 1 /* attr type must be >=1 */	
>> #define TCAA_MAX 1
>> +/* tcamsg flags stored in tca__pad
>> + *
>> + * ACT_LARGE_DUMP_ON user->kernel to request for larger than TCA_ACT_MAX_PRIO
>> + * actions in a dump. All dump responses will contain the number of actions
>> + * being dumped stored in tca__pad2 for user app's info
>> + */
>> +#define ACT_LARGE_DUMP_ON		(1 << 0)
>>
>> /* New extended info filters for IFLA_EXT_MASK */
>> #define RTEXT_FILTER_VF		(1 << 0)
>> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>> index 79d875c..90cc774 100644
>> --- a/net/sched/act_api.c
>> +++ b/net/sched/act_api.c
>> @@ -83,6 +83,7 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, struct sk_buff *skb,
>> 			   struct netlink_callback *cb)
>> {
>> 	int err = 0, index = -1, i = 0, s_i = 0, n_i = 0;
>> +	unsigned short act_flags = cb->args[2];
>> 	struct nlattr *nest;
>>
>> 	spin_lock_bh(&hinfo->lock);
>> @@ -111,14 +112,18 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, struct sk_buff *skb,
>> 			}
>> 			nla_nest_end(skb, nest);
>> 			n_i++;
>> -			if (n_i >= TCA_ACT_MAX_PRIO)
>> +			if (!(act_flags & ACT_LARGE_DUMP_ON) &&
>> +			    n_i >= TCA_ACT_MAX_PRIO)
>> 				goto done;
>> 		}
>> 	}
>> done:
>> 	spin_unlock_bh(&hinfo->lock);
>> -	if (n_i)
>> +	if (n_i) {
>> 		cb->args[0] += n_i;
>> +		if (act_flags & ACT_LARGE_DUMP_ON)
>> +			cb->args[1] = n_i;
>> +	}
>> 	return n_i;
>>
>> nla_put_failure:
>> @@ -1081,6 +1086,7 @@ static int tc_dump_action(struct sk_buff *skb, struct netlink_callback *cb)
>> 	struct tc_action_ops *a_o;
>> 	int ret = 0;
>> 	struct tcamsg *t = (struct tcamsg *) nlmsg_data(cb->nlh);
>> +	unsigned char act_flags = t->tca__pad1;
>> 	struct nlattr *kind = find_dump_kind(cb->nlh);
>>
>> 	if (kind == NULL) {
>> @@ -1096,9 +1102,12 @@ static int tc_dump_action(struct sk_buff *skb, struct netlink_callback *cb)
>> 			cb->nlh->nlmsg_type, sizeof(*t), 0);
>> 	if (!nlh)
>> 		goto out_module_put;
>> +
>> +	cb->args[2] = act_flags;
>> +
>> 	t = nlmsg_data(nlh);
>> 	t->tca_family = AF_UNSPEC;
>> -	t->tca__pad1 = 0;
>> +	t->tca__pad1 = act_flags;
>
> I don't like this one bit. Pad is no longer a pad but by name it still
> is. Why don't you introduce new attributes for this?
>

The name "pad" is ugly - but _we need to put these reserved spaces
to good use_. We cant keep declaring pads and say they should never
be used in the future.
We dont need more than 2-3 bits for the flags for example and i dont
see anyone dumping 64K actions in one message.
An attribute is a big waste of space. I cant change the name pad - 
perhaps a union with a new name? We had a similar discussion a while 
back on some netlink header, i just dont remember the details.
Suggestions?

cheers,
jamal

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  2017-04-17 11:01   ` Jamal Hadi Salim
@ 2017-04-17 13:10     ` Eric Dumazet
  2017-04-17 14:02       ` Jamal Hadi Salim
  2017-04-17 15:31       ` Jiri Pirko
  0 siblings, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2017-04-17 13:10 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Jiri Pirko, davem, netdev, xiyou.wangcong

On Mon, 2017-04-17 at 07:01 -0400, Jamal Hadi Salim wrote:

> The name "pad" is ugly - but _we need to put these reserved spaces
> to good use_. We cant keep declaring pads and say they should never
> be used in the future.
> We dont need more than 2-3 bits for the flags for example and i dont
> see anyone dumping 64K actions in one message.
> An attribute is a big waste of space. I cant change the name pad - 
> perhaps a union with a new name? We had a similar discussion a while 
> back on some netlink header, i just dont remember the details.
> Suggestions?

We can not assume user programs properly cleared the paddings anyway.

Using them for 'new features' is risky, since it might break programs.

So the safe way is using new attributes really.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  2017-04-17 13:10     ` Eric Dumazet
@ 2017-04-17 14:02       ` Jamal Hadi Salim
  2017-04-17 14:58         ` Eric Dumazet
  2017-04-17 15:31       ` Jiri Pirko
  1 sibling, 1 reply; 17+ messages in thread
From: Jamal Hadi Salim @ 2017-04-17 14:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jiri Pirko, davem, netdev, xiyou.wangcong

On 17-04-17 09:10 AM, Eric Dumazet wrote:

[..]
>
> We can not assume user programs properly cleared the paddings anyway.
>
> Using them for 'new features' is risky, since it might break programs.
>
> So the safe way is using new attributes really.

Since we agreed to have longer discussions on uapis
when we met I'd like to digress:
Can we talk about what it means to define pads in data
structures and then never using them?
As an example, no-one is setting these fields anywhere on any
app i know of. Would a union not be good enough for new name
vs old name? old binaries should continue to work.
Breakage with any app during compile should be fixable
within the breaking app (since whoever it is would have source).
Maybe by breaking some weird app we can experiment on finding
out.

My contention is that it is not nice to continue to define uapi
pads and  then say they cant be used ever.

cheers,
jamal

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  2017-04-17 14:02       ` Jamal Hadi Salim
@ 2017-04-17 14:58         ` Eric Dumazet
  2017-04-17 16:40           ` Jamal Hadi Salim
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2017-04-17 14:58 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Jiri Pirko, davem, netdev, xiyou.wangcong

On Mon, 2017-04-17 at 10:02 -0400, Jamal Hadi Salim wrote:
> On 17-04-17 09:10 AM, Eric Dumazet wrote:
> 
> [..]
> >
> > We can not assume user programs properly cleared the paddings anyway.
> >
> > Using them for 'new features' is risky, since it might break programs.
> >
> > So the safe way is using new attributes really.
> 
> Since we agreed to have longer discussions on uapis
> when we met I'd like to digress:
> Can we talk about what it means to define pads in data
> structures and then never using them?
> As an example, no-one is setting these fields anywhere on any
> app i know of. Would a union not be good enough for new name
> vs old name? old binaries should continue to work.
> Breakage with any app during compile should be fixable
> within the breaking app (since whoever it is would have source).
> Maybe by breaking some weird app we can experiment on finding
> out.
> 
> My contention is that it is not nice to continue to define uapi
> pads and  then say they cant be used ever.

Very often, pads are there because of ABI constraints.

We 'name' them to make clear to developers that they are there,
and avoid security issues, because of say few bytes from kernel stack
are copied to user space.

struct foo {
  __u32 a;
  __u16 b;
};


Note that the 16bit padding is there, even if you do not name it.

Once this structure had been exported to some include file and in a
kernel, there is little point trying to 'reuse' the padding, unless for
very specific cases.

If you name paddings, then developers might think about it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  2017-04-17 13:10     ` Eric Dumazet
  2017-04-17 14:02       ` Jamal Hadi Salim
@ 2017-04-17 15:31       ` Jiri Pirko
  2017-04-17 16:46         ` Jamal Hadi Salim
  1 sibling, 1 reply; 17+ messages in thread
From: Jiri Pirko @ 2017-04-17 15:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jamal Hadi Salim, davem, netdev, xiyou.wangcong

Mon, Apr 17, 2017 at 03:10:59PM CEST, eric.dumazet@gmail.com wrote:
>On Mon, 2017-04-17 at 07:01 -0400, Jamal Hadi Salim wrote:
>
>> The name "pad" is ugly - but _we need to put these reserved spaces
>> to good use_. We cant keep declaring pads and say they should never
>> be used in the future.
>> We dont need more than 2-3 bits for the flags for example and i dont
>> see anyone dumping 64K actions in one message.
>> An attribute is a big waste of space. I cant change the name pad - 
>> perhaps a union with a new name? We had a similar discussion a while 
>> back on some netlink header, i just dont remember the details.
>> Suggestions?
>
>We can not assume user programs properly cleared the paddings anyway.
>
>Using them for 'new features' is risky, since it might break programs.
>
>So the safe way is using new attributes really.

Agreed.

Plus the argument that attributes are "a big waste" sounds to me really
silly. What is couple of bytes? Please do this properly, as it should
be done.


>
>
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  2017-04-17 14:58         ` Eric Dumazet
@ 2017-04-17 16:40           ` Jamal Hadi Salim
  0 siblings, 0 replies; 17+ messages in thread
From: Jamal Hadi Salim @ 2017-04-17 16:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jiri Pirko, davem, netdev, xiyou.wangcong

On 17-04-17 10:58 AM, Eric Dumazet wrote:
[..]
> Very often, pads are there because of ABI constraints.
>
> We 'name' them to make clear to developers that they are there,
> and avoid security issues, because of say few bytes from kernel stack
> are copied to user space.
>
> struct foo {
>   __u32 a;
>   __u16 b;
> };
>
>
> Note that the 16bit padding is there, even if you do not name it.
>

Agreed. But note netlink is defined as "a wire protocol" which
has explicit requirement to pad/align to 32 bit boundary. Therefore
we _always_ explicitly name the pads.

> Once this structure had been exported to some include file and in a
> kernel, there is little point trying to 'reuse' the padding, unless for
> very specific cases.
>
> If you name paddings, then developers might think about it.
>

We always name them for netlink. The challenge is a few months later
we are not allowed to use the fields we name. I see these netlink
struct pads in the same manner as say reserved packet header fields.

cheers,
jamal

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  2017-04-17 15:31       ` Jiri Pirko
@ 2017-04-17 16:46         ` Jamal Hadi Salim
  2017-04-17 17:04           ` Jiri Pirko
  2017-04-17 17:08           ` Eric Dumazet
  0 siblings, 2 replies; 17+ messages in thread
From: Jamal Hadi Salim @ 2017-04-17 16:46 UTC (permalink / raw)
  To: Jiri Pirko, Eric Dumazet; +Cc: davem, netdev, xiyou.wangcong

On 17-04-17 11:31 AM, Jiri Pirko wrote:
> Mon, Apr 17, 2017 at 03:10:59PM CEST, eric.dumazet@gmail.com wrote:
>> On Mon, 2017-04-17 at 07:01 -0400, Jamal Hadi Salim wrote:

> Agreed.
>
> Plus the argument that attributes are "a big waste" sounds to me really
> silly. What is couple of bytes?Please do this properly, as it should
> be done.

Jiri - you wanted to have these uapi discussions, right? ;->

Of course it is trivial to add this as attributes and 32 bits
for this case is not a big deal because it is done once. I want to talk
about the pads instead ;-> What do you suggest we do with pads?

cheers,
jaaml

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  2017-04-17 16:46         ` Jamal Hadi Salim
@ 2017-04-17 17:04           ` Jiri Pirko
  2017-04-17 17:08           ` Eric Dumazet
  1 sibling, 0 replies; 17+ messages in thread
From: Jiri Pirko @ 2017-04-17 17:04 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Eric Dumazet, davem, netdev, xiyou.wangcong

Mon, Apr 17, 2017 at 06:46:17PM CEST, jhs@mojatatu.com wrote:
>On 17-04-17 11:31 AM, Jiri Pirko wrote:
>> Mon, Apr 17, 2017 at 03:10:59PM CEST, eric.dumazet@gmail.com wrote:
>> > On Mon, 2017-04-17 at 07:01 -0400, Jamal Hadi Salim wrote:
>
>> Agreed.
>> 
>> Plus the argument that attributes are "a big waste" sounds to me really
>> silly. What is couple of bytes?Please do this properly, as it should
>> be done.
>
>Jiri - you wanted to have these uapi discussions, right? ;->
>
>Of course it is trivial to add this as attributes and 32 bits
>for this case is not a big deal because it is done once. I want to talk
>about the pads instead ;-> What do you suggest we do with pads?

I believe we should leave them as they are.


>
>cheers,
>jaaml
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  2017-04-17 16:46         ` Jamal Hadi Salim
  2017-04-17 17:04           ` Jiri Pirko
@ 2017-04-17 17:08           ` Eric Dumazet
  2017-04-17 17:51             ` Roman Mashak
  2017-04-18 12:48             ` Case for reusing netlink PADs WAS(Re: " Jamal Hadi Salim
  1 sibling, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2017-04-17 17:08 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Jiri Pirko, davem, netdev, xiyou.wangcong

On Mon, 2017-04-17 at 12:46 -0400, Jamal Hadi Salim wrote:

> Of course it is trivial to add this as attributes and 32 bits
> for this case is not a big deal because it is done once. I want to talk
> about the pads instead ;-> What do you suggest we do with pads?

We do nothing with pads. Just leave them.

Or, you need something to make sure to not break old user applications.

Something like a version.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  2017-04-17 17:08           ` Eric Dumazet
@ 2017-04-17 17:51             ` Roman Mashak
  2017-04-18 12:48             ` Case for reusing netlink PADs WAS(Re: " Jamal Hadi Salim
  1 sibling, 0 replies; 17+ messages in thread
From: Roman Mashak @ 2017-04-17 17:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jamal Hadi Salim, Jiri Pirko, davem, netdev, xiyou.wangcong

Eric Dumazet <eric.dumazet@gmail.com> writes:

> On Mon, 2017-04-17 at 12:46 -0400, Jamal Hadi Salim wrote:
>
>> Of course it is trivial to add this as attributes and 32 bits
>> for this case is not a big deal because it is done once. I want to talk
>> about the pads instead ;-> What do you suggest we do with pads?
>
> We do nothing with pads. Just leave them.
>
> Or, you need something to make sure to not break old user applications.
>
> Something like a version.

Would be a new NLM_F_* modifier for GET request an option for large
dumps implementation? Looks like there are extract bits and Johannes'
patches for extended ACK do not use it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Case for reusing netlink PADs WAS(Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  2017-04-17 17:08           ` Eric Dumazet
  2017-04-17 17:51             ` Roman Mashak
@ 2017-04-18 12:48             ` Jamal Hadi Salim
  2017-04-18 13:16               ` Jiri Pirko
                                 ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Jamal Hadi Salim @ 2017-04-18 12:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jiri Pirko, davem, netdev, xiyou.wangcong

On 17-04-17 01:08 PM, Eric Dumazet wrote:
> On Mon, 2017-04-17 at 12:46 -0400, Jamal Hadi Salim wrote:
>
>> Of course it is trivial to add this as attributes and 32 bits
>> for this case is not a big deal because it is done once. I want to talk
>> about the pads instead ;-> What do you suggest we do with pads?
>
> We do nothing with pads. Just leave them.
>
> Or, you need something to make sure to not break old user applications.
>
> Something like a version.
>
>

Dave is not fond of version fields - they tend to be abused
although looking at the idiag and tpacket stuff there is a case
to be made for versioning ;->


For the patches I posted, I will work on getting an attribute based
variant of the patches out - but i wanted to have this discussion a
little more if you bear with me.

Netlink is a wire protocol. When a protocol is defined with rules such
as alignment (which lead to explicit padding) then those are equivalent
to "reserved bits" in standard wire protocols. Good practise is:
all sender zero those bits(MBZ); and all receivers must ignore them
unless they wish to interpret them. Not everyone follows these rules
(I remember the havoc ECN caused when TCP/IP started using the different
reserved fields).

For our case it is _very sad_ that someone actually explicitly defined
pads - in my opinion for no other purpose other than reuse and then
we say we cant use them after.

I believe you understand what I was saying, but to clarify, here is
what i meant:
---
struct tcamsg {
         unsigned char   tca_family;
         union {
                 unsigned char   tca__pad1;
                 unsigned char   tca_flags;
         };
         union {
                 unsigned short  tca__pad2;
                 unsigned char   tca_foobar;
         };
};
---

This should work with old binaries and kernels.
It will work with someone referencing sizeof tcamsg.
It will work with unchanged source that memsets the struct.
It will work with things that explicitly set the pad1 and pad2.

What is the downside?

cheers,
jamal

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Case for reusing netlink PADs WAS(Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  2017-04-18 12:48             ` Case for reusing netlink PADs WAS(Re: " Jamal Hadi Salim
@ 2017-04-18 13:16               ` Jiri Pirko
  2017-04-18 15:29                 ` David Miller
  2017-04-18 14:19               ` Eric Dumazet
  2017-04-18 15:25               ` David Miller
  2 siblings, 1 reply; 17+ messages in thread
From: Jiri Pirko @ 2017-04-18 13:16 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Eric Dumazet, davem, netdev, xiyou.wangcong

Tue, Apr 18, 2017 at 02:48:57PM CEST, jhs@mojatatu.com wrote:
>On 17-04-17 01:08 PM, Eric Dumazet wrote:
>> On Mon, 2017-04-17 at 12:46 -0400, Jamal Hadi Salim wrote:
>> 
>> > Of course it is trivial to add this as attributes and 32 bits
>> > for this case is not a big deal because it is done once. I want to talk
>> > about the pads instead ;-> What do you suggest we do with pads?
>> 
>> We do nothing with pads. Just leave them.
>> 
>> Or, you need something to make sure to not break old user applications.
>> 
>> Something like a version.
>> 
>> 
>
>Dave is not fond of version fields - they tend to be abused
>although looking at the idiag and tpacket stuff there is a case
>to be made for versioning ;->
>
>
>For the patches I posted, I will work on getting an attribute based
>variant of the patches out - but i wanted to have this discussion a
>little more if you bear with me.
>
>Netlink is a wire protocol. When a protocol is defined with rules such
>as alignment (which lead to explicit padding) then those are equivalent
>to "reserved bits" in standard wire protocols. Good practise is:
>all sender zero those bits(MBZ); and all receivers must ignore them
>unless they wish to interpret them. Not everyone follows these rules
>(I remember the havoc ECN caused when TCP/IP started using the different
>reserved fields).
>
>For our case it is _very sad_ that someone actually explicitly defined
>pads - in my opinion for no other purpose other than reuse and then
>we say we cant use them after.
>
>I believe you understand what I was saying, but to clarify, here is
>what i meant:
>---
>struct tcamsg {
>        unsigned char   tca_family;
>        union {
>                unsigned char   tca__pad1;
>                unsigned char   tca_flags;
>        };
>        union {
>                unsigned short  tca__pad2;
>                unsigned char   tca_foobar;
>        };
>};
>---
>
>This should work with old binaries and kernels.
>It will work with someone referencing sizeof tcamsg.
>It will work with unchanged source that memsets the struct.
>It will work with things that explicitly set the pad1 and pad2.
>
>What is the downside?

It's ugly. Plus user may pass garbage in pads from current apps.
Why you don't just use new attributes? I don't get it.
I still don't understand why people feel need to do struct style message
passing in TLV-designed Netlink interface...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Case for reusing netlink PADs WAS(Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  2017-04-18 12:48             ` Case for reusing netlink PADs WAS(Re: " Jamal Hadi Salim
  2017-04-18 13:16               ` Jiri Pirko
@ 2017-04-18 14:19               ` Eric Dumazet
  2017-04-18 15:25               ` David Miller
  2 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2017-04-18 14:19 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Jiri Pirko, davem, netdev, xiyou.wangcong

On Tue, 2017-04-18 at 08:48 -0400, Jamal Hadi Salim wrote:

> For our case it is _very sad_ that someone actually explicitly defined
> pads - in my opinion for no other purpose other than reuse and then
> we say we cant use them after.


Again, there is nothing you can do to avoid some pads _once_ they are
there.

Even if you leave the pad here, the 'wire protocol' will add padding
anyway ( NLMSG_ALIGN() and friends)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Case for reusing netlink PADs WAS(Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  2017-04-18 12:48             ` Case for reusing netlink PADs WAS(Re: " Jamal Hadi Salim
  2017-04-18 13:16               ` Jiri Pirko
  2017-04-18 14:19               ` Eric Dumazet
@ 2017-04-18 15:25               ` David Miller
  2 siblings, 0 replies; 17+ messages in thread
From: David Miller @ 2017-04-18 15:25 UTC (permalink / raw)
  To: jhs; +Cc: eric.dumazet, jiri, netdev, xiyou.wangcong

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Tue, 18 Apr 2017 08:48:57 -0400

> For the patches I posted, I will work on getting an attribute based
> variant of the patches out - but i wanted to have this discussion a
> little more if you bear with me.
> 
> Netlink is a wire protocol. When a protocol is defined with rules such
> as alignment (which lead to explicit padding) then those are
> equivalent
> to "reserved bits" in standard wire protocols. Good practise is:
> all sender zero those bits(MBZ); and all receivers must ignore them
> unless they wish to interpret them. Not everyone follows these rules
> (I remember the havoc ECN caused when TCP/IP started using the
> different
> reserved fields).
> 
> For our case it is _very sad_ that someone actually explicitly defined
> pads - in my opinion for no other purpose other than reuse and then
> we say we cant use them after.

Unless you define the field to have meaning from the beginning and
truly _ENFORCE_ that meaning from the start, you cannot reuse the
field later.

So, for example, if we enforced the padding fields to be zero from day
one, and the kernel rejected non-zero values, then you could start to
consider reusing them later.  Because you have %100 certainty that
existing applications fill the field in with zero.

But that is not the case here.

All of your "on the wire protocol" talk is meaningless because we
didn't do that.  On the wire protocols enforce undefined and reserved
fields to meet certain requirements.  We do not, in general, do that
with netlink.

This is why it is important to very carefully think ahead and define
the initial netlink operation structures fully.

If you don't get it right, and later need to add something, just take
the safe path and just add attributes and don't even think about
messing with the existing structure.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Case for reusing netlink PADs WAS(Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  2017-04-18 13:16               ` Jiri Pirko
@ 2017-04-18 15:29                 ` David Miller
  0 siblings, 0 replies; 17+ messages in thread
From: David Miller @ 2017-04-18 15:29 UTC (permalink / raw)
  To: jiri; +Cc: jhs, eric.dumazet, netdev, xiyou.wangcong

From: Jiri Pirko <jiri@resnulli.us>
Date: Tue, 18 Apr 2017 15:16:56 +0200

> It's ugly. Plus user may pass garbage in pads from current apps.

%100 agreed.

> Why you don't just use new attributes? I don't get it.  I still
> don't understand why people feel need to do struct style message
> passing in TLV-designed Netlink interface...

Also agreed, I don't see why it's such a big deal.

Once we define a structure, let's set it in stone and don't try to
modify it's layout or meaning.  It is the only safe approach.

We made netlink have attributes exactly for this reason.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-04-18 15:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-16 12:34 [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch Jamal Hadi Salim
2017-04-17  8:19 ` Jiri Pirko
2017-04-17 11:01   ` Jamal Hadi Salim
2017-04-17 13:10     ` Eric Dumazet
2017-04-17 14:02       ` Jamal Hadi Salim
2017-04-17 14:58         ` Eric Dumazet
2017-04-17 16:40           ` Jamal Hadi Salim
2017-04-17 15:31       ` Jiri Pirko
2017-04-17 16:46         ` Jamal Hadi Salim
2017-04-17 17:04           ` Jiri Pirko
2017-04-17 17:08           ` Eric Dumazet
2017-04-17 17:51             ` Roman Mashak
2017-04-18 12:48             ` Case for reusing netlink PADs WAS(Re: " Jamal Hadi Salim
2017-04-18 13:16               ` Jiri Pirko
2017-04-18 15:29                 ` David Miller
2017-04-18 14:19               ` Eric Dumazet
2017-04-18 15:25               ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.