All of lore.kernel.org
 help / color / mirror / Atom feed
* LPM6 next hop size
@ 2016-09-19 20:04 Shyam Sundar Govindaraj
  2016-09-19 21:18 ` Nikita Kozlov
  0 siblings, 1 reply; 12+ messages in thread
From: Shyam Sundar Govindaraj @ 2016-09-19 20:04 UTC (permalink / raw)
  To: dev

Hi

In IPv4 lpm implementation, the next hop size is increased from 8-bit to 24-bit. Is there a plan to increase IPv6 lpm next hop size?

Also, next hop size in this document needs to be updated, since it is not 1 byte anymore.
http://dpdk.org/doc/guides/prog_guide/lpm_lib.html

Thanks
Shyam

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LPM6 next hop size
  2016-09-19 20:04 LPM6 next hop size Shyam Sundar Govindaraj
@ 2016-09-19 21:18 ` Nikita Kozlov
  2016-09-19 21:22   ` Matthew Hall
  0 siblings, 1 reply; 12+ messages in thread
From: Nikita Kozlov @ 2016-09-19 21:18 UTC (permalink / raw)
  To: dev

On 09/19/2016 22:04, Shyam Sundar Govindaraj wrote:
> Hi
>
> In IPv4 lpm implementation, the next hop size is increased from 8-bit to 24-bit. Is there a plan to increase IPv6 lpm next hop size?
>
> Also, next hop size in this document needs to be updated, since it is not 1 byte anymore.
> http://dpdk.org/doc/guides/prog_guide/lpm_lib.html
>
> Thanks
> Shyam
I have submitted a patch that, among other things, increase this size.
But it needs some reviews: http://dpdk.org/dev/patchwork/patch/15295/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LPM6 next hop size
  2016-09-19 21:18 ` Nikita Kozlov
@ 2016-09-19 21:22   ` Matthew Hall
  2016-09-20 20:11     ` Thomas Monjalon
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Hall @ 2016-09-19 21:22 UTC (permalink / raw)
  To: Nikita Kozlov; +Cc: dev

On Mon, Sep 19, 2016 at 11:18:48PM +0200, Nikita Kozlov wrote:
> I have submitted a patch that, among other things, increase this size.
> But it needs some reviews: http://dpdk.org/dev/patchwork/patch/15295/

A whole ton of us submitted patches to fix LPM lately.

But fixing LPM6 was deleted from the roadmap and never added in a future release.

So we don't get any upstream support to put the patches together and add it in the official release.

Matthew.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LPM6 next hop size
  2016-09-19 21:22   ` Matthew Hall
@ 2016-09-20 20:11     ` Thomas Monjalon
  2016-09-21 17:29       ` Matthew Hall
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Monjalon @ 2016-09-20 20:11 UTC (permalink / raw)
  To: Matthew Hall; +Cc: dev, Nikita Kozlov

2016-09-19 14:22, Matthew Hall:
> On Mon, Sep 19, 2016 at 11:18:48PM +0200, Nikita Kozlov wrote:
> > I have submitted a patch that, among other things, increase this size.
> > But it needs some reviews: http://dpdk.org/dev/patchwork/patch/15295/
> 
> A whole ton of us submitted patches to fix LPM lately.
> 
> But fixing LPM6 was deleted from the roadmap and never added in a future release.

We do not need to have it in the roadmap for accepting a patch.

> So we don't get any upstream support to put the patches together and add it in the official release.

Please, will you help reviewing this patch?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LPM6 next hop size
  2016-09-20 20:11     ` Thomas Monjalon
@ 2016-09-21 17:29       ` Matthew Hall
  2016-09-21 19:37         ` Thomas Monjalon
                           ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Matthew Hall @ 2016-09-21 17:29 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Nikita Kozlov

On Tue, Sep 20, 2016 at 10:11:04PM +0200, Thomas Monjalon wrote:
> Please, will you help reviewing this patch?

Sure.

1. It adds a dependency on libbsd on Linux: bsd/sys/tree.h. Is this an 
expected dependency of DPDK already? I use it in my code but not sure it's 
expected for everybody else.

2. Some of the tabbing of the new code seems not right. Probably need to 
recheck the indentation settings.

3. The comment formatting of the new code is a bit odd. Continued lines in /* 
... */ in DPDK should have * on them I think.

4. It also contains some vendor specific comments like "/* Vyatta change to 
use red-black tree */". That's probably not needed if it's going into normal 
DPDK.

5. It uses "malloc" instead of standard DPDK allocators. That's bad for me 
because I don't want to use libc malloc in my code. Only DPDK allocators and 
jemalloc.

6. I don't see any updates to the unit test stuff. Don't we need new tests for 
the changes to deletion and re-insertion of rules and the changes in the tbl8 
allocation.

7. Some of us previous submitted code to expand LPM4 and LPM6 to 24 bit next 
hop but the current code is only doing 16 bit. This is not pleasant for me 
because I really need the 24-bit support. When can we make this happen?

Matthew.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LPM6 next hop size
  2016-09-21 17:29       ` Matthew Hall
@ 2016-09-21 19:37         ` Thomas Monjalon
  2016-09-21 23:42         ` Stephen Hemminger
  2016-09-22 20:32         ` Nikita Kozlov
  2 siblings, 0 replies; 12+ messages in thread
From: Thomas Monjalon @ 2016-09-21 19:37 UTC (permalink / raw)
  To: Matthew Hall, Nikita Kozlov; +Cc: dev

2016-09-21 10:29, Matthew Hall:
> On Tue, Sep 20, 2016 at 10:11:04PM +0200, Thomas Monjalon wrote:
> > Please, will you help reviewing this patch?
> 
> Sure.
> 
> 1. It adds a dependency on libbsd on Linux: bsd/sys/tree.h. Is this an 
> expected dependency of DPDK already? I use it in my code but not sure it's 
> expected for everybody else.

No we don't use libbsd yet.
Is there a clean way to avoid this new dependency?
What are the pros/cons?

[...]
> 7. Some of us previous submitted code to expand LPM4 and LPM6 to 24 bit next 
> hop but the current code is only doing 16 bit. This is not pleasant for me 
> because I really need the 24-bit support. When can we make this happen?

It will happen as soon as a patch is approved.
Matthew, what happened to your own patch for 24-bit hop?
Please do not hesitate to work together if it can accelerate landing
this feature.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LPM6 next hop size
  2016-09-21 17:29       ` Matthew Hall
  2016-09-21 19:37         ` Thomas Monjalon
@ 2016-09-21 23:42         ` Stephen Hemminger
  2016-09-22  1:50           ` Matthew Hall
  2016-09-22 20:32         ` Nikita Kozlov
  2 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2016-09-21 23:42 UTC (permalink / raw)
  To: Matthew Hall; +Cc: Thomas Monjalon, dev, Nikita Kozlov

On Wed, 21 Sep 2016 10:29:05 -0700
Matthew Hall <mhall@mhcomputing.net> wrote:

> 5. It uses "malloc" instead of standard DPDK allocators. That's bad for me 
> because I don't want to use libc malloc in my code. Only DPDK allocators and 
> jemalloc.

This was intentional because rte_malloc comes out of huge page area and that
resource is a critical resource. It could use rte_malloc() but that makes it
more likely to break when doing Policy Based routing or VRF.

The BSD tree was used because it was space/time efficient lookup and available
on both BSD and Linux.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LPM6 next hop size
  2016-09-21 23:42         ` Stephen Hemminger
@ 2016-09-22  1:50           ` Matthew Hall
  2016-09-22  5:07             ` Stephen Hemminger
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Hall @ 2016-09-22  1:50 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Thomas Monjalon, dev, Nikita Kozlov

On Wed, Sep 21, 2016 at 04:42:05PM -0700, Stephen Hemminger wrote:
> This was intentional because rte_malloc comes out of huge page area and that
> resource is a critical resource. It could use rte_malloc() but that makes it
> more likely to break when doing Policy Based routing or VRF.

Can we get more clarity on why PBR or VRF would break it? The performance and 
fragmentation of the default glibc allocator are quite bad. So I am trying to 
avoid it in my app for example.

Matthew.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LPM6 next hop size
  2016-09-22  1:50           ` Matthew Hall
@ 2016-09-22  5:07             ` Stephen Hemminger
  2016-09-22 17:55               ` Matthew Hall
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2016-09-22  5:07 UTC (permalink / raw)
  To: Matthew Hall; +Cc: Thomas Monjalon, dev, Nikita Kozlov

If you have 2G of huge memory and one 16M routes then the rules start to
kill an application.
Since huge memory is unpageable (pinned) then it is limited.

On Wed, Sep 21, 2016 at 6:50 PM, Matthew Hall <mhall@mhcomputing.net> wrote:

> On Wed, Sep 21, 2016 at 04:42:05PM -0700, Stephen Hemminger wrote:
> > This was intentional because rte_malloc comes out of huge page area and
> that
> > resource is a critical resource. It could use rte_malloc() but that
> makes it
> > more likely to break when doing Policy Based routing or VRF.
>
> Can we get more clarity on why PBR or VRF would break it? The performance
> and
> fragmentation of the default glibc allocator are quite bad. So I am trying
> to
> avoid it in my app for example.
>
> Matthew.
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LPM6 next hop size
  2016-09-22  5:07             ` Stephen Hemminger
@ 2016-09-22 17:55               ` Matthew Hall
  2016-09-22 18:09                 ` Stephen Hemminger
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Hall @ 2016-09-22 17:55 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Thomas Monjalon, dev, Nikita Kozlov

On Wed, Sep 21, 2016 at 10:07:46PM -0700, Stephen Hemminger wrote:
> If you have 2G of huge memory and one 16M routes then the rules start to
> kill an application.
> Since huge memory is unpageable (pinned) then it is limited.

Won't paging out routes result in very poor network performance?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LPM6 next hop size
  2016-09-22 17:55               ` Matthew Hall
@ 2016-09-22 18:09                 ` Stephen Hemminger
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2016-09-22 18:09 UTC (permalink / raw)
  To: Matthew Hall; +Cc: Thomas Monjalon, dev, Nikita Kozlov

On Thu, 22 Sep 2016 10:55:43 -0700
Matthew Hall <mhall@mhcomputing.net> wrote:

> On Wed, Sep 21, 2016 at 10:07:46PM -0700, Stephen Hemminger wrote:
> > If you have 2G of huge memory and one 16M routes then the rules start to
> > kill an application.
> > Since huge memory is unpageable (pinned) then it is limited.  
> 
> Won't paging out routes result in very poor network performance?

This is the rules not the LPM table itself. The rules are only used when computing
new LPM after add/delete route.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LPM6 next hop size
  2016-09-21 17:29       ` Matthew Hall
  2016-09-21 19:37         ` Thomas Monjalon
  2016-09-21 23:42         ` Stephen Hemminger
@ 2016-09-22 20:32         ` Nikita Kozlov
  2 siblings, 0 replies; 12+ messages in thread
From: Nikita Kozlov @ 2016-09-22 20:32 UTC (permalink / raw)
  To: Matthew Hall, Thomas Monjalon; +Cc: dev, Stephen Hemminger



On 09/21/2016 19:29, Matthew Hall wrote:
> On Tue, Sep 20, 2016 at 10:11:04PM +0200, Thomas Monjalon wrote:
>> Please, will you help reviewing this patch?
> Sure.
Sorry for being late to reply, I was a bit busy.
>
> 1. It adds a dependency on libbsd on Linux: bsd/sys/tree.h. Is this an 
> expected dependency of DPDK already? I use it in my code but not sure it's 
> expected for everybody else.
In the V1 I was just importing bsd/tree.h in DPDK sources but I was told
that it was better to depend on libbsd on linux. It seemed a good
choice, and the idea was taken from a previous patch submission from
Stephen Hemminger.
>
> 2. Some of the tabbing of the new code seems not right. Probably need to 
> recheck the indentation settings.
Ok, I will fix that.
>
> 3. The comment formatting of the new code is a bit odd. Continued lines in /* 
> ... */ in DPDK should have * on them I think.
Ok, I will fix that also.
>
> 4. It also contains some vendor specific comments like "/* Vyatta change to 
> use red-black tree */". That's probably not needed if it's going into normal 
> DPDK.
It was because I copy/pasted some code from another patch, I guess we
can drop that also.
>
> 5. It uses "malloc" instead of standard DPDK allocators. That's bad for me 
> because I don't want to use libc malloc in my code. Only DPDK allocators and 
> jemalloc.
Like Stephen has explained, the tree of rules is her for storing the
inserted rules in the modified DIR-24-8 itself.

So, you should no see impacts on the rte_lpm6_lookup function due to the
malloc() since I assume that you are no doing any
rte_lpm6_add/rte_lpm6_delete in your fastpath.
>
> 6. I don't see any updates to the unit test stuff. Don't we need new tests for 
> the changes to deletion and re-insertion of rules and the changes in the tbl8 
> allocation.
I have slightly modified the test11 from in test_lpm6.c
(http://dpdk.org/dev/patchwork/patch/15294/).I have added some call to
rte_lpm6_tbl8_count() to verify that the number of tbl8 is consistent
with the previous implementation which was recreating the full DIR-24-8
on each rte_lpm6_delete calls.

During my development I was also often comparing the memory dump of of
the tbl8 and tbl24 tables. If it is needed I may resubmit the patches
with a test which do some memcmp between a LPM6 table created with only
rte_lpm6_add() calls and a LPM6 table that contains the same rules but
had some rte_lpm6_delete() calls in addition of the adds.

Do you feel that those 2 tests are enough or do you have some more tests
to suggest ?
>
> 7. Some of us previous submitted code to expand LPM4 and LPM6 to 24 bit next 
> hop but the current code is only doing 16 bit. This is not pleasant for me 
> because I really need the 24-bit support. When can we make this happen?
I have chosen to increase it to 16bit only because I wanted to keep the
rte_lpm6_tbl_entry structure on 4 bytes. Actually the nexthop size is
21bit in the struct but it is returning only 16bits. I may re-submit a
patch which permit to use the whole 21bit but the 16bit version seemed
better to me, and it was enough for my needs.

About increasing the size of the rte_lpm6_tbl_entry, I'm not sure it
will come without performance impacts and I don't really have the time
right now to test it carefully enough so any inputs are welcome on it.
>
> Matthew.
The part of the patch on which I would have ideally wanted a good review
is on the delete_step() and delete_expand_step().

I have the feeling that the code here may be improved in several way
concerning the thread safety. You cannot expect to have atomicity like
in LPM4 since you may have to modify several tbl8 one a delete, but if
you do a lookup in your fast path during a rte_lpm6_delete call that is
running in a different thread your lookup should fail gracefully. In my
tests it is the case
but I may have missed some corner case.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-09-22 20:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-19 20:04 LPM6 next hop size Shyam Sundar Govindaraj
2016-09-19 21:18 ` Nikita Kozlov
2016-09-19 21:22   ` Matthew Hall
2016-09-20 20:11     ` Thomas Monjalon
2016-09-21 17:29       ` Matthew Hall
2016-09-21 19:37         ` Thomas Monjalon
2016-09-21 23:42         ` Stephen Hemminger
2016-09-22  1:50           ` Matthew Hall
2016-09-22  5:07             ` Stephen Hemminger
2016-09-22 17:55               ` Matthew Hall
2016-09-22 18:09                 ` Stephen Hemminger
2016-09-22 20:32         ` Nikita Kozlov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.