All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Timo Teräs" <timo.teras@iki.fi>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH] xfrm: cache bundle lookup results in flow cache
Date: Sat, 20 Mar 2010 18:26:00 +0200	[thread overview]
Message-ID: <4BA4F718.3020700@iki.fi> (raw)
In-Reply-To: <20100320151751.GB2950@gondor.apana.org.au>

Herbert Xu wrote:
> On Fri, Mar 19, 2010 at 11:53:44AM +0200, Timo Teräs wrote:
>> But now, thinking more, it would probably make more sense to
>> just cache xfrm_dst's and keep ref to the policy on them. So
>> in general I agree with your recommendation. The only immediate
>> problem I can think now is that the resolved would need to
>> atomically check if xfrm_dst is valid, if not, resolve new one.
>> But creating new xfrm_dst involves taking locks and can sleep
>> so it cannot be inside the main resolver.
> 
> OK this brings out my favourite topic :)
> 
> The reason we have to sleep is to resolve the template.  So if
> we had queueing for pending xfrm_dst objects, we wouldn't have
> to sleep at all when creating the top-level xfrm_dst.
> 
> Packets using that xfrm_dst can wait in the queue until it is
> fully resolved.  Now obviously this code doesn't exist so this
> is all just a wet dream.

Right. That sounds useful.

> Setting my favourite topic aside, I have to come to the conclusion
> that your patch still doesn't fully resolve the problem you set out
> to fix.
> 
> The crux of the issue is the linked list of all bundles in a
> policy and the obvious problems stemming from walking a linked
> list that is unbounded.
> 
> The reason I think it doesn't fully resolve this is because of
> the flow cache.  Being a per-cpu cache, when you create the xfrm
> dst the first time around, you'll at most put it in one CPU's
> cache.
> 
> The next CPU that comes along will still have to walk that same
> bundle linked list.  So we're back to square one.

Not exactly, each CPU does one slow lookup after which it
finds it fast. But yes, it's not perfect solution. Especially,
if cpu happens to get switched between the initial lookup and
the update.

> Now Dave, my impression is that we picked the per-cpu design
> because it was the best data structure we had back in 2002,
> right?
> 
> If so I'd like us to think about the possibility of switching
> over to a different design, in particular, an RCU-based hash
> table, similar to the one I just used for bridge multicasting.
> 
> This would eliminate the need for walking the bundle list apart
> from the case when we're destroying the policy, which can be
> done in process context.

Right. This would speed the bundle lookup in all cases.

Except... we can have override policy on per-socket basis.
We should include the per-socket override in the flow lookups
so that those sockets get also boost from the cache. Though
usual use case is to disable all policies (so e.g. IKE can
be talked without policies applying).

> Actually I just realised that the other way we can fix this is
> to make xfrm_dst objects per-cpu just like IPv4 routes.  That
> is, when you fail to find an xfrm_dst object in the per-cpu
> cache, you dont' bother calling xfrm_find_bundle but just make
> a new bundle.
> 
> This is probably much easier than replacing the whole flow cache.
> Can any one think of any problems with duplicate xfrm_dst objects?

Sounds like a very good idea. If we instantiate new xfrm_dst,
all that it shares with others is xfrm_state and xfrm_policy
(inner objects will be unique). Since that's what happens anyway
I don't see any problem with this.

So should go ahead and:
1. modify flow cache to be more generic (have virtual put and get
   for each object; and remove the atomic_t pointer)
2. modify flow cache to have slow and fast resolvers so we can
   copy with the current sleeping requirement
3. cache bundles instead of policies for outgoing stuff
4. kill find_bundle and just instantiate new ones if we get cache
   miss
5. put all bundles to global hlist (since only place that walks
   through them is gc, and stale bundle can be dst_free'd right
   away); use genid's for policy to flush old bundles
6. dst_free and unlink bundle immediately if it's found to be stale

- Timo


  reply	other threads:[~2010-03-20 16:26 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-15 12:20 [PATCH] xfrm: cache bundle lookup results in flow cache Timo Teras
2010-03-17 13:07 ` Herbert Xu
2010-03-17 14:16   ` Timo Teräs
2010-03-17 14:58     ` Herbert Xu
2010-03-17 15:56       ` Timo Teräs
2010-03-17 16:32         ` Timo Teräs
2010-03-18 19:30         ` Timo Teräs
2010-03-19  0:31           ` Herbert Xu
2010-03-19  5:48             ` Timo Teräs
2010-03-19  6:03               ` Herbert Xu
2010-03-19  6:21                 ` Timo Teräs
2010-03-19  7:17                   ` Herbert Xu
2010-03-19  7:27                   ` Timo Teräs
2010-03-19  0:32           ` Herbert Xu
2010-03-19  7:20 ` Herbert Xu
2010-03-19  7:48   ` Timo Teräs
2010-03-19  8:29     ` Herbert Xu
2010-03-19  8:37       ` Timo Teräs
2010-03-19  8:47         ` Herbert Xu
2010-03-19  9:12           ` Timo Teräs
2010-03-19  9:32             ` Herbert Xu
2010-03-19  9:53               ` Timo Teräs
2010-03-20 15:17                 ` Herbert Xu
2010-03-20 16:26                   ` Timo Teräs [this message]
2010-03-21  0:46                     ` Herbert Xu
2010-03-21  7:34                       ` Timo Teräs
2010-03-21  8:31                         ` Timo Teräs
2010-03-22  3:52                           ` Herbert Xu
2010-03-22 18:03                             ` Timo Teräs
2010-03-23  7:28                               ` Timo Teräs
2010-03-23  7:42                                 ` Herbert Xu
2010-03-23  9:19                                   ` Timo Teräs
2010-03-23  9:41                                     ` Herbert Xu
2010-03-22  1:26                   ` David Miller
2010-03-22  1:28                   ` David Miller
2010-03-22  1:32                     ` Herbert Xu
2010-03-22  1:36                       ` David Miller
2010-03-22  1:40                         ` Herbert Xu
2010-03-22  3:12                           ` David Miller
2010-03-22  3:52                             ` Herbert Xu
2010-03-22 18:31                               ` Timo Teräs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BA4F718.3020700@iki.fi \
    --to=timo.teras@iki.fi \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.