All of lore.kernel.org
 help / color / mirror / Atom feed
* APIC lookups
@ 2011-09-02 17:55 Sasha Levin
  2011-09-02 18:07 ` Sasha Levin
  2011-09-02 18:13 ` Gleb Natapov
  0 siblings, 2 replies; 6+ messages in thread
From: Sasha Levin @ 2011-09-02 17:55 UTC (permalink / raw)
  To: kvm

Hi,

I've noticed that kvm_irq_delivery_to_apic() is locating the destination
APIC by running through kvm_for_each_vcpu() which becomes a scalability
issue with a large number if vcpus.

I'm thinking about speeding that up using a radix tree for lookups, and
was wondering if it sounds right.

-- 

Sasha.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: APIC lookups
  2011-09-02 17:55 APIC lookups Sasha Levin
@ 2011-09-02 18:07 ` Sasha Levin
  2011-09-02 18:13 ` Gleb Natapov
  1 sibling, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2011-09-02 18:07 UTC (permalink / raw)
  To: kvm

On Fri, 2011-09-02 at 20:55 +0300, Sasha Levin wrote:
> Hi,
> 
> I've noticed that kvm_irq_delivery_to_apic() is locating the destination
> APIC by running through kvm_for_each_vcpu() which becomes a scalability
> issue with a large number if vcpus.
> 
> I'm thinking about speeding that up using a radix tree for lookups, and
> was wondering if it sounds right.
> 

Can also go with a simple sorted array since APIC id is limited to 8
bits, but it might need to change if we plan on supporting more than 254
vcpus in the future.

-- 

Sasha.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: APIC lookups
  2011-09-02 17:55 APIC lookups Sasha Levin
  2011-09-02 18:07 ` Sasha Levin
@ 2011-09-02 18:13 ` Gleb Natapov
  2011-09-02 19:08   ` Sasha Levin
  1 sibling, 1 reply; 6+ messages in thread
From: Gleb Natapov @ 2011-09-02 18:13 UTC (permalink / raw)
  To: Sasha Levin; +Cc: kvm

On Fri, Sep 02, 2011 at 08:55:55PM +0300, Sasha Levin wrote:
> Hi,
> 
> I've noticed that kvm_irq_delivery_to_apic() is locating the destination
> APIC by running through kvm_for_each_vcpu() which becomes a scalability
> issue with a large number if vcpus.
> 
> I'm thinking about speeding that up using a radix tree for lookups, and
> was wondering if it sounds right.
> 
We have to call kvm_apic_match_dest() on each apic to see if it should
get the message. Single message can be sent to more than one apic. It is
likely possible to optimize common case of physical addressing fixed
destination, but then just use array of 256 elements, no need for a tree.
Do you see this function in profiling?

--
			Gleb.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: APIC lookups
  2011-09-02 18:13 ` Gleb Natapov
@ 2011-09-02 19:08   ` Sasha Levin
  2011-09-03  7:32     ` Gleb Natapov
  0 siblings, 1 reply; 6+ messages in thread
From: Sasha Levin @ 2011-09-02 19:08 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm

On Fri, 2011-09-02 at 21:13 +0300, Gleb Natapov wrote:
> On Fri, Sep 02, 2011 at 08:55:55PM +0300, Sasha Levin wrote:
> > Hi,
> > 
> > I've noticed that kvm_irq_delivery_to_apic() is locating the destination
> > APIC by running through kvm_for_each_vcpu() which becomes a scalability
> > issue with a large number if vcpus.
> > 
> > I'm thinking about speeding that up using a radix tree for lookups, and
> > was wondering if it sounds right.
> > 
> We have to call kvm_apic_match_dest() on each apic to see if it should
> get the message. Single message can be sent to more than one apic. It is
> likely possible to optimize common case of physical addressing fixed
> destination, but then just use array of 256 elements, no need for a tree.

I think it's also possible to handle it for logical addressing as well,
instead of a simple compare we just need to go through all the IDs that
would 'and' with the dest.

> Do you see this function in profiling?

I was running profiling to see which functions get much slower during
regular operation (not boot) when you run with large amount of vcpus,
and this was one of them.

Though this is probably due to the method we use to find lowest priority
and not the lookups themselves.

-- 

Sasha.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: APIC lookups
  2011-09-02 19:08   ` Sasha Levin
@ 2011-09-03  7:32     ` Gleb Natapov
  2011-09-03  7:42       ` Sasha Levin
  0 siblings, 1 reply; 6+ messages in thread
From: Gleb Natapov @ 2011-09-03  7:32 UTC (permalink / raw)
  To: Sasha Levin; +Cc: kvm

On Fri, Sep 02, 2011 at 10:08:42PM +0300, Sasha Levin wrote:
> On Fri, 2011-09-02 at 21:13 +0300, Gleb Natapov wrote:
> > On Fri, Sep 02, 2011 at 08:55:55PM +0300, Sasha Levin wrote:
> > > Hi,
> > > 
> > > I've noticed that kvm_irq_delivery_to_apic() is locating the destination
> > > APIC by running through kvm_for_each_vcpu() which becomes a scalability
> > > issue with a large number if vcpus.
> > > 
> > > I'm thinking about speeding that up using a radix tree for lookups, and
> > > was wondering if it sounds right.
> > > 
> > We have to call kvm_apic_match_dest() on each apic to see if it should
> > get the message. Single message can be sent to more than one apic. It is
> > likely possible to optimize common case of physical addressing fixed
> > destination, but then just use array of 256 elements, no need for a tree.
> 
> I think it's also possible to handle it for logical addressing as well,
> instead of a simple compare we just need to go through all the IDs that
> would 'and' with the dest.
> 
There are two kinds of logical addressing: flat and cluster. And
I see nothing that prevents different CPUs be in different mode.

It is better to cache lookup result in irq routing entry to speedup
following interrupts.

> > Do you see this function in profiling?
> 
> I was running profiling to see which functions get much slower during
> regular operation (not boot) when you run with large amount of vcpus,
> and this was one of them.
> 
> Though this is probably due to the method we use to find lowest priority
> and not the lookups themselves.
> 
Currently we round robin between all cpus on each interrupt when lowest priority
delivery is used. We should do it on each N interrupts where N >> 1.

--
			Gleb.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: APIC lookups
  2011-09-03  7:32     ` Gleb Natapov
@ 2011-09-03  7:42       ` Sasha Levin
  0 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2011-09-03  7:42 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm

On Sat, 2011-09-03 at 10:32 +0300, Gleb Natapov wrote:
> On Fri, Sep 02, 2011 at 10:08:42PM +0300, Sasha Levin wrote:
> > On Fri, 2011-09-02 at 21:13 +0300, Gleb Natapov wrote:
> > > On Fri, Sep 02, 2011 at 08:55:55PM +0300, Sasha Levin wrote:
> > > > Hi,
> > > > 
> > > > I've noticed that kvm_irq_delivery_to_apic() is locating the destination
> > > > APIC by running through kvm_for_each_vcpu() which becomes a scalability
> > > > issue with a large number if vcpus.
> > > > 
> > > > I'm thinking about speeding that up using a radix tree for lookups, and
> > > > was wondering if it sounds right.
> > > > 
> > > We have to call kvm_apic_match_dest() on each apic to see if it should
> > > get the message. Single message can be sent to more than one apic. It is
> > > likely possible to optimize common case of physical addressing fixed
> > > destination, but then just use array of 256 elements, no need for a tree.
> > 
> > I think it's also possible to handle it for logical addressing as well,
> > instead of a simple compare we just need to go through all the IDs that
> > would 'and' with the dest.
> > 
> There are two kinds of logical addressing: flat and cluster. And
> I see nothing that prevents different CPUs be in different mode.
> 

Hm... I thought that when using logical addressing it's either flat or
cluster, not both.

In that case - yes, let's skip that.

> It is better to cache lookup result in irq routing entry to speedup
> following interrupts.
> 
> > > Do you see this function in profiling?
> > 
> > I was running profiling to see which functions get much slower during
> > regular operation (not boot) when you run with large amount of vcpus,
> > and this was one of them.
> > 
> > Though this is probably due to the method we use to find lowest priority
> > and not the lookups themselves.
> > 
> Currently we round robin between all cpus on each interrupt when lowest priority
> delivery is used. We should do it on each N interrupts where N >> 1.

I'll try that and see how it improves performance.

-- 

Sasha.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-09-03  7:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-02 17:55 APIC lookups Sasha Levin
2011-09-02 18:07 ` Sasha Levin
2011-09-02 18:13 ` Gleb Natapov
2011-09-02 19:08   ` Sasha Levin
2011-09-03  7:32     ` Gleb Natapov
2011-09-03  7:42       ` Sasha Levin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.