All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] cpumask handling scalability improvements
@ 2011-10-20 13:36 Jan Beulich
  2011-10-20 15:09 ` Keir Fraser
  2011-10-21  7:08 ` Keir Fraser
  0 siblings, 2 replies; 15+ messages in thread
From: Jan Beulich @ 2011-10-20 13:36 UTC (permalink / raw)
  To: xen-devel

This patch set makes some first steps towards eliminating the old cpumask
accessors, replacing them by such that don't require the full NR_CPUS
bits to be allocated (which obviously can be pretty wasteful when
NR_CPUS is high, but the actual number is low or moderate).

01: introduce and use nr_cpu_ids and nr_cpumask_bits
02: eliminate cpumask accessors referencing NR_CPUS
03: eliminate direct assignments of CPU masks
04: x86: allocate IRQ actions' cpu_eoi_map dynamically
05: allocate CPU sibling and core maps dynamically
06: allow efficient allocation of multiple CPU masks at once

One reason I put the following ones together was to reduce the
differences between the disassembly of hypervisors built for 4095
and 2047 CPUs, which I looked at to determine the places where
cpumask_t variables get copied without using cpumask_copy() (a
job where grep is of no help). Hence consider these patch optional,
but recommended.

07: cpufreq: allocate CPU masks dynamically
08: x86/p2m: allocate CPU masks dynamically
09: cpupools: allocate CPU masks dynamically
10: credit: allocate CPU masks dynamically
11: x86/hpet: allocate CPU masks dynamically
12: cpumask <=> xenctl_cpumap: allocate CPU masks and byte maps dynamically

Signed-off-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 00/12] cpumask handling scalability improvements
  2011-10-20 13:36 [PATCH 00/12] cpumask handling scalability improvements Jan Beulich
@ 2011-10-20 15:09 ` Keir Fraser
  2011-10-20 15:19   ` Jan Beulich
  2011-10-21  7:08 ` Keir Fraser
  1 sibling, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2011-10-20 15:09 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 20/10/2011 14:36, "Jan Beulich" <JBeulich@suse.com> wrote:

> This patch set makes some first steps towards eliminating the old cpumask
> accessors, replacing them by such that don't require the full NR_CPUS
> bits to be allocated (which obviously can be pretty wasteful when
> NR_CPUS is high, but the actual number is low or moderate).
> 
> 01: introduce and use nr_cpu_ids and nr_cpumask_bits
> 02: eliminate cpumask accessors referencing NR_CPUS
> 03: eliminate direct assignments of CPU masks
> 04: x86: allocate IRQ actions' cpu_eoi_map dynamically
> 05: allocate CPU sibling and core maps dynamically

I'm not sure about this. We can save ~500 bytes per cpumask_t when
NR_CPUS=4096 and actual nr_cpus<64. But how many cpumask_t's do we typically
have dynamically allocated all at once? Let's say we waste 2kB per VCPU and
per IRQ, and we have a massive system with ~1k VCPUs and ~1k IRQs -- we'd
save ~4MB in that extreme case. But such a large system probably actually
will have a lot of CPUs. And also a lot of memory, such that 4MB is quite
insignificant.

I suppose there is a second argument that it shrinks the containing
structures (struct domain, struct vcpu, struct irq_desc, ...) and maybe
helps reduce our order!=0 allocations?

By the way, I think we could avoid the NR_CPUS copying overhead everywhere
by having the cpumask.h functions respect nr_cpu_ids, but continuing to
return NR_CPUS for sentinel value (e.g., end of loop; or no bit found)? This
would not need to change tonnes of code. It only gets part of the benefit
(reducing cpu time overhead) but is more palatable?

> 06: allow efficient allocation of multiple CPU masks at once

That is utterly hideous and for insignificant saving.

> One reason I put the following ones together was to reduce the
> differences between the disassembly of hypervisors built for 4095
> and 2047 CPUs, which I looked at to determine the places where
> cpumask_t variables get copied without using cpumask_copy() (a
> job where grep is of no help). Hence consider these patch optional,
> but recommended.
> 
> 07: cpufreq: allocate CPU masks dynamically
> 08: x86/p2m: allocate CPU masks dynamically
> 09: cpupools: allocate CPU masks dynamically
> 10: credit: allocate CPU masks dynamically
> 11: x86/hpet: allocate CPU masks dynamically
> 12: cpumask <=> xenctl_cpumap: allocate CPU masks and byte maps dynamically

Questionable. Any subsystem that allocates no more than a handful of
cpumask_t's is possibly just as well left alone... I'm not dead set against
them if we deicde that 01-05 are actually worth pursuing, however.

 -- Keir

> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 00/12] cpumask handling scalability improvements
  2011-10-20 15:09 ` Keir Fraser
@ 2011-10-20 15:19   ` Jan Beulich
  2011-10-20 15:25     ` Jan Beulich
                       ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Jan Beulich @ 2011-10-20 15:19 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

>>> On 20.10.11 at 17:09, Keir Fraser <keir.xen@gmail.com> wrote:
> On 20/10/2011 14:36, "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>> This patch set makes some first steps towards eliminating the old cpumask
>> accessors, replacing them by such that don't require the full NR_CPUS
>> bits to be allocated (which obviously can be pretty wasteful when
>> NR_CPUS is high, but the actual number is low or moderate).
>> 
>> 01: introduce and use nr_cpu_ids and nr_cpumask_bits
>> 02: eliminate cpumask accessors referencing NR_CPUS
>> 03: eliminate direct assignments of CPU masks
>> 04: x86: allocate IRQ actions' cpu_eoi_map dynamically
>> 05: allocate CPU sibling and core maps dynamically
> 
> I'm not sure about this. We can save ~500 bytes per cpumask_t when
> NR_CPUS=4096 and actual nr_cpus<64. But how many cpumask_t's do we typically
> have dynamically allocated all at once? Let's say we waste 2kB per VCPU and
> per IRQ, and we have a massive system with ~1k VCPUs and ~1k IRQs -- we'd
> save ~4MB in that extreme case. But such a large system probably actually
> will have a lot of CPUs. And also a lot of memory, such that 4MB is quite
> insignificant.

It's not only the memory savings, but the time savings in manipulating
less space.

> I suppose there is a second argument that it shrinks the containing
> structures (struct domain, struct vcpu, struct irq_desc, ...) and maybe
> helps reduce our order!=0 allocations?

Yes - that's what made me start taking over these Linux bits. What I
sent here just continues on that route. I was really hoping that we
wouldn't leave this in a half baked state.

> By the way, I think we could avoid the NR_CPUS copying overhead everywhere
> by having the cpumask.h functions respect nr_cpu_ids, but continuing to
> return NR_CPUS for sentinel value (e.g., end of loop; or no bit found)? This
> would not need to change tonnes of code. It only gets part of the benefit
> (reducing cpu time overhead) but is more palatable?

That would be possible, but would again leave is in a somewhat
incomplete state. (Note that I did leave NR_CPUS in the stop-
machine logic).

>> 06: allow efficient allocation of multiple CPU masks at once
> 
> That is utterly hideous and for insignificant saving.

I was afraid you would say that, and I'm not fully convinced
either. But I wanted to give it a try to see how bad it is. The
more significant saving here really comes from not allocating
the CPU masks at all for unused irq_desc-s.

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 00/12] cpumask handling scalability improvements
  2011-10-20 15:19   ` Jan Beulich
@ 2011-10-20 15:25     ` Jan Beulich
  2011-10-20 15:49     ` Andrew Cooper
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2011-10-20 15:25 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

>>> On 20.10.11 at 17:19, "Jan Beulich" <JBeulich@suse.com> wrote:
>>>> On 20.10.11 at 17:09, Keir Fraser <keir.xen@gmail.com> wrote:
>> I suppose there is a second argument that it shrinks the containing
>> structures (struct domain, struct vcpu, struct irq_desc, ...) and maybe
>> helps reduce our order!=0 allocations?
> 
> Yes - that's what made me start taking over these Linux bits. What I
> sent here just continues on that route. I was really hoping that we
> wouldn't leave this in a half baked state.

I forgot to say: The new accessors produce a little more legible code
with cpumask_var_t and cpumask_t * (the second we converted to
already wherever possible) than with cpumask_t, because of the need
to use & on the latter (and as you will see they also allow removing a
number of ugly looking * operators, which I always considered only
temporary).

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 00/12] cpumask handling scalability improvements
  2011-10-20 15:19   ` Jan Beulich
  2011-10-20 15:25     ` Jan Beulich
@ 2011-10-20 15:49     ` Andrew Cooper
  2011-10-20 15:57       ` Jan Beulich
  2011-10-20 16:04     ` Keir Fraser
  2011-10-20 16:06     ` Keir Fraser
  3 siblings, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2011-10-20 15:49 UTC (permalink / raw)
  To: xen-devel



On 20/10/11 16:19, Jan Beulich wrote:
>>>> On 20.10.11 at 17:09, Keir Fraser <keir.xen@gmail.com> wrote:
>> On 20/10/2011 14:36, "Jan Beulich" <JBeulich@suse.com> wrote:
>>
>>> This patch set makes some first steps towards eliminating the old cpumask
>>> accessors, replacing them by such that don't require the full NR_CPUS
>>> bits to be allocated (which obviously can be pretty wasteful when
>>> NR_CPUS is high, but the actual number is low or moderate).
>>>
>>> 01: introduce and use nr_cpu_ids and nr_cpumask_bits
>>> 02: eliminate cpumask accessors referencing NR_CPUS
>>> 03: eliminate direct assignments of CPU masks
>>> 04: x86: allocate IRQ actions' cpu_eoi_map dynamically
>>> 05: allocate CPU sibling and core maps dynamically
>> I'm not sure about this. We can save ~500 bytes per cpumask_t when
>> NR_CPUS=4096 and actual nr_cpus<64. But how many cpumask_t's do we typically
>> have dynamically allocated all at once? Let's say we waste 2kB per VCPU and
>> per IRQ, and we have a massive system with ~1k VCPUs and ~1k IRQs -- we'd
>> save ~4MB in that extreme case. But such a large system probably actually
>> will have a lot of CPUs. And also a lot of memory, such that 4MB is quite
>> insignificant.
> It's not only the memory savings, but the time savings in manipulating
> less space.
>
>> I suppose there is a second argument that it shrinks the containing
>> structures (struct domain, struct vcpu, struct irq_desc, ...) and maybe
>> helps reduce our order!=0 allocations?
> Yes - that's what made me start taking over these Linux bits. What I
> sent here just continues on that route. I was really hoping that we
> wouldn't leave this in a half baked state.
>
>> By the way, I think we could avoid the NR_CPUS copying overhead everywhere
>> by having the cpumask.h functions respect nr_cpu_ids, but continuing to
>> return NR_CPUS for sentinel value (e.g., end of loop; or no bit found)? This
>> would not need to change tonnes of code. It only gets part of the benefit
>> (reducing cpu time overhead) but is more palatable?
> That would be possible, but would again leave is in a somewhat
> incomplete state. (Note that I did leave NR_CPUS in the stop-
> machine logic).
>
>>> 06: allow efficient allocation of multiple CPU masks at once
>> That is utterly hideous and for insignificant saving.
> I was afraid you would say that, and I'm not fully convinced
> either. But I wanted to give it a try to see how bad it is. The
> more significant saving here really comes from not allocating
> the CPU masks at all for unused irq_desc-s.
>
> Jan

The saving of not allocating masks for unused irq_desc's (and irq_cfg's)
will be significant in the general case.  (3 * NR_UNUSED_IRQs *
sizeof(mask)) where the average system is wasting most of 224 IRQs per CPU.

However, I am against moving the masks out of irq_desc (perhaps this is
the C++ coder inside me).

Would an acceptable alternative be to change irq_desc to use
cpumask_var_t's and allocate them on first use? (I have not spent long
thinking about this, so it is possible that the extra checks for Null
pointers on the irq path might be counter productive?)

~Andrew

>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 00/12] cpumask handling scalability improvements
  2011-10-20 15:49     ` Andrew Cooper
@ 2011-10-20 15:57       ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2011-10-20 15:57 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 20.10.11 at 17:49, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> The saving of not allocating masks for unused irq_desc's (and irq_cfg's)
> will be significant in the general case.  (3 * NR_UNUSED_IRQs *
> sizeof(mask)) where the average system is wasting most of 224 IRQs per CPU.
> 
> However, I am against moving the masks out of irq_desc (perhaps this is
> the C++ coder inside me).

I didn't propose moving them out of there.

> Would an acceptable alternative be to change irq_desc to use
> cpumask_var_t's and allocate them on first use? (I have not spent long
> thinking about this, so it is possible that the extra checks for Null
> pointers on the irq path might be counter productive?)

That's what it does (or really it allocates them for all GSIs at boot, and
then in create_irq() for the dynamically allocated ones). There's no
reason to worry about NULL checks - any access to a never initialized
irq_desc is a bug anyway.

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 00/12] cpumask handling scalability improvements
  2011-10-20 15:19   ` Jan Beulich
  2011-10-20 15:25     ` Jan Beulich
  2011-10-20 15:49     ` Andrew Cooper
@ 2011-10-20 16:04     ` Keir Fraser
  2011-10-21  7:10       ` Jan Beulich
  2011-10-20 16:06     ` Keir Fraser
  3 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2011-10-20 16:04 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On 20/10/2011 16:19, "Jan Beulich" <JBeulich@suse.com> wrote:

>>> 06: allow efficient allocation of multiple CPU masks at once
>> 
>> That is utterly hideous and for insignificant saving.
> 
> I was afraid you would say that, and I'm not fully convinced
> either. But I wanted to give it a try to see how bad it is. The
> more significant saving here really comes from not allocating
> the CPU masks at all for unused irq_desc-s.

Aren't we planning to dynamically allocate irq_desc-s? That would seem the
nicer solution here.

 -- Keir

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 00/12] cpumask handling scalability improvements
  2011-10-20 15:19   ` Jan Beulich
                       ` (2 preceding siblings ...)
  2011-10-20 16:04     ` Keir Fraser
@ 2011-10-20 16:06     ` Keir Fraser
  2011-10-21  7:01       ` Jan Beulich
  3 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2011-10-20 16:06 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 20/10/2011 16:19, "Jan Beulich" <JBeulich@suse.com> wrote:

>> I suppose there is a second argument that it shrinks the containing
>> structures (struct domain, struct vcpu, struct irq_desc, ...) and maybe
>> helps reduce our order!=0 allocations?
> 
> Yes - that's what made me start taking over these Linux bits. What I
> sent here just continues on that route. I was really hoping that we
> wouldn't leave this in a half baked state.

Has Linux gone wholesale down this route, then?

 -- Keir

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 00/12] cpumask handling scalability improvements
  2011-10-20 16:06     ` Keir Fraser
@ 2011-10-21  7:01       ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2011-10-21  7:01 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

>>> On 20.10.11 at 18:06, Keir Fraser <keir@xen.org> wrote:
> On 20/10/2011 16:19, "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>>> I suppose there is a second argument that it shrinks the containing
>>> structures (struct domain, struct vcpu, struct irq_desc, ...) and maybe
>>> helps reduce our order!=0 allocations?
>> 
>> Yes - that's what made me start taking over these Linux bits. What I
>> sent here just continues on that route. I was really hoping that we
>> wouldn't leave this in a half baked state.
> 
> Has Linux gone wholesale down this route, then?

Yes - the old ones are left in there is something forces
CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS (which
we can avoid altogether).

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 00/12] cpumask handling scalability improvements
  2011-10-20 13:36 [PATCH 00/12] cpumask handling scalability improvements Jan Beulich
  2011-10-20 15:09 ` Keir Fraser
@ 2011-10-21  7:08 ` Keir Fraser
  2011-10-21  7:14   ` Jan Beulich
  2011-10-21  7:49   ` Jan Beulich
  1 sibling, 2 replies; 15+ messages in thread
From: Keir Fraser @ 2011-10-21  7:08 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 20/10/2011 14:36, "Jan Beulich" <JBeulich@suse.com> wrote:

> This patch set makes some first steps towards eliminating the old cpumask
> accessors, replacing them by such that don't require the full NR_CPUS
> bits to be allocated (which obviously can be pretty wasteful when
> NR_CPUS is high, but the actual number is low or moderate).
> 
> 01: introduce and use nr_cpu_ids and nr_cpumask_bits
> 02: eliminate cpumask accessors referencing NR_CPUS
> 03: eliminate direct assignments of CPU masks
> 04: x86: allocate IRQ actions' cpu_eoi_map dynamically
> 05: allocate CPU sibling and core maps dynamically

01-05/07-12:
Acked-by: Keir Fraser <keir@xen.org>

> 06: allow efficient allocation of multiple CPU masks at once

Not this one.

 -- Keir

> One reason I put the following ones together was to reduce the
> differences between the disassembly of hypervisors built for 4095
> and 2047 CPUs, which I looked at to determine the places where
> cpumask_t variables get copied without using cpumask_copy() (a
> job where grep is of no help). Hence consider these patch optional,
> but recommended.
> 
> 07: cpufreq: allocate CPU masks dynamically
> 08: x86/p2m: allocate CPU masks dynamically
> 09: cpupools: allocate CPU masks dynamically
> 10: credit: allocate CPU masks dynamically
> 11: x86/hpet: allocate CPU masks dynamically
> 12: cpumask <=> xenctl_cpumap: allocate CPU masks and byte maps dynamically
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 00/12] cpumask handling scalability improvements
  2011-10-20 16:04     ` Keir Fraser
@ 2011-10-21  7:10       ` Jan Beulich
  2011-10-21  8:00         ` Keir Fraser
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2011-10-21  7:10 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Andrew Cooper, xen-devel

>>> On 20.10.11 at 18:04, Keir Fraser <keir@xen.org> wrote:
> On 20/10/2011 16:19, "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>>>> 06: allow efficient allocation of multiple CPU masks at once
>>> 
>>> That is utterly hideous and for insignificant saving.
>> 
>> I was afraid you would say that, and I'm not fully convinced
>> either. But I wanted to give it a try to see how bad it is. The
>> more significant saving here really comes from not allocating
>> the CPU masks at all for unused irq_desc-s.
> 
> Aren't we planning to dynamically allocate irq_desc-s? That would seem the
> nicer solution here.

Yes, I would hope so. But irrespective of that, allocating e.g. 512 bits
(times 4) just to use, say, 20-30 of them is bad - again, not so much
from a memory wasting pov, but rather from the fact that this
needlessly causes a larger cache and TLB footprint.

I actually think that ultimately we should try to remove all
non-dynamically allocated CPU masks (including statics, per-CPU
ones, and local variables - the latter being particularly important as
they cause pretty big stack frames, despite there now being at
most one [with the rare exception of two] of them per function,
which will continue to grow with higher NR_CPUS values).

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 00/12] cpumask handling scalability improvements
  2011-10-21  7:08 ` Keir Fraser
@ 2011-10-21  7:14   ` Jan Beulich
  2011-10-21  7:49   ` Jan Beulich
  1 sibling, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2011-10-21  7:14 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

>>> On 21.10.11 at 09:08, Keir Fraser <keir.xen@gmail.com> wrote:
> On 20/10/2011 14:36, "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>> This patch set makes some first steps towards eliminating the old cpumask
>> accessors, replacing them by such that don't require the full NR_CPUS
>> bits to be allocated (which obviously can be pretty wasteful when
>> NR_CPUS is high, but the actual number is low or moderate).
>> 
>> 01: introduce and use nr_cpu_ids and nr_cpumask_bits
>> 02: eliminate cpumask accessors referencing NR_CPUS
>> 03: eliminate direct assignments of CPU masks
>> 04: x86: allocate IRQ actions' cpu_eoi_map dynamically
>> 05: allocate CPU sibling and core maps dynamically
> 
> 01-05/07-12:
> Acked-by: Keir Fraser <keir@xen.org>
> 
>> 06: allow efficient allocation of multiple CPU masks at once
> 
> Not this one.

Okay, I'll re-work (and re-send) it then to use non-array allocations.

Jan

>> One reason I put the following ones together was to reduce the
>> differences between the disassembly of hypervisors built for 4095
>> and 2047 CPUs, which I looked at to determine the places where
>> cpumask_t variables get copied without using cpumask_copy() (a
>> job where grep is of no help). Hence consider these patch optional,
>> but recommended.
>> 
>> 07: cpufreq: allocate CPU masks dynamically
>> 08: x86/p2m: allocate CPU masks dynamically
>> 09: cpupools: allocate CPU masks dynamically
>> 10: credit: allocate CPU masks dynamically
>> 11: x86/hpet: allocate CPU masks dynamically
>> 12: cpumask <=> xenctl_cpumap: allocate CPU masks and byte maps dynamically
>> 
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com 
>> http://lists.xensource.com/xen-devel 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 00/12] cpumask handling scalability improvements
  2011-10-21  7:08 ` Keir Fraser
  2011-10-21  7:14   ` Jan Beulich
@ 2011-10-21  7:49   ` Jan Beulich
  1 sibling, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2011-10-21  7:49 UTC (permalink / raw)
  To: mark.langsdorf, Keir Fraser; +Cc: xen-devel

>>> On 21.10.11 at 09:08, Keir Fraser <keir.xen@gmail.com> wrote:
> On 20/10/2011 14:36, "Jan Beulich" <JBeulich@suse.com> wrote:
> 01-05/07-12:
> Acked-by: Keir Fraser <keir@xen.org>
>
>> 07: cpufreq: allocate CPU masks dynamically

I didn't apply this one, as it wants
http://lists.xensource.com/archives/html/xen-devel/2011-10/msg01180.html
to be applied first, but there was no reaction from AMD so far. Mark?

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 00/12] cpumask handling scalability improvements
  2011-10-21  7:10       ` Jan Beulich
@ 2011-10-21  8:00         ` Keir Fraser
  2011-10-21  8:23           ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2011-10-21  8:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On 21/10/2011 08:10, "Jan Beulich" <JBeulich@suse.com> wrote:

>> Aren't we planning to dynamically allocate irq_desc-s? That would seem the
>> nicer solution here.
> 
> Yes, I would hope so. But irrespective of that, allocating e.g. 512 bits
> (times 4) just to use, say, 20-30 of them is bad - again, not so much
> from a memory wasting pov, but rather from the fact that this
> needlessly causes a larger cache and TLB footprint.

Oh, I don't mind you changing irq_desc to use cpumask_var_t-s. It's the
extra step of using an array to save a few pointers, that I reject.

> I actually think that ultimately we should try to remove all
> non-dynamically allocated CPU masks (including statics, per-CPU
> ones, and local variables - the latter being particularly important as
> they cause pretty big stack frames, despite there now being at
> most one [with the rare exception of two] of them per function,
> which will continue to grow with higher NR_CPUS values).

OTOH they are probably in some cases used in contexts where dynamic
allocation (and possibility of failure) is not nice? And we can compare this
effort with just increasing per-cpu stack sizes, for example?

I'm not particularly against moving in this direction, but there might be
cases where it isn't worth the effort, or there are better solutions.

 -- Keir

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 00/12] cpumask handling scalability improvements
  2011-10-21  8:00         ` Keir Fraser
@ 2011-10-21  8:23           ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2011-10-21  8:23 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Andrew Cooper, xen-devel

>>> On 21.10.11 at 10:00, Keir Fraser <keir.xen@gmail.com> wrote:
> On 21/10/2011 08:10, "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>>> Aren't we planning to dynamically allocate irq_desc-s? That would seem the
>>> nicer solution here.
>> 
>> Yes, I would hope so. But irrespective of that, allocating e.g. 512 bits
>> (times 4) just to use, say, 20-30 of them is bad - again, not so much
>> from a memory wasting pov, but rather from the fact that this
>> needlessly causes a larger cache and TLB footprint.
> 
> Oh, I don't mind you changing irq_desc to use cpumask_var_t-s. It's the
> extra step of using an array to save a few pointers, that I reject.

That's what I understood.

>> I actually think that ultimately we should try to remove all
>> non-dynamically allocated CPU masks (including statics, per-CPU
>> ones, and local variables - the latter being particularly important as
>> they cause pretty big stack frames, despite there now being at
>> most one [with the rare exception of two] of them per function,
>> which will continue to grow with higher NR_CPUS values).
> 
> OTOH they are probably in some cases used in contexts where dynamic
> allocation (and possibility of failure) is not nice? And we can compare this
> effort with just increasing per-cpu stack sizes, for example?
> 
> I'm not particularly against moving in this direction, but there might be
> cases where it isn't worth the effort, or there are better solutions.

Indeed, in some cases it may mean moving from stack variables to
e.g. per-CPU pre-allocated ones.

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-10-21  8:23 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-20 13:36 [PATCH 00/12] cpumask handling scalability improvements Jan Beulich
2011-10-20 15:09 ` Keir Fraser
2011-10-20 15:19   ` Jan Beulich
2011-10-20 15:25     ` Jan Beulich
2011-10-20 15:49     ` Andrew Cooper
2011-10-20 15:57       ` Jan Beulich
2011-10-20 16:04     ` Keir Fraser
2011-10-21  7:10       ` Jan Beulich
2011-10-21  8:00         ` Keir Fraser
2011-10-21  8:23           ` Jan Beulich
2011-10-20 16:06     ` Keir Fraser
2011-10-21  7:01       ` Jan Beulich
2011-10-21  7:08 ` Keir Fraser
2011-10-21  7:14   ` Jan Beulich
2011-10-21  7:49   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.