All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector()
@ 2012-08-24  9:15 Takuya Yoshikawa
  2012-08-27 20:25 ` Marcelo Tosatti
  0 siblings, 1 reply; 22+ messages in thread
From: Takuya Yoshikawa @ 2012-08-24  9:15 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm

Although returning -1 should be likely according to the likely(),
the ASSERT in apic_find_highest_irr() will be triggered in such a case.
It seems that this optimization is not working as expected.

This patch simplifies the logic to mitigate this issue: search for the
first non-zero word in a for loop and then use __fls() if found.  When
nothing found, we are out of the loop, so we can just return -1.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/kvm/lapic.c |   18 ++++++++++--------
 1 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 18d149d..5eb4dde 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -208,16 +208,18 @@ static unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
 
 static int find_highest_vector(void *bitmap)
 {
-	u32 *word = bitmap;
-	int word_offset = MAX_APIC_VECTOR >> 5;
+	u32 *p = bitmap;
+	u32 word;
+	int word_offset;
 
-	while ((word_offset != 0) && (word[(--word_offset) << 2] == 0))
-		continue;
+	for (word_offset = (MAX_APIC_VECTOR >> 5) - 1;
+	     word_offset >= 0; --word_offset) {
+		word = p[word_offset << 2];
+		if (word)
+			return __fls(word) + (word_offset << 5);
+	}
 
-	if (likely(!word_offset && !word[0]))
-		return -1;
-	else
-		return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
+	return -1;
 }
 
 static u8 count_vectors(void *bitmap)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector()
  2012-08-24  9:15 [PATCH] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector() Takuya Yoshikawa
@ 2012-08-27 20:25 ` Marcelo Tosatti
  2012-08-28  9:57   ` Takuya Yoshikawa
  0 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2012-08-27 20:25 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: avi, kvm

On Fri, Aug 24, 2012 at 06:15:49PM +0900, Takuya Yoshikawa wrote:
> Although returning -1 should be likely according to the likely(),
> the ASSERT in apic_find_highest_irr() will be triggered in such a case.
> It seems that this optimization is not working as expected.
> 
> This patch simplifies the logic to mitigate this issue: search for the
> first non-zero word in a for loop and then use __fls() if found.  When
> nothing found, we are out of the loop, so we can just return -1.

Numbers please?

> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> ---
>  arch/x86/kvm/lapic.c |   18 ++++++++++--------
>  1 files changed, 10 insertions(+), 8 deletions(-)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector()
  2012-08-27 20:25 ` Marcelo Tosatti
@ 2012-08-28  9:57   ` Takuya Yoshikawa
  2012-08-29 19:10     ` Marcelo Tosatti
  0 siblings, 1 reply; 22+ messages in thread
From: Takuya Yoshikawa @ 2012-08-28  9:57 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: avi, kvm

On Mon, 27 Aug 2012 17:25:42 -0300
Marcelo Tosatti <mtosatti@redhat.com> wrote:

> On Fri, Aug 24, 2012 at 06:15:49PM +0900, Takuya Yoshikawa wrote:
> > Although returning -1 should be likely according to the likely(),
> > the ASSERT in apic_find_highest_irr() will be triggered in such a case.
> > It seems that this optimization is not working as expected.
> > 
> > This patch simplifies the logic to mitigate this issue: search for the
> > first non-zero word in a for loop and then use __fls() if found.  When
> > nothing found, we are out of the loop, so we can just return -1.
> 
> Numbers please?


How about this?
(It would be great if someone help me understand why I got the numbers.)


Subject: [PATCH -v2] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector()

Although returning -1 should be likely according to the likely(), not
all callers expect such a result:

Call sites:
 - apic_search_irr()
   - apic_find_highest_irr()  --> ASSERT(result == -1 || result >= 16);
   - apic_clear_irr()
 - apic_find_highest_isr()    --> ASSERT(result == -1 || result >= 16);

The likely() might have made sense if returning -1 in apic_clear_irr()
dominated the usage.  But what I saw, by inserting counters in
find_highest_vector(), was not like that:

  When the guest was being booted up, the counter for the "likely" case
  alone increased rapidly and exceeded 1,000,000.  Then, after the guest
  booted up, the other cases started to happen and the "likely"/"others"
  ratio was between 1/4 to 1/3.  Especially when ping from the guest to
  host was running, this was very clear:

    * NOT FOUND : "likely" (return -1)
    * WORD FOUND: "others"
    * printed when counter % 1000 == 0

    [ 3566.796755] KVM: find_highest_vector: WORD FOUND 1707000
    [ 3573.716623] KVM: find_highest_vector: WORD FOUND 1708000
    [ 3575.666378] KVM: find_highest_vector: WORD FOUND 1709000
    [ 3580.514253] KVM: find_highest_vector: NOT FOUND  1574000
    [ 3586.763738] KVM: find_highest_vector: WORD FOUND 1710000
    [ 3593.506674] KVM: find_highest_vector: WORD FOUND 1711000
    [ 3595.766714] KVM: find_highest_vector: WORD FOUND 1712000
    [ 3600.523654] KVM: find_highest_vector: NOT FOUND  1575000
    [ 3607.485739] KVM: find_highest_vector: WORD FOUND 1713000
    [ 3614.009400] KVM: find_highest_vector: WORD FOUND 1714000
    [ 3616.669787] KVM: find_highest_vector: WORD FOUND 1715000
    [ 3620.971443] KVM: find_highest_vector: NOT FOUND  1576000

This result shows that the likely() was not likely at all after the
guest booted up.

This patch simplifies the code to mitigate this issue: search for the
first non-zero word in a for loop and then use __fls() if found.  When
nothing found, we are out of the loop, so we can just return -1.

With this patch applied, even when we see successive "not found", CPU
will predict things much better than us.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/kvm/lapic.c |   18 ++++++++++--------
 1 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 18d149d..5eb4dde 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -208,16 +208,18 @@ static unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
 
 static int find_highest_vector(void *bitmap)
 {
-	u32 *word = bitmap;
-	int word_offset = MAX_APIC_VECTOR >> 5;
+	u32 *p = bitmap;
+	u32 word;
+	int word_offset;
 
-	while ((word_offset != 0) && (word[(--word_offset) << 2] == 0))
-		continue;
+	for (word_offset = (MAX_APIC_VECTOR >> 5) - 1;
+	     word_offset >= 0; --word_offset) {
+		word = p[word_offset << 2];
+		if (word)
+			return __fls(word) + (word_offset << 5);
+	}
 
-	if (likely(!word_offset && !word[0]))
-		return -1;
-	else
-		return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
+	return -1;
 }
 
 static u8 count_vectors(void *bitmap)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector()
  2012-08-28  9:57   ` Takuya Yoshikawa
@ 2012-08-29 19:10     ` Marcelo Tosatti
  2012-08-29 22:51       ` Michael S. Tsirkin
  0 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2012-08-29 19:10 UTC (permalink / raw)
  To: Takuya Yoshikawa, Michael S. Tsirkin; +Cc: avi, kvm

On Tue, Aug 28, 2012 at 06:57:56PM +0900, Takuya Yoshikawa wrote:
> On Mon, 27 Aug 2012 17:25:42 -0300
> Marcelo Tosatti <mtosatti@redhat.com> wrote:
> 
> > On Fri, Aug 24, 2012 at 06:15:49PM +0900, Takuya Yoshikawa wrote:
> > > Although returning -1 should be likely according to the likely(),
> > > the ASSERT in apic_find_highest_irr() will be triggered in such a case.
> > > It seems that this optimization is not working as expected.
> > > 
> > > This patch simplifies the logic to mitigate this issue: search for the
> > > first non-zero word in a for loop and then use __fls() if found.  When
> > > nothing found, we are out of the loop, so we can just return -1.
> > 
> > Numbers please?
> 
> 
> How about this?
> (It would be great if someone help me understand why I got the numbers.)

Michael, can you explain the reasoning?

> Subject: [PATCH -v2] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector()
> 
> Although returning -1 should be likely according to the likely(), not
> all callers expect such a result:
> 
> Call sites:
>  - apic_search_irr()
>    - apic_find_highest_irr()  --> ASSERT(result == -1 || result >= 16);
>    - apic_clear_irr()
>  - apic_find_highest_isr()    --> ASSERT(result == -1 || result >= 16);
> 
> The likely() might have made sense if returning -1 in apic_clear_irr()
> dominated the usage.  But what I saw, by inserting counters in
> find_highest_vector(), was not like that:
> 
>   When the guest was being booted up, the counter for the "likely" case
>   alone increased rapidly and exceeded 1,000,000.  Then, after the guest
>   booted up, the other cases started to happen and the "likely"/"others"
>   ratio was between 1/4 to 1/3.  Especially when ping from the guest to
>   host was running, this was very clear:
> 
>     * NOT FOUND : "likely" (return -1)
>     * WORD FOUND: "others"
>     * printed when counter % 1000 == 0
> 
>     [ 3566.796755] KVM: find_highest_vector: WORD FOUND 1707000
>     [ 3573.716623] KVM: find_highest_vector: WORD FOUND 1708000
>     [ 3575.666378] KVM: find_highest_vector: WORD FOUND 1709000
>     [ 3580.514253] KVM: find_highest_vector: NOT FOUND  1574000
>     [ 3586.763738] KVM: find_highest_vector: WORD FOUND 1710000
>     [ 3593.506674] KVM: find_highest_vector: WORD FOUND 1711000
>     [ 3595.766714] KVM: find_highest_vector: WORD FOUND 1712000
>     [ 3600.523654] KVM: find_highest_vector: NOT FOUND  1575000
>     [ 3607.485739] KVM: find_highest_vector: WORD FOUND 1713000
>     [ 3614.009400] KVM: find_highest_vector: WORD FOUND 1714000
>     [ 3616.669787] KVM: find_highest_vector: WORD FOUND 1715000
>     [ 3620.971443] KVM: find_highest_vector: NOT FOUND  1576000
> 
> This result shows that the likely() was not likely at all after the
> guest booted up.
> 
> This patch simplifies the code to mitigate this issue: search for the
> first non-zero word in a for loop and then use __fls() if found.  When
> nothing found, we are out of the loop, so we can just return -1.
> 
> With this patch applied, even when we see successive "not found", CPU
> will predict things much better than us.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> ---
>  arch/x86/kvm/lapic.c |   18 ++++++++++--------
>  1 files changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 18d149d..5eb4dde 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -208,16 +208,18 @@ static unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
>  
>  static int find_highest_vector(void *bitmap)
>  {
> -	u32 *word = bitmap;
> -	int word_offset = MAX_APIC_VECTOR >> 5;
> +	u32 *p = bitmap;
> +	u32 word;
> +	int word_offset;
>  
> -	while ((word_offset != 0) && (word[(--word_offset) << 2] == 0))
> -		continue;
> +	for (word_offset = (MAX_APIC_VECTOR >> 5) - 1;
> +	     word_offset >= 0; --word_offset) {
> +		word = p[word_offset << 2];
> +		if (word)
> +			return __fls(word) + (word_offset << 5);
> +	}
>  
> -	if (likely(!word_offset && !word[0]))
> -		return -1;
> -	else
> -		return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
> +	return -1;
>  }
>  
>  static u8 count_vectors(void *bitmap)
> -- 
> 1.7.5.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector()
  2012-08-29 19:10     ` Marcelo Tosatti
@ 2012-08-29 22:51       ` Michael S. Tsirkin
  2012-08-30  1:09         ` Takuya Yoshikawa
  0 siblings, 1 reply; 22+ messages in thread
From: Michael S. Tsirkin @ 2012-08-29 22:51 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Takuya Yoshikawa, avi, kvm

On Wed, Aug 29, 2012 at 04:10:57PM -0300, Marcelo Tosatti wrote:
> On Tue, Aug 28, 2012 at 06:57:56PM +0900, Takuya Yoshikawa wrote:
> > On Mon, 27 Aug 2012 17:25:42 -0300
> > Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > 
> > > On Fri, Aug 24, 2012 at 06:15:49PM +0900, Takuya Yoshikawa wrote:
> > > > Although returning -1 should be likely according to the likely(),
> > > > the ASSERT in apic_find_highest_irr() will be triggered in such a case.
> > > > It seems that this optimization is not working as expected.
> > > > 
> > > > This patch simplifies the logic to mitigate this issue: search for the
> > > > first non-zero word in a for loop and then use __fls() if found.  When
> > > > nothing found, we are out of the loop, so we can just return -1.
> > > 
> > > Numbers please?
> > 
> > 
> > How about this?
> > (It would be great if someone help me understand why I got the numbers.)
> 
> Michael, can you explain the reasoning?

This text:
+       if (likely(!word_offset && !word[0]))
+               return -1;
is a left-over from the original implementation.

There we did a ton of gratitious calls to interrupt
injection so it was important to streamline that path:
lookups in emoty ISR and IRR.

This is less common nowdays, since we have kvm_make_request,
additionally, 8680b94b0e6046af2644c17313287ec0cb5843dc
means for ISR lookups we do not need to scan
the vector at all, and
33e4c68656a2e461b296ce714ec322978de85412
means we never need to scan IRR if it is empty.

So I think likely() here really became bogus by now.

But optimizing the vector lookups is a wrong thing to do
I suspect: these basically should almost never trigger.

Besides a single possible case: a single bit in IRR might
still be somewhat common I think.

If we want to optimize the case of a single bit set in IRR.
my guess is the best thing is to replace irr_pending with
generalization of isr_count/highest_isr_cache so we can have
a highest IRR cache. This will avoid scans completely.


> > Subject: [PATCH -v2] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector()
> > 
> > Although returning -1 should be likely according to the likely(), not
> > all callers expect such a result:
> > 
> > Call sites:
> >  - apic_search_irr()
> >    - apic_find_highest_irr()  --> ASSERT(result == -1 || result >= 16);
> >    - apic_clear_irr()
> >  - apic_find_highest_isr()    --> ASSERT(result == -1 || result >= 16);
> > 
> > The likely() might have made sense if returning -1 in apic_clear_irr()
> > dominated the usage.  But what I saw, by inserting counters in
> > find_highest_vector(), was not like that:
> > 
> >   When the guest was being booted up, the counter for the "likely" case
> >   alone increased rapidly and exceeded 1,000,000.  Then, after the guest
> >   booted up, the other cases started to happen and the "likely"/"others"
> >   ratio was between 1/4 to 1/3.  Especially when ping from the guest to
> >   host was running, this was very clear:
> > 
> >     * NOT FOUND : "likely" (return -1)
> >     * WORD FOUND: "others"
> >     * printed when counter % 1000 == 0
> > 
> >     [ 3566.796755] KVM: find_highest_vector: WORD FOUND 1707000
> >     [ 3573.716623] KVM: find_highest_vector: WORD FOUND 1708000
> >     [ 3575.666378] KVM: find_highest_vector: WORD FOUND 1709000
> >     [ 3580.514253] KVM: find_highest_vector: NOT FOUND  1574000
> >     [ 3586.763738] KVM: find_highest_vector: WORD FOUND 1710000
> >     [ 3593.506674] KVM: find_highest_vector: WORD FOUND 1711000
> >     [ 3595.766714] KVM: find_highest_vector: WORD FOUND 1712000
> >     [ 3600.523654] KVM: find_highest_vector: NOT FOUND  1575000
> >     [ 3607.485739] KVM: find_highest_vector: WORD FOUND 1713000
> >     [ 3614.009400] KVM: find_highest_vector: WORD FOUND 1714000
> >     [ 3616.669787] KVM: find_highest_vector: WORD FOUND 1715000
> >     [ 3620.971443] KVM: find_highest_vector: NOT FOUND  1576000
> > 
> > This result shows that the likely() was not likely at all after the
> > guest booted up.
> > 
> > This patch simplifies the code to mitigate this issue: search for the
> > first non-zero word in a for loop and then use __fls() if found.  When
> > nothing found, we are out of the loop, so we can just return -1.
> > 
> > With this patch applied, even when we see successive "not found", CPU
> > will predict things much better than us.
> > 
> > Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> > ---
> >  arch/x86/kvm/lapic.c |   18 ++++++++++--------
> >  1 files changed, 10 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 18d149d..5eb4dde 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -208,16 +208,18 @@ static unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
> >  
> >  static int find_highest_vector(void *bitmap)
> >  {
> > -	u32 *word = bitmap;
> > -	int word_offset = MAX_APIC_VECTOR >> 5;
> > +	u32 *p = bitmap;
> > +	u32 word;
> > +	int word_offset;
> >  
> > -	while ((word_offset != 0) && (word[(--word_offset) << 2] == 0))
> > -		continue;
> > +	for (word_offset = (MAX_APIC_VECTOR >> 5) - 1;
> > +	     word_offset >= 0; --word_offset) {
> > +		word = p[word_offset << 2];
> > +		if (word)
> > +			return __fls(word) + (word_offset << 5);
> > +	}
> >  
> > -	if (likely(!word_offset && !word[0]))
> > -		return -1;
> > -	else
> > -		return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
> > +	return -1;
> >  }
> >  
> >  static u8 count_vectors(void *bitmap)
> > -- 
> > 1.7.5.4
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector()
  2012-08-29 22:51       ` Michael S. Tsirkin
@ 2012-08-30  1:09         ` Takuya Yoshikawa
  2012-08-30  6:37           ` Michael S. Tsirkin
  0 siblings, 1 reply; 22+ messages in thread
From: Takuya Yoshikawa @ 2012-08-30  1:09 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Marcelo Tosatti, avi, kvm

On Thu, 30 Aug 2012 01:51:20 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> This text:
> +       if (likely(!word_offset && !word[0]))
> +               return -1;
> is a left-over from the original implementation.
> 
> There we did a ton of gratitious calls to interrupt
> injection so it was important to streamline that path:
> lookups in emoty ISR and IRR.
> 
> This is less common nowdays, since we have kvm_make_request,
> additionally, 8680b94b0e6046af2644c17313287ec0cb5843dc
> means for ISR lookups we do not need to scan
> the vector at all, and
> 33e4c68656a2e461b296ce714ec322978de85412
> means we never need to scan IRR if it is empty.
> 
> So I think likely() here really became bogus by now.

Yes, thank you for the explanation.

> But optimizing the vector lookups is a wrong thing to do
> I suspect: these basically should almost never trigger.

This patch is not optimizing things at all, just removing
unnatural logic which might be justified only for using that
bogus likely().

I explain this below.

> Besides a single possible case: a single bit in IRR might
> still be somewhat common I think.

Then, the current code is doing very bad thing:

  while ((word_offset != 0) && (word[(--word_offset) << 2] == 0))
      continue;

  if (likely(!word_offset && !word[0]))
      return -1;
  else
      return fls(word[word_offset << 2]) - 1 + (word_offset << 5);

- checking word[0] separately does not make sense
- using fls(), not __fls(), means we doubly check (word[i] == 0)

Actually, how can this likely() work so effectively even when we
return -1?  If we do (word[0] == 0) check in the same loop, CPU
should naturally predict the result:

  for (word_offset = (MAX_APIC_VECTOR >> 5) - 1;
       word_offset >= 0; --word_offset) {
      word = p[word_offset << 2];
      if (word)
          return __fls(word) + (word_offset << 5);
  }

  return -1;

> If we want to optimize the case of a single bit set in IRR.
> my guess is the best thing is to replace irr_pending with
> generalization of isr_count/highest_isr_cache so we can have
> a highest IRR cache. This will avoid scans completely.

Yes, but let's first fix the wrong code.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector()
  2012-08-30  1:09         ` Takuya Yoshikawa
@ 2012-08-30  6:37           ` Michael S. Tsirkin
  2012-08-30  9:50             ` Takuya Yoshikawa
  0 siblings, 1 reply; 22+ messages in thread
From: Michael S. Tsirkin @ 2012-08-30  6:37 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Marcelo Tosatti, avi, kvm

On Thu, Aug 30, 2012 at 10:09:06AM +0900, Takuya Yoshikawa wrote:
> On Thu, 30 Aug 2012 01:51:20 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > This text:
> > +       if (likely(!word_offset && !word[0]))
> > +               return -1;
> > is a left-over from the original implementation.
> > 
> > There we did a ton of gratitious calls to interrupt
> > injection so it was important to streamline that path:
> > lookups in emoty ISR and IRR.
> > 
> > This is less common nowdays, since we have kvm_make_request,
> > additionally, 8680b94b0e6046af2644c17313287ec0cb5843dc
> > means for ISR lookups we do not need to scan
> > the vector at all, and
> > 33e4c68656a2e461b296ce714ec322978de85412
> > means we never need to scan IRR if it is empty.
> > 
> > So I think likely() here really became bogus by now.
> 
> Yes, thank you for the explanation.
> 
> > But optimizing the vector lookups is a wrong thing to do
> > I suspect: these basically should almost never trigger.
> 
> This patch is not optimizing things at all, just removing
> unnatural logic which might be justified only for using that
> bogus likely().
> 
> I explain this below.
> 
> > Besides a single possible case: a single bit in IRR might
> > still be somewhat common I think.
> 
> Then, the current code is doing very bad thing:
> 
>   while ((word_offset != 0) && (word[(--word_offset) << 2] == 0))
>       continue;
> 
>   if (likely(!word_offset && !word[0]))
>       return -1;
>   else
>       return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
> 
> - checking word[0] separately does not make sense
> - using fls(), not __fls(), means we doubly check (word[i] == 0)
> 
> Actually, how can this likely() work so effectively even when we
> return -1?  If we do (word[0] == 0) check in the same loop, CPU
> should naturally predict the result:
> 
>   for (word_offset = (MAX_APIC_VECTOR >> 5) - 1;
>        word_offset >= 0; --word_offset) {
>       word = p[word_offset << 2];
>       if (word)
>           return __fls(word) + (word_offset << 5);
>   }
> 
>   return -1;
> 
> > If we want to optimize the case of a single bit set in IRR.
> > my guess is the best thing is to replace irr_pending with
> > generalization of isr_count/highest_isr_cache so we can have
> > a highest IRR cache. This will avoid scans completely.
> 
> Yes, but let's first fix the wrong code.
> 
> Thanks,
> 	Takuya


After staring at your code for a while it does appear to
do the right thing, and looks cleaner than what
we have now. commit log could be clearer.
It should state something like:


Clean up code in find_highest_vector:
 - likely() is there for historical reasons, it is no longer
   clear it's optimizing for the correct branch,
   and find_highest_vector might not be worth optimizing at all
 - checking word[0] separately does not make sense:
   if (word[word_offset << 2]) would be clearer
 - since we test word[...] != 0 beforehand, we can use __fls
   instead of fls()
 - for loop iterating over an array is clearer than while

Since you are going for cleanups, maybe we could also add:
- get rid of ugly >> 5 << 2, switch to using REG_POS instead?

Something like the below pseudo code would do this I think?

#define APIC_VECTORS_PER_REG 32

	int vec;
	for (vec = MAX_APIC_VECTOR - APIC_VECTORS_PER_REG;
	     vec -= APIC_VECTORS_PER_REG; vec >= 0) {
		u32 *reg = bitmap + REG_POS(vec);
		if (*reg)
			return __fls(*reg) - 1 + vec;
	}
	return -1

count_vectors similar:

        for (vec = 0; vec < MAX_APIC_VECTOR; vec += APIC_VECTORS_PER_REG) {
		u32 *reg = bitmap + REG_POS(vec);
                count += hweight32(*reg);
	}

hmm?

-- 
MST

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector()
  2012-08-30  6:37           ` Michael S. Tsirkin
@ 2012-08-30  9:50             ` Takuya Yoshikawa
  2012-08-30 10:10               ` Michael S. Tsirkin
  0 siblings, 1 reply; 22+ messages in thread
From: Takuya Yoshikawa @ 2012-08-30  9:50 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Marcelo Tosatti, avi, kvm

On Thu, 30 Aug 2012 09:37:02 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> After staring at your code for a while it does appear to
> do the right thing, and looks cleaner than what
> we have now. commit log could be clearer.
> It should state something like:
> 
> 
> Clean up code in find_highest_vector:
>  - likely() is there for historical reasons, it is no longer
>    clear it's optimizing for the correct branch,
>    and find_highest_vector might not be worth optimizing at all
>  - checking word[0] separately does not make sense:
>    if (word[word_offset << 2]) would be clearer
>  - since we test word[...] != 0 beforehand, we can use __fls
>    instead of fls()
>  - for loop iterating over an array is clearer than while

Yes, I'll update the patch.

> Since you are going for cleanups, maybe we could also add:
> - get rid of ugly >> 5 << 2, switch to using REG_POS instead?

OK, I'll do these on top of this patch.

> Something like the below pseudo code would do this I think?
> 
> #define APIC_VECTORS_PER_REG 32
> 
> 	int vec;
> 	for (vec = MAX_APIC_VECTOR - APIC_VECTORS_PER_REG;
> 	     vec -= APIC_VECTORS_PER_REG; vec >= 0) {
> 		u32 *reg = bitmap + REG_POS(vec);

We want to introduce apic_read_register(bitmap, reg) instead.
u32 reg = apic_read_register(bitmap, REG_POS(vec));

> 		if (*reg)
> 			return __fls(*reg) - 1 + vec;

Because it is not clear that this *reg is the same value
tested before. 

> 	}
> 	return -1
> 
> count_vectors similar:
> 
>         for (vec = 0; vec < MAX_APIC_VECTOR; vec += APIC_VECTORS_PER_REG) {
> 		u32 *reg = bitmap + REG_POS(vec);

Same here.

>                 count += hweight32(*reg);
> 	}
> 
> hmm?

Looks very good!

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector()
  2012-08-30  9:50             ` Takuya Yoshikawa
@ 2012-08-30 10:10               ` Michael S. Tsirkin
  2012-08-30 10:24                 ` Takuya Yoshikawa
  0 siblings, 1 reply; 22+ messages in thread
From: Michael S. Tsirkin @ 2012-08-30 10:10 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Marcelo Tosatti, avi, kvm

On Thu, Aug 30, 2012 at 06:50:52PM +0900, Takuya Yoshikawa wrote:
> On Thu, 30 Aug 2012 09:37:02 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > After staring at your code for a while it does appear to
> > do the right thing, and looks cleaner than what
> > we have now. commit log could be clearer.
> > It should state something like:
> > 
> > 
> > Clean up code in find_highest_vector:
> >  - likely() is there for historical reasons, it is no longer
> >    clear it's optimizing for the correct branch,
> >    and find_highest_vector might not be worth optimizing at all
> >  - checking word[0] separately does not make sense:
> >    if (word[word_offset << 2]) would be clearer
> >  - since we test word[...] != 0 beforehand, we can use __fls
> >    instead of fls()
> >  - for loop iterating over an array is clearer than while
> 
> Yes, I'll update the patch.
> 
> > Since you are going for cleanups, maybe we could also add:
> > - get rid of ugly >> 5 << 2, switch to using REG_POS instead?
> 
> OK, I'll do these on top of this patch.

Tweaking these 5 lines for readability across multiple
patches is just not worth it.
As long as we do random cleanups of this function it's probably easier
to just do them all in one patch.

> > Something like the below pseudo code would do this I think?
> > 
> > #define APIC_VECTORS_PER_REG 32
> > 
> > 	int vec;
> > 	for (vec = MAX_APIC_VECTOR - APIC_VECTORS_PER_REG;
> > 	     vec -= APIC_VECTORS_PER_REG; vec >= 0) {
> > 		u32 *reg = bitmap + REG_POS(vec);
> 
> We want to introduce apic_read_register(bitmap, reg) instead.
> u32 reg = apic_read_register(bitmap, REG_POS(vec));

If Marcelo takes it, I don't mind :)

> > 		if (*reg)
> > 			return __fls(*reg) - 1 + vec;
> 
> Because it is not clear that this *reg is the same value
> tested before. 

Before - where? Looks like this is the only place where
*reg is used.

> > 	}
> > 	return -1
> > 
> > count_vectors similar:
> > 
> >         for (vec = 0; vec < MAX_APIC_VECTOR; vec += APIC_VECTORS_PER_REG) {
> > 		u32 *reg = bitmap + REG_POS(vec);
> 
> Same here.

Same question :)

> >                 count += hweight32(*reg);
> > 	}
> > 
> > hmm?
> 
> Looks very good!
> 
> Thanks,
> 	Takuya

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector()
  2012-08-30 10:10               ` Michael S. Tsirkin
@ 2012-08-30 10:24                 ` Takuya Yoshikawa
  2012-08-30 10:44                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 22+ messages in thread
From: Takuya Yoshikawa @ 2012-08-30 10:24 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Marcelo Tosatti, avi, kvm

On Thu, 30 Aug 2012 13:10:33 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> > OK, I'll do these on top of this patch.
> 
> Tweaking these 5 lines for readability across multiple
> patches is just not worth it.
> As long as we do random cleanups of this function it's probably easier
> to just do them all in one patch.

OK.

> > > Something like the below pseudo code would do this I think?
> > > 
> > > #define APIC_VECTORS_PER_REG 32
> > > 
> > > 	int vec;
> > > 	for (vec = MAX_APIC_VECTOR - APIC_VECTORS_PER_REG;
> > > 	     vec -= APIC_VECTORS_PER_REG; vec >= 0) {
> > > 		u32 *reg = bitmap + REG_POS(vec);
> > 
> > We want to introduce apic_read_register(bitmap, reg) instead.
> > u32 reg = apic_read_register(bitmap, REG_POS(vec));
> 
> If Marcelo takes it, I don't mind :)
> 
> > > 		if (*reg)
> > > 			return __fls(*reg) - 1 + vec;
> > 
> > Because it is not clear that this *reg is the same value
> > tested before. 
> 
> Before - where? Looks like this is the only place where
> *reg is used.

  if (*reg)   // BEFORE
      return __fls(*reg) - 1 + vec;    // AFTER

Unless the value pointed to by a pointer cannot be updated
concurrently, it seems a good practice to use a local variable
explicitely in C level.

I know that this will not change anything actually, but many
bitops functions do similar things.

Takuya

> > > 	}
> > > 	return -1
> > > 
> > > count_vectors similar:
> > > 
> > >         for (vec = 0; vec < MAX_APIC_VECTOR; vec += APIC_VECTORS_PER_REG) {
> > > 		u32 *reg = bitmap + REG_POS(vec);
> > 
> > Same here.
> 
> Same question :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector()
  2012-08-30 10:24                 ` Takuya Yoshikawa
@ 2012-08-30 10:44                   ` Michael S. Tsirkin
  2012-08-30 12:30                     ` [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors() Takuya Yoshikawa
  0 siblings, 1 reply; 22+ messages in thread
From: Michael S. Tsirkin @ 2012-08-30 10:44 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Marcelo Tosatti, avi, kvm

On Thu, Aug 30, 2012 at 07:24:39PM +0900, Takuya Yoshikawa wrote:
> On Thu, 30 Aug 2012 13:10:33 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > > OK, I'll do these on top of this patch.
> > 
> > Tweaking these 5 lines for readability across multiple
> > patches is just not worth it.
> > As long as we do random cleanups of this function it's probably easier
> > to just do them all in one patch.
> 
> OK.
> 
> > > > Something like the below pseudo code would do this I think?
> > > > 
> > > > #define APIC_VECTORS_PER_REG 32
> > > > 
> > > > 	int vec;
> > > > 	for (vec = MAX_APIC_VECTOR - APIC_VECTORS_PER_REG;
> > > > 	     vec -= APIC_VECTORS_PER_REG; vec >= 0) {
> > > > 		u32 *reg = bitmap + REG_POS(vec);
> > > 
> > > We want to introduce apic_read_register(bitmap, reg) instead.
> > > u32 reg = apic_read_register(bitmap, REG_POS(vec));
> > 
> > If Marcelo takes it, I don't mind :)
> > 
> > > > 		if (*reg)
> > > > 			return __fls(*reg) - 1 + vec;
> > > 
> > > Because it is not clear that this *reg is the same value
> > > tested before. 
> > 
> > Before - where? Looks like this is the only place where
> > *reg is used.
> 
>   if (*reg)   // BEFORE
>       return __fls(*reg) - 1 + vec;    // AFTER
> 
> Unless the value pointed to by a pointer cannot be updated
> concurrently, it seems a good practice to use a local variable
> explicitely in C level.

This last statement is very wrong.  If you are trying to address concurrent
access on smp, using a varible will never fix it. You need ACCESS_ONCE,
barriers and all that jazz.

If instead you are talking about readability, using a wrapper just to do
'+' looks like a bit of an overkill to me: you almost literally do
#define plus(a,b) (a+b).

> I know that this will not change anything actually, but many
> bitops functions do similar things.
> 
> Takuya

They do a bit more: they avoid duplication by calling
VEC_POS and REG_POS with the same paramater.
But let's not argue theoretically, send a patch and maintainers
will judge it.

> > > > 	}
> > > > 	return -1
> > > > 
> > > > count_vectors similar:
> > > > 
> > > >         for (vec = 0; vec < MAX_APIC_VECTOR; vec += APIC_VECTORS_PER_REG) {
> > > > 		u32 *reg = bitmap + REG_POS(vec);
> > > 
> > > Same here.
> > 
> > Same question :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
  2012-08-30 10:44                   ` Michael S. Tsirkin
@ 2012-08-30 12:30                     ` Takuya Yoshikawa
  2012-08-30 13:21                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 22+ messages in thread
From: Takuya Yoshikawa @ 2012-08-30 12:30 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Marcelo Tosatti, avi, kvm

find_highest_vector() and count_vectors():
 - Instead of using magic values, define and use proper interfaces
   to access registers.

find_highest_vector():
 - Remove likely() which is there only for historical reasons and not
   doing correct branch predictions anymore.  Using such heuristics
   to optimize this function is not worth it now.  Let CPUs predict
   things instead.

 - Stop checking word[0] separately.  This was only needed for doing
   likely() optimization.

 - Use __fls(), not fls(), since we have checked the value passed to it
   is not zero.

 - Use for loop, not while, to iterate over the register array to make
   the code clearer.

Note that we actually confirmed that the likely() did wrong predictions
by inserting debug code.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/kvm/lapic.c |   35 +++++++++++++++++++++++------------
 1 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 18d149d..e44b8ab 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -66,6 +66,7 @@
 #define APIC_DEST_NOSHORT		0x0
 #define APIC_DEST_MASK			0x800
 #define MAX_APIC_VECTOR			256
+#define APIC_VECTORS_PER_REG		32
 
 #define VEC_POS(v) ((v) & (32 - 1))
 #define REG_POS(v) (((v) >> 5) << 4)
@@ -78,6 +79,11 @@ static inline void apic_set_reg(struct kvm_lapic *apic, int reg_off, u32 val)
 	*((u32 *) (apic->regs + reg_off)) = val;
 }
 
+static u32 apic_read_reg(int reg_off, void *bitmap)
+{
+	return *((u32 *)(bitmap + reg_off));
+}
+
 static inline int apic_test_and_set_vector(int vec, void *bitmap)
 {
 	return test_and_set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
@@ -208,25 +214,30 @@ static unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
 
 static int find_highest_vector(void *bitmap)
 {
-	u32 *word = bitmap;
-	int word_offset = MAX_APIC_VECTOR >> 5;
+	int vec;
+	u32 reg;
 
-	while ((word_offset != 0) && (word[(--word_offset) << 2] == 0))
-		continue;
+	for (vec = MAX_APIC_VECTOR - APIC_VECTORS_PER_REG;
+	     vec >= 0; vec -= APIC_VECTORS_PER_REG) {
+		reg = apic_read_reg(REG_POS(vec), bitmap);
+		if (reg)
+			return __fls(reg) + vec;
+	}
 
-	if (likely(!word_offset && !word[0]))
-		return -1;
-	else
-		return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
+	return -1;
 }
 
 static u8 count_vectors(void *bitmap)
 {
-	u32 *word = bitmap;
-	int word_offset;
+	int vec;
+	u32 reg;
 	u8 count = 0;
-	for (word_offset = 0; word_offset < MAX_APIC_VECTOR >> 5; ++word_offset)
-		count += hweight32(word[word_offset << 2]);
+
+	for (vec = 0; vec < MAX_APIC_VECTOR; vec += APIC_VECTORS_PER_REG) {
+		reg = apic_read_reg(REG_POS(vec), bitmap);
+		count += hweight32(reg);
+	}
+
 	return count;
 }
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
  2012-08-30 12:30                     ` [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors() Takuya Yoshikawa
@ 2012-08-30 13:21                       ` Michael S. Tsirkin
  2012-08-30 16:09                         ` Takuya Yoshikawa
  0 siblings, 1 reply; 22+ messages in thread
From: Michael S. Tsirkin @ 2012-08-30 13:21 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Marcelo Tosatti, avi, kvm

On Thu, Aug 30, 2012 at 09:30:19PM +0900, Takuya Yoshikawa wrote:
> find_highest_vector() and count_vectors():
>  - Instead of using magic values, define and use proper interfaces
>    to access registers.
> 
> find_highest_vector():
>  - Remove likely() which is there only for historical reasons and not
>    doing correct branch predictions anymore.  Using such heuristics
>    to optimize this function is not worth it now.  Let CPUs predict
>    things instead.
> 
>  - Stop checking word[0] separately.  This was only needed for doing
>    likely() optimization.
> 
>  - Use __fls(), not fls(), since we have checked the value passed to it
>    is not zero.
> 
>  - Use for loop, not while, to iterate over the register array to make
>    the code clearer.
> 
> Note that we actually confirmed that the likely() did wrong predictions
> by inserting debug code.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> ---
>  arch/x86/kvm/lapic.c |   35 +++++++++++++++++++++++------------
>  1 files changed, 23 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 18d149d..e44b8ab 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -66,6 +66,7 @@
>  #define APIC_DEST_NOSHORT		0x0
>  #define APIC_DEST_MASK			0x800
>  #define MAX_APIC_VECTOR			256
> +#define APIC_VECTORS_PER_REG		32
>  
>  #define VEC_POS(v) ((v) & (32 - 1))
>  #define REG_POS(v) (((v) >> 5) << 4)
> @@ -78,6 +79,11 @@ static inline void apic_set_reg(struct kvm_lapic *apic, int reg_off, u32 val)
>  	*((u32 *) (apic->regs + reg_off)) = val;
>  }
>  
> +static u32 apic_read_reg(int reg_off, void *bitmap)
> +{
> +	return *((u32 *)(bitmap + reg_off));
> +}
> +

Contrast with apic_set_reg which gets apic,
add fact that all callers invoke REG_POS and you will
see this is a bad API.

I played with some APIs but in the end it's
probably better to just open-code this.

As a bonus, open-coding will avoid the need
for cast above, which is good: casts make code more
fragile.

>  static inline int apic_test_and_set_vector(int vec, void *bitmap)
>  {
>  	return test_and_set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> @@ -208,25 +214,30 @@ static unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
>  
>  static int find_highest_vector(void *bitmap)
>  {
> -	u32 *word = bitmap;
> -	int word_offset = MAX_APIC_VECTOR >> 5;
> +	int vec;
> +	u32 reg;
>  
> -	while ((word_offset != 0) && (word[(--word_offset) << 2] == 0))
> -		continue;
> +	for (vec = MAX_APIC_VECTOR - APIC_VECTORS_PER_REG;
> +	     vec >= 0; vec -= APIC_VECTORS_PER_REG) {
> +		reg = apic_read_reg(REG_POS(vec), bitmap);
> +		if (reg)
> +			return __fls(reg) + vec;
> +	}
>  
> -	if (likely(!word_offset && !word[0]))
> -		return -1;
> -	else
> -		return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
> +	return -1;
>  }
>  
>  static u8 count_vectors(void *bitmap)
>  {
> -	u32 *word = bitmap;
> -	int word_offset;
> +	int vec;
> +	u32 reg;
>  	u8 count = 0;
> -	for (word_offset = 0; word_offset < MAX_APIC_VECTOR >> 5; ++word_offset)
> -		count += hweight32(word[word_offset << 2]);
> +
> +	for (vec = 0; vec < MAX_APIC_VECTOR; vec += APIC_VECTORS_PER_REG) {
> +		reg = apic_read_reg(REG_POS(vec), bitmap);
> +		count += hweight32(reg);
> +	}
> +
>  	return count;
>  }
>  
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
  2012-08-30 13:21                       ` Michael S. Tsirkin
@ 2012-08-30 16:09                         ` Takuya Yoshikawa
  2012-08-30 16:49                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 22+ messages in thread
From: Takuya Yoshikawa @ 2012-08-30 16:09 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Takuya Yoshikawa, Marcelo Tosatti, avi, kvm

On Thu, 30 Aug 2012 16:21:31 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:
 
> > +static u32 apic_read_reg(int reg_off, void *bitmap)
> > +{
> > +	return *((u32 *)(bitmap + reg_off));
> > +}
> > +
> 
> Contrast with apic_set_reg which gets apic,
> add fact that all callers invoke REG_POS and you will
> see this is a bad API.
> 
> I played with some APIs but in the end it's
> probably better to just open-code this.

I don't mind open-coding this.

> As a bonus, open-coding will avoid the need
> for cast above, which is good: casts make code more
> fragile.

But I still don't understand why we can eliminate casting:

  u32 reg_val;

  reg_val = *((u32 *)(bitmap + REG_POS(vec)));
  if (reg_val)
      return __fls(reg_val) + vec;

(I'm not sure compilers are allowed to push out the value and
do multiple references for this code as explained in
https://lwn.net/Articles/508991/ )


If you mean

  u32 *reg;

  reg = bitmap + REG_POS(vec);
  if (*reg)
      return __fls(*reg) + vec;

I'm still not confident if this is a good style.
I rarely see code doing

  if (*p)
      __fls(*p);

This looks like explicite multiple references: I'm not saying
this will actually be compiled to do multiple references.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
  2012-08-30 16:09                         ` Takuya Yoshikawa
@ 2012-08-30 16:49                           ` Michael S. Tsirkin
  2012-09-05  8:30                             ` Takuya Yoshikawa
  0 siblings, 1 reply; 22+ messages in thread
From: Michael S. Tsirkin @ 2012-08-30 16:49 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Takuya Yoshikawa, Marcelo Tosatti, avi, kvm

On Fri, Aug 31, 2012 at 01:09:56AM +0900, Takuya Yoshikawa wrote:
> On Thu, 30 Aug 2012 16:21:31 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>  
> > > +static u32 apic_read_reg(int reg_off, void *bitmap)
> > > +{
> > > +	return *((u32 *)(bitmap + reg_off));
> > > +}
> > > +
> > 
> > Contrast with apic_set_reg which gets apic,
> > add fact that all callers invoke REG_POS and you will
> > see this is a bad API.
> > 
> > I played with some APIs but in the end it's
> > probably better to just open-code this.
> 
> I don't mind open-coding this.
> 
> > As a bonus, open-coding will avoid the need
> > for cast above, which is good: casts make code more
> > fragile.
> 
> But I still don't understand why we can eliminate casting:
> 
>   u32 reg_val;
> 
>   reg_val = *((u32 *)(bitmap + REG_POS(vec)));
>   if (reg_val)
>       return __fls(reg_val) + vec;
> 
> (I'm not sure compilers are allowed to push out the value and
> do multiple references for this code as explained in
> https://lwn.net/Articles/508991/

So you *were* talking about concurrency?
And you expect to solve it somehow without barriers
explicit or implicit?

> )
> 
> 
> If you mean
> 
>   u32 *reg;
> 
>   reg = bitmap + REG_POS(vec);
>   if (*reg)
>       return __fls(*reg) + vec;

yes

> I'm still not confident if this is a good style.
> I rarely see code doing
> 
>   if (*p)
>       __fls(*p);
> 
> This looks like explicite multiple references: I'm not saying
> this will actually be compiled to do multiple references.
> 
> Thanks,
> 	Takuya

It's just weird. Both versions are exactly equivalent in C.
Adding a temporary changes *nothing* so the best readability
wins. And IMHO, a version that does not cast wins hands down.
I did a small test just to give you an example:

[mst@robin ~]$ cat a.c 

int foo(void *bitmap)
{
   unsigned *reg;
 
   reg = bitmap + 4;
   if (*reg)
       return *reg + 1;

   return -1;
}
[mst@robin ~]$ cat b.c 

int foo(void *bitmap)
{
   unsigned reg;
 
   reg = *((unsigned *)(bitmap + 4));
   if (reg)
       return reg + 1;

   return -1;
}

[mst@robin ~]$ gcc -O2 -c a.c
[mst@robin ~]$ gcc -O2 -c b.c


[mst@robin ~]$ objdump -ld a.o

a.o:     file format elf32-i386


Disassembly of section .text:

00000000 <foo>:
foo():
   0:   8b 44 24 04             mov    0x4(%esp),%eax
   4:   8b 50 04                mov    0x4(%eax),%edx
   7:   b8 ff ff ff ff          mov    $0xffffffff,%eax
   c:   8d 4a 01                lea    0x1(%edx),%ecx
   f:   85 d2                   test   %edx,%edx
  11:   0f 45 c1                cmovne %ecx,%eax
  14:   c3                      ret    
[mst@robin ~]$ objdump -ld b.o

b.o:     file format elf32-i386


Disassembly of section .text:

00000000 <foo>:
foo():
   0:   8b 44 24 04             mov    0x4(%esp),%eax
   4:   8b 50 04                mov    0x4(%eax),%edx
   7:   b8 ff ff ff ff          mov    $0xffffffff,%eax
   c:   8d 4a 01                lea    0x1(%edx),%ecx
   f:   85 d2                   test   %edx,%edx
  11:   0f 45 c1                cmovne %ecx,%eax
  14:   c3                      ret    



-- 
MST

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
  2012-08-30 16:49                           ` Michael S. Tsirkin
@ 2012-09-05  8:30                             ` Takuya Yoshikawa
  2012-09-05  9:26                               ` Michael S. Tsirkin
  0 siblings, 1 reply; 22+ messages in thread
From: Takuya Yoshikawa @ 2012-09-05  8:30 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Takuya Yoshikawa, Marcelo Tosatti, avi, kvm

On Thu, 30 Aug 2012 19:49:23 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Fri, Aug 31, 2012 at 01:09:56AM +0900, Takuya Yoshikawa wrote:
> > On Thu, 30 Aug 2012 16:21:31 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >  
> > > > +static u32 apic_read_reg(int reg_off, void *bitmap)
> > > > +{
> > > > +	return *((u32 *)(bitmap + reg_off));
> > > > +}
> > > > +
> > > 
> > > Contrast with apic_set_reg which gets apic,
> > > add fact that all callers invoke REG_POS and you will
> > > see this is a bad API.
> > > 
> > > I played with some APIs but in the end it's
> > > probably better to just open-code this.
> > 
> > I don't mind open-coding this.
> > 
> > > As a bonus, open-coding will avoid the need
> > > for cast above, which is good: casts make code more
> > > fragile.
> > 
> > But I still don't understand why we can eliminate casting:
> > 
> >   u32 reg_val;
> > 
> >   reg_val = *((u32 *)(bitmap + REG_POS(vec)));
> >   if (reg_val)
> >       return __fls(reg_val) + vec;
> > 
> > (I'm not sure compilers are allowed to push out the value and
> > do multiple references for this code as explained in
> > https://lwn.net/Articles/508991/
> 
> So you *were* talking about concurrency?

Yes and no, please see below.

> And you expect to solve it somehow without barriers
> explicit or implicit?

What I want to make clear is that the value we pass to
__fls() is not zero, not any more, to avoid undefined
behaviour.

So as you showed below, if the value passed to __fls() is
exactly from the register, which we did non-zero check,
that's fine.  Barriers are not related here.

But as can be seen in the last part of the article above,
that's may theoretically not be guranteed?

Anyway, I'm now thinking that we do not care about such
things here, and can just follow your advice, yes?


> 
> > )
> > 
> > 
> > If you mean
> > 
> >   u32 *reg;
> > 
> >   reg = bitmap + REG_POS(vec);
> >   if (*reg)
> >       return __fls(*reg) + vec;
> 
> yes
> 
> > I'm still not confident if this is a good style.
> > I rarely see code doing
> > 
> >   if (*p)
> >       __fls(*p);
> > 
> > This looks like explicite multiple references: I'm not saying
> > this will actually be compiled to do multiple references.
> > 
> > Thanks,
> > 	Takuya
> 
> It's just weird. Both versions are exactly equivalent in C.
> Adding a temporary changes *nothing* so the best readability
> wins. And IMHO, a version that does not cast wins hands down.
> I did a small test just to give you an example:

Thank you for the example.

What you showed is what I wanted to mean by
"I'm not saying this will actually be compiled to ..."

Thanks,
	Takuya

> 
> [mst@robin ~]$ cat a.c 
> 
> int foo(void *bitmap)
> {
>    unsigned *reg;
>  
>    reg = bitmap + 4;
>    if (*reg)
>        return *reg + 1;
> 
>    return -1;
> }
> [mst@robin ~]$ cat b.c 
> 
> int foo(void *bitmap)
> {
>    unsigned reg;
>  
>    reg = *((unsigned *)(bitmap + 4));
>    if (reg)
>        return reg + 1;
> 
>    return -1;
> }
> 
> [mst@robin ~]$ gcc -O2 -c a.c
> [mst@robin ~]$ gcc -O2 -c b.c
> 
> 
> [mst@robin ~]$ objdump -ld a.o
> 
> a.o:     file format elf32-i386
> 
> 
> Disassembly of section .text:
> 
> 00000000 <foo>:
> foo():
>    0:   8b 44 24 04             mov    0x4(%esp),%eax
>    4:   8b 50 04                mov    0x4(%eax),%edx
>    7:   b8 ff ff ff ff          mov    $0xffffffff,%eax
>    c:   8d 4a 01                lea    0x1(%edx),%ecx
>    f:   85 d2                   test   %edx,%edx
>   11:   0f 45 c1                cmovne %ecx,%eax
>   14:   c3                      ret    
> [mst@robin ~]$ objdump -ld b.o
> 
> b.o:     file format elf32-i386
> 
> 
> Disassembly of section .text:
> 
> 00000000 <foo>:
> foo():
>    0:   8b 44 24 04             mov    0x4(%esp),%eax
>    4:   8b 50 04                mov    0x4(%eax),%edx
>    7:   b8 ff ff ff ff          mov    $0xffffffff,%eax
>    c:   8d 4a 01                lea    0x1(%edx),%ecx
>    f:   85 d2                   test   %edx,%edx
>   11:   0f 45 c1                cmovne %ecx,%eax
>   14:   c3                      ret    

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
  2012-09-05  8:30                             ` Takuya Yoshikawa
@ 2012-09-05  9:26                               ` Michael S. Tsirkin
  2012-09-05  9:40                                 ` Takuya Yoshikawa
  0 siblings, 1 reply; 22+ messages in thread
From: Michael S. Tsirkin @ 2012-09-05  9:26 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Takuya Yoshikawa, Marcelo Tosatti, avi, kvm

On Wed, Sep 05, 2012 at 05:30:31PM +0900, Takuya Yoshikawa wrote:
> On Thu, 30 Aug 2012 19:49:23 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Fri, Aug 31, 2012 at 01:09:56AM +0900, Takuya Yoshikawa wrote:
> > > On Thu, 30 Aug 2012 16:21:31 +0300
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >  
> > > > > +static u32 apic_read_reg(int reg_off, void *bitmap)
> > > > > +{
> > > > > +	return *((u32 *)(bitmap + reg_off));
> > > > > +}
> > > > > +
> > > > 
> > > > Contrast with apic_set_reg which gets apic,
> > > > add fact that all callers invoke REG_POS and you will
> > > > see this is a bad API.
> > > > 
> > > > I played with some APIs but in the end it's
> > > > probably better to just open-code this.
> > > 
> > > I don't mind open-coding this.
> > > 
> > > > As a bonus, open-coding will avoid the need
> > > > for cast above, which is good: casts make code more
> > > > fragile.
> > > 
> > > But I still don't understand why we can eliminate casting:
> > > 
> > >   u32 reg_val;
> > > 
> > >   reg_val = *((u32 *)(bitmap + REG_POS(vec)));
> > >   if (reg_val)
> > >       return __fls(reg_val) + vec;
> > > 
> > > (I'm not sure compilers are allowed to push out the value and
> > > do multiple references for this code as explained in
> > > https://lwn.net/Articles/508991/
> > 
> > So you *were* talking about concurrency?
> 
> Yes and no, please see below.
> 
> > And you expect to solve it somehow without barriers
> > explicit or implicit?
> 
> What I want to make clear is that the value we pass to
> __fls() is not zero, not any more, to avoid undefined
> behaviour.
> 
> So as you showed below, if the value passed to __fls() is
> exactly from the register, which we did non-zero check,
> that's fine.  Barriers are not related here.
> 
> But as can be seen in the last part of the article above,
> that's may theoretically not be guranteed?

It's not guaranteed if another thread can modify the bitmap.
Is this the case here? If yes we need at least ACCESS_ONCE.

> Anyway, I'm now thinking that we do not care about such
> things here, and can just follow your advice, yes?

Unless you see an issue with it ...

> > 
> > > )
> > > 
> > > 
> > > If you mean
> > > 
> > >   u32 *reg;
> > > 
> > >   reg = bitmap + REG_POS(vec);
> > >   if (*reg)
> > >       return __fls(*reg) + vec;
> > 
> > yes
> > 
> > > I'm still not confident if this is a good style.
> > > I rarely see code doing
> > > 
> > >   if (*p)
> > >       __fls(*p);
> > > 
> > > This looks like explicite multiple references: I'm not saying
> > > this will actually be compiled to do multiple references.
> > > 
> > > Thanks,
> > > 	Takuya
> > 
> > It's just weird. Both versions are exactly equivalent in C.
> > Adding a temporary changes *nothing* so the best readability
> > wins. And IMHO, a version that does not cast wins hands down.
> > I did a small test just to give you an example:
> 
> Thank you for the example.
> 
> What you showed is what I wanted to mean by
> "I'm not saying this will actually be compiled to ..."
> 
> Thanks,
> 	Takuya
> 
> > 
> > [mst@robin ~]$ cat a.c 
> > 
> > int foo(void *bitmap)
> > {
> >    unsigned *reg;
> >  
> >    reg = bitmap + 4;
> >    if (*reg)
> >        return *reg + 1;
> > 
> >    return -1;
> > }
> > [mst@robin ~]$ cat b.c 
> > 
> > int foo(void *bitmap)
> > {
> >    unsigned reg;
> >  
> >    reg = *((unsigned *)(bitmap + 4));
> >    if (reg)
> >        return reg + 1;
> > 
> >    return -1;
> > }
> > 
> > [mst@robin ~]$ gcc -O2 -c a.c
> > [mst@robin ~]$ gcc -O2 -c b.c
> > 
> > 
> > [mst@robin ~]$ objdump -ld a.o
> > 
> > a.o:     file format elf32-i386
> > 
> > 
> > Disassembly of section .text:
> > 
> > 00000000 <foo>:
> > foo():
> >    0:   8b 44 24 04             mov    0x4(%esp),%eax
> >    4:   8b 50 04                mov    0x4(%eax),%edx
> >    7:   b8 ff ff ff ff          mov    $0xffffffff,%eax
> >    c:   8d 4a 01                lea    0x1(%edx),%ecx
> >    f:   85 d2                   test   %edx,%edx
> >   11:   0f 45 c1                cmovne %ecx,%eax
> >   14:   c3                      ret    
> > [mst@robin ~]$ objdump -ld b.o
> > 
> > b.o:     file format elf32-i386
> > 
> > 
> > Disassembly of section .text:
> > 
> > 00000000 <foo>:
> > foo():
> >    0:   8b 44 24 04             mov    0x4(%esp),%eax
> >    4:   8b 50 04                mov    0x4(%eax),%edx
> >    7:   b8 ff ff ff ff          mov    $0xffffffff,%eax
> >    c:   8d 4a 01                lea    0x1(%edx),%ecx
> >    f:   85 d2                   test   %edx,%edx
> >   11:   0f 45 c1                cmovne %ecx,%eax
> >   14:   c3                      ret    

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
  2012-09-05  9:26                               ` Michael S. Tsirkin
@ 2012-09-05  9:40                                 ` Takuya Yoshikawa
  2012-09-05  9:51                                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 22+ messages in thread
From: Takuya Yoshikawa @ 2012-09-05  9:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Takuya Yoshikawa, Marcelo Tosatti, avi, kvm

On Wed, 5 Sep 2012 12:26:49 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> It's not guaranteed if another thread can modify the bitmap.
> Is this the case here? If yes we need at least ACCESS_ONCE.

In this patch, using the wrapper function to read out a register
value forces compilers not to do bad things.  But I agree that
it is not a good API.

I would like to use fls() rather than ACCESS_ONCE if it's
really needed.

> > Anyway, I'm now thinking that we do not care about such
> > things here, and can just follow your advice, yes?
> 
> Unless you see an issue with it ...

Although I read the code, I'm not sure.

But this code is apparently not so critical for performance
that we can simply use fls().

If I can remove likely() and use proper macros, that's enough
for me.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
  2012-09-05  9:40                                 ` Takuya Yoshikawa
@ 2012-09-05  9:51                                   ` Michael S. Tsirkin
  2012-09-05 10:30                                     ` [PATCH -v4] " Takuya Yoshikawa
  0 siblings, 1 reply; 22+ messages in thread
From: Michael S. Tsirkin @ 2012-09-05  9:51 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Takuya Yoshikawa, Marcelo Tosatti, avi, kvm

On Wed, Sep 05, 2012 at 06:40:26PM +0900, Takuya Yoshikawa wrote:
> On Wed, 5 Sep 2012 12:26:49 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > It's not guaranteed if another thread can modify the bitmap.
> > Is this the case here? If yes we need at least ACCESS_ONCE.
> 
> In this patch, using the wrapper function to read out a register
> value forces compilers not to do bad things.

It's not robust. Compiler is free to optimize it out, and it
frequently does that.

> But I agree that it is not a good API.
> 
> I would like to use fls() rather than ACCESS_ONCE if it's
> really needed.

Sure, makes sense.

> > > Anyway, I'm now thinking that we do not care about such
> > > things here, and can just follow your advice, yes?
> > 
> > Unless you see an issue with it ...
> 
> Although I read the code, I'm not sure.
> 
> But this code is apparently not so critical for performance
> that we can simply use fls().

Yes. I guess if it becomes critical we'll need to add a cache anyway.

> If I can remove likely() and use proper macros, that's enough
> for me.

Me too.

> Thanks,
> 	Takuya

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH -v4] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
  2012-09-05  9:51                                   ` Michael S. Tsirkin
@ 2012-09-05 10:30                                     ` Takuya Yoshikawa
  2012-09-05 10:58                                       ` Michael S. Tsirkin
  2012-09-12 16:39                                       ` Marcelo Tosatti
  0 siblings, 2 replies; 22+ messages in thread
From: Takuya Yoshikawa @ 2012-09-05 10:30 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: mst, kvm

find_highest_vector() and count_vectors():
 - Instead of using magic values, define and use proper macros.

find_highest_vector():
 - Remove likely() which is there only for historical reasons and not
   doing correct branch predictions anymore.  Using such heuristics
   to optimize this function is not worth it now.  Let CPUs predict
   things instead.

 - Stop checking word[0] separately.  This was only needed for doing
   likely() optimization.

 - Use for loop, not while, to iterate over the register array to make
   the code clearer.

Note that we actually confirmed that the likely() did wrong predictions
by inserting debug code.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/kvm/lapic.c |   30 ++++++++++++++++++------------
 1 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 18d149d..8ace252 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -66,6 +66,7 @@
 #define APIC_DEST_NOSHORT		0x0
 #define APIC_DEST_MASK			0x800
 #define MAX_APIC_VECTOR			256
+#define APIC_VECTORS_PER_REG		32
 
 #define VEC_POS(v) ((v) & (32 - 1))
 #define REG_POS(v) (((v) >> 5) << 4)
@@ -208,25 +209,30 @@ static unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
 
 static int find_highest_vector(void *bitmap)
 {
-	u32 *word = bitmap;
-	int word_offset = MAX_APIC_VECTOR >> 5;
+	int vec;
+	u32 *reg;
 
-	while ((word_offset != 0) && (word[(--word_offset) << 2] == 0))
-		continue;
+	for (vec = MAX_APIC_VECTOR - APIC_VECTORS_PER_REG;
+	     vec >= 0; vec -= APIC_VECTORS_PER_REG) {
+		reg = bitmap + REG_POS(vec);
+		if (*reg)
+			return fls(*reg) - 1 + vec;
+	}
 
-	if (likely(!word_offset && !word[0]))
-		return -1;
-	else
-		return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
+	return -1;
 }
 
 static u8 count_vectors(void *bitmap)
 {
-	u32 *word = bitmap;
-	int word_offset;
+	int vec;
+	u32 *reg;
 	u8 count = 0;
-	for (word_offset = 0; word_offset < MAX_APIC_VECTOR >> 5; ++word_offset)
-		count += hweight32(word[word_offset << 2]);
+
+	for (vec = 0; vec < MAX_APIC_VECTOR; vec += APIC_VECTORS_PER_REG) {
+		reg = bitmap + REG_POS(vec);
+		count += hweight32(*reg);
+	}
+
 	return count;
 }
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH -v4] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
  2012-09-05 10:30                                     ` [PATCH -v4] " Takuya Yoshikawa
@ 2012-09-05 10:58                                       ` Michael S. Tsirkin
  2012-09-12 16:39                                       ` Marcelo Tosatti
  1 sibling, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2012-09-05 10:58 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: avi, mtosatti, kvm

On Wed, Sep 05, 2012 at 07:30:01PM +0900, Takuya Yoshikawa wrote:
> find_highest_vector() and count_vectors():
>  - Instead of using magic values, define and use proper macros.
> 
> find_highest_vector():
>  - Remove likely() which is there only for historical reasons and not
>    doing correct branch predictions anymore.  Using such heuristics
>    to optimize this function is not worth it now.  Let CPUs predict
>    things instead.
> 
>  - Stop checking word[0] separately.  This was only needed for doing
>    likely() optimization.
> 
>  - Use for loop, not while, to iterate over the register array to make
>    the code clearer.
> 
> Note that we actually confirmed that the likely() did wrong predictions
> by inserting debug code.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> Cc: Michael S. Tsirkin <mst@redhat.com>

Looks like a nice cleanup.
Acked-by: Michael S. Tsirkin <mst@redhat.com>


> ---
>  arch/x86/kvm/lapic.c |   30 ++++++++++++++++++------------
>  1 files changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 18d149d..8ace252 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -66,6 +66,7 @@
>  #define APIC_DEST_NOSHORT		0x0
>  #define APIC_DEST_MASK			0x800
>  #define MAX_APIC_VECTOR			256
> +#define APIC_VECTORS_PER_REG		32
>  
>  #define VEC_POS(v) ((v) & (32 - 1))
>  #define REG_POS(v) (((v) >> 5) << 4)
> @@ -208,25 +209,30 @@ static unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
>  
>  static int find_highest_vector(void *bitmap)
>  {
> -	u32 *word = bitmap;
> -	int word_offset = MAX_APIC_VECTOR >> 5;
> +	int vec;
> +	u32 *reg;
>  
> -	while ((word_offset != 0) && (word[(--word_offset) << 2] == 0))
> -		continue;
> +	for (vec = MAX_APIC_VECTOR - APIC_VECTORS_PER_REG;
> +	     vec >= 0; vec -= APIC_VECTORS_PER_REG) {
> +		reg = bitmap + REG_POS(vec);
> +		if (*reg)
> +			return fls(*reg) - 1 + vec;
> +	}
>  
> -	if (likely(!word_offset && !word[0]))
> -		return -1;
> -	else
> -		return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
> +	return -1;
>  }
>  
>  static u8 count_vectors(void *bitmap)
>  {
> -	u32 *word = bitmap;
> -	int word_offset;
> +	int vec;
> +	u32 *reg;
>  	u8 count = 0;
> -	for (word_offset = 0; word_offset < MAX_APIC_VECTOR >> 5; ++word_offset)
> -		count += hweight32(word[word_offset << 2]);
> +
> +	for (vec = 0; vec < MAX_APIC_VECTOR; vec += APIC_VECTORS_PER_REG) {
> +		reg = bitmap + REG_POS(vec);
> +		count += hweight32(*reg);
> +	}
> +
>  	return count;
>  }
>  
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH -v4] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
  2012-09-05 10:30                                     ` [PATCH -v4] " Takuya Yoshikawa
  2012-09-05 10:58                                       ` Michael S. Tsirkin
@ 2012-09-12 16:39                                       ` Marcelo Tosatti
  1 sibling, 0 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2012-09-12 16:39 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: avi, mst, kvm

On Wed, Sep 05, 2012 at 07:30:01PM +0900, Takuya Yoshikawa wrote:
> find_highest_vector() and count_vectors():
>  - Instead of using magic values, define and use proper macros.
> 
> find_highest_vector():
>  - Remove likely() which is there only for historical reasons and not
>    doing correct branch predictions anymore.  Using such heuristics
>    to optimize this function is not worth it now.  Let CPUs predict
>    things instead.
> 
>  - Stop checking word[0] separately.  This was only needed for doing
>    likely() optimization.
> 
>  - Use for loop, not while, to iterate over the register array to make
>    the code clearer.
> 
> Note that we actually confirmed that the likely() did wrong predictions
> by inserting debug code.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> Cc: Michael S. Tsirkin <mst@redhat.com>

Applied, thanks.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-09-12 17:05 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-24  9:15 [PATCH] KVM: x86: lapic: Fix the misuse of likely() in find_highest_vector() Takuya Yoshikawa
2012-08-27 20:25 ` Marcelo Tosatti
2012-08-28  9:57   ` Takuya Yoshikawa
2012-08-29 19:10     ` Marcelo Tosatti
2012-08-29 22:51       ` Michael S. Tsirkin
2012-08-30  1:09         ` Takuya Yoshikawa
2012-08-30  6:37           ` Michael S. Tsirkin
2012-08-30  9:50             ` Takuya Yoshikawa
2012-08-30 10:10               ` Michael S. Tsirkin
2012-08-30 10:24                 ` Takuya Yoshikawa
2012-08-30 10:44                   ` Michael S. Tsirkin
2012-08-30 12:30                     ` [PATCH -v3] KVM: x86: lapic: Clean up find_highest_vector() and count_vectors() Takuya Yoshikawa
2012-08-30 13:21                       ` Michael S. Tsirkin
2012-08-30 16:09                         ` Takuya Yoshikawa
2012-08-30 16:49                           ` Michael S. Tsirkin
2012-09-05  8:30                             ` Takuya Yoshikawa
2012-09-05  9:26                               ` Michael S. Tsirkin
2012-09-05  9:40                                 ` Takuya Yoshikawa
2012-09-05  9:51                                   ` Michael S. Tsirkin
2012-09-05 10:30                                     ` [PATCH -v4] " Takuya Yoshikawa
2012-09-05 10:58                                       ` Michael S. Tsirkin
2012-09-12 16:39                                       ` Marcelo Tosatti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.