From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754460AbcFPNMO (ORCPT <rfc822;w@1wt.eu>);
	Thu, 16 Jun 2016 09:12:14 -0400
Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:34161 "EHLO
	mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL)
	by vger.kernel.org with ESMTP id S1754033AbcFPNMJ (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 16 Jun 2016 09:12:09 -0400
X-IBM-Helo: d23dlp01.au.ibm.com
X-IBM-MailFrom: maddy@linux.vnet.ibm.com
X-IBM-RcptTo: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] tools/perf: fix the word selected in find_*_bit
To: Yury Norov <ynorov@caviumnetworks.com>
References: <1465990973-31483-1-git-send-email-maddy@linux.vnet.ibm.com>
 <20160615195127.GA6039@yury-N73SV> <20160615211104.GA6978@yury-N73SV>
Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Adrian Hunter <adrian.hunter@intel.com>, Borislav Petkov <bp@suse.de>,
        David Ahern <dsahern@gmail.com>, George Spelvin <linux@horizon.com>,
        Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
        Rasmus Villemoes <linux@rasmusvillemoes.dk>,
        Wang Nan <wangnan0@huawei.com>, Yury Norov <yury.norov@gmail.com>,
        Michael Ellerman <mpe@ellerman.id.au>
From: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Date: Thu, 16 Jun 2016 18:41:50 +0530
User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <20160615211104.GA6978@yury-N73SV>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16061613-0012-0000-0000-000001A00A7C
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 16061613-0013-0000-0000-00000569E254
Message-Id: <632e6ce5-afc6-5507-ef91-fddf727992ff@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-06-16_06:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0
 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam
 adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000
 definitions=main-1606160147
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


On Thursday 16 June 2016 02:41 AM, Yury Norov wrote:
> On Wed, Jun 15, 2016 at 10:51:27PM +0300, Yury Norov wrote:
>> Hi Madhavan,
>>
>> On Wed, Jun 15, 2016 at 05:12:53PM +0530, Madhavan Srinivasan wrote:
>>> When decoding the perf_regs mask in regs_dump__printf(),
>>> we loop through the mask using find_first_bit and find_next_bit functions.
>>> And mask is of type "u64". But "u64" is send as a "unsigned long *" to
>>> lib functions along with sizeof().
>>>
>>> While the exisitng code works fine in most of the case, when using a 32bit perf
>>> on a 64bit kernel (Big Endian), we end reading the wrong word. In find_first_bit(),
>>> one word at a time (based on BITS_PER_LONG) is loaded and
>>> checked for any bit set. In 32bit BE userspace,
>>> BITS_PER_LONG turns out to be 32, and for a mask value of
>>> "0x00000000000000ff", find_first_bit will return 32, instead of 0.
>>> Reason for this is that, value in the word0 is all zeros and value
>>> in word1 is 0xff. Ideally, second word in the mask should be loaded
>>> and searched. Patch swaps the word to look incase of 32bit BE.
>> I think this is not a problem of find_bit() at all. You have wrong
>> typecast as the source of problem (tools/perf/util/session.c"):
>>
>> 940 static void regs_dump__printf(u64 mask, u64 *regs)
>> 941 {
>> 942         unsigned rid, i = 0;
>> 943 
>> 944         for_each_set_bit(rid, (unsigned long *) &mask, sizeof(mask) * 8) {
>>                                           ^^^^ Here ^^^^
>> 945                 u64 val = regs[i++];
>> 946 
>> 947                 printf(".... %-5s 0x%" PRIx64 "\n",
>> 948                        perf_reg_name(rid), val);
>> 949         }
>> 950 }
>>
>> But for some reason you change correct find_bit()...
>>
>> Though proper fix is like this for me:
>>
>> static void regs_dump__printf(u64 mask, u64 *regs)
>> {
>>         unsigned rid, i = 0;
>>         unsigned long _mask[sizeof(mask)/sizeof(unsigned long)];
>>
>>         _mask[0] = mask & ULONG_MAX;
>>         if (sizeof(mask) > sizeof(unsigned long))
>>                 _mask[1] = mask >> BITS_PER_LONG;
>>
>>         for_each_set_bit(rid, _mask, sizeof(mask) * BITS_PER_BYTE) {
>>                 u64 val = regs[i++];
>>
>>                 printf(".... %-5s 0x%" PRIx64 "\n",
>>                        perf_reg_name(rid), val);
>>         }
>> }
>>
>> Maybe there already is some macro doing the conversion for you...
> yes it is, cpu_to_le64() is what you want

no wait, on second look, cpu_to_le64() is not right.
Because we will end up swapping within 32bit.
But what you suggested looks to be fine. I can repost
this with one minor tweak, right shift with 32 instead
of BITS_PER_LONG (since I see compiler errors in 64bit).

Maddy

>
>> Yury.
>>
>>> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
>>> Cc: Adrian Hunter <adrian.hunter@intel.com>
>>> Cc: Borislav Petkov <bp@suse.de>
>>> Cc: David Ahern <dsahern@gmail.com>
>>> Cc: George Spelvin <linux@horizon.com>
>>> Cc: Jiri Olsa <jolsa@redhat.com>
>>> Cc: Namhyung Kim <namhyung@kernel.org>
>>> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
>>> Cc: Wang Nan <wangnan0@huawei.com>
>>> Cc: Yury Norov <yury.norov@gmail.com>
>>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>>> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
>>> ---
>>>  tools/lib/find_bit.c | 17 +++++++++++++++++
>>>  1 file changed, 17 insertions(+)
>>>
>>> diff --git a/tools/lib/find_bit.c b/tools/lib/find_bit.c
>>> index 9122a9e80046..996b3e04324f 100644
>>> --- a/tools/lib/find_bit.c
>>> +++ b/tools/lib/find_bit.c
>>> @@ -37,7 +37,12 @@ static unsigned long _find_next_bit(const unsigned long *addr,
>>>  	if (!nbits || start >= nbits)
>>>  		return nbits;
>>>
>>> +#if (__BYTE_ORDER == __BIG_ENDIAN) && (BITS_PER_LONG != 64)
>>> +	tmp = addr[(((nbits - 1)/BITS_PER_LONG) - (start / BITS_PER_LONG))]
>>> +								^ invert;
>>> +#else
>>>  	tmp = addr[start / BITS_PER_LONG] ^ invert;
>>> +#endif
>>>
>>>  	/* Handle 1st word. */
>>>  	tmp &= BITMAP_FIRST_WORD_MASK(start);
>>> @@ -48,7 +53,12 @@ static unsigned long _find_next_bit(const unsigned long *addr,
>>>  		if (start >= nbits)
>>>  			return nbits;
>>>
>>> +#if (__BYTE_ORDER == __BIG_ENDIAN) && (BITS_PER_LONG != 64)
>>> +		tmp = addr[(((nbits - 1)/BITS_PER_LONG) - (start / BITS_PER_LONG))]
>>> +								^ invert;
>>> +#else
>>>  		tmp = addr[start / BITS_PER_LONG] ^ invert;
>>> +#endif
>>>  	}
>>>
>>>  	return min(start + __ffs(tmp), nbits);
>>> @@ -75,8 +85,15 @@ unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
>>>  	unsigned long idx;
>>>
>>>  	for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
>>> +#if (__BYTE_ORDER == __BIG_ENDIAN) && (BITS_PER_LONG != 64)
>>> +		if (addr[(((size-1)/BITS_PER_LONG) - idx)])
>>> +			return min(idx * BITS_PER_LONG +
>>> +				__ffs(addr[(((size-1)/BITS_PER_LONG) - idx)]),
>>> +									size);
>>> +#else
>>>  		if (addr[idx])
>>>  			return min(idx * BITS_PER_LONG + __ffs(addr[idx]), size);
>>> +#endif
>>>  	}
>>>
>>>  	return size;
>>> --
>>> 1.9.1