From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8AC8C43613 for ; Mon, 24 Jun 2019 15:12:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 96880213F2 for ; Mon, 24 Jun 2019 15:12:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730552AbfFXPML convert rfc822-to-8bit (ORCPT ); Mon, 24 Jun 2019 11:12:11 -0400 Received: from eu-smtp-delivery-151.mimecast.com ([146.101.78.151]:24274 "EHLO eu-smtp-delivery-151.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728172AbfFXPMK (ORCPT ); Mon, 24 Jun 2019 11:12:10 -0400 Received: from AcuMS.aculab.com (156.67.243.126 [156.67.243.126]) (Using TLS) by relay.mimecast.com with ESMTP id uk-mta-143-5aPUt60zN6uguMOtfab1og-1; Mon, 24 Jun 2019 16:12:06 +0100 Received: from AcuMS.Aculab.com (fd9f:af1c:a25b:0:43c:695e:880f:8750) by AcuMS.aculab.com (fd9f:af1c:a25b:0:43c:695e:880f:8750) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Mon, 24 Jun 2019 16:12:05 +0100 Received: from AcuMS.Aculab.com ([fe80::43c:695e:880f:8750]) by AcuMS.aculab.com ([fe80::43c:695e:880f:8750%12]) with mapi id 15.00.1347.000; Mon, 24 Jun 2019 16:12:05 +0100 From: David Laight To: 'Fenghua Yu' , Thomas Gleixner , Ingo Molnar , Borislav Petkov , H Peter Anvin , Peter Zijlstra , Andrew Morton , Dave Hansen , "Paolo Bonzini" , Radim Krcmar , Christopherson Sean J , Ashok Raj , Tony Luck , Dan Williams , "Xiaoyao Li " , "Sai Praneeth Prakhya" , Ravi V Shankar CC: linux-kernel , x86 , "kvm@vger.kernel.org" Subject: RE: [PATCH v9 02/17] drivers/net/b44: Align pwol_mask to unsigned long for better performance Thread-Topic: [PATCH v9 02/17] drivers/net/b44: Align pwol_mask to unsigned long for better performance Thread-Index: AQHVJiiMr9gB8h3g0E+XfoKMzxMCiqaq8WpA Date: Mon, 24 Jun 2019 15:12:05 +0000 Message-ID: References: <1560897679-228028-1-git-send-email-fenghua.yu@intel.com> <1560897679-228028-3-git-send-email-fenghua.yu@intel.com> In-Reply-To: <1560897679-228028-3-git-send-email-fenghua.yu@intel.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 X-MC-Unique: 5aPUt60zN6uguMOtfab1og-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Fenghua Yu > Sent: 18 June 2019 23:41 > From: Peter Zijlstra > > A bit in pwol_mask is set in b44_magic_pattern() by atomic set_bit(). > But since pwol_mask is local and never exposed to concurrency, there is > no need to set bit in pwol_mask atomically. > > set_bit() sets the bit in a single unsigned long location. Because > pwol_mask may not be aligned to unsigned long, the location may cross two > cache lines. On x86, accessing two cache lines in locked instruction in > set_bit() is called split locked access and can cause overall performance > degradation. > > So use non atomic __set_bit() to set pwol_mask bits. __set_bit() won't hit > split lock issue on x86. > > Signed-off-by: Peter Zijlstra > Signed-off-by: Fenghua Yu > --- > drivers/net/ethernet/broadcom/b44.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/broadcom/b44.c b/drivers/net/ethernet/broadcom/b44.c > index 97ab0dd25552..5738ab963dfb 100644 > --- a/drivers/net/ethernet/broadcom/b44.c > +++ b/drivers/net/ethernet/broadcom/b44.c > @@ -1520,7 +1520,7 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset) > > memset(ppattern + offset, 0xff, magicsync); > for (j = 0; j < magicsync; j++) > - set_bit(len++, (unsigned long *) pmask); > + __set_bit(len++, (unsigned long *)pmask); > > for (j = 0; j < B44_MAX_PATTERNS; j++) { > if ((B44_PATTERN_SIZE - len) >= ETH_ALEN) > @@ -1532,7 +1532,7 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset) > for (k = 0; k< ethaddr_bytes; k++) { > ppattern[offset + magicsync + > (j * ETH_ALEN) + k] = macaddr[k]; > - set_bit(len++, (unsigned long *) pmask); > + __set_bit(len++, (unsigned long *)pmask); Is this code expected to do anything sensible on BE systems? Casting the bitmask[] argument to any of the set_bit() functions is dubious at best. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)