From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752426AbdF0HDA (ORCPT ); Tue, 27 Jun 2017 03:03:00 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:42404 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752389AbdF0HCx (ORCPT ); Tue, 27 Jun 2017 03:02:53 -0400 Date: Tue, 27 Jun 2017 00:02:52 -0700 From: Matthew Wilcox To: Rasmus Villemoes Cc: linux-kernel@vger.kernel.org, Andrew Morton , Martin Schwidefsky , Matthew Wilcox Subject: Re: [PATCH 2/3] Turn bitmap_set and bitmap_clear into memset when possible Message-ID: <20170627070252.GB27359@bombadil.infradead.org> References: <20170607142924.28552-1-willy@infradead.org> <20170607142924.28552-3-willy@infradead.org> <87bmpzbiru.fsf@rasmusvillemoes.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87bmpzbiru.fsf@rasmusvillemoes.dk> User-Agent: Mutt/1.8.0 (2017-02-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 07, 2017 at 11:16:37PM +0200, Rasmus Villemoes wrote: > On Wed, Jun 07 2017, Matthew Wilcox wrote: > > @@ -319,6 +319,9 @@ static __always_inline void bitmap_set(unsigned long *map, unsigned int start, > > { > > if (__builtin_constant_p(nbits) && nbits == 1) > > __set_bit(start, map); > > + else if (__builtin_constant_p(start & 7) && IS_ALIGNED(start, 8) && > > + __builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8)) > > + memset(map + start / 8, 0xff, nbits / 8); > > else > > Isn't the pointer arithmetic wrong here? I think you need to cast map to > (char*). Good catch. I've added a test case to test_bitmap.c and it catches the problem. > Do you have an example of how the generated code changes, both in the > case of actual constants and a case where gcc can see that start and > nbits are byte-aligned without knowing their actual values? Here's an example of start == 0 and nbits being clearly a multiple of 8, courtesy of btrfs's extent-io-tests.c. Before: f1: 48 8b 5c 24 10 mov 0x10(%rsp),%rbx f6: 4c 8b 3c 24 mov (%rsp),%r15 fa: 31 f6 xor %esi,%esi fc: 44 8d 34 dd 00 00 00 lea 0x0(,%rbx,8),%r14d 103: 00 104: 4c 8d 2c dd 00 00 00 lea 0x0(,%rbx,8),%r13 10b: 00 10c: 4c 89 ff mov %r15,%rdi 10f: 44 89 f2 mov %r14d,%edx 112: e8 00 00 00 00 callq 117 <__test_eb_bitmaps+0x77> 113: R_X86_64_PC32 bitmap_set-0x4 117: 31 d2 xor %edx,%edx 119: 31 f6 xor %esi,%esi 11b: 4c 89 e9 mov %r13,%rcx 11e: 4c 89 e7 mov %r12,%rdi 121: e8 00 00 00 00 callq 126 <__test_eb_bitmaps+0x86> 122: R_X86_64_PC32 extent_buffer_bitmap_set-0x4 After: f1: 4c 8b 7c 24 10 mov 0x10(%rsp),%r15 f6: be ff 00 00 00 mov $0xff,%esi fb: 48 89 df mov %rbx,%rdi fe: 4d 89 fe mov %r15,%r14 101: 4e 8d 2c fd 00 00 00 lea 0x0(,%r15,8),%r13 108: 00 109: 41 81 e6 ff ff ff 1f and $0x1fffffff,%r14d 110: 4c 89 f2 mov %r14,%rdx 113: e8 00 00 00 00 callq 118 <__test_eb_bitmaps+0x78> 114: R_X86_64_PC32 memset-0x4 118: 48 8b 2c 24 mov (%rsp),%rbp 11c: 31 d2 xor %edx,%edx 11e: 31 f6 xor %esi,%esi 120: 4c 89 e9 mov %r13,%rcx 123: 48 89 ef mov %rbp,%rdi 126: e8 00 00 00 00 callq 12b <__test_eb_bitmaps+0x8b> 127: R_X86_64_PC32 extent_buffer_bitmap_set-0x4 It's actually slightly less efficient in the caller (although obviously memset() is going to execute faster than bitmap_set()). Partly because x86 made some odd choices about the behaviour of an 8-bit move instruction (it leaves the upper 24 bits intact, rather than zeroing them, so gcc has to use a 32-bit move instruction to put 0xff into the second argument to memset()), and partly because gcc is seeing us do: unsigned long len; len = len * 8 / 8; so it's (very reasonably) transforming that into len &= 0x1fff'ffff'ffff'ffff; That's partly a quirk of how this particular function is written.