On Thu, 4 Apr 2019, Paolo Bonzini wrote: > On 04/04/19 18:52, Thomas Gleixner wrote: > > On Thu, 4 Apr 2019, David Laight wrote: > >> From: David Laight Sent: 04 April 2019 15:45 > >>> From: Fenghua Yu Sent: 03 April 2019 22:22 > >>> That is not true. > >>> The BTS/BTR instructions access the memory word that contains the > >>> expected bit. > >>> The 'operand size' only affects the size of the register use for the > >>> bit offset. > >>> If the 'operand size' is 16 bits wide (+/- 32k bit offset) the cpu might > >>> do an aligned 16bit memory access, otherwise (32 or 64bit bit offset) it > >>> might do an aligned 32 bit access. > >>> It should never do an 64bit access and never a misaligned one (even if > >>> the base address is misaligned). > >> > >> Hmmm... I may have misread things slightly. > >> The accessed address is 'Effective Address + (4 ∗ (BitOffset DIV 32))'. > >> However nothing suggests that it ever does 64bit accesses. > >> > >> If it does do 64bit accesses when the operand size is 64 bits then the > >> asm stubs ought to be changed to only specify 32bit operand size. > > > > bitops operate on unsigned long arrays, so the RMW on the affected array > > member has to be atomic vs. other RMW operations on the same array > > member. If we make the bitops 32bit wide on x86/64 we break that. > > > > So selecting 64bit access (REX.W prefix) is correct and has to stay. > > Aren't bitops always atomic with respect to the whole cache line(s)? We > regularly rely on cmpxchgl being atomic with respect to movb. Yes, but if your long goes across a cacheline then you have lost due to the requirement to lock both cachelines. Same problem as with bitops and I rather catch all of those than just some. Thanks, tglx