From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S964971AbcCOMh1 (ORCPT <rfc822;w@1wt.eu>);
	Tue, 15 Mar 2016 08:37:27 -0400
Received: from mail-wm0-f67.google.com ([74.125.82.67]:34925 "EHLO
	mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752925AbcCOMhT (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 15 Mar 2016 08:37:19 -0400
Date: Tue, 15 Mar 2016 13:37:14 +0100
From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        Peter Zijlstra <a.p.zijlstra@chello.nl>,
        Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to
 'xchg_or()'
Message-ID: <20160315123714.GA12289@gmail.com>
References: <20160314123200.GA15971@gmail.com>
 <CA+55aFyqG0xriTOus7wu527rdYHeLX3Qidt9ZzggL+MsR=_iQQ@mail.gmail.com>
 <20160315093245.GA7943@gmail.com>
 <20160315120147.GA9742@gmail.com>
 <20160315123253.GA10152@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160315123253.GA10152@gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > But IMHO this really highlights a fundamental weakness of all this macro magic, 
> > it's all way too fragile.
> > 
> > Why don't we introduce a boring family of APIs:
> > 
> > 	cmpxchg_8()
> > 	cmpxchg_16()
> > 	cmpxchg_32()
> > 	cmpxchg_64()
> > 
> > 	xchg_or_32()
> > 	xchg_or_64()
> > 	...
> > 
> > ... with none of this pesky auto-typing property and none of the 
> > macro-inside-a-macro crap? We could do clean types and would write them all in 
> > proper C, not fragile CPP.
> > 
> > It's not like we migrate between the types all that frequently - and even if we 
> > do, it's trivial.
> > 
> > hm?
> 
> So if we are still on the same page at this point, we'd have to add a pointer 
> variant too I suspect:
> 
> 	cmpxchg_ptr()
> 	xchg_ptr()
> 
> ... whose bitness may differ between architectures(subarches), but it would still 
> be a single variant per architecture, i.e. still with pretty clear type 
> propagation and with a very clear notion of which architecture supports what.
> 
> It looks like a lot of work, but it's all low complexity work AFAICS that could be 
> partly automated.

Btw., if we do all this, we could still add auto-type API variants, but now they 
would be implemented at the highest level, with none of the auto-type complexity 
pushed down to the architecture level. Architectures just provide their set of 
APIs for a given list of types, and that's it.

I hate to see all the auto-typing complexity pushed down to the arch assembly 
level:

/*
 * Atomic compare and exchange.  Compare OLD with MEM, if identical,
 * store NEW in MEM.  Return the initial value in MEM.  Success is
 * indicated by comparing RETURN with OLD.
 */
#define __raw_cmpxchg(ptr, old, new, size, lock)			\
({									\
	__typeof__(*(ptr)) __ret;					\
	__typeof__(*(ptr)) __old = (old);				\
	__typeof__(*(ptr)) __new = (new);				\
	switch (size) {							\
	case __X86_CASE_B:						\
	{								\
		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
		asm volatile(lock "cmpxchgb %2,%1"			\
			     : "=a" (__ret), "+m" (*__ptr)		\
			     : "q" (__new), "0" (__old)			\
			     : "memory");				\
		break;							\
	}								\
	case __X86_CASE_W:						\
	{								\
		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
		asm volatile(lock "cmpxchgw %2,%1"			\
			     : "=a" (__ret), "+m" (*__ptr)		\
			     : "r" (__new), "0" (__old)			\
			     : "memory");				\
		break;							\
	}								\
	case __X86_CASE_L:						\
	{								\
		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
		asm volatile(lock "cmpxchgl %2,%1"			\
			     : "=a" (__ret), "+m" (*__ptr)		\
			     : "r" (__new), "0" (__old)			\
			     : "memory");				\
		break;							\
	}								\
	case __X86_CASE_Q:						\
	{								\
		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
		asm volatile(lock "cmpxchgq %2,%1"			\
			     : "=a" (__ret), "+m" (*__ptr)		\
			     : "r" (__new), "0" (__old)			\
			     : "memory");				\
		break;							\
	}								\
	default:							\
		__cmpxchg_wrong_size();					\
	}								\
	__ret;								\
})

it makes things harder to read, harder to debug and harder to optimize.

Thanks,

	Ingo