linux-m68k.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] treewide: remove current_text_addr
       [not found]   ` <207784db-4fcc-85e7-a0b2-fec26b7dab81@gmx.de>
@ 2018-08-26  2:38     ` H. Peter Anvin
  2018-08-26  3:16       ` H. Peter Anvin
  0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2018-08-26  2:38 UTC (permalink / raw)
  To: Helge Deller, Nick Desaulniers, torvalds, akpm
  Cc: Nicolas Pitre, linux-mips, linux-sh, Benjamin Herrenschmidt,
	Will Deacon, Paul Mackerras, Michael Ellerman,
	James E.J. Bottomley, Geert Uytterhoeven, Catalin Marinas,
	Vasily Gorbik, Matt Turner, uclinux-h8-devel, Marc Zyngier,
	Ram Pai, linux-um, Nicholas Piggin, Andy Lutomirski,
	Shannon Nelson, tglx, Alex Bennée, Richard Henderson,
	Jiri Kosina, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3032 bytes --]

On 08/25/18 03:48, Helge Deller wrote:
> 
> Currently alpha, s390, sparc, sh, c6x, ia64 and parisc provide an
> inline assembly function to get the current instruction pointer. 
> As mentioned in an earlier thread, I personally would *prefer* if 
> _THIS_IP_ would use those inline assembly instructions on those
> architectures instead of the (currently used) higher C-level
> implementation.
> 

Older ones have as well, e.g. x86.

The only reason to retain the use of an assembly function would be in
the case where either:

a) the C implementation produces bad or invalid code on certain
   architectures;
b) there is a specific requirement that either an absolute or a relative
   value is used in the binary, e.g. due to constraints on relocation.
   The latter particularly comes to mind since the x86-64 implementation
   in assembly will produce movq $.,%reg (which requires relocation)
   instead of the more natural leaq .(%rip),%reg.

In the case (a) those architectures ought to be able to simply

#undef _THIS_IP_
#define _THIS_IP_ blah...

and in case (b) *those specific instances* should be using some kind of
specially flagged function e.g. current_true_ip() vs.
current_linktime_ip() or somesuch.

I also note that a lot of those functions are not marked
__always_inline, which is a serious error should the compiler ever get
the idea to out-of-line these functions, which could potentially happen
as gcc is rather bad at assigning weight to an assembly statement.

I'm also going to throw in a perhaps ugly bomb into this discussion:

_THIS_IP_ seems to be horribly ill-defined; there is no kind of
serialization, and no reason to believe it can't be arbitrarily hoisted
inside a function. Furthermore, *most of the uses of _THIS_IP_ seem to
be either discarded or passed to a function*, and the location of a
function call, unlike _THIS_IP_ is very well defined.

In that case, the use of this mechanism is completely pointless and
ought to be replaced with _RET_IP_.  It seems like most invocations of
_THIS_IP_ can be trivially replaced with _RET_IP_ inside the function,
which would also reduce the footprint of the function call, for example:

__trace_puts() is only ever called with _THIS_IP_ as the first argument;
drop that argument and use _RET_IP_ inside the function (also,
__trace_puts() only ever uses strlen() as the third argument, which gcc
can of course optimize into a constant for the case of a consta t
string, but *is that optimization actually worth it*?  In the case of
__trace_puts(), a variable strlen() would only ever need to be called in
the case of an allocation actually happening -- otherwise str is never
examined -- and again, it increases the *code size* of the call site.
If it was worthwhile it would make more sense to at least force this
into the rodata section with the string, something like the attached
file for an example; however, I have a hunch it doesn't matter.

I wouldn't be surprised if all or nearly all instances of _THIS_IP_ can
be completely removed.

	-hpa

[-- Attachment #2: str.c --]
[-- Type: text/x-csrc, Size: 1231 bytes --]

#include <stddef.h>
#include <string.h>

#define _RET_IP_ ((unsigned long)__builtin_return_address(0))
#define no_inline __attribute__((noinline))
#define must_inline __attribute__((always_inline)) inline

struct myputs_string {
	size_t len;
	const char *str;
};

int _myputs_struct(const struct myputs_string * const strs);
int _myputs_string(const char *str);
int __myputs(unsigned long ip, const char *str, size_t len);

int no_inline _myputs_struct(const struct myputs_string * const strs)
{
	return __myputs(_RET_IP_, strs->str, strs->len);
}

int no_inline _myputs_string(const char *str)
{
	return __myputs(_RET_IP_, str, strlen(str)+1);
}

#define myputs(s)							\
({									\
	int _rv;							\
	if (__builtin_constant_p(s) &&					\
	    __builtin_constant_p(strlen(s)) &&				\
	    strlen(s)+1 == sizeof(s)) {					\
		static const struct myputs_string _mps = {		\
			.len = sizeof(s),				\
			.str = __builtin_constant_p(s) ? s : NULL,	\
		};							\
		_rv = _myputs_struct(&_mps);				\
	} else {							\
		_rv = _myputs_string(s);				\
	}								\
	_rv;								\
})

int test1(void);
int test2(const char *strx);

int test1(void)
{
	return myputs("Foobar");
}

int test2(const char *strx)
{
	return myputs(strx);
}

		

[-- Attachment #3: Type: text/plain, Size: 169 bytes --]

_______________________________________________
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] treewide: remove current_text_addr
  2018-08-26  2:38     ` [PATCH] treewide: remove current_text_addr H. Peter Anvin
@ 2018-08-26  3:16       ` H. Peter Anvin
  2018-08-26  4:56         ` H. Peter Anvin
  0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2018-08-26  3:16 UTC (permalink / raw)
  To: Helge Deller, Nick Desaulniers, torvalds, akpm
  Cc: Nicolas Pitre, linux-mips, linux-sh, Benjamin Herrenschmidt,
	Will Deacon, Paul Mackerras, Michael Ellerman,
	James E.J. Bottomley, Geert Uytterhoeven, Catalin Marinas,
	Vasily Gorbik, Matt Turner, uclinux-h8-devel, Marc Zyngier,
	Ram Pai, linux-um, Nicholas Piggin, Andy Lutomirski,
	Shannon Nelson, tglx, Alex Bennée, Richard Henderson,
	Jiri Kosina, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 349 bytes --]

On 08/25/18 19:38, H. Peter Anvin wrote:
>
> If it was worthwhile it would make more sense to at least force this
> into the rodata section with the string, something like the attached
> file for an example; however, I have a hunch it doesn't matter.
> 

An even nuttier version which avoids the extra pointer indirection.
Read it and fear.

	-hpa


[-- Attachment #2: str.c --]
[-- Type: text/x-csrc, Size: 1427 bytes --]

#include <stddef.h>
#include <string.h>

#define _RET_IP_ ((unsigned long)__builtin_return_address(0))
#define no_inline __attribute__((noinline))
#define must_inline __attribute__((always_inline)) inline

struct myputs_string {
	unsigned short len;
	char str[0];
};

int _myputs_struct(const struct myputs_string * const strs);
int _myputs_string(const char *str);
int __myputs(unsigned long ip, const char *str, size_t len);

int no_inline _myputs_struct(const struct myputs_string * const strs)
{
	return __myputs(_RET_IP_, strs->str, strs->len);
}

int no_inline _myputs_string(const char *str)
{
	return __myputs(_RET_IP_, str, strlen(str)+1);
}

#define ifconst(x,y)	__builtin_choose_expr(__builtin_constant_p(x),(x),(y))

#define myputs(s)							\
({									\
	int _rv;							\
	if (__builtin_constant_p(s) &&					\
	    __builtin_constant_p(strlen(s)) &&				\
	    strlen(s)+1 == sizeof(s) &&					\
	    sizeof(s) <= (size_t)65535) {				\
	static const struct {						\
		struct myputs_string _mps_hdr;				\
		char _mps_str[sizeof(s)];				\
	} _mps = {							\
		._mps_hdr.len = sizeof(s),				\
		._mps_str = ifconst(s,""),				\
	};								\
		_rv = _myputs_struct(&_mps._mps_hdr);			\
	} else {							\
		_rv = _myputs_string(s);				\
	}								\
	_rv;								\
})

int test1(void);
int test2(const char *strx);

int test1(void)
{
	return myputs("Foobar");
}

int test2(const char *strx)
{
	return myputs(strx);
}

		

[-- Attachment #3: Type: text/plain, Size: 169 bytes --]

_______________________________________________
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] treewide: remove current_text_addr
  2018-08-26  3:16       ` H. Peter Anvin
@ 2018-08-26  4:56         ` H. Peter Anvin
  2018-08-26 19:30           ` H. Peter Anvin
  0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2018-08-26  4:56 UTC (permalink / raw)
  To: Helge Deller, Nick Desaulniers, torvalds, akpm
  Cc: Nicolas Pitre, linux-mips, linux-sh, Benjamin Herrenschmidt,
	Will Deacon, Paul Mackerras, Michael Ellerman,
	James E.J. Bottomley, Geert Uytterhoeven, Catalin Marinas,
	Vasily Gorbik, Matt Turner, uclinux-h8-devel, Marc Zyngier,
	Ram Pai, linux-um, Nicholas Piggin, Andy Lutomirski,
	Shannon Nelson, tglx, Alex Bennée, Richard Henderson,
	Jiri Kosina, linux-kernel

On 08/25/18 20:16, H. Peter Anvin wrote:
> On 08/25/18 19:38, H. Peter Anvin wrote:
>>
>> If it was worthwhile it would make more sense to at least force this
>> into the rodata section with the string, something like the attached
>> file for an example; however, I have a hunch it doesn't matter.
>>
> 
> An even nuttier version which avoids the extra pointer indirection.
> Read it and fear.
> 
> 	-hpa
> 

OK, so one more thing, I guess: it is necessary to suppress the tailcall
optimization for _RET_IP_ to make any sense, but that should be pretty
simple:

static inline void notailcall(void)
{
	asm volatile("");
}

	-hpa

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] treewide: remove current_text_addr
  2018-08-26  4:56         ` H. Peter Anvin
@ 2018-08-26 19:30           ` H. Peter Anvin
  2018-08-26 20:25             ` Linus Torvalds
  2018-08-26 23:20             ` H. Peter Anvin
  0 siblings, 2 replies; 13+ messages in thread
From: H. Peter Anvin @ 2018-08-26 19:30 UTC (permalink / raw)
  To: Helge Deller, Nick Desaulniers, torvalds, akpm
  Cc: Nicolas Pitre, linux-mips, linux-sh, Benjamin Herrenschmidt,
	Will Deacon, Paul Mackerras, Michael Ellerman,
	James E.J. Bottomley, Geert Uytterhoeven, Catalin Marinas,
	Vasily Gorbik, Matt Turner, uclinux-h8-devel, Marc Zyngier,
	Ram Pai, linux-um, Nicholas Piggin, Andy Lutomirski,
	Shannon Nelson, tglx, Alex Bennée, Richard Henderson,
	Jiri Kosina, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 108 bytes --]

Here is a full-blown (user space) test program demonstrating the whole
technique and how to use it.

	-hpa


[-- Attachment #2: str.c --]
[-- Type: text/x-csrc, Size: 4208 bytes --]

#include <stddef.h>
#include <string.h>

#define _RET_IP_ ((unsigned long)__builtin_return_address(0))
#define noinline __attribute__((noinline))
#define used __attribute__((used))
/* __always_inline is defined in glibc already */
#define ifconst(x,y)	__builtin_choose_expr(__builtin_constant_p(x),(x),(y))
static inline void notailcall(void)
{
	asm volatile("");
}

/* Change this to a null string to make all functions global */
#define STATIC static

struct myputs_string {
	unsigned short len;
	char str[0];
};

STATIC int _myputs_struct(const struct myputs_string * const strs);
STATIC int _myputs_string(const char *str);
STATIC int __myputs(unsigned long ip, const char *str, size_t len);

#if 1

#include <stdio.h>

STATIC void dump_caller(unsigned long where)
{
	const char *opname = NULL;
	const char *wheretoname = NULL;
	char ichar;
	unsigned long whereto = 0;

#if defined(__i386__) || defined(__x86_64__)
	char opname_buf[4];
	unsigned char opcode;
	
	where -= 5;
	opcode = *(unsigned char *)where;

	switch (opcode) {
	case 0xe8:
		opname = "call";
		whereto = where + 5 + *(signed int *)(where + 1);
		break;
	case 0xe9:
		opname = "jmp";
		whereto = where + 5 + *(signed int *)(where + 1);
		break;
	default:
		snprintf(opname_buf, sizeof opname_buf, "?%02x", opcode);
		opname = opname_buf;
		break;
	}

#elif defined(__sparc__)
	const char regtype[4] = "gilo";
	unsigned int opcode, op1, op3, ibit;
	signed int simm13, simm30;
	char opname_buf[32];
	char *p;

	where -= 8;
	
	opcode = *(signed int *)where;
	op1 = opcode >> 30;
	op3 = (opcode >> 19) & 0x3f;
	ibit = (opcode >> 13) & 1;
	simm13 = (opcode & 0x1fff) << 2;
	simm30 = (opcode & 0x3fffffff) << 2;

	opname = opname_buf;
	
	if (op1 == 1) {
		opname = "call";
		whereto = where + simm30;
	} else if (op1 == 2 && op3 == 0x38) {
		if (ibit) {
			snprintf(opname_buf, sizeof opname_buf,
				 "jmpl %%%c%u %c 0x%x",
				 regtype[(opcode >> 17) & 3],
				 (opcode >> 14) & 7,
				 simm13 < 0 ? '-' : '+',
				 abs(simm13));
		} else {
			snprintf(opname_buf, sizeof opname_buf,
				 "jmpl %%%c%u + %%%c%u",
				 regtype[(opcode >> 17) & 3],
				 (opcode >> 14) & 7,
				 regtype[(opcode >> 3) & 3],
				 opcode & 7);
		}
	} else {
		snprintf(opname_buf, sizeof opname_buf,
			 "?0x08x", opcode);
	}
#else
	/* Unknown architecture */
#endif
	if (whereto == (unsigned long)_myputs_struct) {
		wheretoname = "_myputs_struct";
	} else if (whereto == (unsigned long)_myputs_string) {
		wheretoname = "_myputs_string";
	} else {
		wheretoname = "?";
	}

	ichar = '[';
	
	if (opname) {
		printf("%c%p: %s",
		       ichar, (void *)where, opname);
		ichar = ' ';
	}
	if (whereto) {
		printf("%c%p <%s>", ichar, (void *)whereto, wheretoname);
		ichar = ' ';
	}
	if (ichar != '[')
		putchar(']');
}
	
STATIC int __myputs(unsigned long where, const char *str, size_t len)
{
	size_t slen = strlen(str);
	size_t rv;
	
	len--;
	rv = printf("%p: \"%.*s\"%*s", (void *)where, (int)len, str,
		    16-(int)slen, "");
	dump_caller(where);
	if (slen != len)
		printf(" <err: strlen = %zu, len = %zu>\n", slen, len);
	else
		printf(" <ok: len = %zu>\n", len);
	
	return rv;
}

STATIC int noinline _myputs_struct(const struct myputs_string * const strs)
{
	return __myputs(_RET_IP_, strs->str, strs->len);
}

STATIC int noinline _myputs_string(const char *str)
{
	return __myputs(_RET_IP_, str, strlen(str)+1);
}
#endif

#define myputs(s)							\
({									\
	int _rv;							\
	if (__builtin_constant_p(s) &&					\
	    __builtin_constant_p(strlen(s)) &&				\
	    strlen(s)+1 == sizeof(s) &&					\
	    sizeof(s) <= (size_t)65535) {				\
	static const struct {						\
		struct myputs_string _mps_hdr;				\
		char _mps_str[sizeof(s)];				\
	} _mps = {							\
		._mps_hdr.len = sizeof(s),				\
		._mps_str = ifconst(s,""),				\
	};								\
		_rv = _myputs_struct(&_mps._mps_hdr);			\
	} else {							\
		_rv = _myputs_string(s);				\
	}								\
	notailcall();							\
	_rv;								\
})

STATIC int test1(void);
STATIC int test2(const char *strx);

STATIC int test1(void)
{
	return myputs("Foobar");
}

STATIC int test2(const char *strx)
{
	return myputs(strx);
}

int main(int argc, char *argv[])
{
	(void)argc;

	test1();
	test2(argv[0]);
	return 0;
}

[-- Attachment #3: Type: text/plain, Size: 169 bytes --]

_______________________________________________
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] treewide: remove current_text_addr
  2018-08-26 19:30           ` H. Peter Anvin
@ 2018-08-26 20:25             ` Linus Torvalds
  2018-08-27  2:52               ` Nick Desaulniers
  2018-08-27  7:43               ` Nicholas Piggin
  2018-08-26 23:20             ` H. Peter Anvin
  1 sibling, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2018-08-26 20:25 UTC (permalink / raw)
  To: Peter Anvin, Peter Zijlstra
  Cc: Nicolas Pitre, linux-mips, Linux-sh list, Benjamin Herrenschmidt,
	Will Deacon, Paul Mackerras, Michael Ellerman,
	James E.J. Bottomley, Geert Uytterhoeven, Catalin Marinas, gor,
	Matt Turner, moderated list:H8/300 ARCHITECTURE, Marc Zyngier,
	Ram Pai, linux-um, Nick Piggin, Andrew Lutomirski,
	shannon.nelson, Thomas Gleixner, alex.bennee, Richard Henderson,
	Jiri Kosina

On Sun, Aug 26, 2018 at 12:32 PM H. Peter Anvin <hpa@zytor.com> wrote:
>
> Here is a full-blown (user space) test program demonstrating the whole
> technique and how to use it.

So while I agree that some _THIS_IP_ users might be better off being
converted to __builtin_return_address(0) at the caller, I also think
that the whole "notailcall" thing shows why that can easily be more
problematic than just our currnet _THIS_IP_ solution.

Honestly, I'd suggest:

 - just do the current_text_addr() to _THIS_IP_ conversion

 - keep _THIS_IP_ and make it be the generic one, and screw the whole
"some architectures might implement is better" issue. Nobody cares.

 - try to convince people to move away from the "we want the kernel
instruction pointer for the call" model entirely, and consider this a
"legacy" issue.

The whole instruction pointer is a nasty thing. We should discourage
it and not make complex infrastructure for it.

Instead, maybe we could encourage something like

  struct kernel_loc { const char *file; const char *fn; int line; };

  #define __GEN_LOC__(n) \
        ({ static const struct kernel_loc n = { \
                __FILE__, __FUNCTION__, __LINE__  \
           }; &n; })

  #define _THIS_LOC_ __GEN_LOC__(__UNIQUE_ID(loc))

which is a hell of a lot nicer to use, and actually allows gcc to
optimize things (try it: if you pass a _THIS_LOC_ off to an inline
function, and that inline function uses the name and line number, gcc
will pick them up directly, without the extra structure dereference.

Wouldn't it be much nicer to pass these kinds of "location pointer"
around, rather than the nasty _THIS_IP_ thing?

Certainly lockdep looks like it could easily take that "const struct
kernel_loc *" instead of "unsigned long ip". Makes it easy to print
out the lockdep info.

Ok, I didn't try to convert anybody, so maybe people who currently use
_THIS_IP_ or current_text_addr() have some fundamental reason why they
want just that, but let's not male _THIS_IP_ more complex than it
needs to be.

Hmm?

             Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] treewide: remove current_text_addr
  2018-08-26 19:30           ` H. Peter Anvin
  2018-08-26 20:25             ` Linus Torvalds
@ 2018-08-26 23:20             ` H. Peter Anvin
  1 sibling, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2018-08-26 23:20 UTC (permalink / raw)
  To: Helge Deller, Nick Desaulniers, torvalds, akpm
  Cc: Nicolas Pitre, linux-mips, linux-sh, Benjamin Herrenschmidt,
	Will Deacon, Paul Mackerras, Michael Ellerman,
	James E.J. Bottomley, Geert Uytterhoeven, Catalin Marinas,
	Vasily Gorbik, Matt Turner, uclinux-h8-devel, Marc Zyngier,
	Ram Pai, linux-um, Nicholas Piggin, Andy Lutomirski,
	Shannon Nelson, tglx, Alex Bennée, Richard Henderson,
	Jiri Kosina, linux-kernel

On 08/26/18 12:30, H. Peter Anvin wrote:
> Here is a full-blown (user space) test program demonstrating the whole
> technique and how to use it.
> 
> 	-hpa

Incidentally, it looks like _RET_IP_ really should be defined as:


/*
 * Is there any reason whatsoever to have _RET_IP_ an unsigned int
 * rather than a pointer throughout?
 */

#define _RET_IP_PTR_ \
	__builtin_extract_return_addr(__builtin_return_addr(0))
#define _RET_IP_ ((unsigned long)_RET_IP_PTR_)

On some architectures __builtin_extract_return_addr() is apparently
necessary; its a nop on x86.  Why that isn't part of
__builtin_return_addr() one can really wonder.

So, checking into all of this, the generic _THIS_IP_ DOES NOT WORK on
x86.  I have tried a tons of variants, including adding various asm
volatile(...) instructions, and no matter what I do, it will always
return the address of the surrounding function rather than any kind of
local IP.  The only way to get a localized address seems to be in
assembly, but even so, there is absolutely no guarantee that the value
of _THIS_IP_ has anything to do with where the code is otherwise
localized in the function.

>From examining the output of gcc, the fundamental problem seems to be
that *no matter what* one do with the label, unless gcc actually
produces a computed goto somewhere in the code, that it can't remove
with dead-code elimination or constant propagation, it will arbitrarily
hoist the labels all the way to the beginning of the function. Given
that, I suspect that other versions of gcc might have similar problems.

This is the closest thing to arch-neutral I have been able to find that
also works on x86, while not at the same time horribly polluting the
namespace:

#define __here(n) ___here_ ## n
#define __hereasm(n) ".L___here_" #n
#define _THIS_IP_CTR_(n)					\
	({							\
		extern const char __here(n) asm(__hereasm(n));	\
		asm volatile(__hereasm(n) ": /* _THIS_IP_ */"); \
		(unsigned long)&__here(n);			\
	})
#define _THIS_IP_ _THIS_IP_CTR_(__COUNTER__)

The use of asm volatile() to define a label means that the position in
the instruction stream is at least reasonably well-defined.

	-hpa

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] treewide: remove current_text_addr
  2018-08-26 20:25             ` Linus Torvalds
@ 2018-08-27  2:52               ` Nick Desaulniers
  2018-08-27  7:33                 ` Peter Zijlstra
  2018-08-27  7:43               ` Nicholas Piggin
  1 sibling, 1 reply; 13+ messages in thread
From: Nick Desaulniers @ 2018-08-27  2:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: nicolas.pitre, linux-mips, linux-sh, benh, Will Deacon, paulus,
	mpe, jejb, Geert Uytterhoeven, Catalin Marinas, gor, mattst88,
	uclinux-h8-devel, Marc Zyngier, linuxram, linux-um,
	Nicholas Piggin, luto, shannon.nelson, Thomas Gleixner,
	alex.bennee, rth, jkosina, LKML, ralf, rkuo, paul.burton,
	aneesh.kumar, Greg KH, Andrew Morton, Mark Rutland

On Sun, Aug 26, 2018 at 1:25 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Honestly, I'd suggest:
>
>  - just do the current_text_addr() to _THIS_IP_ conversion
>
>  - keep _THIS_IP_ and make it be the generic one, and screw the whole
> "some architectures might implement is better" issue. Nobody cares.

And mention it to the compiler vendors as this seems like a case where
code gen can be improved.

>
>  - try to convince people to move away from the "we want the kernel
> instruction pointer for the call" model entirely, and consider this a
> "legacy" issue.
>
> The whole instruction pointer is a nasty thing. We should discourage
> it and not make complex infrastructure for it.

Yes, please.  I think we should strive for simplicity here.

>
> Instead, maybe we could encourage something like
>
>   struct kernel_loc { const char *file; const char *fn; int line; };
>
>   #define __GEN_LOC__(n) \
>         ({ static const struct kernel_loc n = { \
>                 __FILE__, __FUNCTION__, __LINE__  \
>            }; &n; })
>
>   #define _THIS_LOC_ __GEN_LOC__(__UNIQUE_ID(loc))
>
> which is a hell of a lot nicer to use, and actually allows gcc to
> optimize things (try it: if you pass a _THIS_LOC_ off to an inline
> function, and that inline function uses the name and line number, gcc
> will pick them up directly, without the extra structure dereference.
>
> Wouldn't it be much nicer to pass these kinds of "location pointer"
> around, rather than the nasty _THIS_IP_ thing?
>
> Certainly lockdep looks like it could easily take that "const struct
> kernel_loc *" instead of "unsigned long ip". Makes it easy to print
> out the lockdep info.
>
> Ok, I didn't try to convert anybody, so maybe people who currently use
> _THIS_IP_ or current_text_addr() have some fundamental reason why they
> want just that, but let's not male _THIS_IP_ more complex than it
> needs to be.
>
> Hmm?
>
>              Linus

This is extremely reasonable.  I can follow up with the lockdep folks
to see if they really need _THIS_IP_ to solve their problem, or if
there's a simpler solution that can solve their needs.  Sometimes
taking a step back and asking for clarity around the big picture
allows simpler solutions to shake out.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] treewide: remove current_text_addr
  2018-08-27  2:52               ` Nick Desaulniers
@ 2018-08-27  7:33                 ` Peter Zijlstra
  2018-08-27 12:26                   ` H. Peter Anvin
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2018-08-27  7:33 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: nicolas.pitre, linux-mips, linux-sh, benh, Will Deacon, paulus,
	mpe, jejb, Geert Uytterhoeven, Catalin Marinas, gor, mattst88,
	uclinux-h8-devel, Marc Zyngier, linuxram, linux-um,
	Nicholas Piggin, luto, shannon.nelson, Thomas Gleixner,
	alex.bennee, rth, jkosina, LKML, ralf, rkuo, paul.burton,
	aneesh.kumar, Greg KH, Andrew Morton, Linus Torvalds

On Sun, Aug 26, 2018 at 07:52:59PM -0700, Nick Desaulniers wrote:
> On Sun, Aug 26, 2018 at 1:25 PM Linus Torvalds

> > Instead, maybe we could encourage something like
> >
> >   struct kernel_loc { const char *file; const char *fn; int line; };
> >
> >   #define __GEN_LOC__(n) \
> >         ({ static const struct kernel_loc n = { \
> >                 __FILE__, __FUNCTION__, __LINE__  \
> >            }; &n; })
> >
> >   #define _THIS_LOC_ __GEN_LOC__(__UNIQUE_ID(loc))
> >
> > which is a hell of a lot nicer to use, and actually allows gcc to
> > optimize things (try it: if you pass a _THIS_LOC_ off to an inline
> > function, and that inline function uses the name and line number, gcc
> > will pick them up directly, without the extra structure dereference.
> >
> > Wouldn't it be much nicer to pass these kinds of "location pointer"
> > around, rather than the nasty _THIS_IP_ thing?
> >
> > Certainly lockdep looks like it could easily take that "const struct
> > kernel_loc *" instead of "unsigned long ip". Makes it easy to print
> > out the lockdep info.

> This is extremely reasonable.  I can follow up with the lockdep folks
> to see if they really need _THIS_IP_ to solve their problem, or if
> there's a simpler solution that can solve their needs.  Sometimes
> taking a step back and asking for clarity around the big picture
> allows simpler solutions to shake out.

What problem are we trying to solve? _THIS_IP_ and _RET_IP_ work fine.
We're 'good' at dealing with text addresses, we use them for call stacks
and all sorts. Why does this need changing?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] treewide: remove current_text_addr
  2018-08-26 20:25             ` Linus Torvalds
  2018-08-27  2:52               ` Nick Desaulniers
@ 2018-08-27  7:43               ` Nicholas Piggin
  1 sibling, 0 replies; 13+ messages in thread
From: Nicholas Piggin @ 2018-08-27  7:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Anvin, Helge Deller, Nick Desaulniers, linux-alpha,
	Linux Kernel Mailing List, arcml, linux-arm-kernel,
	linux-c6x-dev, moderated list:H8/300 ARCHITECTURE, linux-hexagon,
	linux-ia64, Linux/m68k, linux-mips,
	moderated list:NIOS2 ARCHITECTURE, openrisc, Parisc List,
	ppc-dev

[ Trimmed the cc list because my SMTP didn't accept that many
addresses. ]

On Sun, 26 Aug 2018 13:25:14 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sun, Aug 26, 2018 at 12:32 PM H. Peter Anvin <hpa@zytor.com> wrote:
> >
> > Here is a full-blown (user space) test program demonstrating the whole
> > technique and how to use it.  
> 
> So while I agree that some _THIS_IP_ users might be better off being
> converted to __builtin_return_address(0) at the caller, I also think
> that the whole "notailcall" thing shows why that can easily be more
> problematic than just our currnet _THIS_IP_ solution.
> 
> Honestly, I'd suggest:
> 
>  - just do the current_text_addr() to _THIS_IP_ conversion
> 
>  - keep _THIS_IP_ and make it be the generic one, and screw the whole
> "some architectures might implement is better" issue. Nobody cares.
> 
>  - try to convince people to move away from the "we want the kernel
> instruction pointer for the call" model entirely, and consider this a
> "legacy" issue.
> 
> The whole instruction pointer is a nasty thing. We should discourage
> it and not make complex infrastructure for it.
> 
> Instead, maybe we could encourage something like
> 
>   struct kernel_loc { const char *file; const char *fn; int line; };
> 
>   #define __GEN_LOC__(n) \
>         ({ static const struct kernel_loc n = { \
>                 __FILE__, __FUNCTION__, __LINE__  \
>            }; &n; })
> 
>   #define _THIS_LOC_ __GEN_LOC__(__UNIQUE_ID(loc))
> 
> which is a hell of a lot nicer to use, and actually allows gcc to
> optimize things (try it: if you pass a _THIS_LOC_ off to an inline
> function, and that inline function uses the name and line number, gcc
> will pick them up directly, without the extra structure dereference.
> 
> Wouldn't it be much nicer to pass these kinds of "location pointer"
> around, rather than the nasty _THIS_IP_ thing?

Seems nice. Do you even need this unique ID thing? AFAIKS the name
would never really be useful.

It could perhaps go into a cold data section too, I assume the common
case is that you do not access it. Although gcc will end up putting
the file and function names into regular rodata.

Possibly we could add a printk specifier for it, pass it through to
existing BUG, etc macros that want exactly this, etc. Makes a lot of
sense.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] treewide: remove current_text_addr
  2018-08-27  7:33                 ` Peter Zijlstra
@ 2018-08-27 12:26                   ` H. Peter Anvin
  2018-08-27 13:11                     ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2018-08-27 12:26 UTC (permalink / raw)
  To: Peter Zijlstra, Nick Desaulniers
  Cc: nicolas.pitre, linux-mips, linux-sh, benh, Will Deacon, paulus,
	mpe, jejb, Geert Uytterhoeven, Catalin Marinas, gor, mattst88,
	uclinux-h8-devel, Marc Zyngier, linuxram, linux-um,
	Nicholas Piggin, luto, shannon.nelson, Thomas Gleixner,
	alex.bennee, rth, jkosina, LKML, ralf, rkuo, paul.burton,
	aneesh.kumar, Greg KH, Andrew Morton, Linus Torvalds

On 08/27/18 00:33, Peter Zijlstra wrote:
> 
> What problem are we trying to solve? _THIS_IP_ and _RET_IP_ work fine.
> We're 'good' at dealing with text addresses, we use them for call stacks
> and all sorts. Why does this need changing?
> 

_RET_IP_ works fine, with the following two caveats:

1. To get a unique IP for each call site, the function call needs to be
   tailcall protected (easily done by wrapping the function in an
  __always_inline function with the notailcall() function I described
  earlier.  Alternatively, a generic macro wrapper for the same thing:

  #define notailcall(x) ({ typeof(x) _x = (x); asm volatile("");  _x; })

2. To uniformly get the return IP, it needs to be defined as:

#define _RET_IP_((unsigned long) \
__builtin_extract_return_addr(__builtin_return_address(0)))

[sorry for the line wrapping]

Using the type unsigned long instead of void * seems kind of pointless
though.


_THIS_IP_, however, is completely ill-defined, other than being an
address *somewhere* in the same global function (not even necessarily
the same function if the function is static!)  As my experiment show, in
many (nearly) cases gcc will hoist the address all the way to the top of
the function, at least for the current generic implementation.

For the case where _THIS_IP_ is passed to an out-of-line function in all
cases, it is extra pointless because all it does is increase the
footprint of every caller: _RET_IP_ is inherently passed to the function
anyway, and with tailcall protection it will uniquely identify a callsite.

For the case where _THIS_IP_ is used inline, I believe the version I
described will at the very least avoid hoisting around volatile accesses
like READ_ONCE(). Surrounding the marked code with asm volatile("");
[which should be turned into a macro or inline, obviously] might be
necessary for it to make any kind of inherent sense.

The proposed "location identifier" does have a serious problem: with
inline functions you might very well have a bunch of duplicates pointing
into the inline function, so a single callsite isn't identifiable.

	-hpa

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] treewide: remove current_text_addr
  2018-08-27 12:26                   ` H. Peter Anvin
@ 2018-08-27 13:11                     ` Peter Zijlstra
  2018-08-27 13:33                       ` H. Peter Anvin
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2018-08-27 13:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: nicolas.pitre, linux-mips, linux-sh, benh, Will Deacon, paulus,
	mpe, jejb, Geert Uytterhoeven, Catalin Marinas, gor, mattst88,
	uclinux-h8-devel, Marc Zyngier, linuxram, linux-um,
	Nicholas Piggin, luto, shannon.nelson, Thomas Gleixner,
	alex.bennee, rth, jkosina, Nick Desaulniers, LKML, ralf, rkuo,
	paul.burton, aneesh.kumar, Greg KH, Andrew Morton

On Mon, Aug 27, 2018 at 05:26:53AM -0700, H. Peter Anvin wrote:

> _THIS_IP_, however, is completely ill-defined, other than being an
> address *somewhere* in the same global function (not even necessarily
> the same function if the function is static!)  As my experiment show, in
> many (nearly) cases gcc will hoist the address all the way to the top of
> the function, at least for the current generic implementation.

It seems to have mostly worked so far... did anything change?

> For the case where _THIS_IP_ is passed to an out-of-line function in all
> cases, it is extra pointless because all it does is increase the
> footprint of every caller: _RET_IP_ is inherently passed to the function
> anyway, and with tailcall protection it will uniquely identify a callsite.

So I think we can convert many of the lockdep _THIS_IP_ calls to
_RET_IP_ on the other side, with a wee bit of care.

A little something like so perhaps...

---

 drivers/md/bcache/btree.c    |  2 +-
 fs/jbd2/transaction.c        |  6 +++---
 fs/super.c                   |  4 ++--
 include/linux/fs.h           |  4 ++--
 include/linux/jbd2.h         |  4 ++--
 include/linux/lockdep.h      | 21 ++++++++++-----------
 include/linux/percpu-rwsem.h | 22 ++++++++++------------
 include/linux/rcupdate.h     |  8 ++++----
 include/linux/ww_mutex.h     |  2 +-
 kernel/locking/lockdep.c     | 14 ++++++++------
 kernel/printk/printk.c       | 14 +++++++-------
 kernel/sched/core.c          |  4 ++--
 lib/locking-selftest.c       | 32 ++++++++++++++++----------------
 13 files changed, 68 insertions(+), 69 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index c19f7716df88..21ede9b317de 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -940,7 +940,7 @@ static struct btree *mca_alloc(struct cache_set *c, struct btree_op *op,
 	hlist_del_init_rcu(&b->hash);
 	hlist_add_head_rcu(&b->hash, mca_hash(c, k));
 
-	lock_set_subclass(&b->lock.dep_map, level + 1, _THIS_IP_);
+	lock_set_subclass(&b->lock.dep_map, level + 1);
 	b->parent	= (void *) ~0UL;
 	b->flags	= 0;
 	b->written	= 0;
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index c0b66a7a795b..40aa71321f8a 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -382,7 +382,7 @@ static int start_this_handle(journal_t *journal, handle_t *handle,
 	read_unlock(&journal->j_state_lock);
 	current->journal_info = handle;
 
-	rwsem_acquire_read(&journal->j_trans_commit_map, 0, 0, _THIS_IP_);
+	rwsem_acquire_read(&journal->j_trans_commit_map, 0, 0);
 	jbd2_journal_free_transaction(new_transaction);
 	/*
 	 * Ensure that no allocations done while the transaction is open are
@@ -677,7 +677,7 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask)
 	if (need_to_start)
 		jbd2_log_start_commit(journal, tid);
 
-	rwsem_release(&journal->j_trans_commit_map, 1, _THIS_IP_);
+	rwsem_release(&journal->j_trans_commit_map, 1);
 	handle->h_buffer_credits = nblocks;
 	/*
 	 * Restore the original nofs context because the journal restart
@@ -1771,7 +1771,7 @@ int jbd2_journal_stop(handle_t *handle)
 			wake_up(&journal->j_wait_transaction_locked);
 	}
 
-	rwsem_release(&journal->j_trans_commit_map, 1, _THIS_IP_);
+	rwsem_release(&journal->j_trans_commit_map, 1);
 
 	if (wait_for_commit)
 		err = jbd2_log_wait_commit(journal, tid);
diff --git a/fs/super.c b/fs/super.c
index 50728d9c1a05..ec650a558f09 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1431,7 +1431,7 @@ static void lockdep_sb_freeze_release(struct super_block *sb)
 	int level;
 
 	for (level = SB_FREEZE_LEVELS - 1; level >= 0; level--)
-		percpu_rwsem_release(sb->s_writers.rw_sem + level, 0, _THIS_IP_);
+		percpu_rwsem_release(sb->s_writers.rw_sem + level, 0);
 }
 
 /*
@@ -1442,7 +1442,7 @@ static void lockdep_sb_freeze_acquire(struct super_block *sb)
 	int level;
 
 	for (level = 0; level < SB_FREEZE_LEVELS; ++level)
-		percpu_rwsem_acquire(sb->s_writers.rw_sem + level, 0, _THIS_IP_);
+		percpu_rwsem_acquire(sb->s_writers.rw_sem + level, 0);
 }
 
 static void sb_freeze_unlock(struct super_block *sb)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1ec33fd0423f..2ba14e5362e4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1505,9 +1505,9 @@ void __sb_end_write(struct super_block *sb, int level);
 int __sb_start_write(struct super_block *sb, int level, bool wait);
 
 #define __sb_writers_acquired(sb, lev)	\
-	percpu_rwsem_acquire(&(sb)->s_writers.rw_sem[(lev)-1], 1, _THIS_IP_)
+	percpu_rwsem_acquire(&(sb)->s_writers.rw_sem[(lev)-1], 1)
 #define __sb_writers_release(sb, lev)	\
-	percpu_rwsem_release(&(sb)->s_writers.rw_sem[(lev)-1], 1, _THIS_IP_)
+	percpu_rwsem_release(&(sb)->s_writers.rw_sem[(lev)-1], 1)
 
 /**
  * sb_end_write - drop write access to a superblock
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index b708e5169d1d..7c31176ec8ae 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1155,8 +1155,8 @@ struct journal_s
 
 #define jbd2_might_wait_for_commit(j) \
 	do { \
-		rwsem_acquire(&j->j_trans_commit_map, 0, 0, _THIS_IP_); \
-		rwsem_release(&j->j_trans_commit_map, 1, _THIS_IP_); \
+		rwsem_acquire(&j->j_trans_commit_map, 0, 0); \
+		rwsem_release(&j->j_trans_commit_map, 1); \
 	} while (0)
 
 /* journal feature predicate functions */
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6fc77d4dbdcd..ed3daf41ae7b 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -348,16 +348,15 @@ static inline int lock_is_held(const struct lockdep_map *lock)
 #define lockdep_is_held_type(lock, r)	lock_is_held_type(&(lock)->dep_map, (r))
 
 extern void lock_set_class(struct lockdep_map *lock, const char *name,
-			   struct lock_class_key *key, unsigned int subclass,
-			   unsigned long ip);
+			   struct lock_class_key *key, unsigned int subclass);
 
-static inline void lock_set_subclass(struct lockdep_map *lock,
-		unsigned int subclass, unsigned long ip)
+static __always_inline void
+lock_set_subclass(struct lockdep_map *lock, unsigned int subclass)
 {
-	lock_set_class(lock, lock->name, lock->key, subclass, ip);
+	lock_set_class(lock, lock->name, lock->key, subclass);
 }
 
-extern void lock_downgrade(struct lockdep_map *lock, unsigned long ip);
+extern void lock_downgrade(struct lockdep_map *lock);
 
 struct pin_cookie { unsigned int val; };
 
@@ -401,11 +400,11 @@ static inline void lockdep_on(void)
 {
 }
 
-# define lock_acquire(l, s, t, r, c, n, i)	do { } while (0)
-# define lock_release(l, n, i)			do { } while (0)
-# define lock_downgrade(l, i)			do { } while (0)
-# define lock_set_class(l, n, k, s, i)		do { } while (0)
-# define lock_set_subclass(l, s, i)		do { } while (0)
+# define lock_acquire(l, s, t, r, c, n)		do { } while (0)
+# define lock_release(l, n)			do { } while (0)
+# define lock_downgrade(l)			do { } while (0)
+# define lock_set_class(l, n, k, s)		do { } while (0)
+# define lock_set_subclass(l, s)		do { } while (0)
 # define lockdep_info()				do { } while (0)
 # define lockdep_init_map(lock, name, key, sub) \
 		do { (void)(name); (void)(key); } while (0)
diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index 79b99d653e03..4ebf14e99034 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -29,11 +29,11 @@ static struct percpu_rw_semaphore name = {				\
 extern int __percpu_down_read(struct percpu_rw_semaphore *, int);
 extern void __percpu_up_read(struct percpu_rw_semaphore *);
 
-static inline void percpu_down_read_preempt_disable(struct percpu_rw_semaphore *sem)
+static __always_inline void percpu_down_read_preempt_disable(struct percpu_rw_semaphore *sem)
 {
 	might_sleep();
 
-	rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 0, _RET_IP_);
+	rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 0);
 
 	preempt_disable();
 	/*
@@ -60,7 +60,7 @@ static inline void percpu_down_read(struct percpu_rw_semaphore *sem)
 	preempt_enable();
 }
 
-static inline int percpu_down_read_trylock(struct percpu_rw_semaphore *sem)
+static __always_inline int percpu_down_read_trylock(struct percpu_rw_semaphore *sem)
 {
 	int ret = 1;
 
@@ -78,12 +78,12 @@ static inline int percpu_down_read_trylock(struct percpu_rw_semaphore *sem)
 	 */
 
 	if (ret)
-		rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 1, _RET_IP_);
+		rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 1);
 
 	return ret;
 }
 
-static inline void percpu_up_read_preempt_enable(struct percpu_rw_semaphore *sem)
+static __always_inline void percpu_up_read_preempt_enable(struct percpu_rw_semaphore *sem)
 {
 	/*
 	 * The barrier() prevents the compiler from
@@ -99,7 +99,7 @@ static inline void percpu_up_read_preempt_enable(struct percpu_rw_semaphore *sem
 		__percpu_up_read(sem); /* Unconditional memory barrier */
 	preempt_enable();
 
-	rwsem_release(&sem->rw_sem.dep_map, 1, _RET_IP_);
+	rwsem_release(&sem->rw_sem.dep_map, 1);
 }
 
 static inline void percpu_up_read(struct percpu_rw_semaphore *sem)
@@ -127,20 +127,18 @@ extern void percpu_free_rwsem(struct percpu_rw_semaphore *);
 #define percpu_rwsem_assert_held(sem)				\
 	lockdep_assert_held(&(sem)->rw_sem)
 
-static inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem,
-					bool read, unsigned long ip)
+static __always_inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem, bool read)
 {
-	lock_release(&sem->rw_sem.dep_map, 1, ip);
+	lock_release(&sem->rw_sem.dep_map, 1);
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 	if (!read)
 		sem->rw_sem.owner = RWSEM_OWNER_UNKNOWN;
 #endif
 }
 
-static inline void percpu_rwsem_acquire(struct percpu_rw_semaphore *sem,
-					bool read, unsigned long ip)
+static __always_inline void percpu_rwsem_acquire(struct percpu_rw_semaphore *sem, bool read)
 {
-	lock_acquire(&sem->rw_sem.dep_map, 0, 1, read, 1, NULL, ip);
+	lock_acquire(&sem->rw_sem.dep_map, 0, 1, read, 1, NULL);
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 	if (!read)
 		sem->rw_sem.owner = current;
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 75e5b393cf44..6c1a35555e9d 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -239,14 +239,14 @@ static inline bool rcu_lockdep_current_cpu_online(void) { return true; }
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 
-static inline void rcu_lock_acquire(struct lockdep_map *map)
+static __always_inline void rcu_lock_acquire(struct lockdep_map *map)
 {
-	lock_acquire(map, 0, 0, 2, 0, NULL, _THIS_IP_);
+	lock_acquire(map, 0, 0, 2, 0, NULL);
 }
 
-static inline void rcu_lock_release(struct lockdep_map *map)
+static __always_inline void rcu_lock_release(struct lockdep_map *map)
 {
-	lock_release(map, 1, _THIS_IP_);
+	lock_release(map, 1);
 }
 
 extern struct lockdep_map rcu_lock_map;
diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index 3af7c0e03be5..524aa28eef33 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -182,7 +182,7 @@ static inline void ww_acquire_done(struct ww_acquire_ctx *ctx)
 static inline void ww_acquire_fini(struct ww_acquire_ctx *ctx)
 {
 #ifdef CONFIG_DEBUG_MUTEXES
-	mutex_release(&ctx->dep_map, 0, _THIS_IP_);
+	mutex_release(&ctx->dep_map, 0);
 
 	DEBUG_LOCKS_WARN_ON(ctx->acquired);
 	if (!IS_ENABLED(CONFIG_PROVE_LOCKING))
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 5fa4d3138bf1..0b7c4f94a7a3 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3868,9 +3868,9 @@ static void check_flags(unsigned long flags)
 }
 
 void lock_set_class(struct lockdep_map *lock, const char *name,
-		    struct lock_class_key *key, unsigned int subclass,
-		    unsigned long ip)
+		    struct lock_class_key *key, unsigned int subclass)
 {
+	unsigned long ip = _RET_IP_;
 	unsigned long flags;
 
 	if (unlikely(current->lockdep_recursion))
@@ -3886,8 +3886,9 @@ void lock_set_class(struct lockdep_map *lock, const char *name,
 }
 EXPORT_SYMBOL_GPL(lock_set_class);
 
-void lock_downgrade(struct lockdep_map *lock, unsigned long ip)
+void lock_downgrade(struct lockdep_map *lock)
 {
+	unsigned long ip = _RET_IP_;
 	unsigned long flags;
 
 	if (unlikely(current->lockdep_recursion))
@@ -3909,8 +3910,9 @@ EXPORT_SYMBOL_GPL(lock_downgrade);
  */
 void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 			  int trylock, int read, int check,
-			  struct lockdep_map *nest_lock, unsigned long ip)
+			  struct lockdep_map *nest_lock)
 {
+	unsigned long ip = _RET_IP_;
 	unsigned long flags;
 
 	if (unlikely(current->lockdep_recursion))
@@ -3928,9 +3930,9 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 }
 EXPORT_SYMBOL_GPL(lock_acquire);
 
-void lock_release(struct lockdep_map *lock, int nested,
-			  unsigned long ip)
+void lock_release(struct lockdep_map *lock, int nested)
 {
+	unsigned long ip = _RET_IP_;
 	unsigned long flags;
 
 	if (unlikely(current->lockdep_recursion))
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 90b6ab01db59..9c8654be08bb 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1583,7 +1583,7 @@ static void console_lock_spinning_enable(void)
 	raw_spin_unlock(&console_owner_lock);
 
 	/* The waiter may spin on us after setting console_owner */
-	spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+	spin_acquire(&console_owner_dep_map, 0, 0);
 }
 
 /**
@@ -1611,20 +1611,20 @@ static int console_lock_spinning_disable_and_check(void)
 	raw_spin_unlock(&console_owner_lock);
 
 	if (!waiter) {
-		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+		spin_release(&console_owner_dep_map, 1);
 		return 0;
 	}
 
 	/* The waiter is now free to continue */
 	WRITE_ONCE(console_waiter, false);
 
-	spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+	spin_release(&console_owner_dep_map, 1);
 
 	/*
 	 * Hand off console_lock to waiter. The waiter will perform
 	 * the up(). After this, the waiter is the console_lock owner.
 	 */
-	mutex_release(&console_lock_dep_map, 1, _THIS_IP_);
+	mutex_release(&console_lock_dep_map, 1);
 	return 1;
 }
 
@@ -1674,11 +1674,11 @@ static int console_trylock_spinning(void)
 	}
 
 	/* We spin waiting for the owner to release us */
-	spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+	spin_acquire(&console_owner_dep_map, 0, 0);
 	/* Owner will clear console_waiter on hand off */
 	while (READ_ONCE(console_waiter))
 		cpu_relax();
-	spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+	spin_release(&console_owner_dep_map, 1);
 
 	printk_safe_exit_irqrestore(flags);
 	/*
@@ -1687,7 +1687,7 @@ static int console_trylock_spinning(void)
 	 * this as a trylock. Otherwise lockdep will
 	 * complain.
 	 */
-	mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
+	mutex_acquire(&console_lock_dep_map, 0, 1);
 
 	return 1;
 }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 454adf9f8180..a3d146cc2cb9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2557,7 +2557,7 @@ prepare_lock_switch(struct rq *rq, struct task_struct *next, struct rq_flags *rf
 	 * do an early lockdep release here:
 	 */
 	rq_unpin_lock(rq, rf);
-	spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
+	spin_release(&rq->lock.dep_map, 1);
 #ifdef CONFIG_DEBUG_SPINLOCK
 	/* this is a valid case when another task releases the spinlock */
 	rq->lock.owner = next;
@@ -2571,7 +2571,7 @@ static inline void finish_lock_switch(struct rq *rq)
 	 * fix up the runqueue lock - which gets 'carried over' from
 	 * prev into current:
 	 */
-	spin_acquire(&rq->lock.dep_map, 0, 0, _THIS_IP_);
+	spin_acquire(&rq->lock.dep_map, 0, 0);
 	raw_spin_unlock_irq(&rq->lock);
 }
 
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 1e1bbf171eca..d9599c7d0426 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -1475,7 +1475,7 @@ static void ww_test_edeadlk_normal(void)
 
 	mutex_lock(&o2.base);
 	o2.ctx = &t2;
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 
 	WWAI(&t);
 	t2 = t;
@@ -1488,7 +1488,7 @@ static void ww_test_edeadlk_normal(void)
 	WARN_ON(ret != -EDEADLK);
 
 	o2.ctx = NULL;
-	mutex_acquire(&o2.base.dep_map, 0, 1, _THIS_IP_);
+	mutex_acquire(&o2.base.dep_map, 0, 1);
 	mutex_unlock(&o2.base);
 	WWU(&o);
 
@@ -1500,7 +1500,7 @@ static void ww_test_edeadlk_normal_slow(void)
 	int ret;
 
 	mutex_lock(&o2.base);
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 	o2.ctx = &t2;
 
 	WWAI(&t);
@@ -1514,7 +1514,7 @@ static void ww_test_edeadlk_normal_slow(void)
 	WARN_ON(ret != -EDEADLK);
 
 	o2.ctx = NULL;
-	mutex_acquire(&o2.base.dep_map, 0, 1, _THIS_IP_);
+	mutex_acquire(&o2.base.dep_map, 0, 1);
 	mutex_unlock(&o2.base);
 	WWU(&o);
 
@@ -1527,7 +1527,7 @@ static void ww_test_edeadlk_no_unlock(void)
 
 	mutex_lock(&o2.base);
 	o2.ctx = &t2;
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 
 	WWAI(&t);
 	t2 = t;
@@ -1540,7 +1540,7 @@ static void ww_test_edeadlk_no_unlock(void)
 	WARN_ON(ret != -EDEADLK);
 
 	o2.ctx = NULL;
-	mutex_acquire(&o2.base.dep_map, 0, 1, _THIS_IP_);
+	mutex_acquire(&o2.base.dep_map, 0, 1);
 	mutex_unlock(&o2.base);
 
 	WWL(&o2, &t);
@@ -1551,7 +1551,7 @@ static void ww_test_edeadlk_no_unlock_slow(void)
 	int ret;
 
 	mutex_lock(&o2.base);
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 	o2.ctx = &t2;
 
 	WWAI(&t);
@@ -1565,7 +1565,7 @@ static void ww_test_edeadlk_no_unlock_slow(void)
 	WARN_ON(ret != -EDEADLK);
 
 	o2.ctx = NULL;
-	mutex_acquire(&o2.base.dep_map, 0, 1, _THIS_IP_);
+	mutex_acquire(&o2.base.dep_map, 0, 1);
 	mutex_unlock(&o2.base);
 
 	ww_mutex_lock_slow(&o2, &t);
@@ -1576,7 +1576,7 @@ static void ww_test_edeadlk_acquire_more(void)
 	int ret;
 
 	mutex_lock(&o2.base);
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 	o2.ctx = &t2;
 
 	WWAI(&t);
@@ -1597,7 +1597,7 @@ static void ww_test_edeadlk_acquire_more_slow(void)
 	int ret;
 
 	mutex_lock(&o2.base);
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 	o2.ctx = &t2;
 
 	WWAI(&t);
@@ -1618,11 +1618,11 @@ static void ww_test_edeadlk_acquire_more_edeadlk(void)
 	int ret;
 
 	mutex_lock(&o2.base);
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 	o2.ctx = &t2;
 
 	mutex_lock(&o3.base);
-	mutex_release(&o3.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o3.base.dep_map, 1);
 	o3.ctx = &t2;
 
 	WWAI(&t);
@@ -1644,11 +1644,11 @@ static void ww_test_edeadlk_acquire_more_edeadlk_slow(void)
 	int ret;
 
 	mutex_lock(&o2.base);
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 	o2.ctx = &t2;
 
 	mutex_lock(&o3.base);
-	mutex_release(&o3.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o3.base.dep_map, 1);
 	o3.ctx = &t2;
 
 	WWAI(&t);
@@ -1669,7 +1669,7 @@ static void ww_test_edeadlk_acquire_wrong(void)
 	int ret;
 
 	mutex_lock(&o2.base);
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 	o2.ctx = &t2;
 
 	WWAI(&t);
@@ -1694,7 +1694,7 @@ static void ww_test_edeadlk_acquire_wrong_slow(void)
 	int ret;
 
 	mutex_lock(&o2.base);
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 	o2.ctx = &t2;
 
 	WWAI(&t);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] treewide: remove current_text_addr
  2018-08-27 13:11                     ` Peter Zijlstra
@ 2018-08-27 13:33                       ` H. Peter Anvin
  2018-08-31 16:48                         ` Nick Desaulniers
  0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2018-08-27 13:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: nicolas.pitre, linux-mips, linux-sh, benh, Will Deacon, paulus,
	mpe, jejb, Geert Uytterhoeven, Catalin Marinas, gor, mattst88,
	uclinux-h8-devel, Marc Zyngier, linuxram, linux-um,
	Nicholas Piggin, luto, shannon.nelson, Thomas Gleixner,
	alex.bennee, rth, jkosina, Nick Desaulniers, LKML, ralf, rkuo,
	paul.burton, aneesh.kumar, Greg KH, Andrew Morton

On 08/27/18 06:11, Peter Zijlstra wrote:
> On Mon, Aug 27, 2018 at 05:26:53AM -0700, H. Peter Anvin wrote:
> 
>> _THIS_IP_, however, is completely ill-defined, other than being an
>> address *somewhere* in the same global function (not even necessarily
>> the same function if the function is static!)  As my experiment show, in
>> many (nearly) cases gcc will hoist the address all the way to the top of
>> the function, at least for the current generic implementation.
> 
> It seems to have mostly worked so far... did anything change?
> 

Most likely because the major architectures contain a arch-specific
assembly implementation.  The generic implementation used in some places
is completely broken, as my experiments show.

>> For the case where _THIS_IP_ is passed to an out-of-line function in all
>> cases, it is extra pointless because all it does is increase the
>> footprint of every caller: _RET_IP_ is inherently passed to the function
>> anyway, and with tailcall protection it will uniquely identify a callsite.
> 
> So I think we can convert many of the lockdep _THIS_IP_ calls to
> _RET_IP_ on the other side, with a wee bit of care.
> 
> A little something like so perhaps...

I don't have time to look at this right now (I'm on sabbatical, and I'm
dealing with personal legal stuff right at the moment), but I think it
is the right direction.

	-hpa

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] treewide: remove current_text_addr
  2018-08-27 13:33                       ` H. Peter Anvin
@ 2018-08-31 16:48                         ` Nick Desaulniers
  0 siblings, 0 replies; 13+ messages in thread
From: Nick Desaulniers @ 2018-08-31 16:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nicolas Pitre, linux-mips, linux-sh, benh, Will Deacon, paulus,
	mpe, jejb, Geert Uytterhoeven, Catalin Marinas, gor, Matt Turner,
	uclinux-h8-devel, Marc Zyngier, linuxram, linux-um,
	Nicholas Piggin, luto, shannon.nelson, Thomas Gleixner,
	Alex Bennée, rth, jkosina, LKML, ralf, rkuo, paul.burton,
	aneesh.kumar, Greg KH

On Mon, Aug 27, 2018 at 6:34 AM H. Peter Anvin <hpa@zytor.com> wrote:
>
> On 08/27/18 06:11, Peter Zijlstra wrote:
> > On Mon, Aug 27, 2018 at 05:26:53AM -0700, H. Peter Anvin wrote:
> >
> >> _THIS_IP_, however, is completely ill-defined, other than being an
> >> address *somewhere* in the same global function (not even necessarily
> >> the same function if the function is static!)  As my experiment show, in
> >> many (nearly) cases gcc will hoist the address all the way to the top of
> >> the function, at least for the current generic implementation.
> >
> > It seems to have mostly worked so far... did anything change?
> >
>
> Most likely because the major architectures contain a arch-specific
> assembly implementation.  The generic implementation used in some places
> is completely broken, as my experiments show.
>
> >> For the case where _THIS_IP_ is passed to an out-of-line function in all
> >> cases, it is extra pointless because all it does is increase the
> >> footprint of every caller: _RET_IP_ is inherently passed to the function
> >> anyway, and with tailcall protection it will uniquely identify a callsite.
> >
> > So I think we can convert many of the lockdep _THIS_IP_ calls to
> > _RET_IP_ on the other side, with a wee bit of care.
> >
> > A little something like so perhaps...
>
> I don't have time to look at this right now (I'm on sabbatical, and I'm
> dealing with personal legal stuff right at the moment), but I think it
> is the right direction.
>
>         -hpa

Linus,
Can this patch please be merged?  Then we can polish off Peter's
change to lockdep to not even use _THIS_IP_.

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-08-31 16:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAKwvOdkWL_2yTnJqM6n6R9UCPwY4iz-9BQYGN2MDAk9EzumUvA@mail.gmail.com>
     [not found] ` <20180821202900.208417-1-ndesaulniers@google.com>
     [not found]   ` <207784db-4fcc-85e7-a0b2-fec26b7dab81@gmx.de>
2018-08-26  2:38     ` [PATCH] treewide: remove current_text_addr H. Peter Anvin
2018-08-26  3:16       ` H. Peter Anvin
2018-08-26  4:56         ` H. Peter Anvin
2018-08-26 19:30           ` H. Peter Anvin
2018-08-26 20:25             ` Linus Torvalds
2018-08-27  2:52               ` Nick Desaulniers
2018-08-27  7:33                 ` Peter Zijlstra
2018-08-27 12:26                   ` H. Peter Anvin
2018-08-27 13:11                     ` Peter Zijlstra
2018-08-27 13:33                       ` H. Peter Anvin
2018-08-31 16:48                         ` Nick Desaulniers
2018-08-27  7:43               ` Nicholas Piggin
2018-08-26 23:20             ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).