All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] usercopy: use unsigned long instead of uintptr_t
@ 2022-06-16 14:36 Jason A. Donenfeld
  2022-06-16 14:38 ` Matthew Wilcox
  2022-06-16 16:29 ` Kees Cook
  0 siblings, 2 replies; 17+ messages in thread
From: Jason A. Donenfeld @ 2022-06-16 14:36 UTC (permalink / raw)
  To: linux-mm, linux-xfs, linux-hardening, linux-kernel
  Cc: Jason A. Donenfeld, Matthew Wilcox, Uladzislau Rezki, Kees Cook,
	Greg Kroah-Hartman, Linus Torvalds, Joe Perches

A recent commit factored out a series of annoying (unsigned long) casts
into a single variable declaration, but made the pointer type a
`uintptr_t` rather than the usual `unsigned long`. This patch changes it
to be the integer type more typically used by the kernel to represent
addresses.

Fixes: 35fb9ae4aa2e ("usercopy: Cast pointer to an integer once")
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 mm/usercopy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/usercopy.c b/mm/usercopy.c
index 4e1da708699b..c1ee15a98633 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -161,7 +161,7 @@ static inline void check_bogus_address(const unsigned long ptr, unsigned long n,
 static inline void check_heap_object(const void *ptr, unsigned long n,
 				     bool to_user)
 {
-	uintptr_t addr = (uintptr_t)ptr;
+	unsigned long addr = (unsigned long)ptr;
 	unsigned long offset;
 	struct folio *folio;
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] usercopy: use unsigned long instead of uintptr_t
  2022-06-16 14:36 [PATCH] usercopy: use unsigned long instead of uintptr_t Jason A. Donenfeld
@ 2022-06-16 14:38 ` Matthew Wilcox
  2022-06-16 14:51   ` Jason A. Donenfeld
  2022-06-16 16:29 ` Kees Cook
  1 sibling, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2022-06-16 14:38 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-mm, linux-xfs, linux-hardening, linux-kernel,
	Uladzislau Rezki, Kees Cook, Greg Kroah-Hartman, Linus Torvalds,
	Joe Perches

On Thu, Jun 16, 2022 at 04:36:17PM +0200, Jason A. Donenfeld wrote:
> A recent commit factored out a series of annoying (unsigned long) casts
> into a single variable declaration, but made the pointer type a
> `uintptr_t` rather than the usual `unsigned long`. This patch changes it
> to be the integer type more typically used by the kernel to represent
> addresses.

No.  I did this on purpose.  uintptr_t is the correct type to represent
a pointer that's being used as an integer.  This dinosaur approach of
using unsigned long has to stop.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] usercopy: use unsigned long instead of uintptr_t
  2022-06-16 14:38 ` Matthew Wilcox
@ 2022-06-16 14:51   ` Jason A. Donenfeld
  2022-06-16 15:11     ` Jason A. Donenfeld
  2022-06-16 15:21     ` Matthew Wilcox
  0 siblings, 2 replies; 17+ messages in thread
From: Jason A. Donenfeld @ 2022-06-16 14:51 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-xfs, linux-hardening, linux-kernel,
	Uladzislau Rezki, Kees Cook, Greg Kroah-Hartman, Linus Torvalds,
	Joe Perches

Hi Matthew,

On Thu, Jun 16, 2022 at 03:38:02PM +0100, Matthew Wilcox wrote:
> On Thu, Jun 16, 2022 at 04:36:17PM +0200, Jason A. Donenfeld wrote:
> > A recent commit factored out a series of annoying (unsigned long) casts
> > into a single variable declaration, but made the pointer type a
> > `uintptr_t` rather than the usual `unsigned long`. This patch changes it
> > to be the integer type more typically used by the kernel to represent
> > addresses.
> 
> No.  I did this on purpose.  uintptr_t is the correct type to represent
> a pointer that's being used as an integer.  This dinosaur approach of
> using unsigned long has to stop.

For better or for worse, I've always assumed that the kernel had its
reasons -- legitimate reasons, even -- for preferring `unsigned long` to
a userspace type like `uintptr_t`, so I've always tried to code that
way.

If that's a "dinosaur approach" that "has to stop", it'd certainly be
news to me (and I'm guessing others on the list too). I've never really
seen anybody question the kernel's `unsigned long` usage before.

So hopefully some outcome of this discussion will make it clear, and
then either this patch will go in, or I'll get to work on carefully
adjusting my code that uses `unsigned long` at the moment.

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] usercopy: use unsigned long instead of uintptr_t
  2022-06-16 14:51   ` Jason A. Donenfeld
@ 2022-06-16 15:11     ` Jason A. Donenfeld
  2022-06-16 15:21     ` Matthew Wilcox
  1 sibling, 0 replies; 17+ messages in thread
From: Jason A. Donenfeld @ 2022-06-16 15:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-xfs, linux-hardening, linux-kernel,
	Uladzislau Rezki, Kees Cook, Greg Kroah-Hartman, Linus Torvalds,
	Joe Perches

On Thu, Jun 16, 2022 at 04:51:08PM +0200, Jason A. Donenfeld wrote:
> If that's a "dinosaur approach" that "has to stop", it'd certainly be
> news to me (and I'm guessing others on the list too). I've never really
> seen anybody question the kernel's `unsigned long` usage before.
> 
> So hopefully some outcome of this discussion will make it clear, and
> then either this patch will go in, or I'll get to work on carefully
> adjusting my code that uses `unsigned long` at the moment.

Searching through list archives, there's not much, but I did find [1]
from Linus:

    PPS. And btw, the warning is unacceptable too. Cast the thing to
    "unsigned long" (or uintptr_t, but quite frankly, in the kernel I'd
    suggest "unsigned long" rather than the more obscure standard types)
    after you've fixed the macro argument problem.
 
 [1] https://lore.kernel.org/lkml/AANLkTineDxntR0ZTXdgXrc6qx6pATTORgOwFR5+w5MLN@mail.gmail.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] usercopy: use unsigned long instead of uintptr_t
  2022-06-16 14:51   ` Jason A. Donenfeld
  2022-06-16 15:11     ` Jason A. Donenfeld
@ 2022-06-16 15:21     ` Matthew Wilcox
  2022-06-16 15:59       ` Linus Torvalds
  1 sibling, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2022-06-16 15:21 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-mm, linux-xfs, linux-hardening, linux-kernel,
	Uladzislau Rezki, Kees Cook, Greg Kroah-Hartman, Linus Torvalds,
	Joe Perches

On Thu, Jun 16, 2022 at 04:51:08PM +0200, Jason A. Donenfeld wrote:
> For better or for worse, I've always assumed that the kernel had its
> reasons -- legitimate reasons, even -- for preferring `unsigned long` to
> a userspace type like `uintptr_t`, so I've always tried to code that
> way.

I don't know why people call uintptr_t a "userspace type".  It's a type
invented by C99 that is an integer type large enough to hold a pointer.
Which is exactly what we want here.

> If that's a "dinosaur approach" that "has to stop", it'd certainly be
> news to me (and I'm guessing others on the list too). I've never really
> seen anybody question the kernel's `unsigned long` usage before.

I've put in a proposal to ksummit-discuss that makes the case for using
uintptr_t where it fits our needs.  Here's a copy of it.

--- 8< ---

Zettalinux: It's Not Too Late To Start

The current trend in memory sizes lead me to believe that we'll need
128-bit pointers by 2035.  Hardware people are starting to think about it
[1a] [1b] [2].  We have a cultural problem in Linux where we believe that
all pointers (user or kernel) can be stuffed into an unsigned long and
newer C solutions (uintptr_t) are derided as "userspace namespace mess".

The only sane way to set up a C environment for a CPU with 128-bit
pointers is sizeof(void *) == 16, sizeof(short) == 2, sizeof(int) == 4,
sizeof(long) == 8, sizeof(long long) == 16.

That means that casting a pointer to a long will drop the upper 64
bits, and we'll have to use long long for the uintptr_t on 128-bit.
Fixing Linux to be 128-bit clean is going to be a big job, and I'm not
proposing to do it myself.  But we can at least start by not questioning
when people use uintptr_t inside the kernel to represent an address.

Getting the userspace API fixed is going to be the most important thing
(eg io_uring was just added and is definitely not 128-bit clean).
Fortunately, no 128-bit machines exist today, so we have a bit of time
to get the UAPI right.  But if not today, then we should start soon.

There are two purposes for this session:

 * Agree that we do need to start thinking about 128-bit architectures
   (even if they're not going to show up in our offices tomorrow)
 * Come to terms with needing to use uintptr_t instead of unsigned long

[1a] https://github.com/michaeljclark/rv8/blob/master/doc/src/rv128.md
[1b] https://github.com/riscv/riscv-opcodes/blob/master/unratified/rv128_i
[2] https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] usercopy: use unsigned long instead of uintptr_t
  2022-06-16 15:21     ` Matthew Wilcox
@ 2022-06-16 15:59       ` Linus Torvalds
  2022-06-16 16:12         ` Linus Torvalds
  2022-06-16 16:44         ` Matthew Wilcox
  0 siblings, 2 replies; 17+ messages in thread
From: Linus Torvalds @ 2022-06-16 15:59 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jason A. Donenfeld, Linux-MM, linux-xfs, linux-hardening,
	Linux Kernel Mailing List, Uladzislau Rezki, Kees Cook,
	Greg Kroah-Hartman, Joe Perches

On Thu, Jun 16, 2022 at 8:21 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> I don't know why people call uintptr_t a "userspace type".  It's a type
> invented by C99 that is an integer type large enough to hold a pointer.
> Which is exactly what we want here.

On the other hand, "unsigned long" has existed since the first version
of C, and is an integer type large enough to hold a pointer.

Which is exactly what we want here, and what we use everywhere else too.

The whole "uintptr_t handles the historical odd cases with pointers
that are smaller than a 'long'" is entirely irrelevant, since those
odd cases are just not a factor.

And the "pointers are _larger_ than a 'long'" case is similarly
irrelevant, since we very much want to use arithmetic on these things,
and they are 'long' later. They aren't used as pointers, they are used
as integer indexes into the virtual address space that we do odd
operations on.

Honestly, even if you believe in the 128-bit pointer thing, changing
one cast in one random place to be different from everything else is
*not* productive. We're never going to do some "let's slowly migrate
from one to the other".

And honestly, we're never going to do that using "uintptr_t" anyway,
since it would be based on a kernel config variable and be a very
specific type, and would need to be type-safe for any sane conversion.

IOW, in a hypothetical word where the address size is larger than the
word-size, it would have to be something like out "pte_t", which is
basically wrapped in a struct so that C implicit type conversion
doesn't bite you in the arse.

So no. There is ABSOLUTELY ZERO reason to ever use 'uintptr_t' in the
kernel. It's wrong. It's wrong *even* for actual user space interfaces
where user space might use 'uintptr_t', because those need to be
specific kernel types so that we control them (think for compat
reasons etc).

We use the user-space types in a few places, and they have caused
problems, but at least they are really traditional and the compiler
actually enforces them for some really standard functions. I'm looking
at 'size_t' in particular, which caused problems exactly because it's
a type that is strictly speaking not under our control.

'size_t' is actually a great example of why 'uintptr_t' is a horrid
thing. It's effectively a integer type that is large enough to hold a
pointer difference. On unsegmented architectures, that ends up being a
type large enough to hold a pointer.

Sound familiar?

And does it sound familiar how on some architectures it's "unsigned
int", and on others it is "unsigned long"? It's very annoying, and
it's been annoying over the years. The latest annoyance was literally
less than a week ago in 1c27f1fc1549 ("iov_iter: fix build issue due
to possible type mis-match").

Again, "unsigned long" is superior.

And the only reason to migrate away from it is because you want
something *type-safe*, which uintptr_t very much is not. As
exemplified by size_t, it's the opposite of type-safe. It's actively
likely to be type-confused.

              Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] usercopy: use unsigned long instead of uintptr_t
  2022-06-16 15:59       ` Linus Torvalds
@ 2022-06-16 16:12         ` Linus Torvalds
  2022-06-16 16:44         ` Matthew Wilcox
  1 sibling, 0 replies; 17+ messages in thread
From: Linus Torvalds @ 2022-06-16 16:12 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jason A. Donenfeld, Linux-MM, linux-xfs, linux-hardening,
	Linux Kernel Mailing List, Uladzislau Rezki, Kees Cook,
	Greg Kroah-Hartman, Joe Perches

On Thu, Jun 16, 2022 at 8:59 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So no. There is ABSOLUTELY ZERO reason to ever use 'uintptr_t' in the
> kernel. It's wrong. It's wrong *even* for actual user space interfaces
> where user space might use 'uintptr_t', because those need to be
> specific kernel types so that we control them (think for compat
> reasons etc).

Ok, so I wrote that just because that particular issue has happened
before with other types.

But then I actually grepped for uintptr_t use in the kernel.

And guess what you find when you do that?

You find

  #ifdef BINDER_IPC_32BIT
  typedef __u32 binder_size_t;
  typedef __u32 binder_uintptr_t;
  #else
  typedef __u64 binder_size_t;
  typedef __u64 binder_uintptr_t;
  #endif

exactly because user space interfaces used this broken sh*t-for-brains
traditional model that we've done over and over, and that has been a
big mistake.

We have similar mistakes in things like 'off_t', where we have a
mishmash of different versions (off_t, loff_t, __kernel_loff_t,
compat_loff_t) and several duplicate interfaces due to that.

The drm people (who end up having had more of this kind of stuff than
most) actually learnt their lesson, and made things be fixed-size.
We've done that in some other places too. It turns out that "u64" is a
fairly good type, but even *that* has caused problems, because we
really should have had a special "naturally aligned" version of it so
that you don't get the odd alignment issues (x86-32: 'u64' is 4-byte
aligned. m68k: u64 is 2-byte aligned).

So yeah. size_t and uintptr_t are both disasters in the kernel.

size_t we just have to live with. But that doesn't mean we want to
deal with uintptr_t.

Another issue is that we can't always control where user space defines
their types. Which is why we really don't want to use POSIX namespace
types in any interfaces anyway. It turns out that "u8..u64" are great
types, and adding two underscores to them for the uapi headers is
simple and straightforward enough.

Because using other types ends up being really nasty from a namespace
and "core compiler header files declare them in compiler-specific
places" etc. Sometimes they are literally hardcoded *inside* the
compiler (size_t being that kind of type).

Anyway, that's more of an explanation of why the whole "just use the
standard types" is simply NOT a good argument at all. We end up often
having to actively avoid them, and the ones we do use are very *very*
core and traditional

So the whole "just use the standard type" _sounds_ sane. But it really
isn't, and has some real issues, and we actively avoid it for good
reasons.

                 Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] usercopy: use unsigned long instead of uintptr_t
  2022-06-16 14:36 [PATCH] usercopy: use unsigned long instead of uintptr_t Jason A. Donenfeld
  2022-06-16 14:38 ` Matthew Wilcox
@ 2022-06-16 16:29 ` Kees Cook
  2022-06-16 16:36   ` Mark Brown
  1 sibling, 1 reply; 17+ messages in thread
From: Kees Cook @ 2022-06-16 16:29 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-hardening, Jason, linux-xfs
  Cc: Kees Cook, urezki, joe, gregkh, torvalds, willy

On Thu, 16 Jun 2022 16:36:17 +0200, Jason A. Donenfeld wrote:
> A recent commit factored out a series of annoying (unsigned long) casts
> into a single variable declaration, but made the pointer type a
> `uintptr_t` rather than the usual `unsigned long`. This patch changes it
> to be the integer type more typically used by the kernel to represent
> addresses.
> 
> 
> [...]

Given Linus's confirmation: applied to for-next/hardening, thanks! I
do note, however, that we have almost 1700 uses of uintptr_t in the
kernel. Perhaps we need to add a section to the CodingStyle doc?

[1/1] usercopy: use unsigned long instead of uintptr_t
      https://git.kernel.org/kees/c/e230d8275da4

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] usercopy: use unsigned long instead of uintptr_t
  2022-06-16 16:29 ` Kees Cook
@ 2022-06-16 16:36   ` Mark Brown
  0 siblings, 0 replies; 17+ messages in thread
From: Mark Brown @ 2022-06-16 16:36 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, linux-mm, linux-hardening, Jason, linux-xfs,
	urezki, joe, gregkh, torvalds, willy

[-- Attachment #1: Type: text/plain, Size: 285 bytes --]

On Thu, Jun 16, 2022 at 09:29:12AM -0700, Kees Cook wrote:

> Given Linus's confirmation: applied to for-next/hardening, thanks! I
> do note, however, that we have almost 1700 uses of uintptr_t in the
> kernel. Perhaps we need to add a section to the CodingStyle doc?

checkpatch too.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] usercopy: use unsigned long instead of uintptr_t
  2022-06-16 15:59       ` Linus Torvalds
  2022-06-16 16:12         ` Linus Torvalds
@ 2022-06-16 16:44         ` Matthew Wilcox
  2022-06-16 16:56           ` Linus Torvalds
  1 sibling, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2022-06-16 16:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason A. Donenfeld, Linux-MM, linux-xfs, linux-hardening,
	Linux Kernel Mailing List, Uladzislau Rezki, Kees Cook,
	Greg Kroah-Hartman, Joe Perches

On Thu, Jun 16, 2022 at 08:59:51AM -0700, Linus Torvalds wrote:
> On Thu, Jun 16, 2022 at 8:21 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > I don't know why people call uintptr_t a "userspace type".  It's a type
> > invented by C99 that is an integer type large enough to hold a pointer.
> > Which is exactly what we want here.
> 
> On the other hand, "unsigned long" has existed since the first version
> of C, and is an integer type large enough to hold a pointer.
> 
> Which is exactly what we want here, and what we use everywhere else too.
> 
> The whole "uintptr_t handles the historical odd cases with pointers
> that are smaller than a 'long'" is entirely irrelevant, since those
> odd cases are just not a factor.

I don't care about the odd historical cases either.

> And the "pointers are _larger_ than a 'long'" case is similarly
> irrelevant, since we very much want to use arithmetic on these things,
> and they are 'long' later. They aren't used as pointers, they are used
> as integer indexes into the virtual address space that we do odd
> operations on.
> 
> Honestly, even if you believe in the 128-bit pointer thing, changing
> one cast in one random place to be different from everything else is
> *not* productive. We're never going to do some "let's slowly migrate
> from one to the other".
> 
> And honestly, we're never going to do that using "uintptr_t" anyway,
> since it would be based on a kernel config variable and be a very
> specific type, and would need to be type-safe for any sane conversion.
> 
> IOW, in a hypothetical word where the address size is larger than the
> word-size, it would have to be something like out "pte_t", which is
> basically wrapped in a struct so that C implicit type conversion
> doesn't bite you in the arse.

I don't want to support an address space larger than word size.  I can't
imagine any CPU vendor saying "So we have these larger registers that
you can only use for pointers and then these smaller registers that you
can use for data".  We haven't had A/D register splits since the m68k.
Perhaps I haven't talked to enough crazy CPU people.  But if anyone does
propose something that bad, we should laugh at them.

So how do you think we should solve the 128-bit-word-size problem?
Leave int at 32-bit, promote long to 128-bit and get the compiler to
add __int64 to give us a 64-bit type?

> We use the user-space types in a few places, and they have caused
> problems, but at least they are really traditional and the compiler
> actually enforces them for some really standard functions. I'm looking
> at 'size_t' in particular, which caused problems exactly because it's
> a type that is strictly speaking not under our control.
> 
> 'size_t' is actually a great example of why 'uintptr_t' is a horrid
> thing. It's effectively a integer type that is large enough to hold a
> pointer difference. On unsegmented architectures, that ends up being a
> type large enough to hold a pointer.

The only reason I like size_t is that it's good _documentation_.
It says "This integer is a byte count of something that's in memory".
As opposed to being a count of sectors, blocks, pages, pointers or
turtles.

As an example:
extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
                           unsigned int, unsigned int);

What the hell are those two ints?  Based on experience, they're probably
offset & length, but who even knows what order they're in.

> And does it sound familiar how on some architectures it's "unsigned
> int", and on others it is "unsigned long"? It's very annoying, and
> it's been annoying over the years. The latest annoyance was literally
> less than a week ago in 1c27f1fc1549 ("iov_iter: fix build issue due
> to possible type mis-match").

Yes, ARM is just crazy here.  We should get the compiler people to give
us an option to make size_t unsigned long.  Like we have -mcmodel=kernel
on x86.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] usercopy: use unsigned long instead of uintptr_t
  2022-06-16 16:44         ` Matthew Wilcox
@ 2022-06-16 16:56           ` Linus Torvalds
  2022-06-16 19:14             ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2022-06-16 16:56 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jason A. Donenfeld, Linux-MM, linux-xfs, linux-hardening,
	Linux Kernel Mailing List, Uladzislau Rezki, Kees Cook,
	Greg Kroah-Hartman, Joe Perches

On Thu, Jun 16, 2022 at 9:44 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> I don't want to support an address space larger than word size.  I can't
> imagine any CPU vendor saying "So we have these larger registers that
> you can only use for pointers and then these smaller registers that you
> can use for data".  We haven't had A/D register splits since the m68k.
> Perhaps I haven't talked to enough crazy CPU people.  But if anyone does
> propose something that bad, we should laugh at them.

Yeah, the thing is, right now we have 'unsigned long' as the "wordsize".

And I want to point out that that is not about "pointers" at all, it's
about pretty much everything.

It shows up in some very core places like system call interface etc,
where "long" is in very real ways the expected register size.

So the 128-bit problem is actually much larger than just "uintptr_t",
and we have that "sizeof(long)" thing absolutely everywhere.

In fact, you can see it very much in things like this:

   #if BITS_PER_LONG == 64

which you'll find all over as the "is this a 64-bit architecture".

Out bitmaps and bit fields are also all about "long" - again, entirely
unrelated to pointers.

So I agree 100% that "we will have a problem with 128-bit words".

> So how do you think we should solve the 128-bit-word-size problem?
> Leave int at 32-bit, promote long to 128-bit and get the compiler to
> add __int64 to give us a 64-bit type?

That would likely be the least painful approach, but I'm not sure it's
necessarily the right one.

Maybe we might have to introduce a "word size" type.

> The only reason I like size_t is that it's good _documentation_.
> It says "This integer is a byte count of something that's in memory".
> As opposed to being a count of sectors, blocks, pages, pointers or
> turtles.

Yes.

And yes:

> extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
>                            unsigned int, unsigned int);

We should use a lot more explicit types for flags in particular.
Partly for documentation, partly for "we could type-check these".

And in declarations it might be good to encourage use of (helpful)
argument names, in case it really is just an offset or other integer
where a type makes no sense.

             Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] usercopy: use unsigned long instead of uintptr_t
  2022-06-16 16:56           ` Linus Torvalds
@ 2022-06-16 19:14             ` Linus Torvalds
  2022-06-16 19:18               ` Linus Torvalds
  2022-06-17  7:58               ` Geert Uytterhoeven
  0 siblings, 2 replies; 17+ messages in thread
From: Linus Torvalds @ 2022-06-16 19:14 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jason A. Donenfeld, Linux-MM, linux-xfs, linux-hardening,
	Linux Kernel Mailing List, Uladzislau Rezki, Kees Cook,
	Greg Kroah-Hartman, Joe Perches

On Thu, Jun 16, 2022 at 9:56 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Out bitmaps and bit fields are also all about "long" - again, entirely
> unrelated to pointers.

That, btw, has probably been a mistake. It's entirely historical. We
would have been better off had our bitmap types been defined in terms
of 32-bit chunks, because now we have the odd situation that 32-bit
and 64-bit architectures get very different sizes for some flag
fields.

It does have a technical reason: it's often better to traverse bitmaps
in maximally sized chunks (ie scanning for bits set or clear), and in
that sense defining bitmaps to always act as arrays of "long" has been
a good thing.

But it then causes pointless problems when people can't really rely on
more than 32 bits for atomic bit operations, and on 64-bit
architectures we unnecessarily use "long" and waste the upper bits.

In some places we then take advantage of that ugly difference (ie "we
have more page flags on 64-bit architectures"), but on the whole it
has probably been more of a pain than the technical gain. The bitmap
scanning is probably not worth it, and could have been done
differently.

And continuing that for some 128-bit architecture is likely just
making the same mistake even worse.

So I think we have a *lot* of things we should look at, if people
really get serious about 128-bit computing.

But I personally think it's several decades away, if ever. Looking at
historical workload growth is probably _very_ misleading - Moore's law
is dying or dead. We're not that many shrinks away from some very
fundamental physical limits.

I do expect people will want to do academic test architectures,
though. It's not clear it's going to be a "double all the widths" kind
of thing, though, and I don't think there is a lot of sense in
designing for a future architecture that is very hazy.

It's not entirely unlikely that we'll end up with a situation where we
do have access to 128-bit operations (because ALU and register width
is relatively "cheap", and it helps some loads - extended precision
arithmetic, crypto, integer vectors), but the address space will be
64-bit (because big pointers are bad for memory and cache use).

In that situation, we'd probably just see "long long" being 128-bit
("I32LP64LL128").

That's obviously what people thought back in the ILP32/LL64 data
model. They were wrong back then, and I may well be wrong now. But
Moore's law limits are fairly undisputable, and a 64-bit address space
is a *lot* bigger than a 32-bit address space was.

So I personally will take a "let's wait for reality to catch up a bit
more" approach, because it's not clear what the actual future will
hold.

                 Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] usercopy: use unsigned long instead of uintptr_t
  2022-06-16 19:14             ` Linus Torvalds
@ 2022-06-16 19:18               ` Linus Torvalds
  2022-06-17  9:19                 ` David Laight
  2022-06-17  7:58               ` Geert Uytterhoeven
  1 sibling, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2022-06-16 19:18 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jason A. Donenfeld, Linux-MM, linux-xfs, linux-hardening,
	Linux Kernel Mailing List, Uladzislau Rezki, Kees Cook,
	Greg Kroah-Hartman, Joe Perches

On Thu, Jun 16, 2022 at 12:14 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> In that situation, we'd probably just see "long long" being 128-bit
> ("I32LP64LL128").

Looking around, it looks like people prefer "long long long" (or in
the kernel, just "u128") for this, because so many have already gotten
used to "long long" being 64-bit, and 32-bit architectures (where
"long" is 32-bit and "long long" is 64-bit) are still relevant enough
that people want to keep that.

             Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] usercopy: use unsigned long instead of uintptr_t
  2022-06-16 19:14             ` Linus Torvalds
  2022-06-16 19:18               ` Linus Torvalds
@ 2022-06-17  7:58               ` Geert Uytterhoeven
  2022-06-17 11:05                 ` Christophe Leroy
  1 sibling, 1 reply; 17+ messages in thread
From: Geert Uytterhoeven @ 2022-06-17  7:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthew Wilcox, Jason A. Donenfeld, Linux-MM, linux-xfs,
	linux-hardening, Linux Kernel Mailing List, Uladzislau Rezki,
	Kees Cook, Greg Kroah-Hartman, Joe Perches

Hi Linus,

On Thu, Jun 16, 2022 at 9:15 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jun 16, 2022 at 9:56 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > Out bitmaps and bit fields are also all about "long" - again, entirely
> > unrelated to pointers.
>
> That, btw, has probably been a mistake. It's entirely historical. We
> would have been better off had our bitmap types been defined in terms
> of 32-bit chunks, because now we have the odd situation that 32-bit
> and 64-bit architectures get very different sizes for some flag
> fields.
>
> It does have a technical reason: it's often better to traverse bitmaps
> in maximally sized chunks (ie scanning for bits set or clear), and in
> that sense defining bitmaps to always act as arrays of "long" has been
> a good thing.

Indeed, as long is the native word size, it's assumed to be the best,
performance-wise.  For bitmaps, the actual underlying unit doesn't
matter that much to the user, as bitmaps can span multiple words.
For bit fields, you're indeed stuck with the 32-vs-64 bit difference.

> But it then causes pointless problems when people can't really rely on
> more than 32 bits for atomic bit operations, and on 64-bit
> architectures we unnecessarily use "long" and waste the upper bits.

Well, atomic works up to native word size, i.e. long.

> It's not entirely unlikely that we'll end up with a situation where we
> do have access to 128-bit operations (because ALU and register width
> is relatively "cheap", and it helps some loads - extended precision
> arithmetic, crypto, integer vectors), but the address space will be
> 64-bit (because big pointers are bad for memory and cache use).
>
> In that situation, we'd probably just see "long long" being 128-bit
> ("I32LP64LL128").

Regardless of the address space decision (we do have size_t and the
dreaded uintptr_t to cater for that), keeping long at 64-bit would
break the "long is the native word size" assumption (as used in lots
of places, e.g. for syscalls).

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] usercopy: use unsigned long instead of uintptr_t
  2022-06-16 19:18               ` Linus Torvalds
@ 2022-06-17  9:19                 ` David Laight
  0 siblings, 0 replies; 17+ messages in thread
From: David Laight @ 2022-06-17  9:19 UTC (permalink / raw)
  To: 'Linus Torvalds', Matthew Wilcox
  Cc: Jason A. Donenfeld, Linux-MM, linux-xfs, linux-hardening,
	Linux Kernel Mailing List, Uladzislau Rezki, Kees Cook,
	Greg Kroah-Hartman, Joe Perches

From: Linus Torvalds
> Sent: 16 June 2022 20:19
> 
> On Thu, Jun 16, 2022 at 12:14 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > In that situation, we'd probably just see "long long" being 128-bit
> > ("I32LP64LL128").
> 
> Looking around, it looks like people prefer "long long long" (or in
> the kernel, just "u128") for this, because so many have already gotten
> used to "long long" being 64-bit, and 32-bit architectures (where
> "long" is 32-bit and "long long" is 64-bit) are still relevant enough
> that people want to keep that.

Changing 'long long' to 128 bits will break things.
Much like a certain OS that is IL32P64LL64 because too much
code used 'long' to mean 32bits when int was 16 bits. 

gcc already has support for 128bit integers (on 64bit systems)
emulated using two 64bit registers (__u128 ??)

Anything wanting them probably wants them explicitly and even deciding
that uintmax_t is suddenly 128 bit will probably break things!

The only place I can imagine 'long long long' being used
is as "%llld" in printf formats.

Since 'short' and 'long' are both types and qualifiers
you can have 'long char, 'long short' and 'short long'.
These could be interesting on an I64L256 system :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] usercopy: use unsigned long instead of uintptr_t
  2022-06-17  7:58               ` Geert Uytterhoeven
@ 2022-06-17 11:05                 ` Christophe Leroy
  2022-06-17 12:51                   ` David Laight
  0 siblings, 1 reply; 17+ messages in thread
From: Christophe Leroy @ 2022-06-17 11:05 UTC (permalink / raw)
  To: Geert Uytterhoeven, Linus Torvalds
  Cc: Matthew Wilcox, Jason A. Donenfeld, Linux-MM, linux-xfs,
	linux-hardening, Linux Kernel Mailing List, Uladzislau Rezki,
	Kees Cook, Greg Kroah-Hartman, Joe Perches



Le 17/06/2022 à 09:58, Geert Uytterhoeven a écrit :
>> But it then causes pointless problems when people can't really rely on
>> more than 32 bits for atomic bit operations, and on 64-bit
>> architectures we unnecessarily use "long" and waste the upper bits.
> 
> Well, atomic works up to native word size, i.e. long.
> 

powerpc64 has a pair of instructions to perform 128bits atomic 
operations : lqarx / stqcx.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] usercopy: use unsigned long instead of uintptr_t
  2022-06-17 11:05                 ` Christophe Leroy
@ 2022-06-17 12:51                   ` David Laight
  0 siblings, 0 replies; 17+ messages in thread
From: David Laight @ 2022-06-17 12:51 UTC (permalink / raw)
  To: 'Christophe Leroy', Geert Uytterhoeven, Linus Torvalds
  Cc: Matthew Wilcox, Jason A. Donenfeld, Linux-MM, linux-xfs,
	linux-hardening, Linux Kernel Mailing List, Uladzislau Rezki,
	Kees Cook, Greg Kroah-Hartman, Joe Perches

From: Christophe Leroy
> Sent: 17 June 2022 12:06
> 
> Le 17/06/2022 à 09:58, Geert Uytterhoeven a écrit :
> >> But it then causes pointless problems when people can't really rely on
> >> more than 32 bits for atomic bit operations, and on 64-bit
> >> architectures we unnecessarily use "long" and waste the upper bits.
> >
> > Well, atomic works up to native word size, i.e. long.
> >
> 
> powerpc64 has a pair of instructions to perform 128bits atomic
> operations : lqarx / stqcx.

As does x86-64 (and 32bit has a 64bit atomic compare+exchange).

Annoyingly the x86-64 doesn't have 128bit read/write register
pair instructions that would generate a 128bit PCIe TLP.
You can use AVX instructions to generate large TLP - but not
in the linux kernel - you want 1 big register.

Even the humble 68020 has a cas2 instruction that will do a
64bit atomic operation.
I did manage to use it once, but it is easier to disable interrupts.
I'm not sure how many SMP 68020 systems were ever built.
You'd need a matched pair of cpu (or extreme care) since it
tends to stack microcode data on mid-instruction faults.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-06-17 12:51 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-16 14:36 [PATCH] usercopy: use unsigned long instead of uintptr_t Jason A. Donenfeld
2022-06-16 14:38 ` Matthew Wilcox
2022-06-16 14:51   ` Jason A. Donenfeld
2022-06-16 15:11     ` Jason A. Donenfeld
2022-06-16 15:21     ` Matthew Wilcox
2022-06-16 15:59       ` Linus Torvalds
2022-06-16 16:12         ` Linus Torvalds
2022-06-16 16:44         ` Matthew Wilcox
2022-06-16 16:56           ` Linus Torvalds
2022-06-16 19:14             ` Linus Torvalds
2022-06-16 19:18               ` Linus Torvalds
2022-06-17  9:19                 ` David Laight
2022-06-17  7:58               ` Geert Uytterhoeven
2022-06-17 11:05                 ` Christophe Leroy
2022-06-17 12:51                   ` David Laight
2022-06-16 16:29 ` Kees Cook
2022-06-16 16:36   ` Mark Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.