All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] regset ->get() API
@ 2020-02-17 18:33 Al Viro
  2020-02-19 20:01 ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Al Viro @ 2020-02-17 18:33 UTC (permalink / raw)
  To: linux-arch; +Cc: linux-kernel, Linus Torvalds, Arnd Bergmann

	Looking at the regset guts, I really wonder if the interface
is right.  What happens goes along the lines of

ptrace(PTRACE_GETREGS, pid, whatever, buf)
  arch_ptrace(task, PTRACE_GETREGS, whatever, buf)
    copy_regset_to_user(task, task_user_regset_view(current),
		REGSET_GENERAL, 0, sizeof(struct user_regs_struct), buf);
     check access_ok(buf, sizeof(...)
     genregs_get(task, regset, 0, sizeof(...), NULL, buf);
and we hit
static int genregs_get(struct task_struct *target,
                       const struct user_regset *regset,
                       unsigned int pos, unsigned int count,
                       void *kbuf, void __user *ubuf)
{
        if (kbuf) {
                unsigned long *k = kbuf;
                while (count >= sizeof(*k)) {
                        *k++ = getreg(target, pos);
                        count -= sizeof(*k);
                        pos += sizeof(*k);
                }
        } else {
                unsigned long __user *u = ubuf;
                while (count >= sizeof(*u)) {
                        if (__put_user(getreg(target, pos), u++))
                                return -EFAULT;
                        count -= sizeof(*u);
                        pos += sizeof(*u);
                }
        }

        return 0;
}
 
IOW, we call getreg() in a loop, with separate __put_user() for each
value.  For one thing, it's going to be painful on any recent x86 -
the cost of stac/clac on each of those is not going to be cheap.
And we obviously can't extend stac/clac area over the entire loop
there - getreg() is not trivial.

For another, the calling conventions are too generic - the callers
of ->get() always pass zero for pos, for example.  And this "pass
kbuf and ubuf separately" wart does not make it any prettier.

Other instances (e.g. powerpc gpr_get()) do not use __put_user() -
they make a series of user_regset_copyout()/user_regset_copyout_zero()
calls (4 in this case):
static inline int user_regset_copyout(unsigned int *pos, unsigned int *count,
                                      void **kbuf,
                                      void __user **ubuf, const void *data,
                                      const int start_pos, const int end_pos)
{
        if (*count == 0)
                return 0;
        BUG_ON(*pos < start_pos);
        if (end_pos < 0 || *pos < end_pos) {
                unsigned int copy = (end_pos < 0 ? *count
                                     : min(*count, end_pos - *pos));
                data += *pos - start_pos;
                if (*kbuf) {
                        memcpy(*kbuf, data, copy);
                        *kbuf += copy;
                } else if (__copy_to_user(*ubuf, data, copy))
                        return -EFAULT;
                else
                        *ubuf += copy;
                *pos += copy;
                *count -= copy;
        }
        return 0;
}

The caller obviously needs to check if the damn thing has failed, and the
calling conventions are still over-generic (grep and you'll see).  The
actual calls of __copy_to_user()/__clear_user() have destinations back-to-back.
And while in case of ppc their number is not too large, for e.g.
arch/arc/kernel/ptrace.c:genregs_get() we get 39 calls of those things -
basically, back to the __put_user-inna-loop-and-thats-cuttin-me-own-throat
situation we have on x86, only with more overhead per register.

And too convoluted calling conventions come with the usual price - they
are too easy to get wrong.  Example:
arch/x86/math-emu/fpu_entry.c:fpregs_soft_get() is
{
        struct swregs_state *s387 = &target->thread.fpu.state.soft;
        const void *space = s387->st_space;
        int ret;
        int offset = (S387->ftop & 7) * 10, other = 80 - offset;

        RE_ENTRANT_CHECK_OFF;

#ifdef PECULIAR_486
        S387->cwd &= ~0xe080;
        /* An 80486 sets nearly all of the reserved bits to 1. */
        S387->cwd |= 0xffff0040;
        S387->swd = sstatus_word() | 0xffff0000;
        S387->twd |= 0xffff0000;
        S387->fcs &= ~0xf8000000;
        S387->fos |= 0xffff0000;
#endif /* PECULIAR_486 */

        ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, s387, 0,
                                  offsetof(struct swregs_state, st_space));

OK, here we copy everything in s387 up to the beginning of s387->st_space.
Fair enough.  Now pos is offsetof(struct swregs_state, st_space)), and...
        /* Copy all registers in stack order. */
        if (!ret)
                ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
                                          space + offset, 0, other);
        if (!ret)
                ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
                                          space, 0, offset);
We _intend_ to copy stack entries, top..7, then 0..top-1, out of
s387->st_space.  What we actually do is different - pos is updated
by each call, so the last two arguments in these calls are wrong.
They should've been offsetof(struct swregs_state, st_space)) and
offsetof(struct swregs_state, st_space)) + other in the first
one and offsetof(struct swregs_state, st_space)) + other and
offsetof(struct swregs_state, st_space)) + 80 in the second one.

It's not hard to fix, and nobody really uses !CONFIG_FPU setups
on x86 anyway; the point is that this is really easy to get wrong.

What I really wonder is whether it's actually worth bothering - the
life would be much simpler if we *always* passed a kernel buffer to
->get() instances and did all copyout at once.  Sure, it results in
double copies, but... would that really cost more than all the complexity
we have there?

What are the typical amounts of data copied in such ptrace() calls
and how hot they normally are?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] regset ->get() API
  2020-02-17 18:33 [RFC] regset ->get() API Al Viro
@ 2020-02-19 20:01 ` Linus Torvalds
  2020-02-20 22:47   ` Al Viro
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2020-02-19 20:01 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-arch, Linux Kernel Mailing List, Arnd Bergmann

On Mon, Feb 17, 2020 at 10:33 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>         Looking at the regset guts, I really wonder if the interface
> is right.

No, it's not right, but it has a history. I think it comes from the
internal gdb model, and Roland - who was used to that model - then
modelled the kernel internals roughly similarly.

But part of it is that the user interfaces have evolved over time too,
with the original one being the "get one register using magical
PTRACE_{PEEK,POKE}USR", and then because that performs horribly badly,
there's now a PTRACE_GETREGSET.

And yes, because of the gdb history, some of the interfaces make no
sense for the kernel. Like

> For another, the calling conventions are too generic - the callers
> of ->get() always pass zero for pos, for example.

I think that comes from the historical PTRACE_{PEEK,POKE}USR case,
where gdb has a unifying interface for getting one register (using
PEEKUSR) vs getting the whole regset (using GETREGSET).

But in the kernel, we never actually got to the point where this was
generalized. Each architecture keeps its own PEEKUSR implementation,
and then there is that generic GETREGSET interface that is completely
separate.

End result: ->get is never actually used for individual registers.

So yes, you always have a zero position, because you always basically
get the whole thing. But the interface was written on the assumption
that some day the PEEKUSR thing would also be generalized.

Except absolutely nobody wants to touch that code, and it's hard to do
because it's so architecture-specific anyway, and you can't switch
over to a generic model for PEEKUSR until _all_ architectures have
done it, so it's never going to happen.

Anyway, that's the explanation for the bad interface.

That is not an excuse for _keeping_ the bad interface, though. It's
just that effectively the interface was done overr a decade ago, and
absolutely nobody has ever shown any interest in touching it since.
"It works, don't touch it".

> What I really wonder is whether it's actually worth bothering - the
> life would be much simpler if we *always* passed a kernel buffer to
> ->get() instances and did all copyout at once.

I don't mind it, but some of those buffers are big, and the generic
code generally doesn't know how big.

You'd probably have to have a whole page to pass in as the buffer -
certainly not some stack space.

Maybe even more. I'm not sure how big the FPU regset can get on x86...

But yes, I'd love to see somebody (you, apparently) carefully simplify
that interface. And I do not think anybody will object - the biggest
issue is that this touches code across all architectures, and some of
them are not having a lot of maintenance.

If you do it in small steps, it probably won't be a problem. Remove
the "offset" parameter in one patch, do the "only copy to kernel
space" in another, and hey, if it ends up breaking subtly on something
we don't have any testers for, well, they can report that a year from
now and it will get fixed up, but nobody will seriously care.

IOW, "go wild".

But at the same time, I wouldn't worry _too_ much about things like
the performance of copying one register at a time. The reason we do
that regset thing is not because it's truly high-performance, it's
just that it sucked dead baby donkeys through a straw to do a full
"ptrace(PEEKUSR)" for each register.

So "high performance" is relative. Doing the STAC/CLAC dance for each
register on x86 is a complete non-issue. It's not that
performance-critical.

So yes, "go wild", but do it for the right reasons: simplifying the interface.

Because if you only worry about current use of
"__get_user()/__put_user()" optimizations, don't. Just convert to
"get_user()/put_user()" and be done with it, it's not important. This
is another area where people used the double-underscore version not
because it mattered, but because it was available.

It's like Sir Edmund Hillary, except instead of climbing Mount
Everest, people write bad code using the wrong interfaces: "because
they're there".

               Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] regset ->get() API
  2020-02-19 20:01 ` Linus Torvalds
@ 2020-02-20 22:47   ` Al Viro
  2020-02-20 22:56     ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Al Viro @ 2020-02-20 22:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-arch, Linux Kernel Mailing List, Arnd Bergmann

On Wed, Feb 19, 2020 at 12:01:54PM -0800, Linus Torvalds wrote:

> I don't mind it, but some of those buffers are big, and the generic
> code generally doesn't know how big.

That's what regset_size() returns...
 
> You'd probably have to have a whole page to pass in as the buffer -
> certainly not some stack space.
> 
> Maybe even more. I'm not sure how big the FPU regset can get on x86...

amd64:
        REGSET_GENERAL	=>	sizeof(struct user_regs_struct) (216)
        REGSET_FP	=>	sizeof(struct user_i387_struct) (512)
        REGSET_XSTATE	=>	sizeof(struct swregs_state) or 
				sizeof(struct fxregs_state) or
				sizeof(struct fregs_state) or
				XSAVE insn buffer size (max about 2.5Kb, AFAICS)
        REGSET_IOPERM64	=>	IO_BITMAP_BYTES (8Kb, that is)

> But yes, I'd love to see somebody (you, apparently) carefully simplify
> that interface. And I do not think anybody will object - the biggest
> issue is that this touches code across all architectures, and some of
> them are not having a lot of maintenance.
> 
> If you do it in small steps, it probably won't be a problem. Remove
> the "offset" parameter in one patch, do the "only copy to kernel
> space" in another, and hey, if it ends up breaking subtly on something
> we don't have any testers for, well, they can report that a year from
> now and it will get fixed up, but nobody will seriously care.
> 
> IOW, "go wild".
> 
> But at the same time, I wouldn't worry _too_ much about things like
> the performance of copying one register at a time. The reason we do
> that regset thing is not because it's truly high-performance, it's
> just that it sucked dead baby donkeys through a straw to do a full
> "ptrace(PEEKUSR)" for each register.
> 
> So "high performance" is relative. Doing the STAC/CLAC dance for each
> register on x86 is a complete non-issue. It's not that
> performance-critical.
> 
> So yes, "go wild", but do it for the right reasons: simplifying the interface.
> 
> Because if you only worry about current use of
> "__get_user()/__put_user()" optimizations, don't. Just convert to
> "get_user()/put_user()" and be done with it, it's not important. This
> is another area where people used the double-underscore version not
> because it mattered, but because it was available.

FWIW, what I have in mind is to start with making copy_regset_to_user() do
	buf = kmalloc(size, GFP_KERNEL);
	if (!buf)
		return -ENOMEM;
	err = regset->get(target, regset, offset, size, buf, NULL);
	if (!err && copy_to_user(data, buf, size))
		err = -EFAULT;
	kfree(buf);
	return err;

That alone would make sure that ->get() instances always get called
with NULL ubuf.  There are 6 places that can call one of those -
copy_regset_to_user(), fill_thread_core_info() and 4 direct calls
(not via method) - x86 dump_fpu(), sh dump_fpu() and itanit
acces_uarea().  Only the first one went to userland buffer and
with that change it won't do that anymore.

Right after that we can just replace the kbuf == NULL case in
user_regset_copyout()/user_regset_copyout_zero() with BUG();

Next we can start converting instances to more humane helpers -
along the lines of

struct <something> {
	void *p;
	size_t left;
};

static inline void <something>_zero(struct <something> *s, size_t size)
{
	if (s->left) {
		if (size > s->left)
			size = s->left;
		memset(s->p, 0, size);
		s->p += size;
		s->left -= size;
	}
}

static inline void <something>_align(struct <something> *s, int n)
{
	int off = (unsigned long)s->p & (n - 1);
	if (off)
		<something>_zero(s, n - off);
}

static inline void <something>_write(struct <something> *s, void *v, size_t size)
{
	if (s->left) {
		if (unlikely(size < s->left) {
			<something>_zero(s, s->left);
			return;
		}
		memcpy(s->p, v, size);
		s->p += size;
		s->left -= size;
	}
}

#define __<something>_store(s, v, __size)			\
do {								\
	if ((s)->left) {					\
		if (unlikely(__size < (s)->left) {		\
			<something>_zero((s), __size);		\
		} else {					\
			*(typeof(v) *)(s)->p = (v);		\
			(s)->p += __size;			\
			(s)->left -= __size;			\
		}						\
	}							\
} while(0)

/* current s->p must be aligned for v */
#define <something>_store(s, v) __<something>_store((s), (v), sizeof(v))


E.g. genregs_get() becomes
	struct foo to = {.p = kbuf, .left = count};
	int reg;
	for (reg = 0; to.left; reg++)
		foo_store(&to, getreg(target, reg * sizeof(unsigned long)));
	return 0;

while in fpregs_soft_get() we get
	ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, s387, 0,
				  offsetof(struct swregs_state, st_space));

	/* Copy all registers in stack order. */
	if (!ret)
		ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
					  space + offset, 0, other);
	if (!ret)
		ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
					  space, 0, offset);
replaced with
	foo_write(&to, s387, offsetof(struct swregs_state, st_space));
	foo_write(&to, space + offset, other);
	foo_write(&to, space, offset);
(fixing the mishandled pos in there, while we are at it)

Some functions (e.g. copy_xstate_to_user()) simply go away.

Conversions for all instances are independent from each other; once
they are all done, we can throw the current primitives (user_regset_copyout()
and user_regset_copyout_zero()) and do a tree-wide replacement of ->get()
instances - remove the pos and ubuf arguments.

It's doable, and after the first two steps it's uncoupled from the
uaccess series.  As for the helpers...  I'm not sure what name to use;
it's obviously a lot more generic than regset.  membuf, perhaps?
Any suggestions would be welcome, really...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] regset ->get() API
  2020-02-20 22:47   ` Al Viro
@ 2020-02-20 22:56     ` Linus Torvalds
  2020-02-20 23:29       ` Al Viro
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2020-02-20 22:56 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-arch, Linux Kernel Mailing List, Arnd Bergmann

On Thu, Feb 20, 2020 at 2:47 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Wed, Feb 19, 2020 at 12:01:54PM -0800, Linus Torvalds wrote:
>
> > I don't mind it, but some of those buffers are big, and the generic
> > code generally doesn't know how big.
>
> That's what regset_size() returns...

Yes, but the code ends up being disgusting. You first have to call
that indirect function just to get the size, then do a kmalloc, and
then call another indirect function to actually fill it.

Don't do that. Not since we know how retpoline is a bad thing.

And since the size isn't always some trivial constant (ie for x86 PFU
it depends on the register state!), I think the only sane model is to
change the interface even more, and just have the "get()" function not
only get the data, but allocate the backing store too.

So you'd never pass in the result pointer - you'd get a result area
that you can then free.

Hmm?

The alternative is to pick a constant size that is "big enough", and
just assume that one page (or whatever) is sufficient:

> > Maybe even more. I'm not sure how big the FPU regset can get on x86...
>
> amd64:
>         REGSET_GENERAL  =>      sizeof(struct user_regs_struct) (216)
>         REGSET_FP       =>      sizeof(struct user_i387_struct) (512)
>         REGSET_XSTATE   =>      sizeof(struct swregs_state) or
>                                 sizeof(struct fxregs_state) or
>                                 sizeof(struct fregs_state) or
>                                 XSAVE insn buffer size (max about 2.5Kb, AFAICS)
>         REGSET_IOPERM64 =>      IO_BITMAP_BYTES (8Kb, that is)

Yeah, so apparently one page isn't sufficient.


> FWIW, what I have in mind is to start with making copy_regset_to_user() do
>         buf = kmalloc(size, GFP_KERNEL);
>         if (!buf)
>                 return -ENOMEM;
>         err = regset->get(target, regset, offset, size, buf, NULL);

See above. This doesn't work. You don't know the size. And we don't
have a known maximum size either.

>         if (!err && copy_to_user(data, buf, size))
>                 err = -EFAULT;
>         kfree(buf);
>         return err;

But if you change "->get()" to just return a kmalloc'ed buffer, I'd be
ok with that.

IOW, something like

        buf = regset->get(target, regset, &size);
        if (IS_ERR(buf))
                return PTR_ERR(bug);
        err = copy_to_user(data, buf, size);
        kfree(buf);
        return err;

or something like that. Just get rid of the "ubuf" entirely.

Wouldn't that be nicer?

            Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] regset ->get() API
  2020-02-20 22:56     ` Linus Torvalds
@ 2020-02-20 23:29       ` Al Viro
  2020-02-20 23:31         ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Al Viro @ 2020-02-20 23:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-arch, Linux Kernel Mailing List, Arnd Bergmann

On Thu, Feb 20, 2020 at 02:56:28PM -0800, Linus Torvalds wrote:
> On Thu, Feb 20, 2020 at 2:47 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Wed, Feb 19, 2020 at 12:01:54PM -0800, Linus Torvalds wrote:
> >
> > > I don't mind it, but some of those buffers are big, and the generic
> > > code generally doesn't know how big.
> >
> > That's what regset_size() returns...
> 
> Yes, but the code ends up being disgusting. You first have to call
> that indirect function just to get the size, then do a kmalloc, and
> then call another indirect function to actually fill it.

Umm...  You do realize that this indirect function is a pathological
case, right?  It has exactly one user - REGSET_SVE on arm64.  Everything
else uses regset->n * regset->size.

> Don't do that. Not since we know how retpoline is a bad thing.
> 
> And since the size isn't always some trivial constant (ie for x86 PFU
> it depends on the register state!), I think the only sane model is to
> change the interface even more, and just have the "get()" function not
> only get the data, but allocate the backing store too.
> 
> So you'd never pass in the result pointer - you'd get a result area
> that you can then free.
> 
> Hmm?

Do you want such allocations done in each ->get() instance?  We have
a plenty of those instances...

> > FWIW, what I have in mind is to start with making copy_regset_to_user() do
> >         buf = kmalloc(size, GFP_KERNEL);
> >         if (!buf)
> >                 return -ENOMEM;
> >         err = regset->get(target, regset, offset, size, buf, NULL);
> 
> See above. This doesn't work. You don't know the size. And we don't
> have a known maximum size either.

We do know that caller does not want more than the value it has passed in
'size' argument, though.  For existing ptrace requests it's either
min(iov->iov_len, regset->n * regset->size) (in ptrace_regset())
or an explicit constant (usually in arch_ptrace()).  Note, BTW, that
regset_size() is used only by coredump - that's how much we allocate
there.  Everybody else either looks like
        case PTRACE_GETFPREGS:  /* Get the child FPU state. */
                return copy_regset_to_user(child,
                                           task_user_regset_view(current),
                                           REGSET_FP,
                                           0, sizeof(struct user_i387_struct),
                                           datap);
or does regset->n * regset->size.

FWIW, the real need to know the size is not in "how much do we allocated" -
it's "how much do we copy"; I _think_ everyone except that arm64 thing
fills exactly regset->n * regset->size (or we have a nasty infoleak in
coredumps) and we can switch coredump to "allocate regset->n * regset->size,
call ->get(), copy all of that into coredump unless ->get_size is there,
copy ->get_size() bytes to coredump if ->get_size exists" as the first
step.

Longer term I would have ->get() tell how much has it filled and killed
->get_size().  Again, there's only one user.  But I'd prefer to do that
in the end of series, when the bodies of ->get() instances are cleaned
up...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] regset ->get() API
  2020-02-20 23:29       ` Al Viro
@ 2020-02-20 23:31         ` Linus Torvalds
  2020-02-21  3:30           ` Al Viro
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2020-02-20 23:31 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-arch, Linux Kernel Mailing List, Arnd Bergmann

On Thu, Feb 20, 2020 at 3:29 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> We do know that caller does not want more than the value it has passed in
> 'size' argument, though.

Ok, fair enough. That's probably a good way to handle the "allocate in
the caller".

So then I have no issues with that approach.

          Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] regset ->get() API
  2020-02-20 23:31         ` Linus Torvalds
@ 2020-02-21  3:30           ` Al Viro
  2020-02-21 18:59             ` Al Viro
  0 siblings, 1 reply; 11+ messages in thread
From: Al Viro @ 2020-02-21  3:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-arch, Linux Kernel Mailing List, Arnd Bergmann

On Thu, Feb 20, 2020 at 03:31:56PM -0800, Linus Torvalds wrote:
> On Thu, Feb 20, 2020 at 3:29 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > We do know that caller does not want more than the value it has passed in
> > 'size' argument, though.
> 
> Ok, fair enough. That's probably a good way to handle the "allocate in
> the caller".
> 
> So then I have no issues with that approach.

Turns out that "nobody uses those for userland destinations after that" is not
quite right - we have this:
int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
{
        struct task_struct *tsk = current;
        int ia32_fxstate = (buf != buf_fx);
        int ret;

        ia32_fxstate &= (IS_ENABLED(CONFIG_X86_32) ||
                         IS_ENABLED(CONFIG_IA32_EMULATION));

        if (!access_ok(buf, size))
                return -EACCES;

        if (!static_cpu_has(X86_FEATURE_FPU))
                return fpregs_soft_get(current, NULL, 0,
                        sizeof(struct user_i387_ia32_struct), NULL,
                        (struct _fpstate_32 __user *) buf) ? -1 : 1;

... with fpregs_soft_get() behing the shared helper that does, in turn,
call user_regset_copyout().  OTOH, _that_ sure as hell is "fill local
variable, then copy_to_user()" case.

Sigh...  Wish we had a quick way to do something along the lines of
"find all callchains leading to <function> that would not come via
-><method>" - doing that manually stank to high heaven ;-/  And in
cases like that nothing along the lines of "simulate a build" is
practical - call chains are all over arch/*, and config-sensitive
as well (32bit vs. 64bit is only beginning of that fun).  Thankfully,
none of those involved token-pasting...

Anyway, one observation that came out of that is that we might
be better off with signature change done first; less boilerplate
that way, contrary to what I expected.

Alternatively, we could introduce a new method, with one-by-one
conversion to it.  Hmm...
	int (*get2)(struct task_struct *target,
		    const struct user_regset *regset,
		    struct membuf to);
returning -E... on error and amount left unfilled on success, perhaps?
That seems to generate decent code and is pretty easy on the instances,
especially if membuf_write() et.al. are made to return to->left...
Things like

static int evr_get(struct task_struct *target, const struct user_regset *regset,
                   struct membuf to)
{
        flush_spe_to_thread(target);

        membuf_write(&to, &target->thread.evr, sizeof(target->thread.evr));

        BUILD_BUG_ON(offsetof(struct thread_struct, acc) + sizeof(u64) !=
                     offsetof(struct thread_struct, spefscr));

        return membuf_write(&to, &target->thread.acc, sizeof(u64));
}

in place of current

static int evr_get(struct task_struct *target, const struct user_regset *regset,
                   unsigned int pos, unsigned int count,
                   void *kbuf, void __user *ubuf)
{
        int ret;

        flush_spe_to_thread(target);

        ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
                                  &target->thread.evr,
                                  0, sizeof(target->thread.evr));

        BUILD_BUG_ON(offsetof(struct thread_struct, acc) + sizeof(u64) !=
                     offsetof(struct thread_struct, spefscr));

        if (!ret)
                ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
                                          &target->thread.acc,
                                          sizeof(target->thread.evr), -1);

        return ret;
}

and

static int vr_get(struct task_struct *target, const struct user_regset *regset,
                  struct membuf to)
{
	union {
		elf_vrreg_t reg;
		u32 word;
	} vrsave;

        flush_altivec_to_thread(target);

        BUILD_BUG_ON(offsetof(struct thread_vr_state, vscr) !=
                     offsetof(struct thread_vr_state, vr[32]));

        membuf_write(&to, &target->thread.vr_state, 33 * sizeof(vector128));
	/*
	 * Copy out only the low-order word of vrsave.
	 */
	memset(&vrsave, 0, sizeof(vrsave));
	vrsave.word = target->thread.vrsave;

	return membuf_write(&to, &vrsave, sizeof(vrsave);
}

instead of

static int vr_get(struct task_struct *target, const struct user_regset *regset,
                  unsigned int pos, unsigned int count,
                  void *kbuf, void __user *ubuf)
{
        int ret;

        flush_altivec_to_thread(target);

        BUILD_BUG_ON(offsetof(struct thread_vr_state, vscr) !=
                     offsetof(struct thread_vr_state, vr[32]));

        ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
                                  &target->thread.vr_state, 0,
                                  33 * sizeof(vector128));
        if (!ret) {
                /*
                 * Copy out only the low-order word of vrsave.
                 */
                int start, end;
                union {
                        elf_vrreg_t reg;
                        u32 word;
                } vrsave;
                memset(&vrsave, 0, sizeof(vrsave));

                vrsave.word = target->thread.vrsave;

                start = 33 * sizeof(vector128);
                end = start + sizeof(vrsave);
                ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &vrsave,
                                          start, end);
        }

        return ret;
}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] regset ->get() API
  2020-02-21  3:30           ` Al Viro
@ 2020-02-21 18:59             ` Al Viro
  2020-02-21 19:22               ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Al Viro @ 2020-02-21 18:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-arch, Linux Kernel Mailing List, Arnd Bergmann

On Fri, Feb 21, 2020 at 03:30:16AM +0000, Al Viro wrote:

> Alternatively, we could introduce a new method, with one-by-one
> conversion to it.  Hmm...
> 	int (*get2)(struct task_struct *target,
> 		    const struct user_regset *regset,
> 		    struct membuf to);
> returning -E... on error and amount left unfilled on success, perhaps?
> That seems to generate decent code and is pretty easy on the instances,
> especially if membuf_write() et.al. are made to return to->left...

Arrrrghhh...  sparc is interesting.  For one thing, GETREGS64 uses
a format different from coredump (or GETREGSET) - instead of
	G0..G7, O0..O7, L0..L7, I0..I7, TSTATE, TPC, TNPC, (u64)Y
it's
	G1..G7, O0..O7, TSTATE, TPC, TNPC, (u64)Y
with interesting comment about Y being mishandled.  Achieved by
a couple of copy_regset_to_user() with non-zero offset ;-/

GETREGS is also different from coredump/GETREGSET - instead of
	G0..G7, O0..O7, L0..L7, I0..I7, PSR, PC, nPC, Y, 0 (WIM), 0 (TBR)
it's
	PSR, PC, nPC, Y, G1..G7, O0..O7
Again, a couple of copy_regset_to_user(), but there's an additional
twist - GETREGSET of 32bit task on sparc64 will use access_process_vm()
when trying to fetch L0..L7/I0..I7 of other task, using copy_from_user()
only when the target is equal to current.  For sparc32 this is not
true - it's always copy_from_user() there, so the values it reports
for those registers have nothing to do with the target process.  That
part smells like a bug; by the time GETREGSET had been introduced
sparc32 was not getting much attention, GETREGS worked just fine
(not reporting L*/I* anyway) and for coredump it was accessing the
caller's memory.  Not sure if anyone cares at that point...

The situation with floating point is similar.  FWIW, considering how
compact those ->get2() instances become, I wonder if we should just
go for
static int getregs_get(struct task_struct *target,
                         const struct user_regset *regset,
                         struct membuf to)
{
        const struct pt_regs *regs = task_pt_regs(target);
        int i;

        if (target == current)
                flushw_user();

        membuf_store(&to, (u32)tstate_to_psr(regs->tstate));
        membuf_store(&to, (u32)(regs->tpc));
        membuf_store(&to, (u32)(regs->tnpc));
        membuf_store(&to, (u32)(regs->y));
        for (i = 1; i < 16; i++)
                membuf_store(&to, (u32)regs->u_regs[i]);
        return to.left;
}

static int getfpregs_get(struct task_struct *target,
                        const struct user_regset *regset,
                        struct membuf to)
{
	struct thread_info *t = task_thread_info(target);

        if (target == current)
                save_and_clear_fpu();

        membuf_write(&to, t->fpregs, 32 * sizeof(u32));
        if (t->fpsaved[0] & FPRS_FEF)
                membuf_store(&to, (u32)t->xfsr[0]);
        else
                membuf_zero(&to, sizeof(u32));
        return membuf_zero(&to, 35 * sizeof(u32));
}

and slap together a couple of struct user_regset refering to those,
so that PTRACE_GETREGS/PTRACE_GETFPREGS would just use solitary
copy_regset_to_user() calls on those, rather than trying to
paste them out of several calls on the normal regsets...

FWIW, they do shrink nicely - compare e.g.
static int fpregs64_get(struct task_struct *target,
                        const struct user_regset *regset,
                        struct membuf to)
{
        const unsigned long *fpregs = task_thread_info(target)->fpregs;
        unsigned long fprs;

        if (target == current)
                save_and_clear_fpu();

        fprs = task_thread_info(target)->fpsaved[0];

        if (fprs & FPRS_DL)
                membuf_write(&to, fpregs, 16 * sizeof(u64));
        else
                membuf_zero(&to, 16 * sizeof(u64));
        if (fprs & FPRS_DU)
                membuf_write(&to, fpregs + 16, 16 * sizeof(u64));
        else
                membuf_zero(&to, 16 * sizeof(u64));
        if (fprs & FPRS_FEF) {
                membuf_store(&to, task_thread_info(target)->xfsr[0]);
                membuf_store(&to, task_thread_info(target)->gsr[0]);
        } else {
                membuf_zero(&to, 2 * sizeof(u64));
        }
        return membuf_store(&to, fprs);
}
with the same function in mainline arch/sparc/kernel/ptrace_64.c...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] regset ->get() API
  2020-02-21 18:59             ` Al Viro
@ 2020-02-21 19:22               ` David Miller
  2020-02-22  0:41                 ` Al Viro
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2020-02-21 19:22 UTC (permalink / raw)
  To: viro; +Cc: torvalds, linux-arch, linux-kernel, arnd

From: Al Viro <viro@zeniv.linux.org.uk>
Date: Fri, 21 Feb 2020 18:59:03 +0000

> Again, a couple of copy_regset_to_user(), but there's an additional
> twist - GETREGSET of 32bit task on sparc64 will use access_process_vm()
> when trying to fetch L0..L7/I0..I7 of other task, using copy_from_user()
> only when the target is equal to current.  For sparc32 this is not
> true - it's always copy_from_user() there, so the values it reports
> for those registers have nothing to do with the target process.  That
> part smells like a bug; by the time GETREGSET had been introduced
> sparc32 was not getting much attention, GETREGS worked just fine
> (not reporting L*/I* anyway) and for coredump it was accessing the
> caller's memory.  Not sure if anyone cares at that point...

That's definitely a bug and sparc64 is doing it correctly.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] regset ->get() API
  2020-02-21 19:22               ` David Miller
@ 2020-02-22  0:41                 ` Al Viro
  2020-04-13  4:32                   ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Al Viro @ 2020-02-22  0:41 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, linux-arch, linux-kernel, arnd

On Fri, Feb 21, 2020 at 11:22:44AM -0800, David Miller wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> Date: Fri, 21 Feb 2020 18:59:03 +0000
> 
> > Again, a couple of copy_regset_to_user(), but there's an additional
> > twist - GETREGSET of 32bit task on sparc64 will use access_process_vm()
> > when trying to fetch L0..L7/I0..I7 of other task, using copy_from_user()
> > only when the target is equal to current.  For sparc32 this is not
> > true - it's always copy_from_user() there, so the values it reports
> > for those registers have nothing to do with the target process.  That
> > part smells like a bug; by the time GETREGSET had been introduced
> > sparc32 was not getting much attention, GETREGS worked just fine
> > (not reporting L*/I* anyway) and for coredump it was accessing the
> > caller's memory.  Not sure if anyone cares at that point...
> 
> That's definitely a bug and sparc64 is doing it correctly.

OK...  What does the comment in
        case PTRACE_GETREGS64:
                ret = copy_regset_to_user(child, view, REGSET_GENERAL,
                                          1 * sizeof(u64),
                                          15 * sizeof(u64),
                                          &pregs->u_regs[0]);
                if (!ret) {
                        /* XXX doesn't handle 'y' register correctly XXX */
                        ret = copy_regset_to_user(child, view, REGSET_GENERAL,
                                                  32 * sizeof(u64),
                                                  4 * sizeof(u64),
                                                  &pregs->tstate);
                }
                break;   
refer to?  The fact that you end up with 0 in pregs->y and Y in pregs->magic?
In that case it's probably too late to do anything about that...

Or is that something different?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] regset ->get() API
  2020-02-22  0:41                 ` Al Viro
@ 2020-04-13  4:32                   ` David Miller
  0 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2020-04-13  4:32 UTC (permalink / raw)
  To: viro; +Cc: torvalds, linux-arch, linux-kernel, arnd

From: Al Viro <viro@zeniv.linux.org.uk>
Date: Sat, 22 Feb 2020 00:41:57 +0000

> On Fri, Feb 21, 2020 at 11:22:44AM -0800, David Miller wrote:
>> From: Al Viro <viro@zeniv.linux.org.uk>
>> Date: Fri, 21 Feb 2020 18:59:03 +0000
>> 
>> > Again, a couple of copy_regset_to_user(), but there's an additional
>> > twist - GETREGSET of 32bit task on sparc64 will use access_process_vm()
>> > when trying to fetch L0..L7/I0..I7 of other task, using copy_from_user()
>> > only when the target is equal to current.  For sparc32 this is not
>> > true - it's always copy_from_user() there, so the values it reports
>> > for those registers have nothing to do with the target process.  That
>> > part smells like a bug; by the time GETREGSET had been introduced
>> > sparc32 was not getting much attention, GETREGS worked just fine
>> > (not reporting L*/I* anyway) and for coredump it was accessing the
>> > caller's memory.  Not sure if anyone cares at that point...
>> 
>> That's definitely a bug and sparc64 is doing it correctly.
> 
> OK...  What does the comment in
>         case PTRACE_GETREGS64:
>                 ret = copy_regset_to_user(child, view, REGSET_GENERAL,
>                                           1 * sizeof(u64),
>                                           15 * sizeof(u64),
>                                           &pregs->u_regs[0]);
>                 if (!ret) {
>                         /* XXX doesn't handle 'y' register correctly XXX */
>                         ret = copy_regset_to_user(child, view, REGSET_GENERAL,
>                                                   32 * sizeof(u64),
>                                                   4 * sizeof(u64),
>                                                   &pregs->tstate);
>                 }
>                 break;   
> refer to?  The fact that you end up with 0 in pregs->y and Y in pregs->magic?
> In that case it's probably too late to do anything about that...

Yes, that's exactly what it's talking about since we have:

	unsigned int y;
	unsigned int magic;

and we're doing a 64-bit value copy.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-04-13  4:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-17 18:33 [RFC] regset ->get() API Al Viro
2020-02-19 20:01 ` Linus Torvalds
2020-02-20 22:47   ` Al Viro
2020-02-20 22:56     ` Linus Torvalds
2020-02-20 23:29       ` Al Viro
2020-02-20 23:31         ` Linus Torvalds
2020-02-21  3:30           ` Al Viro
2020-02-21 18:59             ` Al Viro
2020-02-21 19:22               ` David Miller
2020-02-22  0:41                 ` Al Viro
2020-04-13  4:32                   ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.