All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] parisc: Fix syscall restarts
@ 2015-12-18 23:30 Helge Deller
  2015-12-20 13:59 ` Mathieu Desnoyers
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Helge Deller @ 2015-12-18 23:30 UTC (permalink / raw)
  To: linux-parisc, James Bottomley, John David Anglin; +Cc: Mathieu Desnoyers

On parisc syscalls which are interrupted by signals sometimes fail to restart
and instead return -ENOSYS which then in the worst case lead to userspace
crashes.
A similiar problem existed on MIPS and was fixed by commit e967ef02 
("MIPS: Fix restart of indirect syscalls").

On parisc the current syscall restart code assumes hat the syscall number is
always loaded in the delay branch of the ble instruction as defined in the
unistd.h header file and as such never restored %r20 before returning to
userspace:
	ble 0x100(%sr2, %r0)
	ldi #syscall_nr, %r20

This assumption is at least not true for code which uses the syscall() glibc
function, which instead uses this syntax:
	ble 0x100(%sr2, %r0)
	copy regX, %r20
where regX depend on how the compiler optimizes the code and register usage.

This patch fixes this problem by adding code to analyze how the syscall number
is loaded in the delay branch and - if needed - copy the syscall number to regX
prior returning to userspace for the syscall restart.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

diff --git a/arch/parisc/kernel/signal.c b/arch/parisc/kernel/signal.c
index dc1ea79..b0414ad 100644
--- a/arch/parisc/kernel/signal.c
+++ b/arch/parisc/kernel/signal.c
@@ -435,6 +435,48 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs, int in_syscall)
 		regs->gr[28]);
 }
 
+/*
+ * Check the delay branch in userspace how the syscall number gets loaded into
+ * %r20 and adjust as needed.
+ */
+
+static void check_syscallno_in_delay_branch(struct pt_regs *regs)
+{
+	unsigned int opcode, source_reg;
+	u32 __user *uaddr;
+
+	/* Usually we don't have to restore %r20 (the system call number)
+	 * because it gets loaded in the delay slot of the branch external
+	 * instruction via the ldi instruction.
+	 * In some cases a register-to-register copy instruction might have
+	 * been used instead, in which case we need to copy the syscall
+	 * number into the source register before returning to userspace.
+	 */
+
+	/* A syscall is just a branch, so all
+	 * we have to do is fiddle the return pointer.
+	 */
+	regs->gr[31] -= 8; /* delayed branching */
+
+	/* Get assembler opcode of code in delay branch */
+	uaddr = (unsigned int *) (regs->gr[31] + 1);
+	get_user(opcode, uaddr);
+
+	/* Check if delay branch uses "ldi int,%r20" */
+	if ((opcode & 0xffff0000) == 0x34140000)
+		return;	/* everything ok, just return */
+
+	/* Check if delay branch uses "copy %rX,%r20" */
+	if ((opcode & 0xff00ffff) == 0x08000254) {
+		source_reg = (opcode >> 16) & 31;
+		regs->gr[source_reg] = regs->gr[20];
+		return;
+	}
+
+	pr_warn("syscall restart: %s (pid %d): unexpected opcode 0x%08x\n",
+		current->comm, task_pid_nr(current), opcode);
+}
+
 static inline void
 syscall_restart(struct pt_regs *regs, struct k_sigaction *ka)
 {
@@ -457,10 +499,7 @@ syscall_restart(struct pt_regs *regs, struct k_sigaction *ka)
 		}
 		/* fallthrough */
 	case -ERESTARTNOINTR:
-		/* A syscall is just a branch, so all
-		 * we have to do is fiddle the return pointer.
-		 */
-		regs->gr[31] -= 8; /* delayed branching */
+		check_syscallno_in_delay_branch(regs);
 		break;
 	}
 }
@@ -510,15 +549,9 @@ insert_restart_trampoline(struct pt_regs *regs)
 	}
 	case -ERESTARTNOHAND:
 	case -ERESTARTSYS:
-	case -ERESTARTNOINTR: {
-		/* Hooray for delayed branching.  We don't
-		 * have to restore %r20 (the system call
-		 * number) because it gets loaded in the delay
-		 * slot of the branch external instruction.
-		 */
-		regs->gr[31] -= 8;
+	case -ERESTARTNOINTR:
+		check_syscallno_in_delay_branch(regs);
 		return;
-	}
 	default:
 		break;
 	}

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-18 23:30 [PATCH] parisc: Fix syscall restarts Helge Deller
@ 2015-12-20 13:59 ` Mathieu Desnoyers
  2015-12-20 14:09   ` Mathieu Desnoyers
  2015-12-20 19:39 ` John David Anglin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers @ 2015-12-20 13:59 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-parisc, James Bottomley, John David Anglin

----- On Dec 18, 2015, at 6:30 PM, Helge Deller deller@gmx.de wrote:

> On parisc syscalls which are interrupted by signals sometimes fail to restart
> and instead return -ENOSYS which then in the worst case lead to userspace
> crashes.
> A similiar problem existed on MIPS and was fixed by commit e967ef02
> ("MIPS: Fix restart of indirect syscalls").
> 
> On parisc the current syscall restart code assumes hat the syscall number is
> always loaded in the delay branch of the ble instruction as defined in the
> unistd.h header file and as such never restored %r20 before returning to
> userspace:
>	ble 0x100(%sr2, %r0)
>	ldi #syscall_nr, %r20
> 
> This assumption is at least not true for code which uses the syscall() glibc
> function, which instead uses this syntax:
>	ble 0x100(%sr2, %r0)
>	copy regX, %r20
> where regX depend on how the compiler optimizes the code and register usage.
> 
> This patch fixes this problem by adding code to analyze how the syscall number
> is loaded in the delay branch and - if needed - copy the syscall number to regX
> prior returning to userspace for the syscall restart.
> 
> Signed-off-by: Helge Deller <deller@gmx.de>
> Cc: stable@vger.kernel.org
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> 
> diff --git a/arch/parisc/kernel/signal.c b/arch/parisc/kernel/signal.c
> index dc1ea79..b0414ad 100644
> --- a/arch/parisc/kernel/signal.c
> +++ b/arch/parisc/kernel/signal.c
> @@ -435,6 +435,48 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs,
> int in_syscall)
> 		regs->gr[28]);
> }
> 
> +/*
> + * Check the delay branch in userspace how the syscall number gets loaded into
> + * %r20 and adjust as needed.

I'm pretty sure "Check the delay branch in userspace how the syscall..."
is not an English construct. ;-) Suggested rewording:

"Check how the syscall number gets loaded into %r20 within
the delay branch in userspace and adjust as needed."

> + */
> +
> +static void check_syscallno_in_delay_branch(struct pt_regs *regs)
> +{
> +	unsigned int opcode, source_reg;

Why "unsigned int" above rather than u32 ? Since we're using
opcode as target variable for a get_user, it would be clearer
if the type of the __user * match the type of the target kernel
variable. (understood that those happen to have the same bitness
and type size on all Linux architectures, but it would be clearer
nevertheless).

> +	u32 __user *uaddr;
> +
> +	/* Usually we don't have to restore %r20 (the system call number)
> +	 * because it gets loaded in the delay slot of the branch external
> +	 * instruction via the ldi instruction.
> +	 * In some cases a register-to-register copy instruction might have
> +	 * been used instead, in which case we need to copy the syscall
> +	 * number into the source register before returning to userspace.
> +	 */
> +
> +	/* A syscall is just a branch, so all
> +	 * we have to do is fiddle the return pointer.
> +	 */
> +	regs->gr[31] -= 8; /* delayed branching */
> +
> +	/* Get assembler opcode of code in delay branch */
> +	uaddr = (unsigned int *) (regs->gr[31] + 1);
> +	get_user(opcode, uaddr);

get_user() can fail due to EFAULT. This error should be
handled here, otherwise this could lead to the following
code using an uninitialized opcode variable, which could
indirectly leak a few bits of kernel stack information
to userspace (security concern). One attack vector I have
in mind for this is ptrace(), which might be able to tweak
those register values.

> +
> +	/* Check if delay branch uses "ldi int,%r20" */
> +	if ((opcode & 0xffff0000) == 0x34140000)
> +		return;	/* everything ok, just return */
> +
> +	/* Check if delay branch uses "copy %rX,%r20" */
> +	if ((opcode & 0xff00ffff) == 0x08000254) {
> +		source_reg = (opcode >> 16) & 31;

Can you explain the reasoning behind the & 31 mask ?
I'm no parisc expert, but this seems rather odd.
Do you really mean "% 31" which translates to "& 5" ?
It would make more sense since there are 32 "gr"
registers. With & 31, a carefully crafted opcode
could overflow the gr[32] array, and cause a kernel
overflow allowing to overwrite kernel memory
(security issue).

If it's the case, then it would also be good to
check that the register selector within the opcode
is not larger than 31 (e.g. applying to fr or sr
register), rather than applying the modulo and
assuming it's a gr and corrupt userspace state.

Thanks,

Mathieu

> +		regs->gr[source_reg] = regs->gr[20];
> +		return;
> +	}
> +
> +	pr_warn("syscall restart: %s (pid %d): unexpected opcode 0x%08x\n",
> +		current->comm, task_pid_nr(current), opcode);
> +}
> +
> static inline void
> syscall_restart(struct pt_regs *regs, struct k_sigaction *ka)
> {
> @@ -457,10 +499,7 @@ syscall_restart(struct pt_regs *regs, struct k_sigaction
> *ka)
> 		}
> 		/* fallthrough */
> 	case -ERESTARTNOINTR:
> -		/* A syscall is just a branch, so all
> -		 * we have to do is fiddle the return pointer.
> -		 */
> -		regs->gr[31] -= 8; /* delayed branching */
> +		check_syscallno_in_delay_branch(regs);
> 		break;
> 	}
> }
> @@ -510,15 +549,9 @@ insert_restart_trampoline(struct pt_regs *regs)
> 	}
> 	case -ERESTARTNOHAND:
> 	case -ERESTARTSYS:
> -	case -ERESTARTNOINTR: {
> -		/* Hooray for delayed branching.  We don't
> -		 * have to restore %r20 (the system call
> -		 * number) because it gets loaded in the delay
> -		 * slot of the branch external instruction.
> -		 */
> -		regs->gr[31] -= 8;
> +	case -ERESTARTNOINTR:
> +		check_syscallno_in_delay_branch(regs);
> 		return;
> -	}
> 	default:
> 		break;
>  	}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 13:59 ` Mathieu Desnoyers
@ 2015-12-20 14:09   ` Mathieu Desnoyers
  2015-12-20 15:49     ` Helge Deller
  0 siblings, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers @ 2015-12-20 14:09 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-parisc, James Bottomley, John David Anglin

----- On Dec 20, 2015, at 8:59 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Dec 18, 2015, at 6:30 PM, Helge Deller deller@gmx.de wrote:
> 
>> On parisc syscalls which are interrupted by signals sometimes fail to restart
>> and instead return -ENOSYS which then in the worst case lead to userspace
>> crashes.
>> A similiar problem existed on MIPS and was fixed by commit e967ef02
>> ("MIPS: Fix restart of indirect syscalls").
>> 
>> On parisc the current syscall restart code assumes hat the syscall number is
>> always loaded in the delay branch of the ble instruction as defined in the
>> unistd.h header file and as such never restored %r20 before returning to
>> userspace:
>>	ble 0x100(%sr2, %r0)
>>	ldi #syscall_nr, %r20
>> 
>> This assumption is at least not true for code which uses the syscall() glibc
>> function, which instead uses this syntax:
>>	ble 0x100(%sr2, %r0)
>>	copy regX, %r20
>> where regX depend on how the compiler optimizes the code and register usage.
>> 
>> This patch fixes this problem by adding code to analyze how the syscall number
>> is loaded in the delay branch and - if needed - copy the syscall number to regX
>> prior returning to userspace for the syscall restart.
>> 
>> Signed-off-by: Helge Deller <deller@gmx.de>
>> Cc: stable@vger.kernel.org
>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> 
>> diff --git a/arch/parisc/kernel/signal.c b/arch/parisc/kernel/signal.c
>> index dc1ea79..b0414ad 100644
>> --- a/arch/parisc/kernel/signal.c
>> +++ b/arch/parisc/kernel/signal.c
>> @@ -435,6 +435,48 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs,
>> int in_syscall)
>> 		regs->gr[28]);
>> }
>> 
>> +/*
>> + * Check the delay branch in userspace how the syscall number gets loaded into
>> + * %r20 and adjust as needed.
> 
> I'm pretty sure "Check the delay branch in userspace how the syscall..."
> is not an English construct. ;-) Suggested rewording:
> 
> "Check how the syscall number gets loaded into %r20 within
> the delay branch in userspace and adjust as needed."
> 
>> + */
>> +
>> +static void check_syscallno_in_delay_branch(struct pt_regs *regs)
>> +{
>> +	unsigned int opcode, source_reg;
> 
> Why "unsigned int" above rather than u32 ? Since we're using
> opcode as target variable for a get_user, it would be clearer
> if the type of the __user * match the type of the target kernel
> variable. (understood that those happen to have the same bitness
> and type size on all Linux architectures, but it would be clearer
> nevertheless).
> 
>> +	u32 __user *uaddr;
>> +
>> +	/* Usually we don't have to restore %r20 (the system call number)
>> +	 * because it gets loaded in the delay slot of the branch external
>> +	 * instruction via the ldi instruction.
>> +	 * In some cases a register-to-register copy instruction might have
>> +	 * been used instead, in which case we need to copy the syscall
>> +	 * number into the source register before returning to userspace.
>> +	 */
>> +
>> +	/* A syscall is just a branch, so all
>> +	 * we have to do is fiddle the return pointer.
>> +	 */
>> +	regs->gr[31] -= 8; /* delayed branching */
>> +
>> +	/* Get assembler opcode of code in delay branch */
>> +	uaddr = (unsigned int *) (regs->gr[31] + 1);
>> +	get_user(opcode, uaddr);
> 
> get_user() can fail due to EFAULT. This error should be
> handled here, otherwise this could lead to the following
> code using an uninitialized opcode variable, which could
> indirectly leak a few bits of kernel stack information
> to userspace (security concern). One attack vector I have
> in mind for this is ptrace(), which might be able to tweak
> those register values.
> 
>> +
>> +	/* Check if delay branch uses "ldi int,%r20" */
>> +	if ((opcode & 0xffff0000) == 0x34140000)
>> +		return;	/* everything ok, just return */
>> +
>> +	/* Check if delay branch uses "copy %rX,%r20" */
>> +	if ((opcode & 0xff00ffff) == 0x08000254) {
>> +		source_reg = (opcode >> 16) & 31;
> 
> Can you explain the reasoning behind the & 31 mask ?
> I'm no parisc expert, but this seems rather odd.
> Do you really mean "% 31" which translates to "& 5" ?
> It would make more sense since there are 32 "gr"
> registers. With & 31, a carefully crafted opcode
> could overflow the gr[32] array, and cause a kernel
> overflow allowing to overwrite kernel memory
> (security issue).

Hrm, I got my masks temporarily mixed up (early morning
here). This is why I always use constructs such as:

#define GR_REGS_BITS  5
#define NR_GR_REGS    (1U << GR_REGS_BITS)
#define GR_REGS_MASK  (NR_GR_REGS - 1)

and then

v & GR_REGS_MASK;

Which makes everything super-obvious. The & 31 mask
seems therefore technically correct. The
paragraph below still holds though:

> 
> If it's the case, then it would also be good to
> check that the register selector within the opcode
> is not larger than 31 (e.g. applying to fr or sr
> register), rather than applying the modulo and
> assuming it's a gr and corrupt userspace state.
> 

Thanks,

Mathieu

> Thanks,
> 
> Mathieu
> 
>> +		regs->gr[source_reg] = regs->gr[20];
>> +		return;
>> +	}
>> +
>> +	pr_warn("syscall restart: %s (pid %d): unexpected opcode 0x%08x\n",
>> +		current->comm, task_pid_nr(current), opcode);
>> +}
>> +
>> static inline void
>> syscall_restart(struct pt_regs *regs, struct k_sigaction *ka)
>> {
>> @@ -457,10 +499,7 @@ syscall_restart(struct pt_regs *regs, struct k_sigaction
>> *ka)
>> 		}
>> 		/* fallthrough */
>> 	case -ERESTARTNOINTR:
>> -		/* A syscall is just a branch, so all
>> -		 * we have to do is fiddle the return pointer.
>> -		 */
>> -		regs->gr[31] -= 8; /* delayed branching */
>> +		check_syscallno_in_delay_branch(regs);
>> 		break;
>> 	}
>> }
>> @@ -510,15 +549,9 @@ insert_restart_trampoline(struct pt_regs *regs)
>> 	}
>> 	case -ERESTARTNOHAND:
>> 	case -ERESTARTSYS:
>> -	case -ERESTARTNOINTR: {
>> -		/* Hooray for delayed branching.  We don't
>> -		 * have to restore %r20 (the system call
>> -		 * number) because it gets loaded in the delay
>> -		 * slot of the branch external instruction.
>> -		 */
>> -		regs->gr[31] -= 8;
>> +	case -ERESTARTNOINTR:
>> +		check_syscallno_in_delay_branch(regs);
>> 		return;
>> -	}
>> 	default:
>> 		break;
>>  	}
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 14:09   ` Mathieu Desnoyers
@ 2015-12-20 15:49     ` Helge Deller
  2015-12-20 16:50       ` James Bottomley
  2015-12-20 18:31       ` John David Anglin
  0 siblings, 2 replies; 30+ messages in thread
From: Helge Deller @ 2015-12-20 15:49 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: linux-parisc, James Bottomley, John David Anglin

On 20.12.2015 15:09, Mathieu Desnoyers wrote:
> ----- On Dec 20, 2015, at 8:59 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
> 
>> ----- On Dec 18, 2015, at 6:30 PM, Helge Deller deller@gmx.de wrote:
>>
>>> On parisc syscalls which are interrupted by signals sometimes fail to restart
>>> and instead return -ENOSYS which then in the worst case lead to userspace
>>> crashes.
>>> A similiar problem existed on MIPS and was fixed by commit e967ef02
>>> ("MIPS: Fix restart of indirect syscalls").
>>>
>>> On parisc the current syscall restart code assumes hat the syscall number is
>>> always loaded in the delay branch of the ble instruction as defined in the
>>> unistd.h header file and as such never restored %r20 before returning to
>>> userspace:
>>> 	ble 0x100(%sr2, %r0)
>>> 	ldi #syscall_nr, %r20
>>>
>>> This assumption is at least not true for code which uses the syscall() glibc
>>> function, which instead uses this syntax:
>>> 	ble 0x100(%sr2, %r0)
>>> 	copy regX, %r20
>>> where regX depend on how the compiler optimizes the code and register usage.
>>>
>>> This patch fixes this problem by adding code to analyze how the syscall number
>>> is loaded in the delay branch and - if needed - copy the syscall number to regX
>>> prior returning to userspace for the syscall restart.
>>>
>>> Signed-off-by: Helge Deller <deller@gmx.de>
>>> Cc: stable@vger.kernel.org
>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>
>>> diff --git a/arch/parisc/kernel/signal.c b/arch/parisc/kernel/signal.c
>>> index dc1ea79..b0414ad 100644
>>> --- a/arch/parisc/kernel/signal.c
>>> +++ b/arch/parisc/kernel/signal.c
>>> @@ -435,6 +435,48 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs,
>>> int in_syscall)
>>> 		regs->gr[28]);
>>> }
>>>
>>> +/*
>>> + * Check the delay branch in userspace how the syscall number gets loaded into
>>> + * %r20 and adjust as needed.
>>
>> I'm pretty sure "Check the delay branch in userspace how the syscall..."
>> is not an English construct. ;-) Suggested rewording:
>>
>> "Check how the syscall number gets loaded into %r20 within
>> the delay branch in userspace and adjust as needed."

Thanks!
I'll change that.


>>> + */
>>> +
>>> +static void check_syscallno_in_delay_branch(struct pt_regs *regs)
>>> +{
>>> +	unsigned int opcode, source_reg;
>>
>> Why "unsigned int" above rather than u32 ? Since we're using
>> opcode as target variable for a get_user, it would be clearer
>> if the type of the __user * match the type of the target kernel
>> variable. (understood that those happen to have the same bitness
>> and type size on all Linux architectures, but it would be clearer
>> nevertheless).

Yes, seems OK.
I'll change that.

>>> +	u32 __user *uaddr;
>>> +
>>> +	/* Usually we don't have to restore %r20 (the system call number)
>>> +	 * because it gets loaded in the delay slot of the branch external
>>> +	 * instruction via the ldi instruction.
>>> +	 * In some cases a register-to-register copy instruction might have
>>> +	 * been used instead, in which case we need to copy the syscall
>>> +	 * number into the source register before returning to userspace.
>>> +	 */
>>> +
>>> +	/* A syscall is just a branch, so all
>>> +	 * we have to do is fiddle the return pointer.
>>> +	 */
>>> +	regs->gr[31] -= 8; /* delayed branching */
>>> +
>>> +	/* Get assembler opcode of code in delay branch */
>>> +	uaddr = (unsigned int *) (regs->gr[31] + 1);
>>> +	get_user(opcode, uaddr);
>>
>> get_user() can fail due to EFAULT. This error should be
>> handled here, otherwise this could lead to the following
>> code using an uninitialized opcode variable, which could
>> indirectly leak a few bits of kernel stack information
>> to userspace (security concern). One attack vector I have
>> in mind for this is ptrace(), which might be able to tweak
>> those register values.

Yes, generally get_user() can fail.
But this would be rather strange in that case, because
the syscall was started by userspace from this address.
So, without the code at that address in userspace, we would
never have reached this get_user().

And, there is no leak (of kernel stack) either. All this
function does is to move the content of userspace register X
to userspace register Y. There is no kernel info involved.    

>>> +
>>> +	/* Check if delay branch uses "ldi int,%r20" */
>>> +	if ((opcode & 0xffff0000) == 0x34140000)
>>> +		return;	/* everything ok, just return */
>>> +
>>> +	/* Check if delay branch uses "copy %rX,%r20" */
>>> +	if ((opcode & 0xff00ffff) == 0x08000254) {
>>> +		source_reg = (opcode >> 16) & 31;
>>
>> Can you explain the reasoning behind the & 31 mask ?
>> I'm no parisc expert, but this seems rather odd.
>> Do you really mean "% 31" which translates to "& 5" ?
>> It would make more sense since there are 32 "gr"
>> registers. With & 31, a carefully crafted opcode
>> could overflow the gr[32] array, and cause a kernel
>> overflow allowing to overwrite kernel memory
>> (security issue).
> 
> Hrm, I got my masks temporarily mixed up (early morning
> here). This is why I always use constructs such as:
> 
> #define GR_REGS_BITS  5
> #define NR_GR_REGS    (1U << GR_REGS_BITS)
> #define GR_REGS_MASK  (NR_GR_REGS - 1)
> 
> and then
> 
> v & GR_REGS_MASK;
> 
> Which makes everything super-obvious. The & 31 mask
> seems therefore technically correct.

Good. I was struggling with your comments as well :-)


> The paragraph below still holds though:
> 
>>
>> If it's the case, then it would also be good to
>> check that the register selector within the opcode
>> is not larger than 31 (e.g. applying to fr or sr
>> register), rather than applying the modulo and
>> assuming it's a gr and corrupt userspace state.

I'll check that.

Thanks!

Helge

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 15:49     ` Helge Deller
@ 2015-12-20 16:50       ` James Bottomley
  2015-12-20 20:35         ` Helge Deller
  2015-12-20 18:31       ` John David Anglin
  1 sibling, 1 reply; 30+ messages in thread
From: James Bottomley @ 2015-12-20 16:50 UTC (permalink / raw)
  To: Helge Deller, Mathieu Desnoyers; +Cc: linux-parisc, John David Anglin

On Sun, 2015-12-20 at 16:49 +0100, Helge Deller wrote:
> On 20.12.2015 15:09, Mathieu Desnoyers wrote:
> > ----- On Dec 20, 2015, at 8:59 AM, Mathieu Desnoyers 
> > mathieu.desnoyers@efficios.com wrote:
> > 
> > > ----- On Dec 18, 2015, at 6:30 PM, Helge Deller deller@gmx.de wro
> > > te:
> > > 
> > > > On parisc syscalls which are interrupted by signals sometimes
> > > > fail to restart
> > > > and instead return -ENOSYS which then in the worst case lead to
> > > > userspace
> > > > crashes.
> > > > A similiar problem existed on MIPS and was fixed by commit
> > > > e967ef02
> > > > ("MIPS: Fix restart of indirect syscalls").
> > > > 
> > > > On parisc the current syscall restart code assumes hat the
> > > > syscall number is
> > > > always loaded in the delay branch of the ble instruction as
> > > > defined in the
> > > > unistd.h header file and as such never restored %r20 before
> > > > returning to
> > > > userspace:
> > > > 	ble 0x100(%sr2, %r0)
> > > > 	ldi #syscall_nr, %r20
> > > > 
> > > > This assumption is at least not true for code which uses the
> > > > syscall() glibc
> > > > function, which instead uses this syntax:
> > > > 	ble 0x100(%sr2, %r0)
> > > > 	copy regX, %r20
> > > > where regX depend on how the compiler optimizes the code and
> > > > register usage.
> > > > 
> > > > This patch fixes this problem by adding code to analyze how the
> > > > syscall number
> > > > is loaded in the delay branch and - if needed - copy the
> > > > syscall number to regX
> > > > prior returning to userspace for the syscall restart.
> > > > 
> > > > Signed-off-by: Helge Deller <deller@gmx.de>
> > > > Cc: stable@vger.kernel.org
> > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > 
> > > > diff --git a/arch/parisc/kernel/signal.c
> > > > b/arch/parisc/kernel/signal.c
> > > > index dc1ea79..b0414ad 100644
> > > > --- a/arch/parisc/kernel/signal.c
> > > > +++ b/arch/parisc/kernel/signal.c
> > > > @@ -435,6 +435,48 @@ handle_signal(struct ksignal *ksig, struct
> > > > pt_regs *regs,
> > > > int in_syscall)
> > > > 		regs->gr[28]);
> > > > }
> > > > 
> > > > +/*
> > > > + * Check the delay branch in userspace how the syscall number
> > > > gets loaded into
> > > > + * %r20 and adjust as needed.
> > > 
> > > I'm pretty sure "Check the delay branch in userspace how the
> > > syscall..."
> > > is not an English construct. ;-) Suggested rewording:
> > > 
> > > "Check how the syscall number gets loaded into %r20 within
> > > the delay branch in userspace and adjust as needed."
> 
> Thanks!
> I'll change that.
> 
> 
> > > > + */
> > > > +
> > > > +static void check_syscallno_in_delay_branch(struct pt_regs
> > > > *regs)
> > > > +{
> > > > +	unsigned int opcode, source_reg;
> > > 
> > > Why "unsigned int" above rather than u32 ? Since we're using
> > > opcode as target variable for a get_user, it would be clearer
> > > if the type of the __user * match the type of the target kernel
> > > variable. (understood that those happen to have the same bitness
> > > and type size on all Linux architectures, but it would be clearer
> > > nevertheless).
> 
> Yes, seems OK.
> I'll change that.
> 
> > > > +	u32 __user *uaddr;
> > > > +
> > > > +	/* Usually we don't have to restore %r20 (the system
> > > > call number)
> > > > +	 * because it gets loaded in the delay slot of the
> > > > branch external
> > > > +	 * instruction via the ldi instruction.
> > > > +	 * In some cases a register-to-register copy
> > > > instruction might have
> > > > +	 * been used instead, in which case we need to copy
> > > > the syscall
> > > > +	 * number into the source register before returning to
> > > > userspace.
> > > > +	 */
> > > > +
> > > > +	/* A syscall is just a branch, so all
> > > > +	 * we have to do is fiddle the return pointer.
> > > > +	 */
> > > > +	regs->gr[31] -= 8; /* delayed branching */
> > > > +
> > > > +	/* Get assembler opcode of code in delay branch */
> > > > +	uaddr = (unsigned int *) (regs->gr[31] + 1);
> > > > +	get_user(opcode, uaddr);
> > > 
> > > get_user() can fail due to EFAULT. This error should be
> > > handled here, otherwise this could lead to the following
> > > code using an uninitialized opcode variable, which could
> > > indirectly leak a few bits of kernel stack information
> > > to userspace (security concern). One attack vector I have
> > > in mind for this is ptrace(), which might be able to tweak
> > > those register values.
> 
> Yes, generally get_user() can fail.
> But this would be rather strange in that case, because
> the syscall was started by userspace from this address.
> So, without the code at that address in userspace, we would
> never have reached this get_user().

Actually, that's not necessarily a safe assumption.  Any memory
allocation in a syscall path (except GFP_ATOMIC) can trigger reclaim
and since this is a signal restart path, that's entirely possible. 
 Reclaim could pull the backing page out from under the syscall, so in
a low memory situation it is possible get_user() could fail with EFAULT
unless get_user_page() has been called somewhere to pin the page.

James


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 15:49     ` Helge Deller
  2015-12-20 16:50       ` James Bottomley
@ 2015-12-20 18:31       ` John David Anglin
  2015-12-20 19:32         ` Helge Deller
  2015-12-21 14:42         ` Mathieu Desnoyers
  1 sibling, 2 replies; 30+ messages in thread
From: John David Anglin @ 2015-12-20 18:31 UTC (permalink / raw)
  To: Helge Deller; +Cc: Mathieu Desnoyers, linux-parisc, James Bottomley

On 2015-12-20, at 10:49 AM, Helge Deller wrote:

>>>> /* Check if delay branch uses "copy %rX,%r20" */
>>>> +	if ((opcode & 0xff00ffff) == 0x08000254) {
>>>> +		source_reg = (opcode >> 16) & 31;
>>> 
>>> Can you explain the reasoning behind the & 31 mask ?
>>> I'm no parisc expert, but this seems rather odd.
>>> Do you really mean "% 31" which translates to "& 5" ?
>>> It would make more sense since there are 32 "gr"
>>> registers. With & 31, a carefully crafted opcode
>>> could overflow the gr[32] array, and cause a kernel
>>> overflow allowing to overwrite kernel memory
>>> (security issue).
>> 
>> Hrm, I got my masks temporarily mixed up (early morning
>> here). This is why I always use constructs such as:
>> 
>> #define GR_REGS_BITS  5
>> #define NR_GR_REGS    (1U << GR_REGS_BITS)
>> #define GR_REGS_MASK  (NR_GR_REGS - 1)
>> 
>> and then
>> 
>> v & GR_REGS_MASK;
>> 
>> Which makes everything super-obvious. The & 31 mask
>> seems therefore technically correct.
> 
> Good. I was struggling with your comments as well :-)
> 
> 
>> The paragraph below still holds though:
>> 
>>> 
>>> If it's the case, then it would also be good to
>>> check that the register selector within the opcode
>>> is not larger than 31 (e.g. applying to fr or sr
>>> register), rather than applying the modulo and
>>> assuming it's a gr and corrupt userspace state.
> 
> I'll check that.

The register field cannot be larger than 31.  A fr or sr
regster can't be used in a LDO "copy" instruction.

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 18:31       ` John David Anglin
@ 2015-12-20 19:32         ` Helge Deller
  2015-12-20 19:46           ` John David Anglin
  2015-12-21 14:42         ` Mathieu Desnoyers
  1 sibling, 1 reply; 30+ messages in thread
From: Helge Deller @ 2015-12-20 19:32 UTC (permalink / raw)
  To: John David Anglin; +Cc: Mathieu Desnoyers, linux-parisc, James Bottomley

On 20.12.2015 19:31, John David Anglin wrote:
> On 2015-12-20, at 10:49 AM, Helge Deller wrote:
> 
>>>>> /* Check if delay branch uses "copy %rX,%r20" */
>>>>> +	if ((opcode & 0xff00ffff) == 0x08000254) {
>>>>> +		source_reg = (opcode >> 16) & 31;
>>>>
>>>> Can you explain the reasoning behind the & 31 mask ?
>>>> I'm no parisc expert, but this seems rather odd.
>>>> Do you really mean "% 31" which translates to "& 5" ?
>>>> It would make more sense since there are 32 "gr"
>>>> registers. With & 31, a carefully crafted opcode
>>>> could overflow the gr[32] array, and cause a kernel
>>>> overflow allowing to overwrite kernel memory
>>>> (security issue).
>>>
>>> Hrm, I got my masks temporarily mixed up (early morning
>>> here). This is why I always use constructs such as:
>>>
>>> #define GR_REGS_BITS  5
>>> #define NR_GR_REGS    (1U << GR_REGS_BITS)
>>> #define GR_REGS_MASK  (NR_GR_REGS - 1)
>>>
>>> and then
>>>
>>> v & GR_REGS_MASK;
>>>
>>> Which makes everything super-obvious. The & 31 mask
>>> seems therefore technically correct.
>>
>> Good. I was struggling with your comments as well :-)
>>
>>
>>> The paragraph below still holds though:
>>>
>>>>
>>>> If it's the case, then it would also be good to
>>>> check that the register selector within the opcode
>>>> is not larger than 31 (e.g. applying to fr or sr
>>>> register), rather than applying the modulo and
>>>> assuming it's a gr and corrupt userspace state.
>>
>> I'll check that.
> 
> The register field cannot be larger than 31.  A fr or sr
> regster can't be used 

Both correct.

> in a LDO "copy" instruction.

Actually it's the "OR,cond r1,r2,t" instruction.
https://parisc.wiki.kernel.org/images-parisc/6/68/Pa11_acd.pdf
page 5-105

if ((opcode & 0xff00ffff) == 0x08000254)

The mask should be 0xffe0ffff, so that some bits of r2 (which needs to be 0) are not missed.

Helge 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-18 23:30 [PATCH] parisc: Fix syscall restarts Helge Deller
  2015-12-20 13:59 ` Mathieu Desnoyers
@ 2015-12-20 19:39 ` John David Anglin
  2015-12-20 19:48   ` Helge Deller
  2015-12-20 20:14 ` John David Anglin
  2015-12-21  9:19 ` [PATCH] parisc: Fix syscall restarts (v2) Helge Deller
  3 siblings, 1 reply; 30+ messages in thread
From: John David Anglin @ 2015-12-20 19:39 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-parisc, James Bottomley, Mathieu Desnoyers

On 2015-12-18, at 6:30 PM, Helge Deller wrote:

> +	/* Usually we don't have to restore %r20 (the system call number)
> +	 * because it gets loaded in the delay slot of the branch external
> +	 * instruction via the ldi instruction.
> +	 * In some cases a register-to-register copy instruction might have
> +	 * been used instead, in which case we need to copy the syscall
> +	 * number into the source register before returning to userspace.
> +	 */

I'm thinking it might be better to fix syscall() in glibc.  The copy could be
moved before ble and a nop placed delay slot.

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 19:32         ` Helge Deller
@ 2015-12-20 19:46           ` John David Anglin
  2015-12-20 20:06             ` Helge Deller
  2015-12-20 23:57             ` John David Anglin
  0 siblings, 2 replies; 30+ messages in thread
From: John David Anglin @ 2015-12-20 19:46 UTC (permalink / raw)
  To: Helge Deller; +Cc: Mathieu Desnoyers, linux-parisc, James Bottomley

On 2015-12-20, at 2:32 PM, Helge Deller wrote:

>> in a LDO "copy" instruction.
> 
> Actually it's the "OR,cond r1,r2,t" instruction.
> https://parisc.wiki.kernel.org/images-parisc/6/68/Pa11_acd.pdf
> page 5-105
> 
> if ((opcode & 0xff00ffff) == 0x08000254)
> 
> The mask should be 0xffe0ffff, so that some bits of r2 (which needs to be 0) are not missed.

There are multiple instructions that could be used.  The PA 2.0 arch says LDO  on page 7-83.

I think we should fix syscall().  It's not used that much.

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 19:39 ` John David Anglin
@ 2015-12-20 19:48   ` Helge Deller
  2015-12-20 20:01     ` John David Anglin
  0 siblings, 1 reply; 30+ messages in thread
From: Helge Deller @ 2015-12-20 19:48 UTC (permalink / raw)
  To: John David Anglin; +Cc: linux-parisc, James Bottomley, Mathieu Desnoyers

On 20.12.2015 20:39, John David Anglin wrote:
> On 2015-12-18, at 6:30 PM, Helge Deller wrote:
> 
>> +	/* Usually we don't have to restore %r20 (the system call number)
>> +	 * because it gets loaded in the delay slot of the branch external
>> +	 * instruction via the ldi instruction.
>> +	 * In some cases a register-to-register copy instruction might have
>> +	 * been used instead, in which case we need to copy the syscall
>> +	 * number into the source register before returning to userspace.
>> +	 */
> 
> I'm thinking it might be better to fix syscall() in glibc.  The copy could be
> moved before ble and a nop placed delay slot.

Yes, I think it should be fixed in glibc which makes it cleaner.
I looked at dietlibc. There a "nop" is being used.

Nevertheless, it may happen anytime if people forget, that we will see a
"copy" there again, so IMHO it's probably safer to include the workaround in 
kernel too.

Helge

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 19:48   ` Helge Deller
@ 2015-12-20 20:01     ` John David Anglin
  2015-12-20 20:18       ` Helge Deller
  0 siblings, 1 reply; 30+ messages in thread
From: John David Anglin @ 2015-12-20 20:01 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-parisc, James Bottomley, Mathieu Desnoyers

On 2015-12-20, at 2:48 PM, Helge Deller wrote:

> On 20.12.2015 20:39, John David Anglin wrote:
>> On 2015-12-18, at 6:30 PM, Helge Deller wrote:
>> 
>>> +	/* Usually we don't have to restore %r20 (the system call number)
>>> +	 * because it gets loaded in the delay slot of the branch external
>>> +	 * instruction via the ldi instruction.
>>> +	 * In some cases a register-to-register copy instruction might have
>>> +	 * been used instead, in which case we need to copy the syscall
>>> +	 * number into the source register before returning to userspace.
>>> +	 */
>> 
>> I'm thinking it might be better to fix syscall() in glibc.  The copy could be
>> moved before ble and a nop placed delay slot.
> 
> Yes, I think it should be fixed in glibc which makes it cleaner.
> I looked at dietlibc. There a "nop" is being used.

A "nop" implies %r20 needs to be restored.

> 
> Nevertheless, it may happen anytime if people forget, that we will see a
> "copy" there again, so IMHO it's probably safer to include the workaround in 
> kernel too.


The current patch assumes regs->gr[source_reg] is restored.  That needs to be
checked given previous comment about %r20.

Essentially, all syscall clobbered registers need to be restored.

Can the user space stuff be avoided by jumping to the gateway entry point?

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 19:46           ` John David Anglin
@ 2015-12-20 20:06             ` Helge Deller
  2015-12-20 23:57             ` John David Anglin
  1 sibling, 0 replies; 30+ messages in thread
From: Helge Deller @ 2015-12-20 20:06 UTC (permalink / raw)
  To: John David Anglin; +Cc: Mathieu Desnoyers, linux-parisc, James Bottomley

On 20.12.2015 20:46, John David Anglin wrote:
> On 2015-12-20, at 2:32 PM, Helge Deller wrote:
> 
>>> in a LDO "copy" instruction.
>>
>> Actually it's the "OR,cond r1,r2,t" instruction.
>> https://parisc.wiki.kernel.org/images-parisc/6/68/Pa11_acd.pdf
>> page 5-105
>>
>> if ((opcode & 0xff00ffff) == 0x08000254)
>>
>> The mask should be 0xffe0ffff, so that some bits of r2 (which needs to be 0) are not missed.

This is actually not the full truth.
I missed the point, that "r2" is a register and not an immediate value.
My code just checks that "r2" means %r0, while it could be any other register as well,
which either has a null-value, or even worse, if someone decides to "calculate" the syscall number.

> There are multiple instructions that could be used.  The PA 2.0 arch says LDO  on page 7-83.

True.
But luckily it seems gas converts "copy" to the "or" syntax mentioned above.
In any case, keeping my kernel patch will report any cases (which might by mistake get added)
in the future so that we can fix userspace then (or enhance the kernel workaround).

Helge

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-18 23:30 [PATCH] parisc: Fix syscall restarts Helge Deller
  2015-12-20 13:59 ` Mathieu Desnoyers
  2015-12-20 19:39 ` John David Anglin
@ 2015-12-20 20:14 ` John David Anglin
  2015-12-20 20:19   ` Helge Deller
  2015-12-21  9:19 ` [PATCH] parisc: Fix syscall restarts (v2) Helge Deller
  3 siblings, 1 reply; 30+ messages in thread
From: John David Anglin @ 2015-12-20 20:14 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-parisc, James Bottomley, Mathieu Desnoyers

On 2015-12-18, at 6:30 PM, Helge Deller wrote:

> +	/* Get assembler opcode of code in delay branch */
> +	uaddr = (unsigned int *) (regs->gr[31] + 1);

Shouldn't increment be 4?

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 20:01     ` John David Anglin
@ 2015-12-20 20:18       ` Helge Deller
  2015-12-20 20:45         ` John David Anglin
  0 siblings, 1 reply; 30+ messages in thread
From: Helge Deller @ 2015-12-20 20:18 UTC (permalink / raw)
  To: John David Anglin; +Cc: linux-parisc, James Bottomley, Mathieu Desnoyers

On 20.12.2015 21:01, John David Anglin wrote:
> On 2015-12-20, at 2:48 PM, Helge Deller wrote:
> 
>> On 20.12.2015 20:39, John David Anglin wrote:
>>> On 2015-12-18, at 6:30 PM, Helge Deller wrote:
>>>
>>>> +	/* Usually we don't have to restore %r20 (the system call number)
>>>> +	 * because it gets loaded in the delay slot of the branch external
>>>> +	 * instruction via the ldi instruction.
>>>> +	 * In some cases a register-to-register copy instruction might have
>>>> +	 * been used instead, in which case we need to copy the syscall
>>>> +	 * number into the source register before returning to userspace.
>>>> +	 */
>>>
>>> I'm thinking it might be better to fix syscall() in glibc.  The copy could be
>>> moved before ble and a nop placed delay slot.
>>
>> Yes, I think it should be fixed in glibc which makes it cleaner.
>> I looked at dietlibc. There a "nop" is being used.
> 
> A "nop" implies %r20 needs to be restored.

Yes, I tested that. It is being restored correctly, although the comments
imply different behaviour.

>> Nevertheless, it may happen anytime if people forget, that we will see a
>> "copy" there again, so IMHO it's probably safer to include the workaround in 
>> kernel too.
> 
> 
> The current patch assumes regs->gr[source_reg] is restored. 

Yes and no.

The real problem we actually faced is, that the glibc() syscall function
uses %r28 (aka the return value of the syscall) as "source_reg".
That's the reason why we failed with "ENOSYS" in the end, because when we returned
(before my patch) from the first syscall we returned -ERESTARTSYS
in %r28 (which is basically correct), but then the "copy %r28,%r20" in
userspace moved "-ERESTARTSYS" back into %r20, jumped into the kernel, and the
kernel (correctly) reported back "ENOSYS" since there is no such syscall
number with the value of "-ERESTARTSYS".
The problem: %r28 is of course not one of the registers which is being restored.
Quite complicated... I hope I could explain it somewhat...?

> That needs to be
> checked given previous comment about %r20.
> 
> Essentially, all syscall clobbered registers need to be restored.

Yes, they are (with the exception of the return value %r28).

> Can the user space stuff be avoided by jumping to the gateway entry point?

Probably.
I didn't looked into that.

Helge

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 20:14 ` John David Anglin
@ 2015-12-20 20:19   ` Helge Deller
  2015-12-20 20:21     ` Helge Deller
  0 siblings, 1 reply; 30+ messages in thread
From: Helge Deller @ 2015-12-20 20:19 UTC (permalink / raw)
  To: John David Anglin; +Cc: linux-parisc, James Bottomley, Mathieu Desnoyers

On 20.12.2015 21:14, John David Anglin wrote:
> On 2015-12-18, at 6:30 PM, Helge Deller wrote:
> 
>> +	/* Get assembler opcode of code in delay branch */
>> +	uaddr = (unsigned int *) (regs->gr[31] + 1);
> 
> Shouldn't increment be 4?

No.
The first address which is being executed in userspace is regs->gr[31]-3.

Helge



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 20:19   ` Helge Deller
@ 2015-12-20 20:21     ` Helge Deller
  2015-12-20 20:53       ` John David Anglin
  0 siblings, 1 reply; 30+ messages in thread
From: Helge Deller @ 2015-12-20 20:21 UTC (permalink / raw)
  To: John David Anglin; +Cc: linux-parisc, James Bottomley, Mathieu Desnoyers

On 20.12.2015 21:19, Helge Deller wrote:
> On 20.12.2015 21:14, John David Anglin wrote:
>> On 2015-12-18, at 6:30 PM, Helge Deller wrote:
>>
>>> +	/* Get assembler opcode of code in delay branch */
>>> +	uaddr = (unsigned int *) (regs->gr[31] + 1);
>>
>> Shouldn't increment be 4?
> 
> No.
> The first address which is being executed in userspace is regs->gr[31]-3.

I could (should?) have used
	uaddr = (unsigned int *) ((regs->gr[31] & ~3) + 4);

Helge

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 16:50       ` James Bottomley
@ 2015-12-20 20:35         ` Helge Deller
  2015-12-21  8:03           ` James Bottomley
  0 siblings, 1 reply; 30+ messages in thread
From: Helge Deller @ 2015-12-20 20:35 UTC (permalink / raw)
  To: James Bottomley, Mathieu Desnoyers; +Cc: linux-parisc, John David Anglin

On 20.12.2015 17:50, James Bottomley wrote:
> On Sun, 2015-12-20 at 16:49 +0100, Helge Deller wrote:
>> On 20.12.2015 15:09, Mathieu Desnoyers wrote:
>>> ----- On Dec 20, 2015, at 8:59 AM, Mathieu Desnoyers 
>>> mathieu.desnoyers@efficios.com wrote:
>>>
>>>> ----- On Dec 18, 2015, at 6:30 PM, Helge Deller deller@gmx.de wro
>>>> te:
>>>>
>>>>> On parisc syscalls which are interrupted by signals sometimes
>>>>> fail to restart
>>>>> and instead return -ENOSYS which then in the worst case lead to
>>>>> userspace
>>>>> crashes.
>>>>> A similiar problem existed on MIPS and was fixed by commit
>>>>> e967ef02
>>>>> ("MIPS: Fix restart of indirect syscalls").
>>>>>
>>>>> On parisc the current syscall restart code assumes hat the
>>>>> syscall number is
>>>>> always loaded in the delay branch of the ble instruction as
>>>>> defined in the
>>>>> unistd.h header file and as such never restored %r20 before
>>>>> returning to
>>>>> userspace:
>>>>> 	ble 0x100(%sr2, %r0)
>>>>> 	ldi #syscall_nr, %r20
>>>>>
>>>>> This assumption is at least not true for code which uses the
>>>>> syscall() glibc
>>>>> function, which instead uses this syntax:
>>>>> 	ble 0x100(%sr2, %r0)
>>>>> 	copy regX, %r20
>>>>> where regX depend on how the compiler optimizes the code and
>>>>> register usage.
>>>>>
>>>>> This patch fixes this problem by adding code to analyze how the
>>>>> syscall number
>>>>> is loaded in the delay branch and - if needed - copy the
>>>>> syscall number to regX
>>>>> prior returning to userspace for the syscall restart.
>>>>>
>>>>> Signed-off-by: Helge Deller <deller@gmx.de>
>>>>> Cc: stable@vger.kernel.org
>>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>>>
>>>>> diff --git a/arch/parisc/kernel/signal.c
>>>>> b/arch/parisc/kernel/signal.c
>>>>> index dc1ea79..b0414ad 100644
>>>>> --- a/arch/parisc/kernel/signal.c
>>>>> +++ b/arch/parisc/kernel/signal.c
>>>>> @@ -435,6 +435,48 @@ handle_signal(struct ksignal *ksig, struct
>>>>> pt_regs *regs,
>>>>> int in_syscall)
>>>>> 		regs->gr[28]);
>>>>> }
>>>>>
>>>>> +/*
>>>>> + * Check the delay branch in userspace how the syscall number
>>>>> gets loaded into
>>>>> + * %r20 and adjust as needed.
>>>>
>>>> I'm pretty sure "Check the delay branch in userspace how the
>>>> syscall..."
>>>> is not an English construct. ;-) Suggested rewording:
>>>>
>>>> "Check how the syscall number gets loaded into %r20 within
>>>> the delay branch in userspace and adjust as needed."
>>
>> Thanks!
>> I'll change that.
>>
>>
>>>>> + */
>>>>> +
>>>>> +static void check_syscallno_in_delay_branch(struct pt_regs
>>>>> *regs)
>>>>> +{
>>>>> +	unsigned int opcode, source_reg;
>>>>
>>>> Why "unsigned int" above rather than u32 ? Since we're using
>>>> opcode as target variable for a get_user, it would be clearer
>>>> if the type of the __user * match the type of the target kernel
>>>> variable. (understood that those happen to have the same bitness
>>>> and type size on all Linux architectures, but it would be clearer
>>>> nevertheless).
>>
>> Yes, seems OK.
>> I'll change that.
>>
>>>>> +	u32 __user *uaddr;
>>>>> +
>>>>> +	/* Usually we don't have to restore %r20 (the system
>>>>> call number)
>>>>> +	 * because it gets loaded in the delay slot of the
>>>>> branch external
>>>>> +	 * instruction via the ldi instruction.
>>>>> +	 * In some cases a register-to-register copy
>>>>> instruction might have
>>>>> +	 * been used instead, in which case we need to copy
>>>>> the syscall
>>>>> +	 * number into the source register before returning to
>>>>> userspace.
>>>>> +	 */
>>>>> +
>>>>> +	/* A syscall is just a branch, so all
>>>>> +	 * we have to do is fiddle the return pointer.
>>>>> +	 */
>>>>> +	regs->gr[31] -= 8; /* delayed branching */
>>>>> +
>>>>> +	/* Get assembler opcode of code in delay branch */
>>>>> +	uaddr = (unsigned int *) (regs->gr[31] + 1);
>>>>> +	get_user(opcode, uaddr);
>>>>
>>>> get_user() can fail due to EFAULT. This error should be
>>>> handled here, otherwise this could lead to the following
>>>> code using an uninitialized opcode variable, which could
>>>> indirectly leak a few bits of kernel stack information
>>>> to userspace (security concern). One attack vector I have
>>>> in mind for this is ptrace(), which might be able to tweak
>>>> those register values.
>>
>> Yes, generally get_user() can fail.
>> But this would be rather strange in that case, because
>> the syscall was started by userspace from this address.
>> So, without the code at that address in userspace, we would
>> never have reached this get_user().
> 
> Actually, that's not necessarily a safe assumption.  Any memory
> allocation in a syscall path (except GFP_ATOMIC) can trigger reclaim
> and since this is a signal restart path, that's entirely possible. 
>  Reclaim could pull the backing page out from under the syscall, so in
> a low memory situation it is possible get_user() could fail with EFAULT

Really?
Maybe I misunderstood...?
So, let's say we have low memory and the kernel "swapped" out the userspace.
I assume that when there is no memory pressure get_user() would
pull the page in again, and if it's memory pressure, then with the assumption 
there is no memory left it probably return EFAULT (is that true?).
But what happens then when we return to userspace? 
I expect userspace then to segfault.

I will add the check for EFAULT, but just trying to understand...

> unless get_user_page() has been called somewhere to pin the page.

Helge

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 20:18       ` Helge Deller
@ 2015-12-20 20:45         ` John David Anglin
  0 siblings, 0 replies; 30+ messages in thread
From: John David Anglin @ 2015-12-20 20:45 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-parisc, James Bottomley, Mathieu Desnoyers

On 2015-12-20, at 3:18 PM, Helge Deller wrote:

>>> Nevertheless, it may happen anytime if people forget, that we will see a
>>> "copy" there again, so IMHO it's probably safer to include the workaround in 
>>> kernel too.
>> 
>> 
>> The current patch assumes regs->gr[source_reg] is restored. 
> 
> Yes and no.
> 
> The real problem we actually faced is, that the glibc() syscall function
> uses %r28 (aka the return value of the syscall) as "source_reg".
> That's the reason why we failed with "ENOSYS" in the end, because when we returned
> (before my patch) from the first syscall we returned -ERESTARTSYS
> in %r28 (which is basically correct), but then the "copy %r28,%r20" in
> userspace moved "-ERESTARTSYS" back into %r20, jumped into the kernel, and the
> kernel (correctly) reported back "ENOSYS" since there is no such syscall
> number with the value of "-ERESTARTSYS".
> The problem: %r28 is of course not one of the registers which is being restored.
> Quite complicated... I hope I could explain it somewhat...?

So, the patch doesn't work and we need to fix glibc.  The only other solution is to load
%r20 into %r28 and restore %r28 on a syscall restart.

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 20:21     ` Helge Deller
@ 2015-12-20 20:53       ` John David Anglin
  0 siblings, 0 replies; 30+ messages in thread
From: John David Anglin @ 2015-12-20 20:53 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-parisc, James Bottomley, Mathieu Desnoyers

On 2015-12-20, at 3:21 PM, Helge Deller wrote:

> On 20.12.2015 21:19, Helge Deller wrote:
>> On 20.12.2015 21:14, John David Anglin wrote:
>>> On 2015-12-18, at 6:30 PM, Helge Deller wrote:
>>> 
>>>> +	/* Get assembler opcode of code in delay branch */
>>>> +	uaddr = (unsigned int *) (regs->gr[31] + 1);
>>> 
>>> Shouldn't increment be 4?
>> 
>> No.
>> The first address which is being executed in userspace is regs->gr[31]-3.
> 
> I could (should?) have used
> 	uaddr = (unsigned int *) ((regs->gr[31] & ~3) + 4);


Possibly that is better.  The former assumes user space is at level 3.

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 19:46           ` John David Anglin
  2015-12-20 20:06             ` Helge Deller
@ 2015-12-20 23:57             ` John David Anglin
  1 sibling, 0 replies; 30+ messages in thread
From: John David Anglin @ 2015-12-20 23:57 UTC (permalink / raw)
  To: John David Anglin
  Cc: Helge Deller, Mathieu Desnoyers, linux-parisc, James Bottomley

[-- Attachment #1: Type: text/plain, Size: 198 bytes --]

On 2015-12-20, at 2:46 PM, John David Anglin wrote:

> I think we should fix syscall().  It's not used that much.

Attached is glibc fix.  Testing.

Dave
--
John David Anglin	dave.anglin@bell.net



[-- Attachment #2: syscall-restart.d.txt --]
[-- Type: text/plain, Size: 1128 bytes --]

diff --git a/sysdeps/unix/sysv/linux/hppa/syscall.c b/sysdeps/unix/sysv/linux/hppa/syscall.c
index 958fa47..9c7ee4f 100644
--- a/sysdeps/unix/sysv/linux/hppa/syscall.c
+++ b/sysdeps/unix/sysv/linux/hppa/syscall.c
@@ -48,8 +48,9 @@ syscall (long int __sysno, ...)
     PIC_REG_DEF
     LOAD_REGS_6
     asm volatile (SAVE_ASM_PIC
-		  "	ble  0x100(%%sr2, %%r0)	\n"
 		  "	copy %1, %%r20		\n"
+		  "	ble  0x100(%%sr2, %%r0)	\n"
+		  "	nop			\n"
 		  LOAD_ASM_PIC
 		  : "=r" (__res)
 		  : "r" (__sysno) PIC_REG_USE ASM_ARGS_6
diff --git a/sysdeps/unix/sysv/linux/hppa/sysdep.h b/sysdeps/unix/sysv/linux/hppa/sysdep.h
index 2cae70f..cc16c1a 100644
--- a/sysdeps/unix/sysv/linux/hppa/sysdep.h
+++ b/sysdeps/unix/sysv/linux/hppa/sysdep.h
@@ -434,8 +434,9 @@ L(pre_end):					ASM_LINE_SEP	\
 		/* FIXME: HACK save/load r19 around syscall */		\
 		asm volatile(						\
 			SAVE_ASM_PIC					\
-			"	ble  0x100(%%sr2, %%r0)\n"		\
 			"	copy %1, %%r20\n"			\
+			"	ble  0x100(%%sr2, %%r0)\n"		\
+			"	nop\n"					\
 			LOAD_ASM_PIC					\
 			: "=r" (__res)					\
 			: "r" (name) PIC_REG_USE ASM_ARGS_##nr		\

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 20:35         ` Helge Deller
@ 2015-12-21  8:03           ` James Bottomley
  2015-12-21 14:39             ` Mathieu Desnoyers
  0 siblings, 1 reply; 30+ messages in thread
From: James Bottomley @ 2015-12-21  8:03 UTC (permalink / raw)
  To: Helge Deller, Mathieu Desnoyers; +Cc: linux-parisc, John David Anglin

On Sun, 2015-12-20 at 21:35 +0100, Helge Deller wrote:
> On 20.12.2015 17:50, James Bottomley wrote:
> > On Sun, 2015-12-20 at 16:49 +0100, Helge Deller wrote:
> > > On 20.12.2015 15:09, Mathieu Desnoyers wrote:
> > > > ----- On Dec 20, 2015, at 8:59 AM, Mathieu Desnoyers 
> > > > mathieu.desnoyers@efficios.com wrote:
> > > > 
> > > > > ----- On Dec 18, 2015, at 6:30 PM, Helge Deller deller@gmx.de
> > > > >  wro
> > > > > te:
> > > > > 
> > > > > > On parisc syscalls which are interrupted by signals
> > > > > > sometimes
> > > > > > fail to restart
> > > > > > and instead return -ENOSYS which then in the worst case
> > > > > > lead to
> > > > > > userspace
> > > > > > crashes.
> > > > > > A similiar problem existed on MIPS and was fixed by commit
> > > > > > e967ef02
> > > > > > ("MIPS: Fix restart of indirect syscalls").
> > > > > > 
> > > > > > On parisc the current syscall restart code assumes hat the
> > > > > > syscall number is
> > > > > > always loaded in the delay branch of the ble instruction as
> > > > > > defined in the
> > > > > > unistd.h header file and as such never restored %r20 before
> > > > > > returning to
> > > > > > userspace:
> > > > > > 	ble 0x100(%sr2, %r0)
> > > > > > 	ldi #syscall_nr, %r20
> > > > > > 
> > > > > > This assumption is at least not true for code which uses
> > > > > > the
> > > > > > syscall() glibc
> > > > > > function, which instead uses this syntax:
> > > > > > 	ble 0x100(%sr2, %r0)
> > > > > > 	copy regX, %r20
> > > > > > where regX depend on how the compiler optimizes the code
> > > > > > and
> > > > > > register usage.
> > > > > > 
> > > > > > This patch fixes this problem by adding code to analyze how
> > > > > > the
> > > > > > syscall number
> > > > > > is loaded in the delay branch and - if needed - copy the
> > > > > > syscall number to regX
> > > > > > prior returning to userspace for the syscall restart.
> > > > > > 
> > > > > > Signed-off-by: Helge Deller <deller@gmx.de>
> > > > > > Cc: stable@vger.kernel.org
> > > > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > > > 
> > > > > > diff --git a/arch/parisc/kernel/signal.c
> > > > > > b/arch/parisc/kernel/signal.c
> > > > > > index dc1ea79..b0414ad 100644
> > > > > > --- a/arch/parisc/kernel/signal.c
> > > > > > +++ b/arch/parisc/kernel/signal.c
> > > > > > @@ -435,6 +435,48 @@ handle_signal(struct ksignal *ksig,
> > > > > > struct
> > > > > > pt_regs *regs,
> > > > > > int in_syscall)
> > > > > > 		regs->gr[28]);
> > > > > > }
> > > > > > 
> > > > > > +/*
> > > > > > + * Check the delay branch in userspace how the syscall
> > > > > > number
> > > > > > gets loaded into
> > > > > > + * %r20 and adjust as needed.
> > > > > 
> > > > > I'm pretty sure "Check the delay branch in userspace how the
> > > > > syscall..."
> > > > > is not an English construct. ;-) Suggested rewording:
> > > > > 
> > > > > "Check how the syscall number gets loaded into %r20 within
> > > > > the delay branch in userspace and adjust as needed."
> > > 
> > > Thanks!
> > > I'll change that.
> > > 
> > > 
> > > > > > + */
> > > > > > +
> > > > > > +static void check_syscallno_in_delay_branch(struct pt_regs
> > > > > > *regs)
> > > > > > +{
> > > > > > +	unsigned int opcode, source_reg;
> > > > > 
> > > > > Why "unsigned int" above rather than u32 ? Since we're using
> > > > > opcode as target variable for a get_user, it would be clearer
> > > > > if the type of the __user * match the type of the target
> > > > > kernel
> > > > > variable. (understood that those happen to have the same
> > > > > bitness
> > > > > and type size on all Linux architectures, but it would be
> > > > > clearer
> > > > > nevertheless).
> > > 
> > > Yes, seems OK.
> > > I'll change that.
> > > 
> > > > > > +	u32 __user *uaddr;
> > > > > > +
> > > > > > +	/* Usually we don't have to restore %r20 (the
> > > > > > system
> > > > > > call number)
> > > > > > +	 * because it gets loaded in the delay slot of the
> > > > > > branch external
> > > > > > +	 * instruction via the ldi instruction.
> > > > > > +	 * In some cases a register-to-register copy
> > > > > > instruction might have
> > > > > > +	 * been used instead, in which case we need to
> > > > > > copy
> > > > > > the syscall
> > > > > > +	 * number into the source register before
> > > > > > returning to
> > > > > > userspace.
> > > > > > +	 */
> > > > > > +
> > > > > > +	/* A syscall is just a branch, so all
> > > > > > +	 * we have to do is fiddle the return pointer.
> > > > > > +	 */
> > > > > > +	regs->gr[31] -= 8; /* delayed branching */
> > > > > > +
> > > > > > +	/* Get assembler opcode of code in delay branch */
> > > > > > +	uaddr = (unsigned int *) (regs->gr[31] + 1);
> > > > > > +	get_user(opcode, uaddr);
> > > > > 
> > > > > get_user() can fail due to EFAULT. This error should be
> > > > > handled here, otherwise this could lead to the following
> > > > > code using an uninitialized opcode variable, which could
> > > > > indirectly leak a few bits of kernel stack information
> > > > > to userspace (security concern). One attack vector I have
> > > > > in mind for this is ptrace(), which might be able to tweak
> > > > > those register values.
> > > 
> > > Yes, generally get_user() can fail.
> > > But this would be rather strange in that case, because
> > > the syscall was started by userspace from this address.
> > > So, without the code at that address in userspace, we would
> > > never have reached this get_user().
> > 
> > Actually, that's not necessarily a safe assumption.  Any memory
> > allocation in a syscall path (except GFP_ATOMIC) can trigger
> > reclaim
> > and since this is a signal restart path, that's entirely possible. 
> >  Reclaim could pull the backing page out from under the syscall, so
> > in
> > a low memory situation it is possible get_user() could fail with
> > EFAULT
> 
> Really?
> Maybe I misunderstood...?
> So, let's say we have low memory and the kernel "swapped" out the
> userspace.
> I assume that when there is no memory pressure get_user() would
> pull the page in again, and if it's memory pressure, then with the
> assumption 
> there is no memory left it probably return EFAULT (is that true?).
> But what happens then when we return to userspace? 
> I expect userspace then to segfault.
> 
> I will add the check for EFAULT, but just trying to understand...

Actually, you're right, I was misremembering why these functions return
EFAULT.  It's if the access is invalid.  They will actually try to page
back in via the fault handler if the backing has gone.

James

> > unless get_user_page() has been called somewhere to pin the page.
> 
> Helge
> --
> To unsubscribe from this list: send the line "unsubscribe linux
> -parisc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts (v2)
  2015-12-18 23:30 [PATCH] parisc: Fix syscall restarts Helge Deller
                   ` (2 preceding siblings ...)
  2015-12-20 20:14 ` John David Anglin
@ 2015-12-21  9:19 ` Helge Deller
  2015-12-21 13:11   ` John David Anglin
  2015-12-21 20:27   ` Mathieu Desnoyers
  3 siblings, 2 replies; 30+ messages in thread
From: Helge Deller @ 2015-12-21  9:19 UTC (permalink / raw)
  To: linux-parisc, James Bottomley, John David Anglin; +Cc: Mathieu Desnoyers

This is version 2 of the patch:

On parisc syscalls which are interrupted by signals sometimes failed to
restart and instead returned -ENOSYS which in the worst case lead to
userspace crashes.
A similiar problem existed on MIPS and was fixed by commit e967ef02
("MIPS: Fix restart of indirect syscalls").

On parisc the current syscall restart code assumes that all syscall
callers load the syscall number in the delay slot of the ble
instruction. That's how it is e.g. done in the unistd.h header file:
	ble 0x100(%sr2, %r0)
	ldi #syscall_nr, %r20
Because of that assumption the current code never restored %r20 before
returning to userspace.

This assumption is at least not true for code which uses the glibc
syscall() function, which instead uses this syntax:
	ble 0x100(%sr2, %r0)
	copy regX, %r20
where regX depend on how the compiler optimizes the code and register
usage.

This patch fixes this problem by adding code to analyze how the syscall
number is loaded in the delay branch and - if needed - copy the syscall
number to regX prior returning to userspace for the syscall restart.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

diff --git a/arch/parisc/kernel/signal.c b/arch/parisc/kernel/signal.c
index dc1ea79..2264f68 100644
--- a/arch/parisc/kernel/signal.c
+++ b/arch/parisc/kernel/signal.c
@@ -435,6 +435,55 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs, int in_syscall)
 		regs->gr[28]);
 }
 
+/*
+ * Check how the syscall number gets loaded into %r20 within
+ * the delay branch in userspace and adjust as needed.
+ */
+
+static void check_syscallno_in_delay_branch(struct pt_regs *regs)
+{
+	u32 opcode, source_reg;
+	u32 __user *uaddr;
+	int err;
+
+	/* Usually we don't have to restore %r20 (the system call number)
+	 * because it gets loaded in the delay slot of the branch external
+	 * instruction via the ldi instruction.
+	 * In some cases a register-to-register copy instruction might have
+	 * been used instead, in which case we need to copy the syscall
+	 * number into the source register before returning to userspace.
+	 */
+
+	/* A syscall is just a branch, so all we have to do is fiddle the
+	 * return pointer so that the ble instruction gets executed again.
+	 */
+	regs->gr[31] -= 8; /* delayed branching */
+
+	/* Get assembler opcode of code in delay branch */
+	uaddr = (unsigned int *) ((regs->gr[31] & ~3) + 4);
+	err = get_user(opcode, uaddr);
+	if (err)
+		return;
+
+	/* Check if delay branch uses "ldi int,%r20" */
+	if ((opcode & 0xffff0000) == 0x34140000)
+		return;	/* everything ok, just return */
+
+	/* Check if delay branch uses "nop" */
+	if (opcode == INSN_NOP)
+		return;
+
+	/* Check if delay branch uses "copy %rX,%r20" */
+	if ((opcode & 0xffe0ffff) == 0x08000254) {
+		source_reg = (opcode >> 16) & 31;
+		regs->gr[source_reg] = regs->gr[20];
+		return;
+	}
+
+	pr_warn("syscall restart: %s (pid %d): unexpected opcode 0x%08x\n",
+		current->comm, task_pid_nr(current), opcode);
+}
+
 static inline void
 syscall_restart(struct pt_regs *regs, struct k_sigaction *ka)
 {
@@ -457,10 +506,7 @@ syscall_restart(struct pt_regs *regs, struct k_sigaction *ka)
 		}
 		/* fallthrough */
 	case -ERESTARTNOINTR:
-		/* A syscall is just a branch, so all
-		 * we have to do is fiddle the return pointer.
-		 */
-		regs->gr[31] -= 8; /* delayed branching */
+		check_syscallno_in_delay_branch(regs);
 		break;
 	}
 }
@@ -510,15 +556,9 @@ insert_restart_trampoline(struct pt_regs *regs)
 	}
 	case -ERESTARTNOHAND:
 	case -ERESTARTSYS:
-	case -ERESTARTNOINTR: {
-		/* Hooray for delayed branching.  We don't
-		 * have to restore %r20 (the system call
-		 * number) because it gets loaded in the delay
-		 * slot of the branch external instruction.
-		 */
-		regs->gr[31] -= 8;
+	case -ERESTARTNOINTR:
+		check_syscallno_in_delay_branch(regs);
 		return;
-	}
 	default:
 		break;
 	}

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts (v2)
  2015-12-21  9:19 ` [PATCH] parisc: Fix syscall restarts (v2) Helge Deller
@ 2015-12-21 13:11   ` John David Anglin
  2015-12-21 20:27   ` Mathieu Desnoyers
  1 sibling, 0 replies; 30+ messages in thread
From: John David Anglin @ 2015-12-21 13:11 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-parisc, James Bottomley, Mathieu Desnoyers

On 2015-12-21, at 4:19 AM, Helge Deller wrote:

> On parisc the current syscall restart code assumes that all syscall
> callers load the syscall number in the delay slot of the ble
> instruction. That's how it is e.g. done in the unistd.h header file:
> 	ble 0x100(%sr2, %r0)
> 	ldi #syscall_nr, %r20
> Because of that assumption the current code never restored %r20 before
> returning to userspace.

Is %r20 restored for nop case?  The comments in the patch suggest that it is
not necessary to restore it.

I'm  thinking we need to review syscall clobbered register list for glibc.

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-21  8:03           ` James Bottomley
@ 2015-12-21 14:39             ` Mathieu Desnoyers
  0 siblings, 0 replies; 30+ messages in thread
From: Mathieu Desnoyers @ 2015-12-21 14:39 UTC (permalink / raw)
  To: James Bottomley; +Cc: Helge Deller, linux-parisc, John David Anglin

----- On Dec 21, 2015, at 3:03 AM, James Bottomley James.Bottomley@HansenPartnership.com wrote:

> On Sun, 2015-12-20 at 21:35 +0100, Helge Deller wrote:
>> On 20.12.2015 17:50, James Bottomley wrote:
>> > On Sun, 2015-12-20 at 16:49 +0100, Helge Deller wrote:
>> > > On 20.12.2015 15:09, Mathieu Desnoyers wrote:
>> > > > ----- On Dec 20, 2015, at 8:59 AM, Mathieu Desnoyers
>> > > > mathieu.desnoyers@efficios.com wrote:
>> > > > 
>> > > > > ----- On Dec 18, 2015, at 6:30 PM, Helge Deller deller@gmx.de
>> > > > >  wro
>> > > > > te:
>> > > > > 
>> > > > > > On parisc syscalls which are interrupted by signals
>> > > > > > sometimes
>> > > > > > fail to restart
>> > > > > > and instead return -ENOSYS which then in the worst case
>> > > > > > lead to
>> > > > > > userspace
>> > > > > > crashes.
>> > > > > > A similiar problem existed on MIPS and was fixed by commit
>> > > > > > e967ef02
>> > > > > > ("MIPS: Fix restart of indirect syscalls").
>> > > > > > 
>> > > > > > On parisc the current syscall restart code assumes hat the
>> > > > > > syscall number is
>> > > > > > always loaded in the delay branch of the ble instruction as
>> > > > > > defined in the
>> > > > > > unistd.h header file and as such never restored %r20 before
>> > > > > > returning to
>> > > > > > userspace:
>> > > > > > 	ble 0x100(%sr2, %r0)
>> > > > > > 	ldi #syscall_nr, %r20
>> > > > > > 
>> > > > > > This assumption is at least not true for code which uses
>> > > > > > the
>> > > > > > syscall() glibc
>> > > > > > function, which instead uses this syntax:
>> > > > > > 	ble 0x100(%sr2, %r0)
>> > > > > > 	copy regX, %r20
>> > > > > > where regX depend on how the compiler optimizes the code
>> > > > > > and
>> > > > > > register usage.
>> > > > > > 
>> > > > > > This patch fixes this problem by adding code to analyze how
>> > > > > > the
>> > > > > > syscall number
>> > > > > > is loaded in the delay branch and - if needed - copy the
>> > > > > > syscall number to regX
>> > > > > > prior returning to userspace for the syscall restart.
>> > > > > > 
>> > > > > > Signed-off-by: Helge Deller <deller@gmx.de>
>> > > > > > Cc: stable@vger.kernel.org
>> > > > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> > > > > > 
>> > > > > > diff --git a/arch/parisc/kernel/signal.c
>> > > > > > b/arch/parisc/kernel/signal.c
>> > > > > > index dc1ea79..b0414ad 100644
>> > > > > > --- a/arch/parisc/kernel/signal.c
>> > > > > > +++ b/arch/parisc/kernel/signal.c
>> > > > > > @@ -435,6 +435,48 @@ handle_signal(struct ksignal *ksig,
>> > > > > > struct
>> > > > > > pt_regs *regs,
>> > > > > > int in_syscall)
>> > > > > > 		regs->gr[28]);
>> > > > > > }
>> > > > > > 
>> > > > > > +/*
>> > > > > > + * Check the delay branch in userspace how the syscall
>> > > > > > number
>> > > > > > gets loaded into
>> > > > > > + * %r20 and adjust as needed.
>> > > > > 
>> > > > > I'm pretty sure "Check the delay branch in userspace how the
>> > > > > syscall..."
>> > > > > is not an English construct. ;-) Suggested rewording:
>> > > > > 
>> > > > > "Check how the syscall number gets loaded into %r20 within
>> > > > > the delay branch in userspace and adjust as needed."
>> > > 
>> > > Thanks!
>> > > I'll change that.
>> > > 
>> > > 
>> > > > > > + */
>> > > > > > +
>> > > > > > +static void check_syscallno_in_delay_branch(struct pt_regs
>> > > > > > *regs)
>> > > > > > +{
>> > > > > > +	unsigned int opcode, source_reg;
>> > > > > 
>> > > > > Why "unsigned int" above rather than u32 ? Since we're using
>> > > > > opcode as target variable for a get_user, it would be clearer
>> > > > > if the type of the __user * match the type of the target
>> > > > > kernel
>> > > > > variable. (understood that those happen to have the same
>> > > > > bitness
>> > > > > and type size on all Linux architectures, but it would be
>> > > > > clearer
>> > > > > nevertheless).
>> > > 
>> > > Yes, seems OK.
>> > > I'll change that.
>> > > 
>> > > > > > +	u32 __user *uaddr;
>> > > > > > +
>> > > > > > +	/* Usually we don't have to restore %r20 (the
>> > > > > > system
>> > > > > > call number)
>> > > > > > +	 * because it gets loaded in the delay slot of the
>> > > > > > branch external
>> > > > > > +	 * instruction via the ldi instruction.
>> > > > > > +	 * In some cases a register-to-register copy
>> > > > > > instruction might have
>> > > > > > +	 * been used instead, in which case we need to
>> > > > > > copy
>> > > > > > the syscall
>> > > > > > +	 * number into the source register before
>> > > > > > returning to
>> > > > > > userspace.
>> > > > > > +	 */
>> > > > > > +
>> > > > > > +	/* A syscall is just a branch, so all
>> > > > > > +	 * we have to do is fiddle the return pointer.
>> > > > > > +	 */
>> > > > > > +	regs->gr[31] -= 8; /* delayed branching */
>> > > > > > +
>> > > > > > +	/* Get assembler opcode of code in delay branch */
>> > > > > > +	uaddr = (unsigned int *) (regs->gr[31] + 1);
>> > > > > > +	get_user(opcode, uaddr);
>> > > > > 
>> > > > > get_user() can fail due to EFAULT. This error should be
>> > > > > handled here, otherwise this could lead to the following
>> > > > > code using an uninitialized opcode variable, which could
>> > > > > indirectly leak a few bits of kernel stack information
>> > > > > to userspace (security concern). One attack vector I have
>> > > > > in mind for this is ptrace(), which might be able to tweak
>> > > > > those register values.
>> > > 
>> > > Yes, generally get_user() can fail.
>> > > But this would be rather strange in that case, because
>> > > the syscall was started by userspace from this address.
>> > > So, without the code at that address in userspace, we would
>> > > never have reached this get_user().
>> > 
>> > Actually, that's not necessarily a safe assumption.  Any memory
>> > allocation in a syscall path (except GFP_ATOMIC) can trigger
>> > reclaim
>> > and since this is a signal restart path, that's entirely possible.
>> >  Reclaim could pull the backing page out from under the syscall, so
>> > in
>> > a low memory situation it is possible get_user() could fail with
>> > EFAULT
>> 
>> Really?
>> Maybe I misunderstood...?
>> So, let's say we have low memory and the kernel "swapped" out the
>> userspace.
>> I assume that when there is no memory pressure get_user() would
>> pull the page in again, and if it's memory pressure, then with the
>> assumption
>> there is no memory left it probably return EFAULT (is that true?).
>> But what happens then when we return to userspace?
>> I expect userspace then to segfault.
>> 
>> I will add the check for EFAULT, but just trying to understand...
> 
> Actually, you're right, I was misremembering why these functions return
> EFAULT.  It's if the access is invalid.  They will actually try to page
> back in via the fault handler if the backing has gone.

I think you could still trigger the EFAULT by having a
concurrent thread running within the same process address
space invoking munmap on the memory range that
includes the get_user() target memory area.

This would clearly be a hostile user-space application,
but it's important to validate all user inputs very
carefully, especially for misbehaving applications.

Thanks,

Mathieu

> 
> James
> 
>> > unless get_user_page() has been called somewhere to pin the page.
>> 
>> Helge
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux
>> -parisc" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-20 18:31       ` John David Anglin
  2015-12-20 19:32         ` Helge Deller
@ 2015-12-21 14:42         ` Mathieu Desnoyers
  2015-12-21 15:12           ` John David Anglin
  1 sibling, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers @ 2015-12-21 14:42 UTC (permalink / raw)
  To: John David Anglin; +Cc: Helge Deller, linux-parisc, James Bottomley

----- On Dec 20, 2015, at 1:31 PM, John David Anglin dave.anglin@bell.net wrote:

> On 2015-12-20, at 10:49 AM, Helge Deller wrote:
> 
>>>>> /* Check if delay branch uses "copy %rX,%r20" */
>>>>> +	if ((opcode & 0xff00ffff) == 0x08000254) {
>>>>> +		source_reg = (opcode >> 16) & 31;
>>>> 
>>>> Can you explain the reasoning behind the & 31 mask ?
>>>> I'm no parisc expert, but this seems rather odd.
>>>> Do you really mean "% 31" which translates to "& 5" ?
>>>> It would make more sense since there are 32 "gr"
>>>> registers. With & 31, a carefully crafted opcode
>>>> could overflow the gr[32] array, and cause a kernel
>>>> overflow allowing to overwrite kernel memory
>>>> (security issue).
>>> 
>>> Hrm, I got my masks temporarily mixed up (early morning
>>> here). This is why I always use constructs such as:
>>> 
>>> #define GR_REGS_BITS  5
>>> #define NR_GR_REGS    (1U << GR_REGS_BITS)
>>> #define GR_REGS_MASK  (NR_GR_REGS - 1)
>>> 
>>> and then
>>> 
>>> v & GR_REGS_MASK;
>>> 
>>> Which makes everything super-obvious. The & 31 mask
>>> seems therefore technically correct.
>> 
>> Good. I was struggling with your comments as well :-)
>> 
>> 
>>> The paragraph below still holds though:
>>> 
>>>> 
>>>> If it's the case, then it would also be good to
>>>> check that the register selector within the opcode
>>>> is not larger than 31 (e.g. applying to fr or sr
>>>> register), rather than applying the modulo and
>>>> assuming it's a gr and corrupt userspace state.
>> 
>> I'll check that.
> 
> The register field cannot be larger than 31.  A fr or sr
> regster can't be used in a LDO "copy" instruction.

Independently of the instruction set, if an application
pulls the carpet from under the kernel from a concurrent
thread (or possibly ptrace) by forging an invalid
instruction after the system call has started executing,
it would be good to catch it and report it with a printk,
rather than blindly expecting that some bits should always
be 0.

Thanks,

Mathieu

> 
> Dave
> --
> John David Anglin	dave.anglin@bell.net

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts
  2015-12-21 14:42         ` Mathieu Desnoyers
@ 2015-12-21 15:12           ` John David Anglin
  0 siblings, 0 replies; 30+ messages in thread
From: John David Anglin @ 2015-12-21 15:12 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Helge Deller, linux-parisc, James Bottomley

On 2015-12-21 9:42 AM, Mathieu Desnoyers wrote:
> ----- On Dec 20, 2015, at 1:31 PM, John David Anglin dave.anglin@bell.net wrote:
>
>> On 2015-12-20, at 10:49 AM, Helge Deller wrote:
>>
>>>>>> /* Check if delay branch uses "copy %rX,%r20" */
>>>>>> +	if ((opcode & 0xff00ffff) == 0x08000254) {
>>>>>> +		source_reg = (opcode >> 16) & 31;
>>>>> Can you explain the reasoning behind the & 31 mask ?
>>>>> I'm no parisc expert, but this seems rather odd.
>>>>> Do you really mean "% 31" which translates to "& 5" ?
>>>>> It would make more sense since there are 32 "gr"
>>>>> registers. With & 31, a carefully crafted opcode
>>>>> could overflow the gr[32] array, and cause a kernel
>>>>> overflow allowing to overwrite kernel memory
>>>>> (security issue).
>>>> Hrm, I got my masks temporarily mixed up (early morning
>>>> here). This is why I always use constructs such as:
>>>>
>>>> #define GR_REGS_BITS  5
>>>> #define NR_GR_REGS    (1U << GR_REGS_BITS)
>>>> #define GR_REGS_MASK  (NR_GR_REGS - 1)
>>>>
>>>> and then
>>>>
>>>> v & GR_REGS_MASK;
>>>>
>>>> Which makes everything super-obvious. The & 31 mask
>>>> seems therefore technically correct.
>>> Good. I was struggling with your comments as well :-)
>>>
>>>
>>>> The paragraph below still holds though:
>>>>
>>>>> If it's the case, then it would also be good to
>>>>> check that the register selector within the opcode
>>>>> is not larger than 31 (e.g. applying to fr or sr
>>>>> register), rather than applying the modulo and
>>>>> assuming it's a gr and corrupt userspace state.
>>> I'll check that.
>> The register field cannot be larger than 31.  A fr or sr
>> regster can't be used in a LDO "copy" instruction.
> Independently of the instruction set, if an application
> pulls the carpet from under the kernel from a concurrent
> thread (or possibly ptrace) by forging an invalid
> instruction after the system call has started executing,
> it would be good to catch it and report it with a printk,
> rather than blindly expecting that some bits should always
> be 0.

The mask should be 0xffe0ffff.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts (v2)
  2015-12-21  9:19 ` [PATCH] parisc: Fix syscall restarts (v2) Helge Deller
  2015-12-21 13:11   ` John David Anglin
@ 2015-12-21 20:27   ` Mathieu Desnoyers
  2015-12-21 20:54     ` Helge Deller
  1 sibling, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers @ 2015-12-21 20:27 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-parisc, James Bottomley, John David Anglin

----- On Dec 21, 2015, at 4:19 AM, Helge Deller deller@gmx.de wrote:

> This is version 2 of the patch:
> 
> On parisc syscalls which are interrupted by signals sometimes failed to
> restart and instead returned -ENOSYS which in the worst case lead to
> userspace crashes.
> A similiar problem existed on MIPS and was fixed by commit e967ef02
> ("MIPS: Fix restart of indirect syscalls").
> 
> On parisc the current syscall restart code assumes that all syscall
> callers load the syscall number in the delay slot of the ble
> instruction. That's how it is e.g. done in the unistd.h header file:
>	ble 0x100(%sr2, %r0)
>	ldi #syscall_nr, %r20
> Because of that assumption the current code never restored %r20 before
> returning to userspace.
> 
> This assumption is at least not true for code which uses the glibc
> syscall() function, which instead uses this syntax:
>	ble 0x100(%sr2, %r0)
>	copy regX, %r20
> where regX depend on how the compiler optimizes the code and register
> usage.
> 
> This patch fixes this problem by adding code to analyze how the syscall
> number is loaded in the delay branch and - if needed - copy the syscall
> number to regX prior returning to userspace for the syscall restart.
> 
> Signed-off-by: Helge Deller <deller@gmx.de>
> Cc: stable@vger.kernel.org
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> 
> diff --git a/arch/parisc/kernel/signal.c b/arch/parisc/kernel/signal.c
> index dc1ea79..2264f68 100644
> --- a/arch/parisc/kernel/signal.c
> +++ b/arch/parisc/kernel/signal.c
> @@ -435,6 +435,55 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs,
> int in_syscall)
> 		regs->gr[28]);
> }
> 
> +/*
> + * Check how the syscall number gets loaded into %r20 within
> + * the delay branch in userspace and adjust as needed.
> + */
> +
> +static void check_syscallno_in_delay_branch(struct pt_regs *regs)
> +{
> +	u32 opcode, source_reg;
> +	u32 __user *uaddr;
> +	int err;
> +
> +	/* Usually we don't have to restore %r20 (the system call number)
> +	 * because it gets loaded in the delay slot of the branch external
> +	 * instruction via the ldi instruction.
> +	 * In some cases a register-to-register copy instruction might have
> +	 * been used instead, in which case we need to copy the syscall
> +	 * number into the source register before returning to userspace.
> +	 */
> +
> +	/* A syscall is just a branch, so all we have to do is fiddle the
> +	 * return pointer so that the ble instruction gets executed again.
> +	 */
> +	regs->gr[31] -= 8; /* delayed branching */
> +
> +	/* Get assembler opcode of code in delay branch */
> +	uaddr = (unsigned int *) ((regs->gr[31] & ~3) + 4);

Is it valid to have unaligned instructions ? Does the architecture
allow it, or it's a fumble and we should pr_warn ?

> +	err = get_user(opcode, uaddr);
> +	if (err)

Should we add a pr_warn here ?

> +		return;
> +
> +	/* Check if delay branch uses "ldi int,%r20" */
> +	if ((opcode & 0xffff0000) == 0x34140000)
> +		return;	/* everything ok, just return */
> +
> +	/* Check if delay branch uses "nop" */
> +	if (opcode == INSN_NOP)
> +		return;

When we find a NOP in the delay slot, how can we be sure %r20
still holds the syscall value when we re-play the branch
instruction ? Can it be overwritten during the syscall,
either from start of syscall to here, or from here to
return to userspace ?

> +
> +	/* Check if delay branch uses "copy %rX,%r20" */
> +	if ((opcode & 0xffe0ffff) == 0x08000254) {
> +		source_reg = (opcode >> 16) & 31;
> +		regs->gr[source_reg] = regs->gr[20];

Similar question here, how can we be sure regs->gr[20]
still has the system call number at this point (not
overwritten from start of syscall to here) ?

Thanks,

Mathieu

> +		return;
> +	}
> +
> +	pr_warn("syscall restart: %s (pid %d): unexpected opcode 0x%08x\n",
> +		current->comm, task_pid_nr(current), opcode);
> +}
> +
> static inline void
> syscall_restart(struct pt_regs *regs, struct k_sigaction *ka)
> {
> @@ -457,10 +506,7 @@ syscall_restart(struct pt_regs *regs, struct k_sigaction
> *ka)
> 		}
> 		/* fallthrough */
> 	case -ERESTARTNOINTR:
> -		/* A syscall is just a branch, so all
> -		 * we have to do is fiddle the return pointer.
> -		 */
> -		regs->gr[31] -= 8; /* delayed branching */
> +		check_syscallno_in_delay_branch(regs);
> 		break;
> 	}
> }
> @@ -510,15 +556,9 @@ insert_restart_trampoline(struct pt_regs *regs)
> 	}
> 	case -ERESTARTNOHAND:
> 	case -ERESTARTSYS:
> -	case -ERESTARTNOINTR: {
> -		/* Hooray for delayed branching.  We don't
> -		 * have to restore %r20 (the system call
> -		 * number) because it gets loaded in the delay
> -		 * slot of the branch external instruction.
> -		 */
> -		regs->gr[31] -= 8;
> +	case -ERESTARTNOINTR:
> +		check_syscallno_in_delay_branch(regs);
> 		return;
> -	}
> 	default:
> 		break;
>  	}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts (v2)
  2015-12-21 20:27   ` Mathieu Desnoyers
@ 2015-12-21 20:54     ` Helge Deller
  2015-12-24 16:07       ` Mathieu Desnoyers
  0 siblings, 1 reply; 30+ messages in thread
From: Helge Deller @ 2015-12-21 20:54 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: linux-parisc, James Bottomley, John David Anglin

On 21.12.2015 21:27, Mathieu Desnoyers wrote:
> ----- On Dec 21, 2015, at 4:19 AM, Helge Deller deller@gmx.de wrote:
> 
>> This is version 2 of the patch:
>>
>> On parisc syscalls which are interrupted by signals sometimes failed to
>> restart and instead returned -ENOSYS which in the worst case lead to
>> userspace crashes.
>> A similiar problem existed on MIPS and was fixed by commit e967ef02
>> ("MIPS: Fix restart of indirect syscalls").
>>
>> On parisc the current syscall restart code assumes that all syscall
>> callers load the syscall number in the delay slot of the ble
>> instruction. That's how it is e.g. done in the unistd.h header file:
>> 	ble 0x100(%sr2, %r0)
>> 	ldi #syscall_nr, %r20
>> Because of that assumption the current code never restored %r20 before
>> returning to userspace.
>>
>> This assumption is at least not true for code which uses the glibc
>> syscall() function, which instead uses this syntax:
>> 	ble 0x100(%sr2, %r0)
>> 	copy regX, %r20
>> where regX depend on how the compiler optimizes the code and register
>> usage.
>>
>> This patch fixes this problem by adding code to analyze how the syscall
>> number is loaded in the delay branch and - if needed - copy the syscall
>> number to regX prior returning to userspace for the syscall restart.
>>
>> Signed-off-by: Helge Deller <deller@gmx.de>
>> Cc: stable@vger.kernel.org
>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>
>> diff --git a/arch/parisc/kernel/signal.c b/arch/parisc/kernel/signal.c
>> index dc1ea79..2264f68 100644
>> --- a/arch/parisc/kernel/signal.c
>> +++ b/arch/parisc/kernel/signal.c
>> @@ -435,6 +435,55 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs,
>> int in_syscall)
>> 		regs->gr[28]);
>> }
>>
>> +/*
>> + * Check how the syscall number gets loaded into %r20 within
>> + * the delay branch in userspace and adjust as needed.
>> + */
>> +
>> +static void check_syscallno_in_delay_branch(struct pt_regs *regs)
>> +{
>> +	u32 opcode, source_reg;
>> +	u32 __user *uaddr;
>> +	int err;
>> +
>> +	/* Usually we don't have to restore %r20 (the system call number)
>> +	 * because it gets loaded in the delay slot of the branch external
>> +	 * instruction via the ldi instruction.
>> +	 * In some cases a register-to-register copy instruction might have
>> +	 * been used instead, in which case we need to copy the syscall
>> +	 * number into the source register before returning to userspace.
>> +	 */
>> +
>> +	/* A syscall is just a branch, so all we have to do is fiddle the
>> +	 * return pointer so that the ble instruction gets executed again.
>> +	 */
>> +	regs->gr[31] -= 8; /* delayed branching */
>> +
>> +	/* Get assembler opcode of code in delay branch */
>> +	uaddr = (unsigned int *) ((regs->gr[31] & ~3) + 4);
> 
> Is it valid to have unaligned instructions ? Does the architecture
> allow it, or it's a fumble and we should pr_warn ?

How can it be unaligned? It's about u32...
And, no, unaligned instructions are not allowed (I think that at least).
 
>> +	err = get_user(opcode, uaddr);
>> +	if (err)
> 
> Should we add a pr_warn here ?

No. There is no gain to have a warning here.
 
>> +		return;
>> +
>> +	/* Check if delay branch uses "ldi int,%r20" */
>> +	if ((opcode & 0xffff0000) == 0x34140000)
>> +		return;	/* everything ok, just return */
>> +
>> +	/* Check if delay branch uses "nop" */
>> +	if (opcode == INSN_NOP)
>> +		return;
> 
> When we find a NOP in the delay slot, how can we be sure %r20
> still holds the syscall value when we re-play the branch
> instruction ? 

I looked at the code and even tested it (with your testcase actually).

> Can it be overwritten during the syscall,
> either from start of syscall to here, or from here to
> return to userspace ?

No.
 
>> +
>> +	/* Check if delay branch uses "copy %rX,%r20" */
>> +	if ((opcode & 0xffe0ffff) == 0x08000254) {
>> +		source_reg = (opcode >> 16) & 31;
>> +		regs->gr[source_reg] = regs->gr[20];
> 
> Similar question here, how can we be sure regs->gr[20]
> still has the system call number at this point (not
> overwritten from start of syscall to here) ?

Those registers are saved at entry of syscall and
restored at exit (with exception of a few registers
e.g. like r28 which is the return value of the syscall). 

Helge

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts (v2)
  2015-12-21 20:54     ` Helge Deller
@ 2015-12-24 16:07       ` Mathieu Desnoyers
  2015-12-24 16:51         ` John David Anglin
  0 siblings, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers @ 2015-12-24 16:07 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-parisc, James Bottomley, John David Anglin

----- On Dec 21, 2015, at 3:54 PM, Helge Deller deller@gmx.de wrote:

> On 21.12.2015 21:27, Mathieu Desnoyers wrote:
>> ----- On Dec 21, 2015, at 4:19 AM, Helge Deller deller@gmx.de wrote:
>> 
>>> This is version 2 of the patch:
>>>
>>> On parisc syscalls which are interrupted by signals sometimes failed to
>>> restart and instead returned -ENOSYS which in the worst case lead to
>>> userspace crashes.
>>> A similiar problem existed on MIPS and was fixed by commit e967ef02
>>> ("MIPS: Fix restart of indirect syscalls").
>>>
>>> On parisc the current syscall restart code assumes that all syscall
>>> callers load the syscall number in the delay slot of the ble
>>> instruction. That's how it is e.g. done in the unistd.h header file:
>>> 	ble 0x100(%sr2, %r0)
>>> 	ldi #syscall_nr, %r20
>>> Because of that assumption the current code never restored %r20 before
>>> returning to userspace.
>>>
>>> This assumption is at least not true for code which uses the glibc
>>> syscall() function, which instead uses this syntax:
>>> 	ble 0x100(%sr2, %r0)
>>> 	copy regX, %r20
>>> where regX depend on how the compiler optimizes the code and register
>>> usage.
>>>
>>> This patch fixes this problem by adding code to analyze how the syscall
>>> number is loaded in the delay branch and - if needed - copy the syscall
>>> number to regX prior returning to userspace for the syscall restart.
>>>
>>> Signed-off-by: Helge Deller <deller@gmx.de>
>>> Cc: stable@vger.kernel.org
>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>
>>> diff --git a/arch/parisc/kernel/signal.c b/arch/parisc/kernel/signal.c
>>> index dc1ea79..2264f68 100644
>>> --- a/arch/parisc/kernel/signal.c
>>> +++ b/arch/parisc/kernel/signal.c
>>> @@ -435,6 +435,55 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs,
>>> int in_syscall)
>>> 		regs->gr[28]);
>>> }
>>>
>>> +/*
>>> + * Check how the syscall number gets loaded into %r20 within
>>> + * the delay branch in userspace and adjust as needed.
>>> + */
>>> +
>>> +static void check_syscallno_in_delay_branch(struct pt_regs *regs)
>>> +{
>>> +	u32 opcode, source_reg;
>>> +	u32 __user *uaddr;
>>> +	int err;
>>> +
>>> +	/* Usually we don't have to restore %r20 (the system call number)
>>> +	 * because it gets loaded in the delay slot of the branch external
>>> +	 * instruction via the ldi instruction.
>>> +	 * In some cases a register-to-register copy instruction might have
>>> +	 * been used instead, in which case we need to copy the syscall
>>> +	 * number into the source register before returning to userspace.
>>> +	 */
>>> +
>>> +	/* A syscall is just a branch, so all we have to do is fiddle the
>>> +	 * return pointer so that the ble instruction gets executed again.
>>> +	 */
>>> +	regs->gr[31] -= 8; /* delayed branching */
>>> +
>>> +	/* Get assembler opcode of code in delay branch */
>>> +	uaddr = (unsigned int *) ((regs->gr[31] & ~3) + 4);
>> 
>> Is it valid to have unaligned instructions ? Does the architecture
>> allow it, or it's a fumble and we should pr_warn ?
> 
> How can it be unaligned? It's about u32...

That would be an instruction that is volountarily offset
from 1, 2, 3 bytes from 4-bytes multiples by the application.
The only situation where I have seen this is in cases where
applications are trying to play games with the debugger or
disassembler and hide what they are doing: they can offset
the start of a function like this, and therefore all the
instructions within that function.

> And, no, unaligned instructions are not allowed (I think that at least).

Might be interesting to try it out though. I'm not saying it's
a valid use-case for an application, but it would be at least good
to known whether this is an input we can expect.

> 
>>> +	err = get_user(opcode, uaddr);
>>> +	if (err)
>> 
>> Should we add a pr_warn here ?
> 
> No. There is no gain to have a warning here.

Allright.

> 
>>> +		return;
>>> +
>>> +	/* Check if delay branch uses "ldi int,%r20" */
>>> +	if ((opcode & 0xffff0000) == 0x34140000)
>>> +		return;	/* everything ok, just return */
>>> +
>>> +	/* Check if delay branch uses "nop" */
>>> +	if (opcode == INSN_NOP)
>>> +		return;
>> 
>> When we find a NOP in the delay slot, how can we be sure %r20
>> still holds the syscall value when we re-play the branch
>> instruction ?
> 
> I looked at the code and even tested it (with your testcase actually).
> 
>> Can it be overwritten during the syscall,
>> either from start of syscall to here, or from here to
>> return to userspace ?
> 
> No.
> 
>>> +
>>> +	/* Check if delay branch uses "copy %rX,%r20" */
>>> +	if ((opcode & 0xffe0ffff) == 0x08000254) {
>>> +		source_reg = (opcode >> 16) & 31;
>>> +		regs->gr[source_reg] = regs->gr[20];
>> 
>> Similar question here, how can we be sure regs->gr[20]
>> still has the system call number at this point (not
>> overwritten from start of syscall to here) ?
> 
> Those registers are saved at entry of syscall and
> restored at exit (with exception of a few registers
> e.g. like r28 which is the return value of the syscall).

Can r28 be used as a source_reg ? If so, what happens then ?

Thanks,

Mathieu

> 
> Helge

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] parisc: Fix syscall restarts (v2)
  2015-12-24 16:07       ` Mathieu Desnoyers
@ 2015-12-24 16:51         ` John David Anglin
  0 siblings, 0 replies; 30+ messages in thread
From: John David Anglin @ 2015-12-24 16:51 UTC (permalink / raw)
  To: Mathieu Desnoyers, Helge Deller; +Cc: linux-parisc, James Bottomley

On 2015-12-24 11:07 AM, Mathieu Desnoyers wrote:
>>> Is it valid to have unaligned instructions ? Does the architecture
>>> >>allow it, or it's a fumble and we should pr_warn ?
>> >
>> >How can it be unaligned? It's about u32...
> That would be an instruction that is volountarily offset
> from 1, 2, 3 bytes from 4-bytes multiples by the application.
> The only situation where I have seen this is in cases where
> applications are trying to play games with the debugger or
> disassembler and hide what they are doing: they can offset
> the start of a function like this, and therefore all the
> instructions within that function.
>
This is not possible on PA-RISC.  Indeed, user instruction addresses are 
always
offset by three.  There is no way to branch to an instruction that is 
offset.

The least significant two bits of an instruction address contain a 
priority level.
User code on linux and hpux executes at level 3.  The only way a user 
can change
privilege level is with a "gate" instruction (a special branch). Whether 
this is
permitted or not depends on page table permissions that the user can't 
change.
In practice, a level change is only allowed on the gateway page.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2015-12-24 16:51 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-18 23:30 [PATCH] parisc: Fix syscall restarts Helge Deller
2015-12-20 13:59 ` Mathieu Desnoyers
2015-12-20 14:09   ` Mathieu Desnoyers
2015-12-20 15:49     ` Helge Deller
2015-12-20 16:50       ` James Bottomley
2015-12-20 20:35         ` Helge Deller
2015-12-21  8:03           ` James Bottomley
2015-12-21 14:39             ` Mathieu Desnoyers
2015-12-20 18:31       ` John David Anglin
2015-12-20 19:32         ` Helge Deller
2015-12-20 19:46           ` John David Anglin
2015-12-20 20:06             ` Helge Deller
2015-12-20 23:57             ` John David Anglin
2015-12-21 14:42         ` Mathieu Desnoyers
2015-12-21 15:12           ` John David Anglin
2015-12-20 19:39 ` John David Anglin
2015-12-20 19:48   ` Helge Deller
2015-12-20 20:01     ` John David Anglin
2015-12-20 20:18       ` Helge Deller
2015-12-20 20:45         ` John David Anglin
2015-12-20 20:14 ` John David Anglin
2015-12-20 20:19   ` Helge Deller
2015-12-20 20:21     ` Helge Deller
2015-12-20 20:53       ` John David Anglin
2015-12-21  9:19 ` [PATCH] parisc: Fix syscall restarts (v2) Helge Deller
2015-12-21 13:11   ` John David Anglin
2015-12-21 20:27   ` Mathieu Desnoyers
2015-12-21 20:54     ` Helge Deller
2015-12-24 16:07       ` Mathieu Desnoyers
2015-12-24 16:51         ` John David Anglin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.