All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] um: Try to avoid kmalloc in signal handling
@ 2019-01-03 16:09 anton.ivanov
  2019-01-03 17:13 ` Anton Ivanov
  0 siblings, 1 reply; 14+ messages in thread
From: anton.ivanov @ 2019-01-03 16:09 UTC (permalink / raw)
  To: linux-um; +Cc: richard, Anton Ivanov

From: Anton Ivanov <anton.ivanov@cambridgegreys.com>

Signal handling (which maps to interrupt handling in UML) needs
to pass current registers to the relevant handlers and was
allocating a structure for that using kmalloc. It is possible
to avoid this kmalloc by using a small "signal register stack".
A depth of 4 suffices 99%+ of the time. If it is exceeded
further sets of registers are allocated as before via kmalloc.

The end result is >10% performance increase in networking
as measured by iperf.

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
---
 arch/um/os-Linux/signal.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
index bf0acb8aad8b..21536581ab57 100644
--- a/arch/um/os-Linux/signal.c
+++ b/arch/um/os-Linux/signal.c
@@ -29,14 +29,24 @@ void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *) = {
 	[SIGALRM]	= timer_handler
 };
 
+static int reg_depth;
+
+static struct uml_pt_regs reg_file[4];
+#define REG_SWITCH_TO_KMALLOC_DEPTH 3
+
 static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc)
 {
 	struct uml_pt_regs *r;
 	int save_errno = errno;
 
-	r = uml_kmalloc(sizeof(struct uml_pt_regs), UM_GFP_ATOMIC);
-	if (!r)
-		panic("out of memory");
+	if (reg_depth >= REG_SWITCH_TO_KMALLOC_DEPTH) {
+		r = uml_kmalloc(sizeof(struct uml_pt_regs), UM_GFP_ATOMIC);
+		if (!r)
+			panic("out of memory");
+	} else {
+		r = &reg_file[reg_depth];
+	}
+	reg_depth++;
 
 	r->is_user = 0;
 	if (sig == SIGSEGV) {
@@ -53,7 +63,9 @@ static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc)
 
 	errno = save_errno;
 
-	free(r);
+	reg_depth--;
+	if (reg_depth >= REG_SWITCH_TO_KMALLOC_DEPTH)
+		free(r);
 }
 
 /*
-- 
2.11.0


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] um: Try to avoid kmalloc in signal handling
  2019-01-03 16:09 [PATCH] um: Try to avoid kmalloc in signal handling anton.ivanov
@ 2019-01-03 17:13 ` Anton Ivanov
  2019-01-03 21:29   ` Anton Ivanov
  0 siblings, 1 reply; 14+ messages in thread
From: Anton Ivanov @ 2019-01-03 17:13 UTC (permalink / raw)
  To: linux-um; +Cc: richard



On 1/3/19 4:09 PM, anton.ivanov@cambridgegreys.com wrote:
> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> 
> Signal handling (which maps to interrupt handling in UML) needs
> to pass current registers to the relevant handlers and was
> allocating a structure for that using kmalloc. It is possible
> to avoid this kmalloc by using a small "signal register stack".
> A depth of 4 suffices 99%+ of the time. If it is exceeded
> further sets of registers are allocated as before via kmalloc.
> 
> The end result is >10% performance increase in networking
> as measured by iperf.
> 
> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> ---
>   arch/um/os-Linux/signal.c | 20 ++++++++++++++++----
>   1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
> index bf0acb8aad8b..21536581ab57 100644
> --- a/arch/um/os-Linux/signal.c
> +++ b/arch/um/os-Linux/signal.c
> @@ -29,14 +29,24 @@ void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *) = {
>   	[SIGALRM]	= timer_handler
>   };
>   
> +static int reg_depth;
> +
> +static struct uml_pt_regs reg_file[4];
> +#define REG_SWITCH_TO_KMALLOC_DEPTH 3
> +
>   static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc)
>   {
>   	struct uml_pt_regs *r;
>   	int save_errno = errno;
>   
> -	r = uml_kmalloc(sizeof(struct uml_pt_regs), UM_GFP_ATOMIC);
> -	if (!r)
> -		panic("out of memory");
> +	if (reg_depth >= REG_SWITCH_TO_KMALLOC_DEPTH) {
> +		r = uml_kmalloc(sizeof(struct uml_pt_regs), UM_GFP_ATOMIC);
> +		if (!r)
> +			panic("out of memory");
> +	} else {
> +		r = &reg_file[reg_depth];
> +	}
> +	reg_depth++;
>   
>   	r->is_user = 0;
>   	if (sig == SIGSEGV) {
> @@ -53,7 +63,9 @@ static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc)
>   
>   	errno = save_errno;
>   
> -	free(r);
> +	reg_depth--;
> +	if (reg_depth >= REG_SWITCH_TO_KMALLOC_DEPTH)
> +		free(r);
>   }
>   
>   /*
> 

This redresses the performance loss introduced in the IRQ handling path 
by b6024b21fec8367ef961a771cc9dde31f1831965. Mea culpa, I should have 
spotted that one at the time.

In my tests the depth has never managed to exceed 2. So a threshold for 
switching to kmalloc set to 3 is possibly an overkill and can be reduced 
to 2.

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] um: Try to avoid kmalloc in signal handling
  2019-01-03 17:13 ` Anton Ivanov
@ 2019-01-03 21:29   ` Anton Ivanov
  2019-01-03 21:42     ` Richard Weinberger
  0 siblings, 1 reply; 14+ messages in thread
From: Anton Ivanov @ 2019-01-03 21:29 UTC (permalink / raw)
  To: linux-um; +Cc: richard

On 03/01/2019 17:13, Anton Ivanov wrote:
> 
> 
> On 1/3/19 4:09 PM, anton.ivanov@cambridgegreys.com wrote:
>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>
>> Signal handling (which maps to interrupt handling in UML) needs
>> to pass current registers to the relevant handlers and was
>> allocating a structure for that using kmalloc. It is possible
>> to avoid this kmalloc by using a small "signal register stack".
>> A depth of 4 suffices 99%+ of the time. If it is exceeded
>> further sets of registers are allocated as before via kmalloc.
>>
>> The end result is >10% performance increase in networking
>> as measured by iperf.
>>
>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> ---
>>   arch/um/os-Linux/signal.c | 20 ++++++++++++++++----
>>   1 file changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
>> index bf0acb8aad8b..21536581ab57 100644
>> --- a/arch/um/os-Linux/signal.c
>> +++ b/arch/um/os-Linux/signal.c
>> @@ -29,14 +29,24 @@ void (*sig_info[NSIG])(int, struct siginfo *, 
>> struct uml_pt_regs *) = {
>>       [SIGALRM]    = timer_handler
>>   };
>> +static int reg_depth;
>> +
>> +static struct uml_pt_regs reg_file[4];
>> +#define REG_SWITCH_TO_KMALLOC_DEPTH 3
>> +
>>   static void sig_handler_common(int sig, struct siginfo *si, 
>> mcontext_t *mc)
>>   {
>>       struct uml_pt_regs *r;
>>       int save_errno = errno;
>> -    r = uml_kmalloc(sizeof(struct uml_pt_regs), UM_GFP_ATOMIC);
>> -    if (!r)
>> -        panic("out of memory");
>> +    if (reg_depth >= REG_SWITCH_TO_KMALLOC_DEPTH) {
>> +        r = uml_kmalloc(sizeof(struct uml_pt_regs), UM_GFP_ATOMIC);
>> +        if (!r)
>> +            panic("out of memory");
>> +    } else {
>> +        r = &reg_file[reg_depth];
>> +    }
>> +    reg_depth++;
>>       r->is_user = 0;
>>       if (sig == SIGSEGV) {
>> @@ -53,7 +63,9 @@ static void sig_handler_common(int sig, struct 
>> siginfo *si, mcontext_t *mc)
>>       errno = save_errno;
>> -    free(r);
>> +    reg_depth--;
>> +    if (reg_depth >= REG_SWITCH_TO_KMALLOC_DEPTH)
>> +        free(r);
>>   }
>>   /*
>>
> 
> This redresses the performance loss introduced in the IRQ handling path 
> by b6024b21fec8367ef961a771cc9dde31f1831965. Mea culpa, I should have 
> spotted that one at the time.
> 
> In my tests the depth has never managed to exceed 2. So a threshold for 
> switching to kmalloc set to 3 is possibly an overkill and can be reduced 
> to 2.
> 

Second thought - this should not use kmalloc at all if it can. It should 
either go on the stack the way it used to do before 
b6024b21fec8367ef961a771cc9dde31f1831965 or it should have a sufficient 
"register stack" to push registers for all use cases.

Kmalloc/ATOMIC can fail just because it feels like it which means that 
this can throw a panic() under very heavy IO.

I am surprised this was not triggered before. The vector IO regularly 
has "holes" in the vector from failed atomic allocations in the same path.

-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] um: Try to avoid kmalloc in signal handling
  2019-01-03 21:29   ` Anton Ivanov
@ 2019-01-03 21:42     ` Richard Weinberger
  2019-01-04  7:07       ` Anton Ivanov
                         ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Richard Weinberger @ 2019-01-03 21:42 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: linux-um

Am Donnerstag, 3. Januar 2019, 22:29:17 CET schrieb Anton Ivanov:
> Second thought - this should not use kmalloc at all if it can. It should 
> either go on the stack the way it used to do before 
> b6024b21fec8367ef961a771cc9dde31f1831965 or it should have a sufficient 
> "register stack" to push registers for all use cases.
> 
> Kmalloc/ATOMIC can fail just because it feels like it which means that 
> this can throw a panic() under very heavy IO.

Not good. :-(
We cannot go back to stack based allocation. The stack is too small for
AVX register sets.
Maybe we can preallocate it.
(And check what arch/x86/ does)

Thanks,
//richard



_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] um: Try to avoid kmalloc in signal handling
  2019-01-03 21:42     ` Richard Weinberger
@ 2019-01-04  7:07       ` Anton Ivanov
  2019-01-04  7:48       ` Anton Ivanov
  2019-01-04 15:50       ` Anton Ivanov
  2 siblings, 0 replies; 14+ messages in thread
From: Anton Ivanov @ 2019-01-04  7:07 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-um

On 03/01/2019 21:42, Richard Weinberger wrote:
> Am Donnerstag, 3. Januar 2019, 22:29:17 CET schrieb Anton Ivanov:
>> Second thought - this should not use kmalloc at all if it can. It should
>> either go on the stack the way it used to do before
>> b6024b21fec8367ef961a771cc9dde31f1831965 or it should have a sufficient
>> "register stack" to push registers for all use cases.
>>
>> Kmalloc/ATOMIC can fail just because it feels like it which means that
>> this can throw a panic() under very heavy IO.
> 
> Not good. :-(
> We cannot go back to stack based allocation. The stack is too small for
> AVX register sets.
> Maybe we can preallocate it.

I can update the patch so it uses a preallocated buffer instead of a 
data segment. IMO for this particular case there is little benefit in 
kmallocing that at init. It is not something you can get away without so 
data segment is probably OK.

Unless I am mistaken, the worst case scenario for depth there is 4 (as 
in the patch at the moment).

IO + timer + fault + FPU.

We can still leave the "overfill kmalloc" and add a WARN() on that and 
see if it gets triggered in the wild.

> (And check what arch/x86/ does)

I will have a look and update the patch.

> 
> Thanks,
> //richard
> 
> 
> 


-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] um: Try to avoid kmalloc in signal handling
  2019-01-03 21:42     ` Richard Weinberger
  2019-01-04  7:07       ` Anton Ivanov
@ 2019-01-04  7:48       ` Anton Ivanov
  2019-01-04 15:50       ` Anton Ivanov
  2 siblings, 0 replies; 14+ messages in thread
From: Anton Ivanov @ 2019-01-04  7:48 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-um

On 03/01/2019 21:42, Richard Weinberger wrote:
> Am Donnerstag, 3. Januar 2019, 22:29:17 CET schrieb Anton Ivanov:
>> Second thought - this should not use kmalloc at all if it can. It should
>> either go on the stack the way it used to do before
>> b6024b21fec8367ef961a771cc9dde31f1831965 or it should have a sufficient
>> "register stack" to push registers for all use cases.
>>
>> Kmalloc/ATOMIC can fail just because it feels like it which means that
>> this can throw a panic() under very heavy IO.
> 
> Not good. :-(
> We cannot go back to stack based allocation. The stack is too small for
> AVX register sets.

If I read it correctly it still uses the stack even for the new register 
sets.

> Maybe we can preallocate it.
> (And check what arch/x86/ does)
> 
> Thanks,
> //richard
> 
> 
> 


-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] um: Try to avoid kmalloc in signal handling
  2019-01-03 21:42     ` Richard Weinberger
  2019-01-04  7:07       ` Anton Ivanov
  2019-01-04  7:48       ` Anton Ivanov
@ 2019-01-04 15:50       ` Anton Ivanov
  2019-01-07 10:07         ` Anton Ivanov
  2 siblings, 1 reply; 14+ messages in thread
From: Anton Ivanov @ 2019-01-04 15:50 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-um



On 1/3/19 9:42 PM, Richard Weinberger wrote:
> Am Donnerstag, 3. Januar 2019, 22:29:17 CET schrieb Anton Ivanov:
>> Second thought - this should not use kmalloc at all if it can. It should
>> either go on the stack the way it used to do before
>> b6024b21fec8367ef961a771cc9dde31f1831965 or it should have a sufficient
>> "register stack" to push registers for all use cases.
>>
>> Kmalloc/ATOMIC can fail just because it feels like it which means that
>> this can throw a panic() under very heavy IO.
> 
> Not good. :-(
> We cannot go back to stack based allocation. The stack is too small for
> AVX register sets.
> Maybe we can preallocate it.

I will write a pre-allocation alternative, but that may take some time 
because it needs to do additional allocations at some "safe" point in 
time like system idle, not in the IO/signal path where they can fail.

In the meantime we have to make sure that we are not sitting on a latent 
crash in the IO/Fault path so I have sent out a patch which removes 
kmalloc/ATOMIC and puts things back on the stack.

There are a few other options which are worth investigating like sizing 
the storage dynamically based on CPU and/or storing fp regs only if they 
have been touched, not every time (afaik MIPS was doing something like 
that). Once again, this may take a while to figure out if it is possible 
and some time to make it happen after that. That may be worth doing 
anyway for performance reasons. If memory serves me right, storing FPU 
state is a high cost operation. Not doing it for cases where it is not 
needed should yield some performance gains.

> (And check what arch/x86/ does)
> 
> Thanks,
> //richard
> 
> 
> 

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] um: Try to avoid kmalloc in signal handling
  2019-01-04 15:50       ` Anton Ivanov
@ 2019-01-07 10:07         ` Anton Ivanov
  0 siblings, 0 replies; 14+ messages in thread
From: Anton Ivanov @ 2019-01-07 10:07 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-um



On 1/4/19 3:50 PM, Anton Ivanov wrote:
> 
> 
> On 1/3/19 9:42 PM, Richard Weinberger wrote:
>> Am Donnerstag, 3. Januar 2019, 22:29:17 CET schrieb Anton Ivanov:
>>> Second thought - this should not use kmalloc at all if it can. It should
>>> either go on the stack the way it used to do before
>>> b6024b21fec8367ef961a771cc9dde31f1831965 or it should have a sufficient
>>> "register stack" to push registers for all use cases.
>>>
>>> Kmalloc/ATOMIC can fail just because it feels like it which means that
>>> this can throw a panic() under very heavy IO.
>>
>> Not good. :-(
>> We cannot go back to stack based allocation. The stack is too small for
>> AVX register sets.
>> Maybe we can preallocate it.
> 
> I will write a pre-allocation alternative, but that may take some time 
> because it needs to do additional allocations at some "safe" point in 
> time like system idle, not in the IO/signal path where they can fail.
> 
> In the meantime we have to make sure that we are not sitting on a latent 
> crash in the IO/Fault path so I have sent out a patch which removes 
> kmalloc/ATOMIC and puts things back on the stack.
> 
> There are a few other options which are worth investigating like sizing 
> the storage dynamically based on CPU and/or storing fp regs only if they 
> have been touched, not every time (afaik MIPS was doing something like 
> that). Once again, this may take a while to figure out if it is possible 
> and some time to make it happen after that. That may be worth doing 
> anyway for performance reasons. If memory serves me right, storing FPU 
> state is a high cost operation. Not doing it for cases where it is not 
> needed should yield some performance gains.
> 
>> (And check what arch/x86/ does)
>>
>> Thanks,
>> //richard
>>
>>
>>
> 

After digging both x86 and uml sources, I think that the long term goal 
should be to get rid of regs->fp altogether.

x86 does not have it and keeps fp state separate under most 
circumstances. As a result it has no issues pushing pt_reqs on the stack 
- it is small.

It is not easy though - references to it are all over the place and I am 
having quite an interesting time figuring out where we can replace them 
with the fpu field in the thread_info structure and where we have to do 
something else :)

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] um: Try to avoid kmalloc in signal handling
  2019-01-04 10:33     ` Richard Weinberger
  2019-01-04 10:49       ` Anton Ivanov
@ 2019-01-04 11:31       ` Anton Ivanov
  1 sibling, 0 replies; 14+ messages in thread
From: Anton Ivanov @ 2019-01-04 11:31 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-um



On 1/4/19 10:33 AM, Richard Weinberger wrote:
> Am Freitag, 4. Januar 2019, 11:19:46 CET schrieb Anton Ivanov:
>>> I fear this is not correct.
>>> Think of task switching, if you push something to reg_file[],
>>> it is not guaranteed that the same task will run next and pop the item.
>>> The order can change.
>>
>> It should not occur for the signals that map onto IRQ (timer and IO) -
>> there it is all kernel.
>>
>> I cannot trigger that for the other ones, but it may be possible.
> 
> We need to be super sure about this.
>   
>> IMO we should just bite the bullet and go back to using the stack
>> increasing the minimum stack order needed to 1 for 32 bit and to 2 for
>> 64 bit.
> 
> Increasing the kernel stack just for this is also a huge overhead.
> But I never benchmarked it myself.

I just did. I wrote a quick alternative to put it back on the stack.

Stack is marginally ~ 1% or thereabouts faster than the data segment 
patch on network tests. That makes it 14% faster than kmalloc-ing the 
pt_regs in the signal handling path.

The difference for other stuff (disk, etc) is within statistical error.

It is indeed using more stack - the minimum stack order needed for my 
setup on 64 bit is now 2 (On AMD A12).

> 
>> x86 if I read it correctly is still using the stack even for the new
>> register sets.
> 
> Are you sure? I don't think so. The kernel stack is small and the xregset is
> huge.
> Where did you see this?
> 
> Thanks,
> //richard
> 
> 
> 

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] um: Try to avoid kmalloc in signal handling
  2019-01-04 10:33     ` Richard Weinberger
@ 2019-01-04 10:49       ` Anton Ivanov
  2019-01-04 11:31       ` Anton Ivanov
  1 sibling, 0 replies; 14+ messages in thread
From: Anton Ivanov @ 2019-01-04 10:49 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-um



On 1/4/19 10:33 AM, Richard Weinberger wrote:
> Am Freitag, 4. Januar 2019, 11:19:46 CET schrieb Anton Ivanov:
>>> I fear this is not correct.
>>> Think of task switching, if you push something to reg_file[],
>>> it is not guaranteed that the same task will run next and pop the item.
>>> The order can change.
>>
>> It should not occur for the signals that map onto IRQ (timer and IO) -
>> there it is all kernel.
>>
>> I cannot trigger that for the other ones, but it may be possible.
> 
> We need to be super sure about this.

I am pretty sure about the IRQ ones. That call chain goes cleanly down 
and up resulting in whatever was pushed to be popped.

I am trying to get my head around what can possibly happen in the 
various fault ones that will result in a task switch to a task which is 
half-done with handling signal and is about to return (and pop the wrong 
one).

>   
>> IMO we should just bite the bullet and go back to using the stack
>> increasing the minimum stack order needed to 1 for 32 bit and to 2 for
>> 64 bit.
> 
> Increasing the kernel stack just for this is also a huge overhead.
> But I never benchmarked it myself.
> 
>> x86 if I read it correctly is still using the stack even for the new
>> register sets.
> 
> Are you sure? I don't think so. The kernel stack is small and the xregset is
> huge.
> Where did you see this?

It looked like that when I skimmed across get_sigframe and 
fpu_alloc_mathframe in x86 signal.c in first read, but I may be wrong.

> 
> Thanks,
> //richard
> 
> 
> 

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] um: Try to avoid kmalloc in signal handling
  2019-01-04 10:19   ` Anton Ivanov
@ 2019-01-04 10:33     ` Richard Weinberger
  2019-01-04 10:49       ` Anton Ivanov
  2019-01-04 11:31       ` Anton Ivanov
  0 siblings, 2 replies; 14+ messages in thread
From: Richard Weinberger @ 2019-01-04 10:33 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: linux-um

Am Freitag, 4. Januar 2019, 11:19:46 CET schrieb Anton Ivanov:
> > I fear this is not correct.
> > Think of task switching, if you push something to reg_file[],
> > it is not guaranteed that the same task will run next and pop the item.
> > The order can change.
> 
> It should not occur for the signals that map onto IRQ (timer and IO) - 
> there it is all kernel.
> 
> I cannot trigger that for the other ones, but it may be possible.

We need to be super sure about this.
 
> IMO we should just bite the bullet and go back to using the stack 
> increasing the minimum stack order needed to 1 for 32 bit and to 2 for 
> 64 bit.

Increasing the kernel stack just for this is also a huge overhead.
But I never benchmarked it myself.

> x86 if I read it correctly is still using the stack even for the new 
> register sets.

Are you sure? I don't think so. The kernel stack is small and the xregset is
huge.
Where did you see this?

Thanks,
//richard



_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] um: Try to avoid kmalloc in signal handling
  2019-01-04 10:13 ` Richard Weinberger
@ 2019-01-04 10:19   ` Anton Ivanov
  2019-01-04 10:33     ` Richard Weinberger
  0 siblings, 1 reply; 14+ messages in thread
From: Anton Ivanov @ 2019-01-04 10:19 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-um



On 1/4/19 10:13 AM, Richard Weinberger wrote:
> Am Freitag, 4. Januar 2019, 11:05:17 CET schrieb anton.ivanov@cambridgegreys.com:
>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>
>> Signal handling (which maps to interrupt handling in UML) needs
>> to pass current registers to the relevant handlers and was
>> allocating a structure for that using kmalloc. It is possible
>> to avoid this kmalloc by using a small "signal register stack".
>> A depth of 4 suffices for normal use. If it is exceeded
>> further sets of registers are allocated as before via kmalloc.
>>
>> The end result is >10% performance increase in networking
>> as measured by iperf and >5% across the board.
>>
>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> ---
>>   arch/um/include/shared/os.h |  4 ++++
>>   arch/um/kernel/signal.c     | 31 +++++++++++++++++++++++++++++++
>>   arch/um/os-Linux/signal.c   | 16 ++++------------
>>   3 files changed, 39 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
>> index ebf23012a59b..4ec1bc63213b 100644
>> --- a/arch/um/include/shared/os.h
>> +++ b/arch/um/include/shared/os.h
>> @@ -319,4 +319,8 @@ extern unsigned long os_get_top_address(void);
>>   
>>   long syscall(long number, ...);
>>   
>> +/* signal.c */
>> +extern struct uml_pt_regs *get_save_register_state(void);
>> +extern void release_save_register_state(struct uml_pt_regs *r);
>> +
>>   #endif
>> diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
>> index 57acbd67d85d..558ac7e4df97 100644
>> --- a/arch/um/kernel/signal.c
>> +++ b/arch/um/kernel/signal.c
>> @@ -6,11 +6,18 @@
>>   #include <linux/module.h>
>>   #include <linux/ptrace.h>
>>   #include <linux/sched.h>
>> +#include <linux/slab.h>
>>   #include <asm/siginfo.h>
>>   #include <asm/signal.h>
>>   #include <asm/unistd.h>
>>   #include <frame_kern.h>
>>   #include <kern_util.h>
>> +#include <os.h>
>> +
>> +static atomic_t reg_depth;
>> +
>> +static struct uml_pt_regs reg_file[4];
>> +#define REG_SWITCH_TO_KMALLOC_DEPTH 3
> 
> I fear this is not correct.
> Think of task switching, if you push something to reg_file[],
> it is not guaranteed that the same task will run next and pop the item.
> The order can change.

It should not occur for the signals that map onto IRQ (timer and IO) - 
there it is all kernel.

I cannot trigger that for the other ones, but it may be possible.

IMO we should just bite the bullet and go back to using the stack 
increasing the minimum stack order needed to 1 for 32 bit and to 2 for 
64 bit.

x86 if I read it correctly is still using the stack even for the new 
register sets.


> 
> Thanks,
> //richard
> 
> 
> 
> 
> 

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] um: Try to avoid kmalloc in signal handling
  2019-01-04 10:05 anton.ivanov
@ 2019-01-04 10:13 ` Richard Weinberger
  2019-01-04 10:19   ` Anton Ivanov
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Weinberger @ 2019-01-04 10:13 UTC (permalink / raw)
  To: anton.ivanov; +Cc: linux-um

Am Freitag, 4. Januar 2019, 11:05:17 CET schrieb anton.ivanov@cambridgegreys.com:
> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> 
> Signal handling (which maps to interrupt handling in UML) needs
> to pass current registers to the relevant handlers and was
> allocating a structure for that using kmalloc. It is possible
> to avoid this kmalloc by using a small "signal register stack".
> A depth of 4 suffices for normal use. If it is exceeded
> further sets of registers are allocated as before via kmalloc.
> 
> The end result is >10% performance increase in networking
> as measured by iperf and >5% across the board.
> 
> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> ---
>  arch/um/include/shared/os.h |  4 ++++
>  arch/um/kernel/signal.c     | 31 +++++++++++++++++++++++++++++++
>  arch/um/os-Linux/signal.c   | 16 ++++------------
>  3 files changed, 39 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
> index ebf23012a59b..4ec1bc63213b 100644
> --- a/arch/um/include/shared/os.h
> +++ b/arch/um/include/shared/os.h
> @@ -319,4 +319,8 @@ extern unsigned long os_get_top_address(void);
>  
>  long syscall(long number, ...);
>  
> +/* signal.c */
> +extern struct uml_pt_regs *get_save_register_state(void);
> +extern void release_save_register_state(struct uml_pt_regs *r);
> +
>  #endif
> diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
> index 57acbd67d85d..558ac7e4df97 100644
> --- a/arch/um/kernel/signal.c
> +++ b/arch/um/kernel/signal.c
> @@ -6,11 +6,18 @@
>  #include <linux/module.h>
>  #include <linux/ptrace.h>
>  #include <linux/sched.h>
> +#include <linux/slab.h>
>  #include <asm/siginfo.h>
>  #include <asm/signal.h>
>  #include <asm/unistd.h>
>  #include <frame_kern.h>
>  #include <kern_util.h>
> +#include <os.h>
> +
> +static atomic_t reg_depth;
> +
> +static struct uml_pt_regs reg_file[4];
> +#define REG_SWITCH_TO_KMALLOC_DEPTH 3

I fear this is not correct.
Think of task switching, if you push something to reg_file[],
it is not guaranteed that the same task will run next and pop the item.
The order can change.

Thanks,
//richard





_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] um: Try to avoid kmalloc in signal handling
@ 2019-01-04 10:05 anton.ivanov
  2019-01-04 10:13 ` Richard Weinberger
  0 siblings, 1 reply; 14+ messages in thread
From: anton.ivanov @ 2019-01-04 10:05 UTC (permalink / raw)
  To: linux-um; +Cc: richard, Anton Ivanov

From: Anton Ivanov <anton.ivanov@cambridgegreys.com>

Signal handling (which maps to interrupt handling in UML) needs
to pass current registers to the relevant handlers and was
allocating a structure for that using kmalloc. It is possible
to avoid this kmalloc by using a small "signal register stack".
A depth of 4 suffices for normal use. If it is exceeded
further sets of registers are allocated as before via kmalloc.

The end result is >10% performance increase in networking
as measured by iperf and >5% across the board.

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
---
 arch/um/include/shared/os.h |  4 ++++
 arch/um/kernel/signal.c     | 31 +++++++++++++++++++++++++++++++
 arch/um/os-Linux/signal.c   | 16 ++++------------
 3 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index ebf23012a59b..4ec1bc63213b 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
@@ -319,4 +319,8 @@ extern unsigned long os_get_top_address(void);
 
 long syscall(long number, ...);
 
+/* signal.c */
+extern struct uml_pt_regs *get_save_register_state(void);
+extern void release_save_register_state(struct uml_pt_regs *r);
+
 #endif
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 57acbd67d85d..558ac7e4df97 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -6,11 +6,18 @@
 #include <linux/module.h>
 #include <linux/ptrace.h>
 #include <linux/sched.h>
+#include <linux/slab.h>
 #include <asm/siginfo.h>
 #include <asm/signal.h>
 #include <asm/unistd.h>
 #include <frame_kern.h>
 #include <kern_util.h>
+#include <os.h>
+
+static atomic_t reg_depth;
+
+static struct uml_pt_regs reg_file[4];
+#define REG_SWITCH_TO_KMALLOC_DEPTH 3
 
 EXPORT_SYMBOL(block_signals);
 EXPORT_SYMBOL(unblock_signals);
@@ -111,3 +118,27 @@ void do_signal(struct pt_regs *regs)
 	if (!handled_sig)
 		restore_saved_sigmask();
 }
+
+struct uml_pt_regs *get_save_register_state(void)
+{
+	struct uml_pt_regs *r;
+	int depth = atomic_inc_return(&reg_depth);
+
+	if (depth > REG_SWITCH_TO_KMALLOC_DEPTH) {
+		WARN(1, "Exceeded uml save register space, should not occur");
+		r = kmalloc(sizeof(struct uml_pt_regs), GFP_ATOMIC);
+		if (!r)
+			panic("Cannot allocate extra save register space");
+	} else {
+		r = &reg_file[depth - 1];
+	}
+	return r;
+}
+
+void release_save_register_state(struct uml_pt_regs *r)
+{
+	if (atomic_dec_return(&reg_depth) > REG_SWITCH_TO_KMALLOC_DEPTH)
+		kfree(r);
+}
+
+
diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
index bf0acb8aad8b..81765abe1c15 100644
--- a/arch/um/os-Linux/signal.c
+++ b/arch/um/os-Linux/signal.c
@@ -31,13 +31,9 @@ void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *) = {
 
 static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc)
 {
-	struct uml_pt_regs *r;
+	struct uml_pt_regs *r = get_save_register_state();
 	int save_errno = errno;
 
-	r = uml_kmalloc(sizeof(struct uml_pt_regs), UM_GFP_ATOMIC);
-	if (!r)
-		panic("out of memory");
-
 	r->is_user = 0;
 	if (sig == SIGSEGV) {
 		/* For segfaults, we want the data from the sigcontext. */
@@ -53,7 +49,7 @@ static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc)
 
 	errno = save_errno;
 
-	free(r);
+	release_save_register_state(r);
 }
 
 /*
@@ -91,17 +87,13 @@ void sig_handler(int sig, struct siginfo *si, mcontext_t *mc)
 
 static void timer_real_alarm_handler(mcontext_t *mc)
 {
-	struct uml_pt_regs *regs;
-
-	regs = uml_kmalloc(sizeof(struct uml_pt_regs), UM_GFP_ATOMIC);
-	if (!regs)
-		panic("out of memory");
+	struct uml_pt_regs *regs = get_save_register_state();
 
 	if (mc != NULL)
 		get_regs_from_mc(regs, mc);
 	timer_handler(SIGALRM, NULL, regs);
 
-	free(regs);
+	release_save_register_state(regs);
 }
 
 void timer_alarm_handler(int sig, struct siginfo *unused_si, mcontext_t *mc)
-- 
2.11.0


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-01-07 10:07 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-03 16:09 [PATCH] um: Try to avoid kmalloc in signal handling anton.ivanov
2019-01-03 17:13 ` Anton Ivanov
2019-01-03 21:29   ` Anton Ivanov
2019-01-03 21:42     ` Richard Weinberger
2019-01-04  7:07       ` Anton Ivanov
2019-01-04  7:48       ` Anton Ivanov
2019-01-04 15:50       ` Anton Ivanov
2019-01-07 10:07         ` Anton Ivanov
2019-01-04 10:05 anton.ivanov
2019-01-04 10:13 ` Richard Weinberger
2019-01-04 10:19   ` Anton Ivanov
2019-01-04 10:33     ` Richard Weinberger
2019-01-04 10:49       ` Anton Ivanov
2019-01-04 11:31       ` Anton Ivanov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.