* [Question] vmalloc latency in RT-Linux
@ 2022-06-21 12:15 Zhipeng Shi
2022-06-23 10:51 ` Baoquan He
0 siblings, 1 reply; 7+ messages in thread
From: Zhipeng Shi @ 2022-06-21 12:15 UTC (permalink / raw)
To: linux-mm, linux-rt-users; +Cc: tglx, shengjian.xu, schspa
I noticed in rt-linux, vmalloc has a large latency. This is because the
free_vmap_area_lock is held for a long time in the function
__purge_vmap_area_lazy.
In non-RT-Linux, because the function spin_is_contended is well
implemented, so there will be no such problem.
But in RT-Linux, spin_is_contended simply returns 0. I don't understand
why this function was implemented like this before, but in order to
solve this problem, I thought of two ways.
The first is to modify the spin_is_contended definition in spinlock_rt.h
as shown below, but I'm not sure if the change has side-effects:
-#define spin_is_contended(lock) (((void)(lock), 0))
+static inline int spin_is_contended(spinlock_t *lock)
+{
+ unsigned long *p = (unsigned long *) &lock->lock.owner;
+
+ return (READ_ONCE(*p) & RT_MUTEX_HAS_WAITERS);
+}
The second is by reducing the number of lazy_max_pages, but it will lead
to lower performance of vmalloc.
Guys, Do you have any good ideas?
Best regards,
Zhipeng
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Question] vmalloc latency in RT-Linux
2022-06-21 12:15 [Question] vmalloc latency in RT-Linux Zhipeng Shi
@ 2022-06-23 10:51 ` Baoquan He
2022-06-23 18:04 ` Waiman Long
0 siblings, 1 reply; 7+ messages in thread
From: Baoquan He @ 2022-06-23 10:51 UTC (permalink / raw)
To: Zhipeng Shi; +Cc: linux-mm, linux-rt-users, tglx, shengjian.xu, schspa, longman
On 06/21/22 at 08:15pm, Zhipeng Shi wrote:
> I noticed in rt-linux, vmalloc has a large latency. This is because the
> free_vmap_area_lock is held for a long time in the function
> __purge_vmap_area_lazy.
>
> In non-RT-Linux, because the function spin_is_contended is well
> implemented, so there will be no such problem.
>
> But in RT-Linux, spin_is_contended simply returns 0. I don't understand
> why this function was implemented like this before, but in order to
> solve this problem, I thought of two ways.
>
> The first is to modify the spin_is_contended definition in spinlock_rt.h
> as shown below, but I'm not sure if the change has side-effects:
>
> -#define spin_is_contended(lock) (((void)(lock), 0))
> +static inline int spin_is_contended(spinlock_t *lock)
> +{
> + unsigned long *p = (unsigned long *) &lock->lock.owner;
> +
> + return (READ_ONCE(*p) & RT_MUTEX_HAS_WAITERS);
> +}
>
> The second is by reducing the number of lazy_max_pages, but it will lead
> to lower performance of vmalloc.
__purge_vmap_area_lazy() has cond_resched_lock() to reschedule and drop
the lock. From your saying, it's spin_is_contended() which is not
working well to make rescheduling happen during __purge_vmap_area_lazy()
handling. Then the fixing should be done in lock side.
>
> Guys, Do you have any good ideas?
>
> Best regards,
> Zhipeng
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Question] vmalloc latency in RT-Linux
2022-06-23 10:51 ` Baoquan He
@ 2022-06-23 18:04 ` Waiman Long
2022-06-24 2:39 ` Baoquan He
0 siblings, 1 reply; 7+ messages in thread
From: Waiman Long @ 2022-06-23 18:04 UTC (permalink / raw)
To: Baoquan He, Zhipeng Shi
Cc: linux-mm, linux-rt-users, tglx, shengjian.xu, schspa,
Sebastian Andrzej Siewior
On 6/23/22 06:51, Baoquan He wrote:
> On 06/21/22 at 08:15pm, Zhipeng Shi wrote:
>> I noticed in rt-linux, vmalloc has a large latency. This is because the
>> free_vmap_area_lock is held for a long time in the function
>> __purge_vmap_area_lazy.
>>
>> In non-RT-Linux, because the function spin_is_contended is well
>> implemented, so there will be no such problem.
>>
>> But in RT-Linux, spin_is_contended simply returns 0. I don't understand
>> why this function was implemented like this before, but in order to
>> solve this problem, I thought of two ways.
>>
>> The first is to modify the spin_is_contended definition in spinlock_rt.h
>> as shown below, but I'm not sure if the change has side-effects:
>>
>> -#define spin_is_contended(lock) (((void)(lock), 0))
>> +static inline int spin_is_contended(spinlock_t *lock)
>> +{
>> + unsigned long *p = (unsigned long *) &lock->lock.owner;
>> +
>> + return (READ_ONCE(*p) & RT_MUTEX_HAS_WAITERS);
>> +}
>>
>> The second is by reducing the number of lazy_max_pages, but it will lead
>> to lower performance of vmalloc.
> __purge_vmap_area_lazy() has cond_resched_lock() to reschedule and drop
> the lock. From your saying, it's spin_is_contended() which is not
> working well to make rescheduling happen during __purge_vmap_area_lazy()
> handling. Then the fixing should be done in lock side.
Sebastian had sent out patch last year to fix spin_is_contended().
https://lore.kernel.org/lkml/20210906143004.2259141-3-bigeasy@linutronix.de/
However, there is no follow-up after some discussion and the patch
wasn't merged.
Cheers,
Longman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Question] vmalloc latency in RT-Linux
2022-06-23 18:04 ` Waiman Long
@ 2022-06-24 2:39 ` Baoquan He
2022-06-24 5:56 ` Zhipeng Shi
2022-06-24 6:46 ` Sebastian Andrzej Siewior
0 siblings, 2 replies; 7+ messages in thread
From: Baoquan He @ 2022-06-24 2:39 UTC (permalink / raw)
To: Waiman Long, Zhipeng Shi
Cc: linux-mm, linux-rt-users, tglx, shengjian.xu, schspa,
Sebastian Andrzej Siewior, peterz
On 06/23/22 at 02:04pm, Waiman Long wrote:
> On 6/23/22 06:51, Baoquan He wrote:
> > On 06/21/22 at 08:15pm, Zhipeng Shi wrote:
> > > I noticed in rt-linux, vmalloc has a large latency. This is because the
> > > free_vmap_area_lock is held for a long time in the function
> > > __purge_vmap_area_lazy.
> > >
> > > In non-RT-Linux, because the function spin_is_contended is well
> > > implemented, so there will be no such problem.
> > >
> > > But in RT-Linux, spin_is_contended simply returns 0. I don't understand
> > > why this function was implemented like this before, but in order to
> > > solve this problem, I thought of two ways.
> > >
> > > The first is to modify the spin_is_contended definition in spinlock_rt.h
> > > as shown below, but I'm not sure if the change has side-effects:
> > >
> > > -#define spin_is_contended(lock) (((void)(lock), 0))
> > > +static inline int spin_is_contended(spinlock_t *lock)
> > > +{
> > > + unsigned long *p = (unsigned long *) &lock->lock.owner;
> > > +
> > > + return (READ_ONCE(*p) & RT_MUTEX_HAS_WAITERS);
> > > +}
> > >
> > > The second is by reducing the number of lazy_max_pages, but it will lead
> > > to lower performance of vmalloc.
> > __purge_vmap_area_lazy() has cond_resched_lock() to reschedule and drop
> > the lock. From your saying, it's spin_is_contended() which is not
> > working well to make rescheduling happen during __purge_vmap_area_lazy()
> > handling. Then the fixing should be done in lock side.
>
> Sebastian had sent out patch last year to fix spin_is_contended().
>
> https://lore.kernel.org/lkml/20210906143004.2259141-3-bigeasy@linutronix.de/
>
> However, there is no follow-up after some discussion and the patch wasn't
> merged.
That's great. Thanks, Longman.
Then this is a good chance to reconsider it, maybe with a test from Zhipeng.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Question] vmalloc latency in RT-Linux
2022-06-24 2:39 ` Baoquan He
@ 2022-06-24 5:56 ` Zhipeng Shi
2022-06-24 6:46 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 7+ messages in thread
From: Zhipeng Shi @ 2022-06-24 5:56 UTC (permalink / raw)
To: Baoquan He, Waiman Long
Cc: linux-mm, linux-rt-users, tglx, shengjian.xu, schspa,
Sebastian Andrzej Siewior, peterz
On Fri, Jun 24, 2022 at 10:39:43AM +0800, Baoquan He wrote:
> On 06/23/22 at 02:04pm, Waiman Long wrote:
> > On 6/23/22 06:51, Baoquan He wrote:
> > > On 06/21/22 at 08:15pm, Zhipeng Shi wrote:
> > > > I noticed in rt-linux, vmalloc has a large latency. This is because the
> > > > free_vmap_area_lock is held for a long time in the function
> > > > __purge_vmap_area_lazy.
> > > >
> > > > In non-RT-Linux, because the function spin_is_contended is well
> > > > implemented, so there will be no such problem.
> > > >
> > > > But in RT-Linux, spin_is_contended simply returns 0. I don't understand
> > > > why this function was implemented like this before, but in order to
> > > > solve this problem, I thought of two ways.
> > > >
> > > > The first is to modify the spin_is_contended definition in spinlock_rt.h
> > > > as shown below, but I'm not sure if the change has side-effects:
> > > >
> > > > -#define spin_is_contended(lock) (((void)(lock), 0))
> > > > +static inline int spin_is_contended(spinlock_t *lock)
> > > > +{
> > > > + unsigned long *p = (unsigned long *) &lock->lock.owner;
> > > > +
> > > > + return (READ_ONCE(*p) & RT_MUTEX_HAS_WAITERS);
> > > > +}
> > > >
> > > > The second is by reducing the number of lazy_max_pages, but it will lead
> > > > to lower performance of vmalloc.
> > > __purge_vmap_area_lazy() has cond_resched_lock() to reschedule and drop
> > > the lock. From your saying, it's spin_is_contended() which is not
> > > working well to make rescheduling happen during __purge_vmap_area_lazy()
> > > handling. Then the fixing should be done in lock side.
> >
> > Sebastian had sent out patch last year to fix spin_is_contended().
> >
> > https://lore.kernel.org/lkml/20210906143004.2259141-3-bigeasy@linutronix.de/
> >
> > However, there is no follow-up after some discussion and the patch wasn't
> > merged.
>
> That's great. Thanks, Longman.
>
> Then this is a good chance to reconsider it, maybe with a test from Zhipeng.
Before that, since I didn't find the patch that Sebastian sent before,
I sent relevant patch for this problem (now it seems that Sebastian's
changes are better than mine) and test scripts. please refer to the
following links:
https://lore.kernel.org/lkml/20220608142457.GA2400218@ubuntu20/
With this patch, max-latency of vmalloc reduce from 10+ msec to
200+ usec, this because spin_lock is released halfway through
__purge_vmap_area_lazy.
Best regards,
Zhipeng
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Question] vmalloc latency in RT-Linux
2022-06-24 2:39 ` Baoquan He
2022-06-24 5:56 ` Zhipeng Shi
@ 2022-06-24 6:46 ` Sebastian Andrzej Siewior
2022-06-25 2:27 ` Waiman Long
1 sibling, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-06-24 6:46 UTC (permalink / raw)
To: Baoquan He
Cc: Waiman Long, Zhipeng Shi, linux-mm, linux-rt-users, tglx,
shengjian.xu, schspa, peterz
On 2022-06-24 10:39:43 [+0800], Baoquan He wrote:
> Then this is a good chance to reconsider it, maybe with a test from Zhipeng.
I reconsidered and it was dropped purpose, see
https://lore.kernel.org/lkml/YT80AB8%2FG59QBSVq@hirez.programming.kicks-ass.net/
Sebastian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Question] vmalloc latency in RT-Linux
2022-06-24 6:46 ` Sebastian Andrzej Siewior
@ 2022-06-25 2:27 ` Waiman Long
0 siblings, 0 replies; 7+ messages in thread
From: Waiman Long @ 2022-06-25 2:27 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, Baoquan He
Cc: Zhipeng Shi, linux-mm, linux-rt-users, tglx, shengjian.xu,
schspa, peterz
On 6/24/22 02:46, Sebastian Andrzej Siewior wrote:
> On 2022-06-24 10:39:43 [+0800], Baoquan He wrote:
>> Then this is a good chance to reconsider it, maybe with a test from Zhipeng.
> I reconsidered and it was dropped purpose, see
> https://lore.kernel.org/lkml/YT80AB8%2FG59QBSVq@hirez.programming.kicks-ass.net/
I do agree that is_contended may not that useful for rwlock, but it can
be useful for spinlock. Will you consider a version just for rt_spinlock?
Cheers,
Longman
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-06-25 2:27 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-21 12:15 [Question] vmalloc latency in RT-Linux Zhipeng Shi
2022-06-23 10:51 ` Baoquan He
2022-06-23 18:04 ` Waiman Long
2022-06-24 2:39 ` Baoquan He
2022-06-24 5:56 ` Zhipeng Shi
2022-06-24 6:46 ` Sebastian Andrzej Siewior
2022-06-25 2:27 ` Waiman Long
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.