On 2021-12-02 3:33 p.m., Helge Deller wrote:
> On 12/2/21 18:47, John David Anglin wrote:
>> On 2021-12-02 12:15 p.m., John David Anglin wrote:
>>> On 2021-12-01 7:32 p.m., John David Anglin wrote:
>>>> On 2021-12-01 4:05 p.m., Helge Deller wrote:
>>>>> On 12/1/21 20:53, John David Anglin wrote:
>>>>>> On 2021-11-26 2:05 p.m., John David Anglin wrote:
>>>>>>> diff --git a/gcc/config/pa/pa.md b/gcc/config/pa/pa.md
>>>>>>> index f124c301b7a..e8cc81511aa 100644
>>>>>>> --- a/gcc/config/pa/pa.md
>>>>>>> +++ b/gcc/config/pa/pa.md
>>>>>>> @@ -10366,10 +10366,11 @@ add,l %2,%3,%3\;bv,n %%r0(%3)"
>>>>>>>    {
>>>>>>>      if (TARGET_SYNC_LIBCALL)
>>>>>>>        {
>>>>>>> -      rtx mem = operands[0];
>>>>>>> -      rtx val = operands[1];
>>>>>>> -      if (pa_maybe_emit_compare_and_swap_exchange_loop (NULL_RTX, mem, val))
>>>>>>> -       DONE;
>>>>>>> +      rtx libfunc = optab_libfunc (sync_lock_test_and_set_optab, QImode);
>>>>>>> +      emit_library_call (libfunc, LCT_NORMAL, VOIDmode,
>>>>>>> +                        XEXP (operands[0], 0), Pmode,
>>>>>>> +                        operands[1], QImode);
>>>>>>> +      DONE;
>>>>>>>        }
>>>>>>>      FAIL;
>>>>>>>    })
>>>>>>>
>>>>>>> However, doing this causes soft lockups in glibc testsuite:
>>>>>>>
>>>>>>> Message from syslogd@atlas at Nov 25 23:03:01 ...
>>>>>>>    kernel:watchdog: BUG: soft lockup - CPU#0 stuck for 354s! [ld.so.1:22095]
>>>>>>>
>>>>>>> Message from syslogd@atlas at Nov 25 23:03:01 ...
>>>>>>>    kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 361s! [ld.so.1:22093]
>>>>>>>
>>>>>>> Message from syslogd@atlas at Nov 25 23:08:30 ...
>>>>>>>
>>>>>>>    kernel:watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ld.so.1:16025]
>>>>>>>
>>>>>>> Message from syslogd@atlas at Nov 25 23:10:28 ...
>>>>>>>    kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [ld.so.1:22086]
>>>>>>>
>>>>>>> Message from syslogd@atlas at Nov 25 23:10:30 ...
>>>>>>>    kernel:watchdog: BUG: soft lockup - CPU#0 stuck for 21s! [ld.so.1:16025]
>>>>>>>
>>>>>>> This happens both with and without lws_atomic_xchg.  The lockups aren't permanent but they clearly
>>>>>>> impact performance.  Maybe we need to call sched_yield() if we spin too many times?  I think scheduling
>>>>>>> is blocked when we spend too much time on gateway page.
>>>>>> The above soft lockups are not caused by the above change to pa.md.
>>>>>>
>>>>>> They all occur on gateway page in thread related tests. They are not real lockups but I would guess scheduling
>>>>>> is not optimal when we spend a lot of time on gateway page.
>>>>> Or maybe calling cond_resched() from inside the kernel (in the locking functions):
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a8bc5316adc998951e8f726c31e231a6021eae2
>>>> I think the problem is related to COW breaks and the lock hashing which is shared between multiple processes/threads.
>>>> This can leave ldcw lock in the held state for an extended period. There's probably a hole in the logic preventing processes
>>>> from being scheduled on gateway page.
>>>>
>>>> The problem was probably aggravated by the patch to leave interrupts enabled when we try to take lock.
>>>>
>>>> A COW break can occur on the store instruction in the CAS operation.
>>> I wonder if we should deprecate LWS implementation and use a full syscall?  See sys_atomic_cmpxchg_32()
>>> in arch/m68k/kernel/sys_m68k.c for m68k implementation.  I believe arm has one too.
> interesting.
>
>> The big concern about the current implementation is whether or not an IRQ or page fault can cause another
>> thread/process to be scheduled in the middle of the critical sequences.  So far, I haven't seen this but it would take
>> a lot of testing to be sure.
> True.
>
>> Can process be killed if it sleeps in a critical region?
> Don't know.
The attached patch against v5.14.21 fixes the LWS CAS behavior.  COW breaks no longer occur in the critical
region.  The COW break now occurs on the stbys,e instruction.  It magically does a store without writing anything 😁

Don't know if something similar is needed in the futex code.

Now need to extract the new stuff so it applies against mainline. Maybe add a "depi_safe" macro to clean up the
code a bit.

With the whole patch v5.14.21 seems good and I haven't seen any random faults in some time.  Doing some
userspace testing.

Let me know if you have any suggestions.

Dave

-- 
John David Anglin  dave.anglin@bell.net