On 26.01.22 14:56, Corey Minyard wrote: > On Wed, Jan 26, 2022 at 07:08:22AM +0100, Juergen Gross wrote: >> On 25.01.22 19:27, Corey Minyard wrote: >>> We have a customer that had been seeing CSD lock issues on a Centos 7 >>> kernel (unfortunately). I couldn't find anything or any kernel changes >>> that might fix it, so I was consdering it was the CSD locking issue you >>> have been chasing for a while. >> >> Is this on bare metal or in a virtualized environment? > > This is bare metal. > > I do think I know what happened. Here's my analysis... > > csd: Detected non-responsive CSD lock (#1) on CPU#3, waiting 5000000042 ns for CPU#55 flush_tlb_func+0x0/0xb0(0xffff8e0b3e2afbe8). > csd: CSD lock (#1) unresponsive. > csd: cnt(0000000): 0000->0000 queue > csd: cnt(0000001): ffff->0037 idle > > The above means that these events weren't seen, I think. We can > ignore them in any case. > > csd: cnt(63d8dd8): 0003->0037 ipi > csd: cnt(63d8dd9): 0003->0037 ping > csd: cnt(63d8dda): 0003->ffff pinged > > This is a little confusing. The first two lines have to be from > __smp_call_single_queue_debug. The last line has to be from > smp_call_function_many. But you never see the pinged from > __smp_call_single_queue_debug. Be careful here. For each event-type/cpu combination there is only one entry saved. It is still possible to see some events more than once, as each entry holds information of two events (the location _where_ the entry is stored is determining what just happened, and the _data_ stored at this position tells us what happened one event before that). So any "missing" entry which type/cpu combination is seen further down in the trace _might_ be just overwritten by the later entry. I've used this scheme instead of simple trace buffers as on a large system any buffer would be overwritten multiple times before the CSD timeout is over. This way at least the trace entries of the cpus having problems (the sender and/or the receiver) are still available. After having found a local reproducer for my problem I could enhance the patches by using small arrays instead of single entries for each trace entry location, producing better traces with less "missing" entries. I didn't post those patches, as they were heavily intertwined with hypervisor patches for producing traces covering both, kernel and hypervisor events. > > csd: cnt(63d8dea): 0035->0037 pinged > > The tail of CPU 53 sending an IPI to CPU 55 in > __smp_call_single_queue_debug. > > csd: cnt(63d8deb): ffff->0037 gotipi > csd: cnt(63d8dec): ffff->0037 handle > csd: cnt(63d8ded): ffff->0037 dequeue (src CPU 0 == empty) > csd: cnt(63d8dee): ffff->0037 hdlend (src CPU 0 == early) > > CPU 55 is handling the IPI(s) it was sent earlier. Right. Everything up to here is completely fine and can be ignored. > > csd: cnt(63d8e1f): 0003->0037 queue > csd: cnt(63d8e20): 0003->0037 ipi > csd: cnt(63d8e21): 0003->0037 ping > > In __smp_call_single_queue_debug CPU 3 sends another message to > CPU 55 and sends an IPI. But there should be a pinged entry > after this. > > csd: cnt(63d8e22): 0003->0037 queue > csd: cnt(63d8e23): 0003->0037 noipi This is interesting. Those are 5 consecutive entries without any missing in between (see the counter values). Could it be that after the ping there was an interrupt and the code was re-entered for sending another IPI? This would clearly result in a hang as seen. Juergen