All of lore.kernel.org
 help / color / mirror / Atom feed
* Possible regression in 8250_dw driver
@ 2023-05-23  8:59 Richard Tresidder
  2023-05-23 11:13 ` Ilpo Järvinen
  2023-05-24 11:20 ` Linux regression tracking #adding (Thorsten Leemhuis)
  0 siblings, 2 replies; 12+ messages in thread
From: Richard Tresidder @ 2023-05-23  8:59 UTC (permalink / raw)
  To: linux-serial

Hi
    We seem to be getting corruption of received data from a ublox GPS unit
To me it looks like a fifo overrun of some sort??

background:
I'm attempting to use 6.3.3 as a new base for one of our systems.
Previously it was using 5.1.7 as a base.
The uart in question is one of the two in the Cyclone V SOC HPS.
And to muddy the waters the linux console TTYS0 is the other Uart from 
the same HPS core
Yet the console appears to be working ok.
Note all other libs and apps are at the same revision and build, it is 
only the kernel that is different.
Both versions of the kernel are also built using the same bitbake bsdk..

Seeing the following with 6.3.3:

  00000000: 45 58 54 20 43 4F 52 45 20 33 2E 30 31 20 28 31  | EXT CORE 
3.01 (1
  00000010: 31 31 31 34 31 29 00 00 00 00 00 00 00 00 30 30  | 
11141)........00
  00000020: 30 38 30 30 30 30 00 00 52 4F 4D 20 42 41 53 45  | 
080000..ROM BASE
  00000030: 20 32 2E 30 31 20 28 37 35 33 33 31 53 00 00 00  | 2.01 
(75331S...
  00000040: 00 00 00 00 00 00 00 00 00 00 00 00 53 42 41 53  | 
............SBAS
  00000050: 3B 49 4D 45 53 3B 51 5A 53 53 00 00 00 00 00 00  | 
;IMES;QZSS......
  00000060: 00 00 00 00 00 00 00 00 00 00 01 3D 29 00 00 00  | 
...........=)...
  00000070: 00 00 00 00 00 00 46 57 56 45 52 3D 54 49 4D 20  | 
......FWVER=TIM
  00000080: 31 2E 31 30 00 00 00 00 00 00 00 00 00 00 00 00  | 
1.10............
  00000090: 00 00 00 00 50 52 4F 54 56 45 52 3D 32 32 2E 30  | 
....PROTVER=22.0
  000000a0: 30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  | 
0...............
  000000b0: 00 00 4D 4F 44 3D 4C 45 41 2D 4D 38 54 2D 30 00  | 
..MOD=LEA-M8T-0.
  000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  | 
................
  000000d0: 46 49 53 3D 30 78 45 46 34 30 31 35 20 28 31 30  | 
FIS=0xEF4015 (10
  000000e0: 30 31 31 31 29 00 00 00 00 00 00 00 00 00 47 50  | 
0111).........GP
  000000f0: 53 3B 47 4C 4F 3B 47 41 4C 3B 42 44 00 00 00 00  | 
S;GLO;GAL;BD....
  00000100: 00 00                                            | ..

But should be seeing this as shown on 5.1.7:
Excuse the offset (due to this frame also showing the packet id's and 
lengths)
But the body of the frame is what we should be seeing.

00000000:  B5 62 0A 04 FA 00 45 58 54 20 43 4F 52 45 20 33  | µb..ú.EXT 
CORE 3
00000010:  2E 30 31 20 28 31 31 31 31 34 31 29 00 00 00 00  | .01 
(111141)....
00000020:  00 00 00 00 30 30 30 38 30 30 30 30 00 00 52 4F  | 
....00080000..RO
00000030:  4D 20 42 41 53 45 20 32 2E 30 31 20 28 37 35 33  | M BASE 
2.01 (753
00000040:  33 31 29 00 00 00 00 00 00 00 00 00 46 57 56 45  | 
31).........FWVE
00000050:  52 3D 54 49 4D 20 31 2E 31 30 00 00 00 00 00 00  | R=TIM 
1.10......
00000060:  00 00 00 00 00 00 00 00 00 00 50 52 4F 54 56 45  | 
..........PROTVE
00000070:  52 3D 32 32 2E 30 30 00 00 00 00 00 00 00 00 00  | 
R=22.00.........
00000080:  00 00 00 00 00 00 00 00 4D 4F 44 3D 4C 45 41 2D  | 
........MOD=LEA-
00000090:  4D 38 54 2D 30 00 00 00 00 00 00 00 00 00 00 00  | 
M8T-0...........
000000A0:  00 00 00 00 00 00 46 49 53 3D 30 78 45 46 34 30  | 
......FIS=0xEF40
000000B0:  31 35 20 28 31 30 30 31 31 31 29 00 00 00 00 00  | 15 
(100111).....
000000C0:  00 00 00 00 47 50 53 3B 47 4C 4F 3B 47 41 4C 3B  | 
....GPS;GLO;GAL;
000000D0:  42 44 53 00 00 00 00 00 00 00 00 00 00 00 00 00  | 
BDS.............
000000E0:  00 00 53 42 41 53 3B 49 4D 45 53 3B 51 5A 53 53  | 
..SBAS;IMES;QZSS
000000F0:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  | 
................
00000100:  01 3D                                            | .=.

As you can see it looks like the frame thats received on the 6.3.3 
kernel is mangled?
This same message is just being requested over and over again from the 
GPS unit.

The offset where the tears occur looks to be pretty similar between each 
poll request.
Usually the 1 at the end of the (75331 is where the first tear occurs.

I'd appreciate some quidance in how to track this down as there appears 
to have been a reasonable amount of work done to this driver and the 
serial core between these two versions.

Cheers
    Richard Tresidder

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible regression in 8250_dw driver
  2023-05-23  8:59 Possible regression in 8250_dw driver Richard Tresidder
@ 2023-05-23 11:13 ` Ilpo Järvinen
  2023-05-24  7:52   ` Richard Tresidder
  2023-05-24 11:20 ` Linux regression tracking #adding (Thorsten Leemhuis)
  1 sibling, 1 reply; 12+ messages in thread
From: Ilpo Järvinen @ 2023-05-23 11:13 UTC (permalink / raw)
  To: Richard Tresidder; +Cc: linux-serial

[-- Attachment #1: Type: text/plain, Size: 4798 bytes --]

On Tue, 23 May 2023, Richard Tresidder wrote:

> Hi
>    We seem to be getting corruption of received data from a ublox GPS unit
> To me it looks like a fifo overrun of some sort??

Overruns should be logged (in dmesg or /proc/tty/driver/serial).

> background:
> I'm attempting to use 6.3.3 as a new base for one of our systems.
> Previously it was using 5.1.7 as a base.
> The uart in question is one of the two in the Cyclone V SOC HPS.
> And to muddy the waters the linux console TTYS0 is the other Uart from the
> same HPS core
> Yet the console appears to be working ok.

Maybe some of the DMA related changes triggering some unexpected behavior. 

Console doesn't use DMA so that could explain the difference.

> Note all other libs and apps are at the same revision and build, it is only
> the kernel that is different.
> Both versions of the kernel are also built using the same bitbake bsdk..
> 
> Seeing the following with 6.3.3:
> 
>  00000000: 45 58 54 20 43 4F 52 45 20 33 2E 30 31 20 28 31  | EXT CORE 3.01 (1
>  00000010: 31 31 31 34 31 29 00 00 00 00 00 00 00 00 30 30  | 11141)........00
>  00000020: 30 38 30 30 30 30 00 00 52 4F 4D 20 42 41 53 45  | 080000..ROM BASE
>  00000030: 20 32 2E 30 31 20 28 37 35 33 33 31 53 00 00 00  | 2.01 (75331S...
>  00000040: 00 00 00 00 00 00 00 00 00 00 00 00 53 42 41 53  | ............SBAS
>  00000050: 3B 49 4D 45 53 3B 51 5A 53 53 00 00 00 00 00 00  | ;IMES;QZSS......
>  00000060: 00 00 00 00 00 00 00 00 00 00 01 3D 29 00 00 00  | ...........=)...
>  00000070: 00 00 00 00 00 00 46 57 56 45 52 3D 54 49 4D 20  | ......FWVER=TIM
>  00000080: 31 2E 31 30 00 00 00 00 00 00 00 00 00 00 00 00  | 1.10............
>  00000090: 00 00 00 00 50 52 4F 54 56 45 52 3D 32 32 2E 30  | ....PROTVER=22.0
>  000000a0: 30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  | 0...............
>  000000b0: 00 00 4D 4F 44 3D 4C 45 41 2D 4D 38 54 2D 30 00  | ..MOD=LEA-M8T-0.
>  000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  | ................
>  000000d0: 46 49 53 3D 30 78 45 46 34 30 31 35 20 28 31 30  | FIS=0xEF4015 (10
>  000000e0: 30 31 31 31 29 00 00 00 00 00 00 00 00 00 47 50  | 0111).........GP
>  000000f0: 53 3B 47 4C 4F 3B 47 41 4C 3B 42 44 00 00 00 00  | S;GLO;GAL;BD....
>  00000100: 00 00                                            | ..
> 
> But should be seeing this as shown on 5.1.7:
> Excuse the offset (due to this frame also showing the packet id's and lengths)
> But the body of the frame is what we should be seeing.
> 
> 00000000:  B5 62 0A 04 FA 00 45 58 54 20 43 4F 52 45 20 33  | µb..ú.EXT CORE 3
> 00000010:  2E 30 31 20 28 31 31 31 31 34 31 29 00 00 00 00  | .01 (111141)....
> 00000020:  00 00 00 00 30 30 30 38 30 30 30 30 00 00 52 4F  | ....00080000..RO
> 00000030:  4D 20 42 41 53 45 20 32 2E 30 31 20 28 37 35 33  | M BASE 2.01 (753
> 00000040:  33 31 29 00 00 00 00 00 00 00 00 00 46 57 56 45  | 31).........FWVE
> 00000050:  52 3D 54 49 4D 20 31 2E 31 30 00 00 00 00 00 00  | R=TIM 1.10......
> 00000060:  00 00 00 00 00 00 00 00 00 00 50 52 4F 54 56 45  | ..........PROTVE
> 00000070:  52 3D 32 32 2E 30 30 00 00 00 00 00 00 00 00 00  | R=22.00.........
> 00000080:  00 00 00 00 00 00 00 00 4D 4F 44 3D 4C 45 41 2D  | ........MOD=LEA-
> 00000090:  4D 38 54 2D 30 00 00 00 00 00 00 00 00 00 00 00  | M8T-0...........
> 000000A0:  00 00 00 00 00 00 46 49 53 3D 30 78 45 46 34 30  | ......FIS=0xEF40
> 000000B0:  31 35 20 28 31 30 30 31 31 31 29 00 00 00 00 00  | 15 (100111).....
> 000000C0:  00 00 00 00 47 50 53 3B 47 4C 4F 3B 47 41 4C 3B  | ....GPS;GLO;GAL;
> 000000D0:  42 44 53 00 00 00 00 00 00 00 00 00 00 00 00 00  | BDS.............
> 000000E0:  00 00 53 42 41 53 3B 49 4D 45 53 3B 51 5A 53 53  | ..SBAS;IMES;QZSS
> 000000F0:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  | ................
> 00000100:  01 3D                                            | .=.
> 
> As you can see it looks like the frame thats received on the 6.3.3 kernel is
> mangled?
> This same message is just being requested over and over again from the GPS
> unit.
> 
> The offset where the tears occur looks to be pretty similar between each poll
> request.
> Usually the 1 at the end of the (75331 is where the first tear occurs.
> 
> I'd appreciate some quidance in how to track this down as there appears to
> have been a reasonable amount of work done to this driver and the serial core
> between these two versions.

A few ideas:
- try without dma_rx_complete() calling p->dma->rx_dma(p)
- revert 90b8596ac46043e4a782d9111f5b285251b13756
- Try the revert in https://lore.kernel.org/all/316ab583-d217-a332-d161-8225b0cee227@redhat.com/2-0001-Revert-serial-8250-use-THRE-__stop_tx-also-with-DMA.patch
  (for e8ffbb71f783 and f8d6e9d3ca5c)

But finding the culprit with git bisect would be the most helpful here.


-- 
 i.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible regression in 8250_dw driver
  2023-05-23 11:13 ` Ilpo Järvinen
@ 2023-05-24  7:52   ` Richard Tresidder
  2023-05-24 10:06     ` Ilpo Järvinen
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Tresidder @ 2023-05-24  7:52 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: linux-serial

On 23/05/2023 7:13 pm, Ilpo Järvinen wrote:

> On Tue, 23 May 2023, Richard Tresidder wrote:
>
>> Hi
>>     We seem to be getting corruption of received data from a ublox GPS unit
>> To me it looks like a fifo overrun of some sort??
> Overruns should be logged (in dmesg or /proc/tty/driver/serial).

Hmm wasn't seeing anything in either.

>
>> background:
>> I'm attempting to use 6.3.3 as a new base for one of our systems.
>> Previously it was using 5.1.7 as a base.
>> The uart in question is one of the two in the Cyclone V SOC HPS.
>> And to muddy the waters the linux console TTYS0 is the other Uart from the
>> same HPS core
>> Yet the console appears to be working ok.
> Maybe some of the DMA related changes triggering some unexpected behavior.
>
> Console doesn't use DMA so that could explain the difference.
>
>> Note all other libs and apps are at the same revision and build, it is only
>> the kernel that is different.
>> Both versions of the kernel are also built using the same bitbake bsdk..
>>
>> Seeing the following with 6.3.3:
>>
>>   00000000: 45 58 54 20 43 4F 52 45 20 33 2E 30 31 20 28 31  | EXT CORE 3.01 (1
>>   00000010: 31 31 31 34 31 29 00 00 00 00 00 00 00 00 30 30  | 11141)........00
>>   00000020: 30 38 30 30 30 30 00 00 52 4F 4D 20 42 41 53 45  | 080000..ROM BASE
>>   00000030: 20 32 2E 30 31 20 28 37 35 33 33 31 53 00 00 00  | 2.01 (75331S...
>>   00000040: 00 00 00 00 00 00 00 00 00 00 00 00 53 42 41 53  | ............SBAS
>>   00000050: 3B 49 4D 45 53 3B 51 5A 53 53 00 00 00 00 00 00  | ;IMES;QZSS......
>>   00000060: 00 00 00 00 00 00 00 00 00 00 01 3D 29 00 00 00  | ...........=)...
>>   00000070: 00 00 00 00 00 00 46 57 56 45 52 3D 54 49 4D 20  | ......FWVER=TIM
>>   00000080: 31 2E 31 30 00 00 00 00 00 00 00 00 00 00 00 00  | 1.10............
>>   00000090: 00 00 00 00 50 52 4F 54 56 45 52 3D 32 32 2E 30  | ....PROTVER=22.0
>>   000000a0: 30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  | 0...............
>>   000000b0: 00 00 4D 4F 44 3D 4C 45 41 2D 4D 38 54 2D 30 00  | ..MOD=LEA-M8T-0.
>>   000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  | ................
>>   000000d0: 46 49 53 3D 30 78 45 46 34 30 31 35 20 28 31 30  | FIS=0xEF4015 (10
>>   000000e0: 30 31 31 31 29 00 00 00 00 00 00 00 00 00 47 50  | 0111).........GP
>>   000000f0: 53 3B 47 4C 4F 3B 47 41 4C 3B 42 44 00 00 00 00  | S;GLO;GAL;BD....
>>   00000100: 00 00                                            | ..
>>
>> But should be seeing this as shown on 5.1.7:
>> Excuse the offset (due to this frame also showing the packet id's and lengths)
>> But the body of the frame is what we should be seeing.
>>
>> 00000000:  B5 62 0A 04 FA 00 45 58 54 20 43 4F 52 45 20 33  | µb..ú.EXT CORE 3
>> 00000010:  2E 30 31 20 28 31 31 31 31 34 31 29 00 00 00 00  | .01 (111141)....
>> 00000020:  00 00 00 00 30 30 30 38 30 30 30 30 00 00 52 4F  | ....00080000..RO
>> 00000030:  4D 20 42 41 53 45 20 32 2E 30 31 20 28 37 35 33  | M BASE 2.01 (753
>> 00000040:  33 31 29 00 00 00 00 00 00 00 00 00 46 57 56 45  | 31).........FWVE
>> 00000050:  52 3D 54 49 4D 20 31 2E 31 30 00 00 00 00 00 00  | R=TIM 1.10......
>> 00000060:  00 00 00 00 00 00 00 00 00 00 50 52 4F 54 56 45  | ..........PROTVE
>> 00000070:  52 3D 32 32 2E 30 30 00 00 00 00 00 00 00 00 00  | R=22.00.........
>> 00000080:  00 00 00 00 00 00 00 00 4D 4F 44 3D 4C 45 41 2D  | ........MOD=LEA-
>> 00000090:  4D 38 54 2D 30 00 00 00 00 00 00 00 00 00 00 00  | M8T-0...........
>> 000000A0:  00 00 00 00 00 00 46 49 53 3D 30 78 45 46 34 30  | ......FIS=0xEF40
>> 000000B0:  31 35 20 28 31 30 30 31 31 31 29 00 00 00 00 00  | 15 (100111).....
>> 000000C0:  00 00 00 00 47 50 53 3B 47 4C 4F 3B 47 41 4C 3B  | ....GPS;GLO;GAL;
>> 000000D0:  42 44 53 00 00 00 00 00 00 00 00 00 00 00 00 00  | BDS.............
>> 000000E0:  00 00 53 42 41 53 3B 49 4D 45 53 3B 51 5A 53 53  | ..SBAS;IMES;QZSS
>> 000000F0:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  | ................
>> 00000100:  01 3D                                            | .=.
>>
>> As you can see it looks like the frame thats received on the 6.3.3 kernel is
>> mangled?
>> This same message is just being requested over and over again from the GPS
>> unit.
>>
>> The offset where the tears occur looks to be pretty similar between each poll
>> request.
>> Usually the 1 at the end of the (75331 is where the first tear occurs.
>>
>> I'd appreciate some quidance in how to track this down as there appears to
>> have been a reasonable amount of work done to this driver and the serial core
>> between these two versions.
> A few ideas:
> - try without dma_rx_complete() calling p->dma->rx_dma(p)
> - revert 90b8596ac46043e4a782d9111f5b285251b13756
> - Try the revert in https://lore.kernel.org/all/316ab583-d217-a332-d161-8225b0cee227@redhat.com/2-0001-Revert-serial-8250-use-THRE-__stop_tx-also-with-DMA.patch
>    (for e8ffbb71f783 and f8d6e9d3ca5c)
>
> But finding the culprit with git bisect would be the most helpful here.
>
Bisect wasn't an easy option as I'd applied a pile of patches after the interesting commits for our system to run.
I'm not a git master :)

So I just started reverting the patches that had been applied to the 8250 folder.
Worked backwards from head.

After reverting 57e9af7831dcf211c5c689c2a6f209f4abdf0bce
serial: 8250_dma: Fix DMA Rx rearm race

Things started to work again.

I reset everything and then just reverted that individual patch and things work.
So that looks like the culprit..


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible regression in 8250_dw driver
  2023-05-24  7:52   ` Richard Tresidder
@ 2023-05-24 10:06     ` Ilpo Järvinen
  2023-05-25  3:44       ` Richard Tresidder
  0 siblings, 1 reply; 12+ messages in thread
From: Ilpo Järvinen @ 2023-05-24 10:06 UTC (permalink / raw)
  To: Richard Tresidder; +Cc: linux-serial

[-- Attachment #1: Type: text/plain, Size: 4681 bytes --]

On Wed, 24 May 2023, Richard Tresidder wrote:
> On 23/05/2023 7:13 pm, Ilpo Järvinen wrote:
> > On Tue, 23 May 2023, Richard Tresidder wrote:
> > 
> > > background:
> > > I'm attempting to use 6.3.3 as a new base for one of our systems.
> > > Previously it was using 5.1.7 as a base.
> > > The uart in question is one of the two in the Cyclone V SOC HPS.
> > > And to muddy the waters the linux console TTYS0 is the other Uart from the
> > > same HPS core
> > > Yet the console appears to be working ok.
> > Maybe some of the DMA related changes triggering some unexpected behavior.
> > 
> > Console doesn't use DMA so that could explain the difference.
> > 
> > > Note all other libs and apps are at the same revision and build, it is
> > > only
> > > the kernel that is different.
> > > Both versions of the kernel are also built using the same bitbake bsdk..
> > > 
> > > 
> > > As you can see it looks like the frame thats received on the 6.3.3 kernel
> > > is
> > > mangled?
> > > This same message is just being requested over and over again from the GPS
> > > unit.
> > > 
> > > The offset where the tears occur looks to be pretty similar between each
> > > poll
> > > request.
> > > Usually the 1 at the end of the (75331 is where the first tear occurs.
> > > 
> > > I'd appreciate some quidance in how to track this down as there appears to
> > > have been a reasonable amount of work done to this driver and the serial
> > > core
> > > between these two versions.
> > A few ideas:
> > - try without dma_rx_complete() calling p->dma->rx_dma(p)
> > - revert 90b8596ac46043e4a782d9111f5b285251b13756
> > - Try the revert in
> > https://lore.kernel.org/all/316ab583-d217-a332-d161-8225b0cee227@redhat.com/2-0001-Revert-serial-8250-use-THRE-__stop_tx-also-with-DMA.patch
> >    (for e8ffbb71f783 and f8d6e9d3ca5c)
> > 
> > But finding the culprit with git bisect would be the most helpful here.
> > 
> Bisect wasn't an easy option as I'd applied a pile of patches after the
> interesting commits for our system to run.
> I'm not a git master :)
> 
> So I just started reverting the patches that had been applied to the 8250
> folder.
> Worked backwards from head.
> 
> After reverting 57e9af7831dcf211c5c689c2a6f209f4abdf0bce
> serial: 8250_dma: Fix DMA Rx rearm race
> 
> Things started to work again.
> 
> I reset everything and then just reverted that individual patch and things
> work.
> So that looks like the culprit..

Okay, that also worked great, thanks a lot. It gives something concrete to 
work with to find a fix.

The commit itself looks very much correct, however, my guess is that when 
serial8250_rx_dma_flush() does dmaengine_pause() dma_status somehow 
does not become DMA_PAUSED which leads to not properly flushing DMA Rx 
transaction. Can you try the following debug patch (if my guess is 
correct, besides triggering the WARN_ON_ONCE, it also works around the 
issue):

[PATCH] DEBUG: 8250_dma: Detect DMA_PAUSED vs DMA_IN_PROGRESS inconsistency

Detect DMA state not returning DMA_PAUSED after dmaengine_pause() was
issued.

Not intended for upstream.

---
 drivers/tty/serial/8250/8250_dma.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/tty/serial/8250/8250_dma.c b/drivers/tty/serial/8250/8250_dma.c
index 7fa66501792d..3dedacd9f104 100644
--- a/drivers/tty/serial/8250/8250_dma.c
+++ b/drivers/tty/serial/8250/8250_dma.c
@@ -38,7 +38,7 @@ static void __dma_tx_complete(void *param)
 	spin_unlock_irqrestore(&p->port.lock, flags);
 }
 
-static void __dma_rx_complete(struct uart_8250_port *p)
+static void __dma_rx_complete(struct uart_8250_port *p, bool paused)
 {
 	struct uart_8250_dma	*dma = p->dma;
 	struct tty_port		*tty_port = &p->port.state->port;
@@ -52,8 +52,12 @@ static void __dma_rx_complete(struct uart_8250_port *p)
 	 * anything in such case.
 	 */
 	dma_status = dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state);
-	if (dma_status == DMA_IN_PROGRESS)
-		return;
+	if (dma_status == DMA_IN_PROGRESS) {
+		if (!paused)
+			return;
+
+		WARN_ON_ONCE(paused);
+	}
 
 	count = dma->rx_size - state.residue;
 
@@ -72,7 +76,7 @@ static void dma_rx_complete(void *param)
 
 	spin_lock_irqsave(&p->port.lock, flags);
 	if (dma->rx_running)
-		__dma_rx_complete(p);
+		__dma_rx_complete(p, false);
 
 	/*
 	 * Cannot be combined with the previous check because __dma_rx_complete()
@@ -172,7 +176,7 @@ void serial8250_rx_dma_flush(struct uart_8250_port *p)
 
 	if (dma->rx_running) {
 		dmaengine_pause(dma->rxchan);
-		__dma_rx_complete(p);
+		__dma_rx_complete(p, true);
 		dmaengine_terminate_async(dma->rxchan);
 	}
 }

-- 
tg: (ac9a78681b92..) debug/dma-paused (depends on: tty-next)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Possible regression in 8250_dw driver
  2023-05-23  8:59 Possible regression in 8250_dw driver Richard Tresidder
  2023-05-23 11:13 ` Ilpo Järvinen
@ 2023-05-24 11:20 ` Linux regression tracking #adding (Thorsten Leemhuis)
  1 sibling, 0 replies; 12+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-05-24 11:20 UTC (permalink / raw)
  To: Richard Tresidder, linux-serial; +Cc: Linux kernel regressions list

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 23.05.23 10:59, Richard Tresidder wrote:
>    We seem to be getting corruption of received data from a ublox GPS unit
> To me it looks like a fifo overrun of some sort??
> 
> background:
> I'm attempting to use 6.3.3 as a new base for one of our systems.
> Previously it was using 5.1.7 as a base.
> The uart in question is one of the two in the Cyclone V SOC HPS.
> And to muddy the waters the linux console TTYS0 is the other Uart from
> the same HPS core
> Yet the console appears to be working ok.
> Note all other libs and apps are at the same revision and build, it is
> only the kernel that is different.
> Both versions of the kernel are also built using the same bitbake bsdk..
> 
> Seeing the following with 6.3.3:
> [...]

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 57e9af7831dcf211c5c689c2a
#regzbot title serial: 8250_dma: data corruption with ublox GPS unit
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible regression in 8250_dw driver
  2023-05-24 10:06     ` Ilpo Järvinen
@ 2023-05-25  3:44       ` Richard Tresidder
  2023-05-25  8:43         ` Ilpo Järvinen
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Tresidder @ 2023-05-25  3:44 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: linux-serial





Richard Tresidder








**






On 24/05/2023 6:06 pm, Ilpo Järvinen wrote:

> On Wed, 24 May 2023, Richard Tresidder wrote:
>> On 23/05/2023 7:13 pm, Ilpo Järvinen wrote:
>>> On Tue, 23 May 2023, Richard Tresidder wrote:
>>>
>>>> background:
>>>> I'm attempting to use 6.3.3 as a new base for one of our systems.
>>>> Previously it was using 5.1.7 as a base.
>>>> The uart in question is one of the two in the Cyclone V SOC HPS.
>>>> And to muddy the waters the linux console TTYS0 is the other Uart from the
>>>> same HPS core
>>>> Yet the console appears to be working ok.
>>> Maybe some of the DMA related changes triggering some unexpected behavior.
>>>
>>> Console doesn't use DMA so that could explain the difference.
>>>
>>>> Note all other libs and apps are at the same revision and build, it is
>>>> only
>>>> the kernel that is different.
>>>> Both versions of the kernel are also built using the same bitbake bsdk..
>>>>
>>>>
>>>> As you can see it looks like the frame thats received on the 6.3.3 kernel
>>>> is
>>>> mangled?
>>>> This same message is just being requested over and over again from the GPS
>>>> unit.
>>>>
>>>> The offset where the tears occur looks to be pretty similar between each
>>>> poll
>>>> request.
>>>> Usually the 1 at the end of the (75331 is where the first tear occurs.
>>>>
>>>> I'd appreciate some quidance in how to track this down as there appears to
>>>> have been a reasonable amount of work done to this driver and the serial
>>>> core
>>>> between these two versions.
>>> A few ideas:
>>> - try without dma_rx_complete() calling p->dma->rx_dma(p)
>>> - revert 90b8596ac46043e4a782d9111f5b285251b13756
>>> - Try the revert in
>>> https://lore.kernel.org/all/316ab583-d217-a332-d161-8225b0cee227@redhat.com/2-0001-Revert-serial-8250-use-THRE-__stop_tx-also-with-DMA.patch
>>>     (for e8ffbb71f783 and f8d6e9d3ca5c)
>>>
>>> But finding the culprit with git bisect would be the most helpful here.
>>>
>> Bisect wasn't an easy option as I'd applied a pile of patches after the
>> interesting commits for our system to run.
>> I'm not a git master :)
>>
>> So I just started reverting the patches that had been applied to the 8250
>> folder.
>> Worked backwards from head.
>>
>> After reverting 57e9af7831dcf211c5c689c2a6f209f4abdf0bce
>> serial: 8250_dma: Fix DMA Rx rearm race
>>
>> Things started to work again.
>>
>> I reset everything and then just reverted that individual patch and things
>> work.
>> So that looks like the culprit..
> Okay, that also worked great, thanks a lot. It gives something concrete to
> work with to find a fix.
>
> The commit itself looks very much correct, however, my guess is that when
> serial8250_rx_dma_flush() does dmaengine_pause() dma_status somehow
> does not become DMA_PAUSED which leads to not properly flushing DMA Rx
> transaction. Can you try the following debug patch (if my guess is
> correct, besides triggering the WARN_ON_ONCE, it also works around the
> issue):
>
> [PATCH] DEBUG: 8250_dma: Detect DMA_PAUSED vs DMA_IN_PROGRESS inconsistency
>
> Detect DMA state not returning DMA_PAUSED after dmaengine_pause() was
> issued.
>
> Not intended for upstream.

Thanks Ilpo
    I can confirm that the patch below works. Got the WARN_ON_ONCE dump so it's taking that path.
I think the problem here is that the pl330 DMA controller thats in these SOC's doesn't "really" support pause according to the driver.
It doesn't look like it can ever set "DMA_PAUSED"
I'm not sure of the flow through of that though.
There is some commenting in the pl330 driver about the pause routine.

Cheers
    Richard Tresidder

> ---
>   drivers/tty/serial/8250/8250_dma.c | 14 +++++++++-----
>   1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/tty/serial/8250/8250_dma.c b/drivers/tty/serial/8250/8250_dma.c
> index 7fa66501792d..3dedacd9f104 100644
> --- a/drivers/tty/serial/8250/8250_dma.c
> +++ b/drivers/tty/serial/8250/8250_dma.c
> @@ -38,7 +38,7 @@ static void __dma_tx_complete(void *param)
>   	spin_unlock_irqrestore(&p->port.lock, flags);
>   }
>   
> -static void __dma_rx_complete(struct uart_8250_port *p)
> +static void __dma_rx_complete(struct uart_8250_port *p, bool paused)
>   {
>   	struct uart_8250_dma	*dma = p->dma;
>   	struct tty_port		*tty_port = &p->port.state->port;
> @@ -52,8 +52,12 @@ static void __dma_rx_complete(struct uart_8250_port *p)
>   	 * anything in such case.
>   	 */
>   	dma_status = dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state);
> -	if (dma_status == DMA_IN_PROGRESS)
> -		return;
> +	if (dma_status == DMA_IN_PROGRESS) {
> +		if (!paused)
> +			return;
> +
> +		WARN_ON_ONCE(paused);
> +	}
>   
>   	count = dma->rx_size - state.residue;
>   
> @@ -72,7 +76,7 @@ static void dma_rx_complete(void *param)
>   
>   	spin_lock_irqsave(&p->port.lock, flags);
>   	if (dma->rx_running)
> -		__dma_rx_complete(p);
> +		__dma_rx_complete(p, false);
>   
>   	/*
>   	 * Cannot be combined with the previous check because __dma_rx_complete()
> @@ -172,7 +176,7 @@ void serial8250_rx_dma_flush(struct uart_8250_port *p)
>   
>   	if (dma->rx_running) {
>   		dmaengine_pause(dma->rxchan);
> -		__dma_rx_complete(p);
> +		__dma_rx_complete(p, true);
>   		dmaengine_terminate_async(dma->rxchan);
>   	}
>   }
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible regression in 8250_dw driver
  2023-05-25  3:44       ` Richard Tresidder
@ 2023-05-25  8:43         ` Ilpo Järvinen
  2023-05-25  9:27           ` Richard Tresidder
  0 siblings, 1 reply; 12+ messages in thread
From: Ilpo Järvinen @ 2023-05-25  8:43 UTC (permalink / raw)
  To: Richard Tresidder; +Cc: linux-serial

[-- Attachment #1: Type: text/plain, Size: 7687 bytes --]

On Thu, 25 May 2023, Richard Tresidder wrote:

> 
> 
> 
> 
> Richard Tresidder
> 
> 
> 
> 
> 
> 
> 
> 
> **
> 
> 
> 
> 
> 
> 
> On 24/05/2023 6:06 pm, Ilpo Järvinen wrote:
> 
> > On Wed, 24 May 2023, Richard Tresidder wrote:
> > > On 23/05/2023 7:13 pm, Ilpo Järvinen wrote:
> > > > On Tue, 23 May 2023, Richard Tresidder wrote:
> > > > 
> > > > > background:
> > > > > I'm attempting to use 6.3.3 as a new base for one of our systems.
> > > > > Previously it was using 5.1.7 as a base.
> > > > > The uart in question is one of the two in the Cyclone V SOC HPS.
> > > > > And to muddy the waters the linux console TTYS0 is the other Uart from
> > > > > the
> > > > > same HPS core
> > > > > Yet the console appears to be working ok.
> > > > Maybe some of the DMA related changes triggering some unexpected
> > > > behavior.
> > > > 
> > > > Console doesn't use DMA so that could explain the difference.
> > > > 
> > > > > Note all other libs and apps are at the same revision and build, it is
> > > > > only
> > > > > the kernel that is different.
> > > > > Both versions of the kernel are also built using the same bitbake
> > > > > bsdk..
> > > > > 
> > > > > 
> > > > > As you can see it looks like the frame thats received on the 6.3.3
> > > > > kernel
> > > > > is
> > > > > mangled?
> > > > > This same message is just being requested over and over again from the
> > > > > GPS
> > > > > unit.
> > > > > 
> > > > > The offset where the tears occur looks to be pretty similar between
> > > > > each
> > > > > poll
> > > > > request.
> > > > > Usually the 1 at the end of the (75331 is where the first tear occurs.
> > > > > 
> > > > > I'd appreciate some quidance in how to track this down as there
> > > > > appears to
> > > > > have been a reasonable amount of work done to this driver and the
> > > > > serial
> > > > > core
> > > > > between these two versions.
> > > > A few ideas:
> > > > - try without dma_rx_complete() calling p->dma->rx_dma(p)
> > > > - revert 90b8596ac46043e4a782d9111f5b285251b13756
> > > > - Try the revert in
> > > > https://lore.kernel.org/all/316ab583-d217-a332-d161-8225b0cee227@redhat.com/2-0001-Revert-serial-8250-use-THRE-__stop_tx-also-with-DMA.patch
> > > >     (for e8ffbb71f783 and f8d6e9d3ca5c)
> > > > 
> > > > But finding the culprit with git bisect would be the most helpful here.
> > > > 
> > > Bisect wasn't an easy option as I'd applied a pile of patches after the
> > > interesting commits for our system to run.
> > > I'm not a git master :)
> > > 
> > > So I just started reverting the patches that had been applied to the 8250
> > > folder.
> > > Worked backwards from head.
> > > 
> > > After reverting 57e9af7831dcf211c5c689c2a6f209f4abdf0bce
> > > serial: 8250_dma: Fix DMA Rx rearm race
> > > 
> > > Things started to work again.
> > > 
> > > I reset everything and then just reverted that individual patch and things
> > > work.
> > > So that looks like the culprit..
> > Okay, that also worked great, thanks a lot. It gives something concrete to
> > work with to find a fix.
> > 
> > The commit itself looks very much correct, however, my guess is that when
> > serial8250_rx_dma_flush() does dmaengine_pause() dma_status somehow
> > does not become DMA_PAUSED which leads to not properly flushing DMA Rx
> > transaction. Can you try the following debug patch (if my guess is
> > correct, besides triggering the WARN_ON_ONCE, it also works around the
> > issue):
> > 
> > [PATCH] DEBUG: 8250_dma: Detect DMA_PAUSED vs DMA_IN_PROGRESS inconsistency
> > 
> > Detect DMA state not returning DMA_PAUSED after dmaengine_pause() was
> > issued.
> > 
> > Not intended for upstream.
> 
> Thanks Ilpo
>    I can confirm that the patch below works. Got the WARN_ON_ONCE dump so it's
> taking that path.
> I think the problem here is that the pl330 DMA controller thats in these SOC's
> doesn't "really" support pause according to the driver.

For this flush usecase, it is enough to support pause + read residue +
terminate which is supported according to the function comment for
pl330_pause().

> It doesn't look like it can ever set "DMA_PAUSED"

Why not? It pauses the transfer and is even able to allow reading the 
transferred byte count.

What it is claimed to not be able to do is resume. Note that w/o resume
serial8250_tx_dma() XON/XOFF processing will not work but that's 
unrelated to this issue (I'll probably need to do another patch to fix 
that on 8250 DMA side).

> I'm not sure of the flow through of that though.
> There is some commenting in the pl330 driver about the pause routine.

Maybe the patch below will help with pl330 DMA status (based on somewhat 
limited understanding of all the moving parts):


[PATCH] dmaengine: pl330: Set DMA_PAUSED when pausing

pl330_pause() does not set anything to indicate paused condition which
causes pl330_tx_status() to return DMA_IN_PROGRESS. This breaks 8250
DMA code after the fix in commit 57e9af7831dc ("serial: 8250_dma: Fix
DMA Rx rearm race"). The function comment for pl330_pause() claims
pause is supported but resume is not which is enough for 8250 DMA flush
to work as long as DMA status is reported correctly when appropriate.

Add PAUSED state for descriptor and mark BUSY descriptors with PAUSED
in pl330_pause(). Return DMA_PAUSED from pl330_tx_status() when the
descriptor is PAUSED.

Fixes: 88987d2c7534 ("dmaengine: pl330: add DMA_PAUSE feature")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

---
 drivers/dma/pl330.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 0d9257fbdfb0..daad25f2c498 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -403,6 +403,12 @@ enum desc_status {
 	 * of a channel can be BUSY at any time.
 	 */
 	BUSY,
+	/*
+	 * Pause was called while descriptor was BUSY. Due to hardware
+	 * limitations, only termination is possible for descriptors
+	 * that have been paused.
+	 */
+	PAUSED,
 	/*
 	 * Sitting on the channel work_list but xfer done
 	 * by PL330 core
@@ -2041,7 +2047,7 @@ static inline void fill_queue(struct dma_pl330_chan *pch)
 	list_for_each_entry(desc, &pch->work_list, node) {
 
 		/* If already submitted */
-		if (desc->status == BUSY)
+		if (desc->status == BUSY || desc->status == PAUSED)
 			continue;
 
 		ret = pl330_submit_req(pch->thread, desc);
@@ -2326,6 +2332,7 @@ static int pl330_pause(struct dma_chan *chan)
 {
 	struct dma_pl330_chan *pch = to_pchan(chan);
 	struct pl330_dmac *pl330 = pch->dmac;
+	struct dma_pl330_desc *desc;
 	unsigned long flags;
 
 	pm_runtime_get_sync(pl330->ddma.dev);
@@ -2335,6 +2342,10 @@ static int pl330_pause(struct dma_chan *chan)
 	_stop(pch->thread);
 	spin_unlock(&pl330->lock);
 
+	list_for_each_entry(desc, &pch->work_list, node) {
+		if (desc->status == BUSY)
+			desc->status = PAUSED;
+	}
 	spin_unlock_irqrestore(&pch->lock, flags);
 	pm_runtime_mark_last_busy(pl330->ddma.dev);
 	pm_runtime_put_autosuspend(pl330->ddma.dev);
@@ -2425,7 +2436,7 @@ pl330_tx_status(struct dma_chan *chan, dma_cookie_t cookie,
 		else if (running && desc == running)
 			transferred =
 				pl330_get_current_xferred_count(pch, desc);
-		else if (desc->status == BUSY)
+		else if (desc->status == BUSY || desc->status == PAUSED)
 			/*
 			 * Busy but not running means either just enqueued,
 			 * or finished and not yet marked done
@@ -2442,6 +2453,9 @@ pl330_tx_status(struct dma_chan *chan, dma_cookie_t cookie,
 			case DONE:
 				ret = DMA_COMPLETE;
 				break;
+			case PAUSED:
+				ret = DMA_PAUSED;
+				break;
 			case PREP:
 			case BUSY:
 				ret = DMA_IN_PROGRESS;

-- 
tg: (5af9d9508d7d..) fix/pl330-paused (depends on: debug/dma-paused)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Possible regression in 8250_dw driver
  2023-05-25  8:43         ` Ilpo Järvinen
@ 2023-05-25  9:27           ` Richard Tresidder
  2023-05-25  9:38             ` Ilpo Järvinen
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Tresidder @ 2023-05-25  9:27 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: linux-serial

On 25/05/2023 4:43 pm, Ilpo Järvinen wrote:

> On Thu, 25 May 2023, Richard Tresidder wrote:
>
>> Richard Tresidder
>>
>> **
>>
>> On 24/05/2023 6:06 pm, Ilpo Järvinen wrote:
>>
>>> On Wed, 24 May 2023, Richard Tresidder wrote:
>>>> On 23/05/2023 7:13 pm, Ilpo Järvinen wrote:
>>>>> On Tue, 23 May 2023, Richard Tresidder wrote:
>>>>>
>>>>>> background:
>>>>>> I'm attempting to use 6.3.3 as a new base for one of our systems.
>>>>>> Previously it was using 5.1.7 as a base.
>>>>>> The uart in question is one of the two in the Cyclone V SOC HPS.
>>>>>> And to muddy the waters the linux console TTYS0 is the other Uart from
>>>>>> the
>>>>>> same HPS core
>>>>>> Yet the console appears to be working ok.
>>>>> Maybe some of the DMA related changes triggering some unexpected
>>>>> behavior.
>>>>>
>>>>> Console doesn't use DMA so that could explain the difference.
>>>>>
>>>>>> Note all other libs and apps are at the same revision and build, it is
>>>>>> only
>>>>>> the kernel that is different.
>>>>>> Both versions of the kernel are also built using the same bitbake
>>>>>> bsdk..
>>>>>>
>>>>>>
>>>>>> As you can see it looks like the frame thats received on the 6.3.3
>>>>>> kernel
>>>>>> is
>>>>>> mangled?
>>>>>> This same message is just being requested over and over again from the
>>>>>> GPS
>>>>>> unit.
>>>>>>
>>>>>> The offset where the tears occur looks to be pretty similar between
>>>>>> each
>>>>>> poll
>>>>>> request.
>>>>>> Usually the 1 at the end of the (75331 is where the first tear occurs.
>>>>>>
>>>>>> I'd appreciate some quidance in how to track this down as there
>>>>>> appears to
>>>>>> have been a reasonable amount of work done to this driver and the
>>>>>> serial
>>>>>> core
>>>>>> between these two versions.
>>>>> A few ideas:
>>>>> - try without dma_rx_complete() calling p->dma->rx_dma(p)
>>>>> - revert 90b8596ac46043e4a782d9111f5b285251b13756
>>>>> - Try the revert in
>>>>> https://lore.kernel.org/all/316ab583-d217-a332-d161-8225b0cee227@redhat.com/2-0001-Revert-serial-8250-use-THRE-__stop_tx-also-with-DMA.patch
>>>>>      (for e8ffbb71f783 and f8d6e9d3ca5c)
>>>>>
>>>>> But finding the culprit with git bisect would be the most helpful here.
>>>>>
>>>> Bisect wasn't an easy option as I'd applied a pile of patches after the
>>>> interesting commits for our system to run.
>>>> I'm not a git master :)
>>>>
>>>> So I just started reverting the patches that had been applied to the 8250
>>>> folder.
>>>> Worked backwards from head.
>>>>
>>>> After reverting 57e9af7831dcf211c5c689c2a6f209f4abdf0bce
>>>> serial: 8250_dma: Fix DMA Rx rearm race
>>>>
>>>> Things started to work again.
>>>>
>>>> I reset everything and then just reverted that individual patch and things
>>>> work.
>>>> So that looks like the culprit..
>>> Okay, that also worked great, thanks a lot. It gives something concrete to
>>> work with to find a fix.
>>>
>>> The commit itself looks very much correct, however, my guess is that when
>>> serial8250_rx_dma_flush() does dmaengine_pause() dma_status somehow
>>> does not become DMA_PAUSED which leads to not properly flushing DMA Rx
>>> transaction. Can you try the following debug patch (if my guess is
>>> correct, besides triggering the WARN_ON_ONCE, it also works around the
>>> issue):
>>>
>>> [PATCH] DEBUG: 8250_dma: Detect DMA_PAUSED vs DMA_IN_PROGRESS inconsistency
>>>
>>> Detect DMA state not returning DMA_PAUSED after dmaengine_pause() was
>>> issued.
>>>
>>> Not intended for upstream.
>> Thanks Ilpo
>>     I can confirm that the patch below works. Got the WARN_ON_ONCE dump so it's
>> taking that path.
>> I think the problem here is that the pl330 DMA controller thats in these SOC's
>> doesn't "really" support pause according to the driver.
> For this flush usecase, it is enough to support pause + read residue +
> terminate which is supported according to the function comment for
> pl330_pause().

Yep agree given the 8250 serial code immediately terminates the dma after the pause during the flush anyway..

>> It doesn't look like it can ever set "DMA_PAUSED"
> Why not? It pauses the transfer and is even able to allow reading the
> transferred byte count.
>
> What it is claimed to not be able to do is resume. Note that w/o resume
> serial8250_tx_dma() XON/XOFF processing will not work but that's
> unrelated to this issue (I'll probably need to do another patch to fix
> that on 8250 DMA side).

Yep I just meant it doesn't look capable of reporting DMA_PAUSED
Which your patch below probably fixes.
and it kind of does a stop without a resume capability so must be terminated after this.
Though I'm not sure of the implications of reporting paused without resume capability.
Kind of an odd one..
I'll check your pl330 patch out tomorrow, and see what else might break.

RT

>> I'm not sure of the flow through of that though.
>> There is some commenting in the pl330 driver about the pause routine.
> Maybe the patch below will help with pl330 DMA status (based on somewhat
> limited understanding of all the moving parts):
>
>
> [PATCH] dmaengine: pl330: Set DMA_PAUSED when pausing
>
> pl330_pause() does not set anything to indicate paused condition which
> causes pl330_tx_status() to return DMA_IN_PROGRESS. This breaks 8250
> DMA code after the fix in commit 57e9af7831dc ("serial: 8250_dma: Fix
> DMA Rx rearm race"). The function comment for pl330_pause() claims
> pause is supported but resume is not which is enough for 8250 DMA flush
> to work as long as DMA status is reported correctly when appropriate.
>
> Add PAUSED state for descriptor and mark BUSY descriptors with PAUSED
> in pl330_pause(). Return DMA_PAUSED from pl330_tx_status() when the
> descriptor is PAUSED.
>
> Fixes: 88987d2c7534 ("dmaengine: pl330: add DMA_PAUSE feature")
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>
> ---
>   drivers/dma/pl330.c | 18 ++++++++++++++++--
>   1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> index 0d9257fbdfb0..daad25f2c498 100644
> --- a/drivers/dma/pl330.c
> +++ b/drivers/dma/pl330.c
> @@ -403,6 +403,12 @@ enum desc_status {
>   	 * of a channel can be BUSY at any time.
>   	 */
>   	BUSY,
> +	/*
> +	 * Pause was called while descriptor was BUSY. Due to hardware
> +	 * limitations, only termination is possible for descriptors
> +	 * that have been paused.
> +	 */
> +	PAUSED,
>   	/*
>   	 * Sitting on the channel work_list but xfer done
>   	 * by PL330 core
> @@ -2041,7 +2047,7 @@ static inline void fill_queue(struct dma_pl330_chan *pch)
>   	list_for_each_entry(desc, &pch->work_list, node) {
>   
>   		/* If already submitted */
> -		if (desc->status == BUSY)
> +		if (desc->status == BUSY || desc->status == PAUSED)
>   			continue;
>   
>   		ret = pl330_submit_req(pch->thread, desc);
> @@ -2326,6 +2332,7 @@ static int pl330_pause(struct dma_chan *chan)
>   {
>   	struct dma_pl330_chan *pch = to_pchan(chan);
>   	struct pl330_dmac *pl330 = pch->dmac;
> +	struct dma_pl330_desc *desc;
>   	unsigned long flags;
>   
>   	pm_runtime_get_sync(pl330->ddma.dev);
> @@ -2335,6 +2342,10 @@ static int pl330_pause(struct dma_chan *chan)
>   	_stop(pch->thread);
>   	spin_unlock(&pl330->lock);
>   
> +	list_for_each_entry(desc, &pch->work_list, node) {
> +		if (desc->status == BUSY)
> +			desc->status = PAUSED;
> +	}
>   	spin_unlock_irqrestore(&pch->lock, flags);
>   	pm_runtime_mark_last_busy(pl330->ddma.dev);
>   	pm_runtime_put_autosuspend(pl330->ddma.dev);
> @@ -2425,7 +2436,7 @@ pl330_tx_status(struct dma_chan *chan, dma_cookie_t cookie,
>   		else if (running && desc == running)
>   			transferred =
>   				pl330_get_current_xferred_count(pch, desc);
> -		else if (desc->status == BUSY)
> +		else if (desc->status == BUSY || desc->status == PAUSED)
>   			/*
>   			 * Busy but not running means either just enqueued,
>   			 * or finished and not yet marked done
> @@ -2442,6 +2453,9 @@ pl330_tx_status(struct dma_chan *chan, dma_cookie_t cookie,
>   			case DONE:
>   				ret = DMA_COMPLETE;
>   				break;
> +			case PAUSED:
> +				ret = DMA_PAUSED;
> +				break;
>   			case PREP:
>   			case BUSY:
>   				ret = DMA_IN_PROGRESS;
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible regression in 8250_dw driver
  2023-05-25  9:27           ` Richard Tresidder
@ 2023-05-25  9:38             ` Ilpo Järvinen
  2023-05-26  7:07               ` Richard Tresidder
  0 siblings, 1 reply; 12+ messages in thread
From: Ilpo Järvinen @ 2023-05-25  9:38 UTC (permalink / raw)
  To: Richard Tresidder; +Cc: linux-serial

[-- Attachment #1: Type: text/plain, Size: 6649 bytes --]

On Thu, 25 May 2023, Richard Tresidder wrote:
> On 25/05/2023 4:43 pm, Ilpo Järvinen wrote:
> > On Thu, 25 May 2023, Richard Tresidder wrote:
> > > On 24/05/2023 6:06 pm, Ilpo Järvinen wrote:
> > > > On Wed, 24 May 2023, Richard Tresidder wrote:
> > > > > So I just started reverting the patches that had been applied to the
> > > > > 8250
> > > > > folder.
> > > > > Worked backwards from head.
> > > > > 
> > > > > After reverting 57e9af7831dcf211c5c689c2a6f209f4abdf0bce
> > > > > serial: 8250_dma: Fix DMA Rx rearm race
> > > > > 
> > > > > Things started to work again.
> > > > > 
> > > > > I reset everything and then just reverted that individual patch and
> > > > > things
> > > > > work.
> > > > > So that looks like the culprit..
> > > > Okay, that also worked great, thanks a lot. It gives something concrete
> > > > to
> > > > work with to find a fix.
> > > > 
> > > > The commit itself looks very much correct, however, my guess is that
> > > > when
> > > > serial8250_rx_dma_flush() does dmaengine_pause() dma_status somehow
> > > > does not become DMA_PAUSED which leads to not properly flushing DMA Rx
> > > > transaction. Can you try the following debug patch (if my guess is
> > > > correct, besides triggering the WARN_ON_ONCE, it also works around the
> > > > issue):
> > > > 
> > > > [PATCH] DEBUG: 8250_dma: Detect DMA_PAUSED vs DMA_IN_PROGRESS
> > > > inconsistency
> > > > 
> > > > Detect DMA state not returning DMA_PAUSED after dmaengine_pause() was
> > > > issued.
> > > > 
> > > > Not intended for upstream.
> > > Thanks Ilpo
> > >     I can confirm that the patch below works. Got the WARN_ON_ONCE dump so
> > > it's
> > > taking that path.
> > > I think the problem here is that the pl330 DMA controller thats in these
> > > SOC's
> > > doesn't "really" support pause according to the driver.
> > For this flush usecase, it is enough to support pause + read residue +
> > terminate which is supported according to the function comment for
> > pl330_pause().
> 
> Yep agree given the 8250 serial code immediately terminates the dma after the
> pause during the flush anyway..
> 
> > > It doesn't look like it can ever set "DMA_PAUSED"
> > Why not? It pauses the transfer and is even able to allow reading the
> > transferred byte count.
> > 
> > What it is claimed to not be able to do is resume. Note that w/o resume
> > serial8250_tx_dma() XON/XOFF processing will not work but that's
> > unrelated to this issue (I'll probably need to do another patch to fix
> > that on 8250 DMA side).
> 
> Yep I just meant it doesn't look capable of reporting DMA_PAUSED
> Which your patch below probably fixes.
> and it kind of does a stop without a resume capability so must be terminated
> after this.
> Though I'm not sure of the implications of reporting paused without resume
> capability.
> Kind of an odd one..

dma_slave_caps has a bool for both pause and resume support (I only found 
out about the second flag for resume today) so not unheard of it seems.

-- 
 i.


> I'll check your pl330 patch out tomorrow, and see what else might break.
>
> > > I'm not sure of the flow through of that though.
> > > There is some commenting in the pl330 driver about the pause routine.
> > Maybe the patch below will help with pl330 DMA status (based on somewhat
> > limited understanding of all the moving parts):
> > 
> > 
> > [PATCH] dmaengine: pl330: Set DMA_PAUSED when pausing
> > 
> > pl330_pause() does not set anything to indicate paused condition which
> > causes pl330_tx_status() to return DMA_IN_PROGRESS. This breaks 8250
> > DMA code after the fix in commit 57e9af7831dc ("serial: 8250_dma: Fix
> > DMA Rx rearm race"). The function comment for pl330_pause() claims
> > pause is supported but resume is not which is enough for 8250 DMA flush
> > to work as long as DMA status is reported correctly when appropriate.
> > 
> > Add PAUSED state for descriptor and mark BUSY descriptors with PAUSED
> > in pl330_pause(). Return DMA_PAUSED from pl330_tx_status() when the
> > descriptor is PAUSED.
> > 
> > Fixes: 88987d2c7534 ("dmaengine: pl330: add DMA_PAUSE feature")
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > 
> > ---
> >   drivers/dma/pl330.c | 18 ++++++++++++++++--
> >   1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> > index 0d9257fbdfb0..daad25f2c498 100644
> > --- a/drivers/dma/pl330.c
> > +++ b/drivers/dma/pl330.c
> > @@ -403,6 +403,12 @@ enum desc_status {
> >   	 * of a channel can be BUSY at any time.
> >   	 */
> >   	BUSY,
> > +	/*
> > +	 * Pause was called while descriptor was BUSY. Due to hardware
> > +	 * limitations, only termination is possible for descriptors
> > +	 * that have been paused.
> > +	 */
> > +	PAUSED,
> >   	/*
> >   	 * Sitting on the channel work_list but xfer done
> >   	 * by PL330 core
> > @@ -2041,7 +2047,7 @@ static inline void fill_queue(struct dma_pl330_chan
> > *pch)
> >   	list_for_each_entry(desc, &pch->work_list, node) {
> >     		/* If already submitted */
> > -		if (desc->status == BUSY)
> > +		if (desc->status == BUSY || desc->status == PAUSED)
> >   			continue;
> >     		ret = pl330_submit_req(pch->thread, desc);
> > @@ -2326,6 +2332,7 @@ static int pl330_pause(struct dma_chan *chan)
> >   {
> >   	struct dma_pl330_chan *pch = to_pchan(chan);
> >   	struct pl330_dmac *pl330 = pch->dmac;
> > +	struct dma_pl330_desc *desc;
> >   	unsigned long flags;
> >     	pm_runtime_get_sync(pl330->ddma.dev);
> > @@ -2335,6 +2342,10 @@ static int pl330_pause(struct dma_chan *chan)
> >   	_stop(pch->thread);
> >   	spin_unlock(&pl330->lock);
> >   +	list_for_each_entry(desc, &pch->work_list, node) {
> > +		if (desc->status == BUSY)
> > +			desc->status = PAUSED;
> > +	}
> >   	spin_unlock_irqrestore(&pch->lock, flags);
> >   	pm_runtime_mark_last_busy(pl330->ddma.dev);
> >   	pm_runtime_put_autosuspend(pl330->ddma.dev);
> > @@ -2425,7 +2436,7 @@ pl330_tx_status(struct dma_chan *chan, dma_cookie_t
> > cookie,
> >   		else if (running && desc == running)
> >   			transferred =
> >   				pl330_get_current_xferred_count(pch, desc);
> > -		else if (desc->status == BUSY)
> > +		else if (desc->status == BUSY || desc->status == PAUSED)
> >   			/*
> >   			 * Busy but not running means either just enqueued,
> >   			 * or finished and not yet marked done
> > @@ -2442,6 +2453,9 @@ pl330_tx_status(struct dma_chan *chan, dma_cookie_t
> > cookie,
> >   			case DONE:
> >   				ret = DMA_COMPLETE;
> >   				break;
> > +			case PAUSED:
> > +				ret = DMA_PAUSED;
> > +				break;
> >   			case PREP:
> >   			case BUSY:
> >   				ret = DMA_IN_PROGRESS;
> > 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible regression in 8250_dw driver
  2023-05-25  9:38             ` Ilpo Järvinen
@ 2023-05-26  7:07               ` Richard Tresidder
  2023-05-26 10:16                 ` Ilpo Järvinen
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Tresidder @ 2023-05-26  7:07 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: linux-serial

On 25/05/2023 5:38 pm, Ilpo Järvinen wrote:

> On Thu, 25 May 2023, Richard Tresidder wrote:
>> On 25/05/2023 4:43 pm, Ilpo Järvinen wrote:
>>> On Thu, 25 May 2023, Richard Tresidder wrote:
>>>> On 24/05/2023 6:06 pm, Ilpo Järvinen wrote:
>>>>> On Wed, 24 May 2023, Richard Tresidder wrote:
>>>>>> So I just started reverting the patches that had been applied to the
>>>>>> 8250
>>>>>> folder.
>>>>>> Worked backwards from head.
>>>>>>
>>>>>> After reverting 57e9af7831dcf211c5c689c2a6f209f4abdf0bce
>>>>>> serial: 8250_dma: Fix DMA Rx rearm race
>>>>>>
>>>>>> Things started to work again.
>>>>>>
>>>>>> I reset everything and then just reverted that individual patch and
>>>>>> things
>>>>>> work.
>>>>>> So that looks like the culprit..
>>>>> Okay, that also worked great, thanks a lot. It gives something concrete
>>>>> to
>>>>> work with to find a fix.
>>>>>
>>>>> The commit itself looks very much correct, however, my guess is that
>>>>> when
>>>>> serial8250_rx_dma_flush() does dmaengine_pause() dma_status somehow
>>>>> does not become DMA_PAUSED which leads to not properly flushing DMA Rx
>>>>> transaction. Can you try the following debug patch (if my guess is
>>>>> correct, besides triggering the WARN_ON_ONCE, it also works around the
>>>>> issue):
>>>>>
>>>>> [PATCH] DEBUG: 8250_dma: Detect DMA_PAUSED vs DMA_IN_PROGRESS
>>>>> inconsistency
>>>>>
>>>>> Detect DMA state not returning DMA_PAUSED after dmaengine_pause() was
>>>>> issued.
>>>>>
>>>>> Not intended for upstream.
>>>> Thanks Ilpo
>>>>      I can confirm that the patch below works. Got the WARN_ON_ONCE dump so
>>>> it's
>>>> taking that path.
>>>> I think the problem here is that the pl330 DMA controller thats in these
>>>> SOC's
>>>> doesn't "really" support pause according to the driver.
>>> For this flush usecase, it is enough to support pause + read residue +
>>> terminate which is supported according to the function comment for
>>> pl330_pause().
>> Yep agree given the 8250 serial code immediately terminates the dma after the
>> pause during the flush anyway..
>>
>>>> It doesn't look like it can ever set "DMA_PAUSED"
>>> Why not? It pauses the transfer and is even able to allow reading the
>>> transferred byte count.
>>>
>>> What it is claimed to not be able to do is resume. Note that w/o resume
>>> serial8250_tx_dma() XON/XOFF processing will not work but that's
>>> unrelated to this issue (I'll probably need to do another patch to fix
>>> that on 8250 DMA side).
>> Yep I just meant it doesn't look capable of reporting DMA_PAUSED
>> Which your patch below probably fixes.
>> and it kind of does a stop without a resume capability so must be terminated
>> after this.
>> Though I'm not sure of the implications of reporting paused without resume
>> capability.
>> Kind of an odd one..
> dma_slave_caps has a bool for both pause and resume support (I only found
> out about the second flag for resume today) so not unheard of it seems.

I've just tested the pl330 patch and it looks good.
No longer get the WARN_ON_ONCE triggered in the 8250_dma driver so it looks like
Paused is correctly being set now and we're avoiding that path.
Wonder how many others dma drivers are afflicted in the same manner?
A quick grep seems to infer a lot more drivers set the pause function then have
something in them setting the status to DMA_PAUSED so probably good to keep the
8250 fix in also for the time being.

Cheers
    Richard Tresidder


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible regression in 8250_dw driver
  2023-05-26  7:07               ` Richard Tresidder
@ 2023-05-26 10:16                 ` Ilpo Järvinen
  2023-05-26 10:44                   ` Richard Tresidder
  0 siblings, 1 reply; 12+ messages in thread
From: Ilpo Järvinen @ 2023-05-26 10:16 UTC (permalink / raw)
  To: Richard Tresidder; +Cc: linux-serial

[-- Attachment #1: Type: text/plain, Size: 4431 bytes --]

On Fri, 26 May 2023, Richard Tresidder wrote:

> On 25/05/2023 5:38 pm, Ilpo Järvinen wrote:
> 
> > On Thu, 25 May 2023, Richard Tresidder wrote:
> > > On 25/05/2023 4:43 pm, Ilpo Järvinen wrote:
> > > > On Thu, 25 May 2023, Richard Tresidder wrote:
> > > > > On 24/05/2023 6:06 pm, Ilpo Järvinen wrote:
> > > > > > On Wed, 24 May 2023, Richard Tresidder wrote:
> > > > > > > So I just started reverting the patches that had been applied to
> > > > > > > the
> > > > > > > 8250
> > > > > > > folder.
> > > > > > > Worked backwards from head.
> > > > > > > 
> > > > > > > After reverting 57e9af7831dcf211c5c689c2a6f209f4abdf0bce
> > > > > > > serial: 8250_dma: Fix DMA Rx rearm race
> > > > > > > 
> > > > > > > Things started to work again.
> > > > > > > 
> > > > > > > I reset everything and then just reverted that individual patch
> > > > > > > and
> > > > > > > things
> > > > > > > work.
> > > > > > > So that looks like the culprit..
> > > > > > Okay, that also worked great, thanks a lot. It gives something
> > > > > > concrete
> > > > > > to
> > > > > > work with to find a fix.
> > > > > > 
> > > > > > The commit itself looks very much correct, however, my guess is that
> > > > > > when
> > > > > > serial8250_rx_dma_flush() does dmaengine_pause() dma_status somehow
> > > > > > does not become DMA_PAUSED which leads to not properly flushing DMA
> > > > > > Rx
> > > > > > transaction. Can you try the following debug patch (if my guess is
> > > > > > correct, besides triggering the WARN_ON_ONCE, it also works around
> > > > > > the
> > > > > > issue):
> > > > > > 
> > > > > > [PATCH] DEBUG: 8250_dma: Detect DMA_PAUSED vs DMA_IN_PROGRESS
> > > > > > inconsistency
> > > > > > 
> > > > > > Detect DMA state not returning DMA_PAUSED after dmaengine_pause()
> > > > > > was
> > > > > > issued.
> > > > > > 
> > > > > > Not intended for upstream.
> > > > > Thanks Ilpo
> > > > >      I can confirm that the patch below works. Got the WARN_ON_ONCE
> > > > > dump so
> > > > > it's
> > > > > taking that path.
> > > > > I think the problem here is that the pl330 DMA controller thats in
> > > > > these
> > > > > SOC's
> > > > > doesn't "really" support pause according to the driver.
> > > > For this flush usecase, it is enough to support pause + read residue +
> > > > terminate which is supported according to the function comment for
> > > > pl330_pause().
> > > Yep agree given the 8250 serial code immediately terminates the dma after
> > > the
> > > pause during the flush anyway..
> > > 
> > > > > It doesn't look like it can ever set "DMA_PAUSED"
> > > > Why not? It pauses the transfer and is even able to allow reading the
> > > > transferred byte count.
> > > > 
> > > > What it is claimed to not be able to do is resume. Note that w/o resume
> > > > serial8250_tx_dma() XON/XOFF processing will not work but that's
> > > > unrelated to this issue (I'll probably need to do another patch to fix
> > > > that on 8250 DMA side).
> > > Yep I just meant it doesn't look capable of reporting DMA_PAUSED
> > > Which your patch below probably fixes.
> > > and it kind of does a stop without a resume capability so must be
> > > terminated
> > > after this.
> > > Though I'm not sure of the implications of reporting paused without resume
> > > capability.
> > > Kind of an odd one..
> > dma_slave_caps has a bool for both pause and resume support (I only found
> > out about the second flag for resume today) so not unheard of it seems.
> 
> I've just tested the pl330 patch and it looks good.
> No longer get the WARN_ON_ONCE triggered in the 8250_dma driver so it looks
> like
> Paused is correctly being set now and we're avoiding that path.

Thanks a lot for testing the fix and also for your invaluable help in 
narrowing down the problem. It would have been very hard to find the 
culprit otherwise.

Can I add your Tested-by tag before sending the official version of the 
fix out?

> Wonder how many others dma drivers are afflicted in the same manner?
> A quick grep seems to infer a lot more drivers set the pause function then
> have
> something in them setting the status to DMA_PAUSED so probably good to keep
> the
> 8250 fix in also for the time being.

There might be others, but then the patch which covered the issue has gone
to stables already months ago and you're the first to report it AFAIK so 
perhaps not that wide-spread.

I'll think how to deal with this on 8250_dma side.


-- 
 i.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible regression in 8250_dw driver
  2023-05-26 10:16                 ` Ilpo Järvinen
@ 2023-05-26 10:44                   ` Richard Tresidder
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Tresidder @ 2023-05-26 10:44 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: linux-serial





Richard Tresidder












On 26/05/2023 6:16 pm, Ilpo Järvinen wrote:

> On Fri, 26 May 2023, Richard Tresidder wrote:
>
>> On 25/05/2023 5:38 pm, Ilpo Järvinen wrote:
>>
>>> On Thu, 25 May 2023, Richard Tresidder wrote:
>>>> On 25/05/2023 4:43 pm, Ilpo Järvinen wrote:
>>>>> On Thu, 25 May 2023, Richard Tresidder wrote:
>>>>>> On 24/05/2023 6:06 pm, Ilpo Järvinen wrote:
>>>>>>> On Wed, 24 May 2023, Richard Tresidder wrote:
>>>>>>>> So I just started reverting the patches that had been applied to
>>>>>>>> the
>>>>>>>> 8250
>>>>>>>> folder.
>>>>>>>> Worked backwards from head.
>>>>>>>>
>>>>>>>> After reverting 57e9af7831dcf211c5c689c2a6f209f4abdf0bce
>>>>>>>> serial: 8250_dma: Fix DMA Rx rearm race
>>>>>>>>
>>>>>>>> Things started to work again.
>>>>>>>>
>>>>>>>> I reset everything and then just reverted that individual patch
>>>>>>>> and
>>>>>>>> things
>>>>>>>> work.
>>>>>>>> So that looks like the culprit..
>>>>>>> Okay, that also worked great, thanks a lot. It gives something
>>>>>>> concrete
>>>>>>> to
>>>>>>> work with to find a fix.
>>>>>>>
>>>>>>> The commit itself looks very much correct, however, my guess is that
>>>>>>> when
>>>>>>> serial8250_rx_dma_flush() does dmaengine_pause() dma_status somehow
>>>>>>> does not become DMA_PAUSED which leads to not properly flushing DMA
>>>>>>> Rx
>>>>>>> transaction. Can you try the following debug patch (if my guess is
>>>>>>> correct, besides triggering the WARN_ON_ONCE, it also works around
>>>>>>> the
>>>>>>> issue):
>>>>>>>
>>>>>>> [PATCH] DEBUG: 8250_dma: Detect DMA_PAUSED vs DMA_IN_PROGRESS
>>>>>>> inconsistency
>>>>>>>
>>>>>>> Detect DMA state not returning DMA_PAUSED after dmaengine_pause()
>>>>>>> was
>>>>>>> issued.
>>>>>>>
>>>>>>> Not intended for upstream.
>>>>>> Thanks Ilpo
>>>>>>       I can confirm that the patch below works. Got the WARN_ON_ONCE
>>>>>> dump so
>>>>>> it's
>>>>>> taking that path.
>>>>>> I think the problem here is that the pl330 DMA controller thats in
>>>>>> these
>>>>>> SOC's
>>>>>> doesn't "really" support pause according to the driver.
>>>>> For this flush usecase, it is enough to support pause + read residue +
>>>>> terminate which is supported according to the function comment for
>>>>> pl330_pause().
>>>> Yep agree given the 8250 serial code immediately terminates the dma after
>>>> the
>>>> pause during the flush anyway..
>>>>
>>>>>> It doesn't look like it can ever set "DMA_PAUSED"
>>>>> Why not? It pauses the transfer and is even able to allow reading the
>>>>> transferred byte count.
>>>>>
>>>>> What it is claimed to not be able to do is resume. Note that w/o resume
>>>>> serial8250_tx_dma() XON/XOFF processing will not work but that's
>>>>> unrelated to this issue (I'll probably need to do another patch to fix
>>>>> that on 8250 DMA side).
>>>> Yep I just meant it doesn't look capable of reporting DMA_PAUSED
>>>> Which your patch below probably fixes.
>>>> and it kind of does a stop without a resume capability so must be
>>>> terminated
>>>> after this.
>>>> Though I'm not sure of the implications of reporting paused without resume
>>>> capability.
>>>> Kind of an odd one..
>>> dma_slave_caps has a bool for both pause and resume support (I only found
>>> out about the second flag for resume today) so not unheard of it seems.
>> I've just tested the pl330 patch and it looks good.
>> No longer get the WARN_ON_ONCE triggered in the 8250_dma driver so it looks
>> like
>> Paused is correctly being set now and we're avoiding that path.
> Thanks a lot for testing the fix and also for your invaluable help in
> narrowing down the problem. It would have been very hard to find the
> culprit otherwise.
>
> Can I add your Tested-by tag before sending the official version of the
> fix out?

No problems thanks for turning the patches around so quickly.
Yep you can add a Tested-by Tag
Have a great weekend!
Cheers
    Richard Tresidder

>
>> Wonder how many others dma drivers are afflicted in the same manner?
>> A quick grep seems to infer a lot more drivers set the pause function then
>> have
>> something in them setting the status to DMA_PAUSED so probably good to keep
>> the
>> 8250 fix in also for the time being.
> There might be others, but then the patch which covered the issue has gone
> to stables already months ago and you're the first to report it AFAIK so
> perhaps not that wide-spread.
>
> I'll think how to deal with this on 8250_dma side.
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-05-26 10:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-23  8:59 Possible regression in 8250_dw driver Richard Tresidder
2023-05-23 11:13 ` Ilpo Järvinen
2023-05-24  7:52   ` Richard Tresidder
2023-05-24 10:06     ` Ilpo Järvinen
2023-05-25  3:44       ` Richard Tresidder
2023-05-25  8:43         ` Ilpo Järvinen
2023-05-25  9:27           ` Richard Tresidder
2023-05-25  9:38             ` Ilpo Järvinen
2023-05-26  7:07               ` Richard Tresidder
2023-05-26 10:16                 ` Ilpo Järvinen
2023-05-26 10:44                   ` Richard Tresidder
2023-05-24 11:20 ` Linux regression tracking #adding (Thorsten Leemhuis)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.