Re: [PATCH] hw/char/pl011: Output characters using best-effort mode

From: Gavin Shan <gshan@redhat.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Peter Maydell <peter.maydell@linaro.org>,
	qemu-arm <qemu-arm@nongnu.org>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Shan Gavin <shan.gavin@gmail.com>
Subject: Re: [PATCH] hw/char/pl011: Output characters using best-effort mode
Date: Mon, 24 Feb 2020 10:57:16 +1100	[thread overview]
Message-ID: <4eec5935-1114-2c8e-05ce-21f80f71d041@redhat.com> (raw)
In-Reply-To: <69bdfa5b09791a9d148b791076f2441f@kernel.org>

Hi Marc,

On 2/21/20 8:09 PM, Marc Zyngier wrote:
> On 2020-02-21 04:24, Gavin Shan wrote:
>> On 2/20/20 9:10 PM, Peter Maydell wrote:
>>> On Thu, 20 Feb 2020 at 09:10, Marc Zyngier <maz@kernel.org> wrote:
>>>> On 2020-02-20 06:01, Gavin Shan wrote:
>>>>> This fixes the issue by using newly added API
>>>>> qemu_chr_fe_try_write_all(),
>>>>> which provides another type of service (best-effort). It's different
>>>>> from
>>>>> qemu_chr_fe_write_all() as the data will be dropped if the backend has
>>>>> been running into so-called broken state or 50 attempts of
>>>>> transmissions.
>>>>> The broken state is cleared if the data is transmitted at once.
>>>>
>>>> I don't think dropping the serial port output is an acceptable outcome.
>>>
>>> Agreed. The correct fix for this is the one cryptically described
>>> in the XXX comment this patch deletes:
>>>
>>> -        /* XXX this blocks entire thread. Rewrite to use
>>> -         * qemu_chr_fe_write and background I/O callbacks */
>>>
>>> The idea is that essentially we end up emulating the real
>>> hardware's transmit FIFO:
>>>   * as data arrives from the guest we put it in the FIFO
>>>   * we try to send the data with qemu_chr_fe_write(), which does
>>>     not block
>>>   * if qemu_chr_fe_write() tells us it did not send all the data,
>>>     we use qemu_chr_fe_add_watch() to set up an I/O callback
>>>     which will get called when the output chardev has drained
>>>     enough that we can try again
>>>   * we make sure all the guest visible registers and mechanisms
>>>     for tracking tx fifo level (status bits, interrupts, etc) are
>>>     correctly wired up
>>>
>>> Then we don't lose data or block QEMU if the guest sends
>>> faster than the chardev backend can handle, assuming the
>>> guest is well-behaved -- just as with a real hardware slow
>>> serial port, the guest will fill the tx fifo and then either poll
>>> or wait for an interrupt telling it that the fifo has drained
>>> before it tries to send more data.
>>>
>>> There is an example of this in hw/char/cadence_uart.c
>>> (and an example of how it works for a UART with no tx
>>> fifo in hw/char-cmsdk-apb-uart.c, which is basically the
>>> same except the 'fifo' is just one byte.)
>>>
>>> You will also find an awful lot of XXX comments like the
>>> above one in various UART models in hw/char, because
>>> converting an old-style simple blocking UART implementation
>>> to a non-blocking one is a bit fiddly and needs knowledge
>>> of the specifics of the UART behaviour.
>>>
>>> The other approach here would be that we could add
>>> options to relevant chardev backends so the user
>>> could say "if you couldn't connect to the tcp server I
>>> specified, throw away data rather than waiting", where
>>> we don't have suitable options already. If the user specifically
>>> tells us they're ok to throw away the serial data, then it's
>>> fine to throw away the serial data :-)
>>>
>>
>> I was intended to convince Marc that it's fine to lose data if the
>> serial connection is broken with an example. Now, I'm taking the
>> example trying to convince both of you: Lets assume we have a ARM
>> board and the UART (RS232) cable is unplugged and plugged in the middle of
>> system booting. I think we would get some output lost. We're emulating
>> pl011 and I think it would have same behavior. However, I'm not sure
>> if it makes sense :)
> 
> But the case you describe in the commit message is not that one.
> The analogy is that of a serial port *plugged* and asserting flow control.
> 

Thanks for your time on the discussion.

Well, I would say we saw two side of a coin. TCP connection isn't bidirectional
until accept() is called on server side. The connection isn't fully functional
until two directions are finalized. It would be unplug if the connection is treated
as the cable :)

> Another thing is that the "system" as been constructed this way by the
> user. QEMU is not in a position to choose and output what is convenient,
> when it is convenient. In my world, the serial output is absolutely
> crucial. This is where I look for clues about failures and odd behaviours,
> and I rely on the serial port emulation to be 100% reliable (and for what
> it's worth, the Linux kernel can output to the serial port asynchronously,
> to some extent).
> 
> [...]
> 

Yep, totally agreed :)

>> If above analysis is correct and the first approach doesn't work out. We have to
>> consider the 2nd approach - adding option to backend to allow losing data. I'm
>> going to add "allow-data-lost" option for TYPE_CHARDEV_SOCKET. With the option,
>> a back-off algorithm in tcp_chr_write(): The channel is consider as broken if
>> it fails to transmit data in last continuous 5 times. The transmission is still
>> issued when the channel is in broken state and recovered to normal state if
>> transmission succeeds for once.
> 
> That'd be an option if you could configure the UART with something that says
> "no flow control". In that case, dropping data on the floor becomes perfectly
> acceptable, as it requires buy-in from the user.
> 

Yep, the point is to has user's buy-in and it seems an explicit option like
"allow-data-lost" fills the gap, but it seems Peter isn't reaching conclusion
or decision yet. Lets see what's that finally :)

Thanks,
Gavin