Re: [PATCH net v2] tun: fix ubuf refcount incorrectly on error path

From: Jason Wang <jasowang@redhat.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: wangyunjian <wangyunjian@huawei.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"virtualization@lists.linux-foundation.org" 
	<virtualization@lists.linux-foundation.org>,
	Network Development <netdev@vger.kernel.org>,
	"Lilijun (Jerry)" <jerry.lilijun@huawei.com>,
	chenchanghu <chenchanghu@huawei.com>,
	xudingke <xudingke@huawei.com>,
	"huangbin (J)" <brian.huangbin@huawei.com>,
	Willem de Bruijn <willemb@google.com>
Subject: Re: [PATCH net v2] tun: fix ubuf refcount incorrectly on error path
Date: Mon, 14 Dec 2020 12:06:48 +0800	[thread overview]
Message-ID: <3cfbcd25-f9ae-ea9b-fc10-80a44a614276@redhat.com> (raw)
In-Reply-To: <CAF=yD-JHO3SaxaHAZJ8nZ1jy8Zp4hMt1EhP3abutA5zczgTv5g@mail.gmail.com>

On 2020/12/14 上午11:56, Willem de Bruijn wrote:
> On Sun, Dec 13, 2020 at 10:54 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>> On Sun, Dec 13, 2020 at 10:30 PM Jason Wang <jasowang@redhat.com> wrote:
>>>
>>> On 2020/12/14 上午9:32, Willem de Bruijn wrote:
>>>> On Sat, Dec 12, 2020 at 7:18 PM Willem de Bruijn
>>>> <willemdebruijn.kernel@gmail.com> wrote:
>>>>>>>> afterwards, the error handling in vhost handle_tx() will try to
>>>>>>>> decrease the same refcount again. This is wrong and fix this by delay
>>>>>>>> copying ubuf_info until we're sure there's no errors.
>>>>>>> I think the right approach is to address this in the error paths, rather than
>>>>>>> complicate the normal datapath.
>>>>>>>
>>>>>>> Is it sufficient to suppress the call to vhost_net_ubuf_put in the handle_tx
>>>>>>> sendmsg error path, given that vhost_zerocopy_callback will be called on
>>>>>>> kfree_skb?
>>>>>> We can not call kfree_skb() until the skb was created.
>>>>>>
>>>>>>> Or alternatively clear the destructor in drop:
>>>>>> The uarg->callback() is called immediately after we decide do datacopy
>>>>>> even if caller want to do zerocopy. If another error occurs later, the vhost
>>>>>> handle_tx() will try to decrease it again.
>>>>> Oh right, I missed the else branch in this path:
>>>>>
>>>>>           /* copy skb_ubuf_info for callback when skb has no error */
>>>>>           if (zerocopy) {
>>>>>                   skb_shinfo(skb)->destructor_arg = msg_control;
>>>>>                   skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
>>>>>                   skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
>>>>>           } else if (msg_control) {
>>>>>                   struct ubuf_info *uarg = msg_control;
>>>>>                   uarg->callback(uarg, false);
>>>>>           }
>>>>>
>>>>> So if handle_tx_zerocopy calls tun_sendmsg with ubuf_info (and thus a
>>>>> reference to release), there are these five options:
>>>>>
>>>>> 1. tun_sendmsg succeeds, ubuf_info is associated with skb.
>>>>>        reference released from kfree_skb calling vhost_zerocopy_callback later
>>>>>
>>>>> 2. tun_sendmsg succeeds, ubuf_info is released immediately, as skb is
>>>>> not zerocopy.
>>>>>
>>>>> 3. tun_sendmsg fails before creating skb, handle_tx_zerocopy correctly
>>>>> cleans up on receiving error from tun_sendmsg.
>>>>>
>>>>> 4. tun_sendmsg fails after creating skb, but with copying: decremented
>>>>> at branch shown above + again in handle_tx_zerocopy
>>>>>
>>>>> 5. tun_sendmsg fails after creating skb, with zerocopy: decremented at
>>>>> kfree_skb in drop: + again in handle_tx_zerocopy
>>>>>
>>>>> Since handle_tx_zerocopy has no idea whether on error 3, 4 or 5
>>>>> occurred,
>>>> Actually, it does. If sendmsg returns an error, it can test whether
>>>> vq->heads[nvq->upend_idx].len != VHOST_DMA_IN_PROGRESS.
>>>
>>> Just to make sure I understand this. Any reason for it can't be
>>> VHOST_DMA_IN_PROGRESS here?
>> It can be, and it will be if tun_sendmsg returns EINVAL before
>> assigning the skb destructor.
> I meant returns an error, not necessarily only EINVAL.
>
>> Only if tun_sendmsg released the zerocopy state through
>> kfree_skb->vhost_zerocopy_callback will it have been updated to
>> VHOST_DMA_DONE_LEN. And only then must the caller not try to release
>> the state again.
> 	

I see. So I tend to fix this in vhost instead of tun to be consistent 
with the current error handling in handle_tx_zerocopy().

Thanks