From: Jason Wang <jasowang@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
ppandit@redhat.com
Subject: Re: [PATCH net] vhost_net: fix possible infinite loop
Date: Sun, 5 May 2019 12:20:24 +0800 [thread overview]
Message-ID: <999ef863-2994-e0c0-fbb1-a6e92de3fd24@redhat.com> (raw)
In-Reply-To: <f4b4ff70-d64f-c3fb-fe2e-97ef6c55bda0@redhat.com>
On 2019/4/26 下午3:35, Jason Wang wrote:
>
> On 2019/4/26 上午1:52, Michael S. Tsirkin wrote:
>> On Thu, Apr 25, 2019 at 03:33:19AM -0400, Jason Wang wrote:
>>> When the rx buffer is too small for a packet, we will discard the vq
>>> descriptor and retry it for the next packet:
>>>
>>> while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
>>> &busyloop_intr))) {
>>> ...
>>> /* On overrun, truncate and discard */
>>> if (unlikely(headcount > UIO_MAXIOV)) {
>>> iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
>>> err = sock->ops->recvmsg(sock, &msg,
>>> 1, MSG_DONTWAIT | MSG_TRUNC);
>>> pr_debug("Discarded rx packet: len %zd\n", sock_len);
>>> continue;
>>> }
>>> ...
>>> }
>>>
>>> This makes it possible to trigger a infinite while..continue loop
>>> through the co-opreation of two VMs like:
>>>
>>> 1) Malicious VM1 allocate 1 byte rx buffer and try to slow down the
>>> vhost process as much as possible e.g using indirect descriptors or
>>> other.
>>> 2) Malicious VM2 generate packets to VM1 as fast as possible
>>>
>>> Fixing this by checking against weight at the end of RX and TX
>>> loop. This also eliminate other similar cases when:
>>>
>>> - userspace is consuming the packets in the meanwhile
>>> - theoretical TOCTOU attack if guest moving avail index back and forth
>>> to hit the continue after vhost find guest just add new buffers
>>>
>>> This addresses CVE-2019-3900.
>>>
>>> Fixes: d8316f3991d20 ("vhost: fix total length when packets are too
>>> short")
>> I agree this is the real issue.
>>
>>> Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")
>> This is just a red herring imho. We can stick this on any vhost patch :)
>>
>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>> ---
>>> drivers/vhost/net.c | 41 +++++++++++++++++++++--------------------
>>> 1 file changed, 21 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>> index df51a35..fb46e6b 100644
>>> --- a/drivers/vhost/net.c
>>> +++ b/drivers/vhost/net.c
>>> @@ -778,8 +778,9 @@ static void handle_tx_copy(struct vhost_net
>>> *net, struct socket *sock)
>>> int err;
>>> int sent_pkts = 0;
>>> bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
>>> + bool next_round = false;
>>> - for (;;) {
>>> + do {
>>> bool busyloop_intr = false;
>>> if (nvq->done_idx == VHOST_NET_BATCH)
>>> @@ -845,11 +846,10 @@ static void handle_tx_copy(struct vhost_net
>>> *net, struct socket *sock)
>>> vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
>>> vq->heads[nvq->done_idx].len = 0;
>>> ++nvq->done_idx;
>>> - if (vhost_exceeds_weight(++sent_pkts, total_len)) {
>>> - vhost_poll_queue(&vq->poll);
>>> - break;
>>> - }
>>> - }
>>> + } while (!(next_round = vhost_exceeds_weight(++sent_pkts,
>>> total_len)));
>>> +
>>> + if (next_round)
>>> + vhost_poll_queue(&vq->poll);
>>> vhost_tx_batch(net, nvq, sock, &msg);
>>> }
>>> @@ -873,8 +873,9 @@ static void handle_tx_zerocopy(struct vhost_net
>>> *net, struct socket *sock)
>>> struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
>>> bool zcopy_used;
>>> int sent_pkts = 0;
>>> + bool next_round = false;
>>> - for (;;) {
>>> + do {
>>> bool busyloop_intr;
>>> /* Release DMAs done buffers first */
>>> @@ -951,11 +952,10 @@ static void handle_tx_zerocopy(struct
>>> vhost_net *net, struct socket *sock)
>>> else
>>> vhost_zerocopy_signal_used(net, vq);
>>> vhost_net_tx_packet(net);
>>> - if (unlikely(vhost_exceeds_weight(++sent_pkts, total_len))) {
>>> - vhost_poll_queue(&vq->poll);
>>> - break;
>>> - }
>>> - }
>>> + } while (!(next_round = vhost_exceeds_weight(++sent_pkts,
>>> total_len)));
>>> +
>>> + if (next_round)
>>> + vhost_poll_queue(&vq->poll);
>>> }
>>> /* Expects to be always run from workqueue - which acts as
>>> @@ -1134,6 +1134,7 @@ static void handle_rx(struct vhost_net *net)
>>> struct iov_iter fixup;
>>> __virtio16 num_buffers;
>>> int recv_pkts = 0;
>>> + bool next_round = false;
>>> mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_RX);
>>> sock = vq->private_data;
>>> @@ -1153,8 +1154,11 @@ static void handle_rx(struct vhost_net *net)
>>> vq->log : NULL;
>>> mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
>>> - while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
>>> - &busyloop_intr))) {
>>> + do {
>>> + sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
>>> + &busyloop_intr);
>>> + if (!sock_len)
>>> + break;
>>> sock_len += sock_hlen;
>>> vhost_len = sock_len + vhost_hlen;
>>> headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
>>> @@ -1239,12 +1243,9 @@ static void handle_rx(struct vhost_net *net)
>>> vhost_log_write(vq, vq_log, log, vhost_len,
>>> vq->iov, in);
>>> total_len += vhost_len;
>>> - if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) {
>>> - vhost_poll_queue(&vq->poll);
>>> - goto out;
>>> - }
>>> - }
>>> - if (unlikely(busyloop_intr))
>>> + } while (!(next_round = vhost_exceeds_weight(++recv_pkts,
>>> total_len)));
>>> +
>>> + if (unlikely(busyloop_intr || next_round))
>>> vhost_poll_queue(&vq->poll);
>>> else
>>> vhost_net_enable_vq(net, vq);
>>
>> I'm afraid with this addition the code is too much like spagetty. What
>> does next_round mean? Just that we are breaking out of loop?
>
>
> Yes, we can have a better name of course.
>
>
>> That is
>> what goto is for... Either let's have for(;;) with goto/break to get
>> outside or a while loop with a condition. Both is just unreadable.
>>
>> All these checks in 3 places are exactly the same on all paths and they
>> are slow path. Why don't we put this in a function?
>
>
> The point I think is, we want the weight to be checked in both fast
> path and slow path.
>
>
>> E.g. like the below.
>> Warning: completely untested.
>>
>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>
>> ---
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index df51a35cf537..a0f89a504cd9 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -761,6 +761,23 @@ static int vhost_net_build_xdp(struct
>> vhost_net_virtqueue *nvq,
>> return 0;
>> }
>> +/* Returns true if caller needs to go back and re-read the ring. */
>> +static bool empty_ring(struct vhost_net *net, struct vhost_virtqueue
>> *vq,
>> + int pkts, size_t total_len, bool busyloop_intr)
>> +{
>> + if (unlikely(busyloop_intr)) {
>> + vhost_poll_queue(&vq->poll);
>> + } else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>> + /* They have slipped one in meanwhile: check again. */
>> + vhost_disable_notify(&net->dev, vq);
>> + if (!vhost_exceeds_weight(pkts, total_len))
>> + return true;
>> + vhost_poll_queue(&vq->poll);
>> + }
>> + /* Nothing new? Wait for eventfd to tell us they refilled. */
>> + return false;
>> +}
>
>
> Ring empy is not the only places that needs care. E.g for RX, we need
> care about overrun and when userspace is consuming the packet in the
> same time. So there's no need to toggle vq notification in those two.
>
>
>> +
>> static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>> {
>> struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
>> @@ -790,15 +807,10 @@ static void handle_tx_copy(struct vhost_net
>> *net, struct socket *sock)
>> /* On error, stop handling until the next kick. */
>> if (unlikely(head < 0))
>> break;
>> - /* Nothing new? Wait for eventfd to tell us they refilled. */
>> if (head == vq->num) {
>> - if (unlikely(busyloop_intr)) {
>> - vhost_poll_queue(&vq->poll);
>> - } else if (unlikely(vhost_enable_notify(&net->dev,
>> - vq))) {
>> - vhost_disable_notify(&net->dev, vq);
>> + if (unlikely(empty_ring(net, vq, ++sent_pkts,
>> + total_len, busyloop_intr)))
>> continue;
>> - }
>> break;
>> }
>> @@ -886,14 +898,10 @@ static void handle_tx_zerocopy(struct
>> vhost_net *net, struct socket *sock)
>> /* On error, stop handling until the next kick. */
>> if (unlikely(head < 0))
>> break;
>> - /* Nothing new? Wait for eventfd to tell us they refilled. */
>> if (head == vq->num) {
>> - if (unlikely(busyloop_intr)) {
>> - vhost_poll_queue(&vq->poll);
>> - } else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>> - vhost_disable_notify(&net->dev, vq);
>> + if (unlikely(empty_ring(net, vq, ++sent_pkts,
>> + total_len, busyloop_intr)))
>> continue;
>> - }
>> break;
>> }
>> @@ -1163,18 +1171,10 @@ static void handle_rx(struct vhost_net *net)
>> /* On error, stop handling until the next kick. */
>> if (unlikely(headcount < 0))
>> goto out;
>> - /* OK, now we need to know about added descriptors. */
>> if (!headcount) {
>> - if (unlikely(busyloop_intr)) {
>> - vhost_poll_queue(&vq->poll);
>> - } else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>> - /* They have slipped one in as we were
>> - * doing that: check again. */
>> - vhost_disable_notify(&net->dev, vq);
>> - continue;
>> - }
>> - /* Nothing new? Wait for eventfd to tell us
>> - * they refilled. */
>> + if (unlikely(empty_ring(net, vq, ++recv_pkts,
>> + total_len, busyloop_intr)))
>> + continue;
>> goto out;
>> }
>> busyloop_intr = false;
>
> The patch misses several other continue that need cares and there's
> another call of vhost_exceeds_weight() at the end of the loop.
>
> So instead of duplicating check everywhere like:
>
> for (;;) {
> if (condition_x) {
> if (empty_ring())
> continue;
> break;
> }
> if (condition_y) {
> if (empty_ring());
> continue;
> break;
> }
> if (condition_z) {
> if (empty_ring())
> continue;
> break;
> }
>
> }
>
> What this patch did:
>
> do {
> if (condition_x)
> continue;
> if (condition_y)
> continue;
> if (condition_z)
> continue;
> } while(!need_break())
>
> is much more compact and easier to read?
>
> Thanks
Hi Michael:
Any more comments?
Thanks
next prev parent reply other threads:[~2019-05-05 4:20 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-25 7:33 [PATCH net] vhost_net: fix possible infinite loop Jason Wang
2019-04-25 17:52 ` Michael S. Tsirkin
2019-04-26 7:35 ` Jason Wang
2019-05-05 4:20 ` Jason Wang [this message]
2019-05-12 17:10 ` Michael S. Tsirkin
2019-05-13 5:42 ` Jason Wang
2019-05-14 21:39 ` Michael S. Tsirkin
2019-05-15 2:57 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=999ef863-2994-e0c0-fbb1-a6e92de3fd24@redhat.com \
--to=jasowang@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=ppandit@redhat.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).