linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: He Zhe <zhe.he@windriver.com>
To: xieyongji@bytedance.com, mst@redhat.com, jasowang@redhat.com,
	stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com,
	hch@infradead.org, christian.brauner@canonical.com,
	rdunlap@infradead.org, willy@infradead.org,
	viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org,
	corbet@lwn.net, mika.penttila@nextfour.com,
	dan.carpenter@oracle.com, gregkh@linuxfoundation.org,
	songmuchun@bytedance.com,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, qiang.zhang@windriver.com,
	zhe.he@windriver.com
Subject: [PATCH] eventfd: Enlarge recursion limit to allow vhost to work
Date: Fri, 18 Jun 2021 16:44:12 +0800	[thread overview]
Message-ID: <20210618084412.18257-1-zhe.he@windriver.com> (raw)
In-Reply-To: <CACycT3t1Dgrzsr7LbBrDhRLDa3qZ85ZOgj9H7r1fqPi-kf7r6Q@mail.gmail.com>

commit b5e683d5cab8 ("eventfd: track eventfd_signal() recursion depth")
introduces a percpu counter that tracks the percpu recursion depth and
warn if it greater than zero, to avoid potential deadlock and stack
overflow.

However sometimes different eventfds may be used in parallel. Specifically,
when heavy network load goes through kvm and vhost, working as below, it
would trigger the following call trace.

-  100.00%
   - 66.51%
        ret_from_fork
        kthread
      - vhost_worker
         - 33.47% handle_tx_kick
              handle_tx
              handle_tx_copy
              vhost_tx_batch.isra.0
              vhost_add_used_and_signal_n
              eventfd_signal
         - 33.05% handle_rx_net
              handle_rx
              vhost_add_used_and_signal_n
              eventfd_signal
   - 33.49%
        ioctl
        entry_SYSCALL_64_after_hwframe
        do_syscall_64
        __x64_sys_ioctl
        ksys_ioctl
        do_vfs_ioctl
        kvm_vcpu_ioctl
        kvm_arch_vcpu_ioctl_run
        vmx_handle_exit
        handle_ept_misconfig
        kvm_io_bus_write
        __kvm_io_bus_write
        eventfd_signal

001: WARNING: CPU: 1 PID: 1503 at fs/eventfd.c:73 eventfd_signal+0x85/0xa0
---- snip ----
001: Call Trace:
001:  vhost_signal+0x15e/0x1b0 [vhost]
001:  vhost_add_used_and_signal_n+0x2b/0x40 [vhost]
001:  handle_rx+0xb9/0x900 [vhost_net]
001:  handle_rx_net+0x15/0x20 [vhost_net]
001:  vhost_worker+0xbe/0x120 [vhost]
001:  kthread+0x106/0x140
001:  ? log_used.part.0+0x20/0x20 [vhost]
001:  ? kthread_park+0x90/0x90
001:  ret_from_fork+0x35/0x40
001: ---[ end trace 0000000000000003 ]---

This patch enlarges the limit to 1 which is the maximum recursion depth we
have found so far.

The credit of modification for eventfd_signal_count goes to
Xie Yongji <xieyongji@bytedance.com>

Signed-off-by: He Zhe <zhe.he@windriver.com>
---
 fs/eventfd.c            | 3 ++-
 include/linux/eventfd.h | 5 ++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index e265b6dd4f34..add6af91cacf 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -71,7 +71,8 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
 	 * it returns true, the eventfd_signal() call should be deferred to a
 	 * safe context.
 	 */
-	if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count)))
+	if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count) >
+	    EFD_WAKE_COUNT_MAX))
 		return 0;
 
 	spin_lock_irqsave(&ctx->wqh.lock, flags);
diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
index fa0a524baed0..74be152ebe87 100644
--- a/include/linux/eventfd.h
+++ b/include/linux/eventfd.h
@@ -29,6 +29,9 @@
 #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
 #define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
 
+/* This is the maximum recursion depth we find so far */
+#define EFD_WAKE_COUNT_MAX 1
+
 struct eventfd_ctx;
 struct file;
 
@@ -47,7 +50,7 @@ DECLARE_PER_CPU(int, eventfd_wake_count);
 
 static inline bool eventfd_signal_count(void)
 {
-	return this_cpu_read(eventfd_wake_count);
+	return this_cpu_read(eventfd_wake_count) > EFD_WAKE_COUNT_MAX;
 }
 
 #else /* CONFIG_EVENTFD */
-- 
2.17.1


  parent reply	other threads:[~2021-06-18  8:47 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-15 14:13 [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace Xie Yongji
2021-06-15 14:13 ` [PATCH v8 01/10] iova: Export alloc_iova_fast() and free_iova_fast(); Xie Yongji
2021-06-15 14:13 ` [PATCH v8 02/10] file: Export receive_fd() to modules Xie Yongji
2021-06-15 14:13 ` [PATCH v8 03/10] eventfd: Increase the recursion depth of eventfd_signal() Xie Yongji
2021-06-17  8:33   ` He Zhe
2021-06-18  3:29     ` Yongji Xie
2021-06-18  8:41       ` He Zhe
2021-06-18  8:44       ` He Zhe [this message]
2021-07-03  8:31         ` [PATCH] eventfd: Enlarge recursion limit to allow vhost to work Michael S. Tsirkin
2021-08-25  7:57         ` Yongji Xie
2021-06-15 14:13 ` [PATCH v8 04/10] vhost-iotlb: Add an opaque pointer for vhost IOTLB Xie Yongji
2021-06-15 14:13 ` [PATCH v8 05/10] vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() Xie Yongji
2021-06-15 14:13 ` [PATCH v8 06/10] vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap() Xie Yongji
2021-06-15 14:13 ` [PATCH v8 07/10] vdpa: Support transferring virtual addressing during DMA mapping Xie Yongji
2021-06-15 14:13 ` [PATCH v8 08/10] vduse: Implement an MMU-based IOMMU driver Xie Yongji
2021-06-15 14:13 ` [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace Xie Yongji
2021-06-21  9:13   ` Jason Wang
2021-06-21 10:41     ` Yongji Xie
2021-06-22  5:06       ` Jason Wang
2021-06-22  7:22         ` Yongji Xie
2021-06-22  7:49           ` Jason Wang
2021-06-22  8:14             ` Yongji Xie
2021-06-23  3:30               ` Jason Wang
2021-06-23  5:50                 ` Yongji Xie
2021-06-24  3:34                   ` Jason Wang
2021-06-24  4:46                     ` Yongji Xie
2021-06-24  8:13                       ` Jason Wang
2021-06-24  9:16                         ` Yongji Xie
2021-06-25  3:08                           ` Jason Wang
2021-06-25  4:19                             ` Yongji Xie
2021-06-28  4:40                               ` Jason Wang
2021-06-29  2:26                                 ` Yongji Xie
2021-06-29  3:29                                   ` Jason Wang
2021-06-29  3:56                                     ` Yongji Xie
2021-06-29  4:03                                       ` Jason Wang
2021-06-24 14:46   ` Stefan Hajnoczi
2021-06-29  2:59     ` Yongji Xie
2021-06-30  9:51       ` Stefan Hajnoczi
2021-07-01  6:50         ` Yongji Xie
2021-07-01  7:55           ` Jason Wang
2021-07-01 10:26             ` Yongji Xie
2021-07-02  3:25               ` Jason Wang
2021-07-07  8:52   ` Stefan Hajnoczi
2021-07-07  9:19     ` Yongji Xie
2021-06-15 14:13 ` [PATCH v8 10/10] Documentation: Add documentation for VDUSE Xie Yongji
2021-06-24 13:01   ` Stefan Hajnoczi
2021-06-29  5:43     ` Yongji Xie
2021-06-30 10:06       ` Stefan Hajnoczi
2021-07-01 10:00         ` Yongji Xie
2021-07-01 13:15           ` Stefan Hajnoczi
2021-07-04  9:49             ` Yongji Xie
2021-07-05  3:36               ` Jason Wang
2021-07-05 12:49                 ` Stefan Hajnoczi
2021-07-06  2:34                   ` Jason Wang
2021-07-06 10:14                     ` Stefan Hajnoczi
     [not found]                       ` <CACGkMEs2HHbUfarum8uQ6wuXoDwLQUSXTsa-huJFiqr__4cwRg@mail.gmail.com>
     [not found]                         ` <YOSOsrQWySr0andk@stefanha-x1.localdomain>
     [not found]                           ` <100e6788-7fdf-1505-d69c-bc28a8bc7a78@redhat.com>
     [not found]                             ` <YOVr801d01YOPzLL@stefanha-x1.localdomain>
2021-07-07  9:24                               ` Jason Wang
2021-07-07 15:54                                 ` Stefan Hajnoczi
2021-07-08  4:17                                   ` Jason Wang
2021-07-08  9:06                                     ` Stefan Hajnoczi
2021-07-08 12:35                                       ` Yongji Xie
2021-07-06  3:04                   ` Yongji Xie
2021-07-06 10:22                     ` Stefan Hajnoczi
2021-07-07  9:09                       ` Yongji Xie
2021-07-08  9:07                         ` Stefan Hajnoczi
2021-06-24 15:12 ` [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace Stefan Hajnoczi
2021-06-29  3:15   ` Yongji Xie
2021-06-28 10:33 ` Liu Xiaodong
2021-06-28  4:35   ` Jason Wang
2021-06-28  5:54     ` Liu, Xiaodong
2021-06-29  4:10       ` Jason Wang
2021-06-29  7:56         ` Liu, Xiaodong
2021-06-29  8:14           ` Yongji Xie
2021-06-28 10:32   ` Yongji Xie
2021-06-29  4:12     ` Jason Wang
2021-06-29  6:40       ` Yongji Xie
2021-06-29  7:33         ` Jason Wang
  -- strict thread matches above, loose matches on Subject: below --
2020-04-10 11:47 [PATCH] eventfd: Enlarge recursion limit to allow vhost to work zhe.he
2020-05-12  7:00 ` He Zhe
2020-06-22  9:09 ` He Zhe
2020-07-03  8:12 ` Juri Lelli
2020-07-03 11:11   ` He Zhe
2020-07-06  6:45     ` Juri Lelli
2020-07-13 13:22       ` Juri Lelli
2020-07-22  9:01         ` Juri Lelli
2020-08-20 10:41           ` He Zhe
2021-05-27 15:52             ` Nitesh Narayan Lal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210618084412.18257-1-zhe.he@windriver.com \
    --to=zhe.he@windriver.com \
    --cc=axboe@kernel.dk \
    --cc=bcrl@kvack.org \
    --cc=christian.brauner@canonical.com \
    --cc=corbet@lwn.net \
    --cc=dan.carpenter@oracle.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mika.penttila@nextfour.com \
    --cc=mst@redhat.com \
    --cc=parav@nvidia.com \
    --cc=qiang.zhang@windriver.com \
    --cc=rdunlap@infradead.org \
    --cc=sgarzare@redhat.com \
    --cc=songmuchun@bytedance.com \
    --cc=stefanha@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=willy@infradead.org \
    --cc=xieyongji@bytedance.com \
    --subject='Re: [PATCH] eventfd: Enlarge recursion limit to allow vhost to work' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).