All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH dlm-tool 1/2] Revert "dlm_controld: improve netlink ENOBUFS error handling"
@ 2021-03-01 15:16 Alexander Aring
  2021-03-01 15:16 ` [Cluster-devel] [PATCH dlm-tool 2/2] dlm_controld: log receive buffer fail Alexander Aring
  0 siblings, 1 reply; 2+ messages in thread
From: Alexander Aring @ 2021-03-01 15:16 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This reverts commit a9b6e5beb8c20e2f76a637327a55683efc582e1c.

The reason why this patch should be reverted is that NETLINK_NO_ENOBUFS
option will just disable ENOBUFS errors and don't put netlink in any
kind of reliable mode. It's just disable receiving this error and does
not try to avoid that such error occurs by doing any kind of additional
handling. There exists ways to make netlink reliable on top of netlink
interface implementation of the kernel, however this isn't done yet.
Another solution would be to filter inside the kernel or switch to
polling.

I need to look again into this problem, what I can say is that after
receiving such error and increasing the receive buffer size to two
megabyte (by /proc/sys/net/core/rmem_default) as temporary solution was
solving the issue for the specific case. For this reason we reverting
this patch and switching back to increase of the receiving socket buffer.
Although this is only a solution when there exists a "burst" of message,
if we constantly have a huge amount of message it will end in congestion
and ENOBUFS occurs again after some time. Therefore, is a reliable switch
(like what I was thinking what NETLINK_NO_ENOBUFS was doing in a magic
way) not the right solution as well.

Final thoughts, I don't believe a lot of people running into this issue.
The related machine which had the ENOBUFS error had a lot of udev messages
going on because registering and deregistering devices, but it is a problem.
---
 dlm_controld/main.c | 34 +++++++++-------------------------
 1 file changed, 9 insertions(+), 25 deletions(-)

diff --git a/dlm_controld/main.c b/dlm_controld/main.c
index c35756d4..bcccec4c 100644
--- a/dlm_controld/main.c
+++ b/dlm_controld/main.c
@@ -765,7 +765,8 @@ static void process_uevent(int ci)
 static int setup_uevent(void)
 {
 	struct sockaddr_nl snl;
-	int s, rv, val;
+	int rcvbuf;
+	int s, rv;
 
 	s = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);
 	if (s < 0) {
@@ -782,31 +783,14 @@ static int setup_uevent(void)
 	 * the  application  to  detect  when  this happens (via the ENOBUFS error
 	 * returned by recvmsg(2)) and resynchronize.
 	 *
-	 * To avoid ENOBUFS errors we set the netlink socket to realiable
-	 * transmission mode which can be turned on by NETLINK_NO_ENOBUFS
-	 * option. This option is available since kernel 2.6.30. If this setting
-	 * fails we fallback to increase the netlink socket receive buffer.
+	 * To prevent ENOBUFS errors we just set the receive buffer to two
+	 * megabyte as other applications do it. This will not ensure that we never
+	 * receive ENOBUFS but it's more unlikely. May it's worth to handle ENOBUFS
+	 * errors on a different way in future.
 	 */
-	val = 1;
-	rv = setsockopt(s, SOL_NETLINK, NETLINK_NO_ENOBUFS, &val, sizeof(val));
-	if (rv == -1) {
-		/* Fallback handling if NETLINK_NO_ENOBUFS fails to set.
-		 *
-		 * To prevent ENOBUFS errors we just set the receive buffer to
-		 * two megabyte as other applications do it. This will not
-		 * ensure that we never receive ENOBUFS but it's more unlikely.
-		 */
-		val = DEFAULT_NETLINK_RCVBUF;
-		log_error("uevent netlink NETLINK_NO_ENOBUFS errno %d, will set rcvbuf to %d bytes", errno, val);
-
-		rv = setsockopt(s, SOL_SOCKET, SO_RCVBUF, &val, sizeof(val));
-		if (rv == -1)
-			log_error("uevent netlink SO_RCVBUF errno %d", errno);
-
-		rv = setsockopt(s, SOL_SOCKET, SO_RCVBUFFORCE, &val, sizeof(val));
-		if (rv == -1)
-			log_error("uevent netlink SO_RCVBUFFORCE errno %d", errno);
-	}
+	rcvbuf = DEFAULT_NETLINK_RCVBUF;
+	setsockopt(s, SOL_SOCKET, SO_RCVBUF, &rcvbuf, sizeof(rcvbuf));
+	setsockopt(s, SOL_SOCKET, SO_RCVBUFFORCE, &rcvbuf, sizeof(rcvbuf));
 
 	memset(&snl, 0, sizeof(snl));
 	snl.nl_family = AF_NETLINK;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [Cluster-devel] [PATCH dlm-tool 2/2] dlm_controld: log receive buffer fail
  2021-03-01 15:16 [Cluster-devel] [PATCH dlm-tool 1/2] Revert "dlm_controld: improve netlink ENOBUFS error handling" Alexander Aring
@ 2021-03-01 15:16 ` Alexander Aring
  0 siblings, 0 replies; 2+ messages in thread
From: Alexander Aring @ 2021-03-01 15:16 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch adds additional handling to log failures of setting receive
buffer failures to ensure on debugging if dlm_controld increased the
netlink receive buffer size.
---
 dlm_controld/main.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/dlm_controld/main.c b/dlm_controld/main.c
index bcccec4c..2df305bd 100644
--- a/dlm_controld/main.c
+++ b/dlm_controld/main.c
@@ -765,8 +765,7 @@ static void process_uevent(int ci)
 static int setup_uevent(void)
 {
 	struct sockaddr_nl snl;
-	int rcvbuf;
-	int s, rv;
+	int s, rv, val;
 
 	s = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);
 	if (s < 0) {
@@ -788,9 +787,14 @@ static int setup_uevent(void)
 	 * receive ENOBUFS but it's more unlikely. May it's worth to handle ENOBUFS
 	 * errors on a different way in future.
 	 */
-	rcvbuf = DEFAULT_NETLINK_RCVBUF;
-	setsockopt(s, SOL_SOCKET, SO_RCVBUF, &rcvbuf, sizeof(rcvbuf));
-	setsockopt(s, SOL_SOCKET, SO_RCVBUFFORCE, &rcvbuf, sizeof(rcvbuf));
+	val = DEFAULT_NETLINK_RCVBUF;
+	rv = setsockopt(s, SOL_SOCKET, SO_RCVBUF, &val, sizeof(val));
+	if (rv == -1)
+		log_error("uevent netlink SO_RCVBUF errno %d", errno);
+
+	rv = setsockopt(s, SOL_SOCKET, SO_RCVBUFFORCE, &val, sizeof(val));
+	if (rv == -1)
+		log_error("uevent netlink SO_RCVBUFFORCE errno %d", errno);
 
 	memset(&snl, 0, sizeof(snl));
 	snl.nl_family = AF_NETLINK;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-03-01 15:16 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-01 15:16 [Cluster-devel] [PATCH dlm-tool 1/2] Revert "dlm_controld: improve netlink ENOBUFS error handling" Alexander Aring
2021-03-01 15:16 ` [Cluster-devel] [PATCH dlm-tool 2/2] dlm_controld: log receive buffer fail Alexander Aring

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.