Near-simultaneous automount of multiple directories fails

* Near-simultaneous automount of multiple directories fails
@ 2016-04-08  7:55 Marcel De Boer
  2016-04-08  8:54 ` Ian Kent
  0 siblings, 1 reply; 5+ messages in thread
From: Marcel De Boer @ 2016-04-08  7:55 UTC (permalink / raw)
  To: autofs

Hi!

I've already reported this on the CentOS bug tracker a while ago, but I 
thought I'd report it here too.

https://bugs.centos.org/view.php?id=9835

Summarized (there's more information on the bug report): on one of our 
servers we initially saw that every few days one home directory became 
inaccessible. This happened to two different homedirectories (but only one 
at a time) out of the couple hundred we have. We traced this to 
simultaneously scheduled cron scripts running out of the affected 
homedirectories, which caused both directories to be mounted nearly 
simultaneously.

A test setup on a different machine (the primary description from the bug 
report, as the server was not stock CentOS) also showed that if we had 
cron simultaneously mount four directories every 10 minutes, only half of 
them would get mounted every time. On this machine an RPM rebuild of 
autofs made the issue disappear, but it was much more persistent on the 
server.

Eventually it seems that there is an issue in mount_mount() from 
mount_nfs.c; to my untrained eye, it looks like it can get called 
simultaneously from different threads, where they change shared 
information, probably the 'hosts' or 'tmp' lists.

I made a patch that seems to work reliably for our situation, but it's 
very crude, it just makes sure everything touching the 'hosts' list (and 
everything else during that time) does not run in parallel. It might be a 
starting point for someone who knows the code better, though. (Patch was 
made against the code used in the 5.0.5_115 CentOS 6 RPM.)

The server has received some more upgrades in the mean while, so we may no 
be able to reproduce it on that system anymore.

Kind regards,
 	Marcel de Boer

--- autofs-5.0.5-orig/modules/mount_nfs.c	2016-01-05 15:26:55.993014650 +0100
+++ autofs-5.0.5/modules/mount_nfs.c	2016-01-05 15:25:51.434011526 +0100
@@ -40,6 +40,9 @@
  static struct mount_mod *mount_bind = NULL;
  static int init_ctr = 0;

+/* Multiple access to hosts workaround */
+static pthread_mutex_t host_list_mutex = PTHREAD_MUTEX_INITIALIZER;
+
  int mount_init(void **context)
  {
  	/* Make sure we have the local mount method available */
@@ -190,7 +193,9 @@
  		      nfsoptions, nobind, nosymlink, ro);
  	}

+	pthread_mutex_lock(&host_list_mutex);
  	if (!parse_location(ap->logopt, &hosts, what, flags)) {
+        	pthread_mutex_unlock(&host_list_mutex);
  		info(ap->logopt, MODPREFIX "no hosts available");
  		return 1;
  	}
@@ -235,6 +240,7 @@

  dont_probe:
  	if (!hosts) {
+        	pthread_mutex_unlock(&host_list_mutex);
  		info(ap->logopt, MODPREFIX "no hosts available");
  		return 1;
  	}
@@ -264,6 +270,7 @@
  		char *estr = strerror_r(errno, buf, MAX_ERR_BUF);
  		error(ap->logopt,
  		      MODPREFIX "mkdir_path %s failed: %s", fullpath, estr);
+        	pthread_mutex_unlock(&host_list_mutex);
  		return 1;
  	}

@@ -300,6 +307,7 @@
  			/* Success - we're done */
  			if (!err) {
  				free_host_list(&hosts);
+                        	pthread_mutex_unlock(&host_list_mutex);
  				return 0;
  			}

@@ -325,6 +333,7 @@
  			if (!loc) {
  				char *estr = strerror_r(errno, buf, MAX_ERR_BUF);
  				error(ap->logopt, "malloc: %s", estr);
+                        	pthread_mutex_unlock(&host_list_mutex);
  				return 1;
  			}
  			if (this->addr->sa_family == AF_INET6) {
@@ -338,6 +347,7 @@
  			if (!loc) {
  				char *estr = strerror_r(errno, buf, MAX_ERR_BUF);
  				error(ap->logopt, "malloc: %s", estr);
+                        	pthread_mutex_unlock(&host_list_mutex);
  				return 1;
  			}
  			strcpy(loc, this->name);
@@ -365,6 +375,7 @@
  			info(ap->logopt, MODPREFIX "mounted %s on %s", loc, fullpath);
  			free(loc);
  			free_host_list(&hosts);
+                       	pthread_mutex_unlock(&host_list_mutex);
  			return 0;
  		}

@@ -374,6 +385,7 @@

  forced_fail:
  	free_host_list(&hosts);
+	pthread_mutex_unlock(&host_list_mutex);

  	/* If we get here we've failed to complete the mount */



-- 
Marcel de Boer
Test engineer, Service Routing R&D, IP/Optical Networks
Nokia, Antwerp, Belgium
--
To unsubscribe from this list: send the line "unsubscribe autofs" in

^ permalink raw reply	[flat|nested] 5+ messages in thread