linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* process creation time increases linearly with shmem
@ 2005-08-24 18:43 Ray Fucillo
  2005-08-25  0:14 ` Nick Piggin
  0 siblings, 1 reply; 36+ messages in thread
From: Ray Fucillo @ 2005-08-24 18:43 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 334 bytes --]

I am seeing process creation time increase linearly with the size of the 
shared memory segment that the parent touches.  The attached forktest.c 
is a very simple user program that illustrates this behavior, which I 
have tested on various kernel versions from 2.4 through 2.6.  Is this a 
known issue, and is it solvable?

TIA,
Ray

[-- Attachment #2: forktest.c --]
[-- Type: text/plain, Size: 3803 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/time.h>
#include <errno.h>
#include <sys/shm.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <signal.h>

#define MAXJOBS 50
#define MAXMALLOC 1024

#define USESIGCHLDHND
/* USESIGCHLDHND feature code changes how the parent waits
   for the children.  When this feature code is on we define
   a signal handler for SIGCHLD and call waitpid to clean up
   the child process.  If this feature code is off, we wait
   until all children are forked and then loop through the 
   array of child pids and call waitpid() on each.  The
   purpose of this feature code was to see if there is any
   difference in timing based on cleaning up zombies faster.
   Test have shown no appreciable difference.  */

/* Return a floating point number of seconds since the start
   time in the timeval structure pointed to by starttv */
float elapsedtime(struct timeval *starttv) {
	struct timeval currenttime;
	gettimeofday(&currenttime,NULL);
	return ((currenttime.tv_sec - starttv->tv_sec) +
	       ((float)(currenttime.tv_usec - starttv->tv_usec)/1000000));
}

#ifdef USESIGCHLDHND
int childexitcnt = 0;
void sigchldhnd(int signum) {
	if (waitpid(-1,NULL,WNOHANG)) ++childexitcnt;
	return;
}
#endif

int main(void) {
	pid_t childpid[MAXJOBS];
	int x,i;
	int childcnt = 0;
        float endfork, endwait;
	struct shmid_ds myshmid_ds;
	unsigned int mb;
	int myshmid;
	key_t mykey = 0xf00df00d;
	char *mymem = 0;
	struct timeval starttime;
#ifdef USESIGCHLDHND
	struct sigaction sa;
	sa.sa_handler = sigchldhnd;
	sigemptyset(&sa.sa_mask);
	sa.sa_flags = SA_RESTART;
	if (sigaction(SIGCHLD, &sa, NULL) == -1) {
	   printf("sigaction() failed, errno %d - exiting\n",errno);
	   exit(1);
	}
#endif
        printf("\nNumber of jobs to fork (max %d):  ",MAXJOBS);
        scanf("%d",&x);
        if ((x < 1) || (x > MAXJOBS)) {
           printf("\ninvalid input - exiting\n");
           exit(1);
        }
        printf("\nNumber of MB to allocate (0-%d):  ",MAXMALLOC);
        scanf("%d",&mb);
        if (mb > MAXMALLOC) {
           printf("\ninvalid input - exiting\n");
           exit(1);
        }
	/* allocate and initialize shared memory if number
	   of MB is not zero */
	if (mb) {
	   myshmid = shmget(mykey,mb*1024*1024,IPC_CREAT|0777);
	   if (myshmid == -1) {
	      printf("\nshmget() failed, errno %d. - exiting\n",errno);
	      exit(1);
	   }
	   mymem = (char *) shmat(myshmid,0,0);
	   if (mymem == (char *) -1) {
	      printf("\nshmat() failed, errno %d. - exiting\n",errno);
	      exit(1);
	   }
	   if (shmctl(myshmid,IPC_STAT,&myshmid_ds)) {
	      printf("\nshmctl() failed, errno %d. - exiting\n",errno);
	      exit(1);
	   }
	   /* write a pattern in the new shmem segment*/
	   for (i=0; i < (mb*1024*1024); i+=32) mymem[i]='R';
	}	
	printf("\nStarting %d jobs.  time:0.0", x);
	fflush(stdout);
	gettimeofday(&starttime,NULL); 
	for (i=0; i<x; i++) {
	   childpid[i] = fork();
	   if (!childpid[i]) {
	      /* child process */
	      printf("\n - Child %d         time:%f",i,elapsedtime(&starttime));
	      exit(1);
	   } else if (childpid[i] == -1) {
	      /* failure */
	      printf("\nfork failed, errno = %d");
	   } else childcnt++;
	}
	endfork = elapsedtime(&starttime);
#ifndef USESIGCHLDHND
	for (i=0; i<x; i++) waitpid(childpid[i],0,0);
#else
	while (childexitcnt < childcnt) {
	   if (waitpid(-1,NULL,0)) ++childexitcnt;
	}
#endif	   
	endwait = elapsedtime(&starttime);
	printf("\nTime to fork all processes in seconds: %f", endfork);
        printf("\nTime for all processes to complete: %f\n", endwait);

	/* kill shmem segment */
	if ((mb) && (shmctl(myshmid,IPC_RMID,&myshmid_ds))) {
	   printf("\nshmctl() failed, errno %d. - exiting\n",errno);
	   exit(1);
	}
}



^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: process creation time increases linearly with shmem
@ 2005-08-25 14:05 Parag Warudkar
  2005-08-25 14:22 ` Andi Kleen
  0 siblings, 1 reply; 36+ messages in thread
From: Parag Warudkar @ 2005-08-25 14:05 UTC (permalink / raw)
  To: Andi Kleen, Ray Fucillo; +Cc: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1612 bytes --]

> Ray Fucillo <fucillo@intersystems.com> writes:
> > 
> > The application is a database system called Caché.  We allocate a
> > large shared memory segment for database cache, which in a large
> > production environment may realistically be 1+GB on 32-bit platforms
> > and much larger on 64-bit.  At these sizes fork() is taking hundreds
> > of miliseconds, which can become a noticeable bottleneck for us.  This
> > performance characteristic seems to be unique to Linux vs other Unix
> > implementations.
> 
> You could set up hugetlbfs and use large pages for the SHM (with SHM_HUGETLB);
> then the overhead of walking the pages of it at fork would be much lower.
> 
> -Andi
> -

Why isn't the page walk for the Shared Memory done lazily though? It is better in that applications most likely may not want to page in all of the shared memory at once. Program logic/requirements should dictate this instead of fork making it compulsory. I think this is because we don't distinguish between shared libraries, program text and explicitly shared memory as the above application does - everything is MAP_SHARED.

As someone mentioned this causes unavoidable faults for reading in shared libraries and program text. But if there was a MAP_SHARED|MAP_LAZY - can fork() then be setup not to setup page tables for such mappings and still continue to map the MAP_SHARED ones so program text and libraries don't cause faults? Applications can then specify MAP_SHARED|MAP_LAZY and not incur the overhead of page table walk for the shared memory all at once.

Would it be worth trying to do something like this?

Parag




^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: process creation time increases linearly with shmem
@ 2005-12-14 14:07 Brice Oliver
  2005-12-14 16:21 ` Hugh Dickins
  0 siblings, 1 reply; 36+ messages in thread
From: Brice Oliver @ 2005-12-14 14:07 UTC (permalink / raw)
  To: linux-kernel

Sorry if this is a bit of an unusual request, but I am just trying to
get more information.

I was working with Ray on this issue, and in order to get the
appropriate patch, I need to get the bugzilla number for this
particular patch so that I
can turn that in to RedHat so they will include this fix in their
release (as they will not allow the kernel patch to be applied unless
they apply it to their source and then distribute to their customers).

Is that something that can be provided to me here?

Thanks,
Brice Oliver

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2005-12-14 16:22 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-08-24 18:43 process creation time increases linearly with shmem Ray Fucillo
2005-08-25  0:14 ` Nick Piggin
2005-08-25 13:07   ` Ray Fucillo
2005-08-25 13:13     ` Andi Kleen
2005-08-25 14:28     ` Nick Piggin
2005-08-25 17:31   ` Rik van Riel
2005-08-26  1:26     ` Nick Piggin
2005-08-26  1:50       ` Rik van Riel
2005-08-26  3:56       ` Linus Torvalds
2005-08-26 11:49         ` Hugh Dickins
2005-08-26 14:26           ` Nick Piggin
2005-08-26 17:00             ` Ray Fucillo
2005-08-26 17:53               ` Rik van Riel
2005-08-26 18:20                 ` Ross Biro
2005-08-26 18:56                   ` Hugh Dickins
     [not found]           ` <8783be660508260915524e2b1e@mail.gmail.com>
2005-08-26 16:38             ` Hugh Dickins
2005-08-26 16:43               ` Ross Biro
2005-08-26 18:07           ` Linus Torvalds
2005-08-26 18:41             ` Hugh Dickins
2005-08-26 22:55               ` Linus Torvalds
2005-08-26 23:10               ` Rik van Riel
2005-08-26 23:23                 ` Linus Torvalds
2005-08-27 15:05                   ` Nick Piggin
2005-08-28  4:26                     ` Hugh Dickins
2005-08-28  6:49                       ` Nick Piggin
2005-08-29 23:33                         ` Ray Fucillo
2005-08-30  0:29                           ` Nick Piggin
2005-08-30  1:03                             ` Linus Torvalds
2005-08-30  0:34                           ` Linus Torvalds
2005-08-25 14:05 Parag Warudkar
2005-08-25 14:22 ` Andi Kleen
2005-08-25 14:35   ` Nick Piggin
2005-08-25 14:47   ` Parag Warudkar
2005-08-25 15:56     ` Andi Kleen
2005-12-14 14:07 Brice Oliver
2005-12-14 16:21 ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).