To Chris Wright:

> > +#define FETAL      1       /* being born, not ready for attachments yet */
>
> Heh, odd named state.  Could be:
> #define INIT  1       /* job initializing, not ready for attachments */
>
> Wait, it's never used...

> Lot of this stuff looks like prime header file material.
>

> > +   struct list_head    entry;      /* List of other jobs - same hash */
>
> These could be hlist_node.
>

> > +static struct list_head  job_table[HASH_SIZE];
>
> Make this hlist_head.

Above are unchanged for now.

>
> > +char hidname[16];
>
> static

> > +static char           *hid = NULL;
> > +MODULE_PARM(hid, "s");
>
> use module_param(hid, charp, 0)

> > +int jobfs_delete(struct dentry *delete_dentry);
>
> static

> > +int jobfs_chown(struct dentry *dentry, struct iattr *attr);
>
> static

> > +int jobfs_nosetattr(struct dentry *dentry, struct iattr *attr);
>
> static

> > +   .init   = NULL,
>
> = NULL initialization is unecessary

> > +#ifdef DEBUG
>
> I know this is a work in progress (and I think we went over this bit
> once before).  So just a reminder...

> > +#define DBG_PRINTINIT(s)   \
>
> ...there is already pr_debug().

> > +#define DBG_PRINTEXIT(c)                                           \
>
> ...and please don't use macro like this for simple lock (at least for
> anything final, when locking is debugged ;-)  you can always add pr_debugs
> in the code if needed, and avoid this whole #ifdef DEBUG #else bit.
>

All those are changed as advised, thanks!

> > + * job_getjob
> > + *
>
> Please use proper kernel-doc style comments.

changed.

> > +struct job_entry *
> > +job_getjob(u64 jid)
> > +{
> > +   struct list_head *entry = NULL;
> > +   struct job_entry *tjob = NULL;
> > +   struct job_entry *job = NULL;
> > +
>
> Why not simply lock the table in here?  Instead of always locking around
> this call?  If you really need to control lock in some spot add an
> __job_getjob() w/ no locking and have job_getjob() be like:
> lock()
> __job_getjob()
> unlock()

I see your point, but sometime we need read lock and sometime write lock,
and we don't want to immediately unlock it because we want to modify the
job table (see job_create). It is just matter of where we code the lock.


> > +   if (job->state == ZOMBIE) {
>
> What about == FETAL (or INIT, whatever)?

Yep, it is not used for now.

> Do they need to be created?  Can't userspace/lookup handle the dentry
> creation?
>
> > +   sprintf(buf, "%d", attached->task->pid);
> > +   sprintf(attached->jid_buf, "0x%llx", attached->job->jid);
> > +        attached->pid_dentry = jobfs_create(jobsb, job->jid_dentry, buf,
> > +           S_IFREG|0444, 0, 0, attached->jid_buf, JOB_FILE_READ);
> > +   attached->pids_dentry = jobfs_create(jobsb, piddirs, buf,
> > +           S_IFREG|0444, 0, 0, attached->jid_buf, JOB_FILE_READ);
>
> Mixed tabs and spaces.  And, jobfs_create can fail.

job_create gets called when user request to create a job, the dentries
are created at that time. Yes, any of the jobfs_create could fail, then
userspace operations on that job would suffer, but should job_create
fail because jobfs_create fails?

I do add some error checking there.

>
> > +error_return:
> > +   DBG_PRINTEXIT(errcode);
> > +   if (attached) kfree(attached);
>
> kfree(NULL) is ok.

changed, thanks!

>
> > +           if (waitcnt > 0) {
> > +                   wake_up_interruptible(&job->wait);
>
> wake_up_interruptible_all?

unsure about this one, so not changed.

>
> > +#if 0      /* XXX - Use if capable is not present */
> > +   if (current->euid != 0)
> > +           return -EPERM;
> > +#else
> > +   if (!capable(CAP_SYS_RESOURCE)) {
> > +           errcode = -EPERM;
> > +           goto error_return;
> > +   }
> > +#endif
>
> Make this unconditionally capable()

changed. The code was old.

> > +   /* create files in jobfs for the new job */
> > +   sprintf(buf1, "0x%llx", job->jid);
> > +   sprintf(attached->jid_buf, "0x%llx", job->jid);
>
> Why is this a buffer, and not just the jid?  In fact, what is it used
> for?  Looks superfluous.

buf1 is indeed unnecessary. check out the revised code.

> > +
> > +   job = job_getjob(jid);
>
> This is and odd interface.  AFAICT, every caller already knows job, and
> passes job->jid only to lookup job again from jid.

Good point. I changed to directly pass the job point than jid.

> > +   if (!job)
> > +           return -ENODATA;
> > +
> > +   if (list_empty(&job->attached))
> > +           return -ESRCH;
> > +   else {
> > +           attached = list_entry(job->attached.next, struct job_attach, entry);
>
> Is primary task always at this list location?

Yes, it is the first one in the attached list.

>
> > +    * Set job_table_sem, so no jobs can be deleted while doing
> > +    * this operation.
> > +    */
> > +   JOB_WLOCK(&job_table_sem);
>
> Why?  This doesn't update any jobs or job list does it?

Just don't want job deletion or creation before the hid change is
finished.

> > +        strcpy(name, dentry->d_name.name);
> > +        if (sscanf(name, "%llx", &jid) != 1)
> > +                return -EINVAL;
>
> looks like you don't need tmp buffer, just scan directly from d_name.

Right, thanks!

>
> > +
> > +   job = job_getjob(jid);
>
> Why no locking on this call to job_getjob()?

my bad, should have the lock.

>
> > +        if (!job)
> > +                return -EINVAL;
> > +
> > +        if (capable(CAP_SYS_RESOURCE) || (current->uid == job->user)) {
> > +           job->user = attr->ia_uid;
>
> This is unconditional, regardless of the type of attr change?  Why
> CAP_SYS_RESOURCE?
>
> > +           if (ia_valid & ATTR_UID) {
> > +                   inode->i_uid = attr->ia_uid;
> > +                   if (ia_valid & ATTR_GID)
>
> Why is this nested under ATTR_UID?
>
> > +                           inode->i_gid = attr->ia_gid;
> > +                   return 0;
> > +           }
>
> Perhaps you could filter the ia_valid types you care about, do local
> changes, then just call at least inode_setattr(), and perhaps also check
> inode_change_ok() first.


> > +   jobfs_create_files(sb, root);
> > +        return 0;
>
> Mixed tabs and spaces.  And make sure to check error here and do proper
> cleanup.
>
> > +   if (!delete_dentry->d_inode)
> > +           return 0;
> > +   down (&delete_dentry->d_inode->i_sem);
>
> This should be the parent i_sem.

Thanks!

> > +static int jobfs_open(struct inode *inode, struct file *filp)
> > +{
> > +   filp->private_data = inode->u.generic_ip;
>
> Doesn't seem like this is necessary.  It's not per file data, but per
> inode, so it'll be there in jobfs_read, for example, via
> filp->f_dentry->d_inode, right?
>
> > +   return 0;
> > +}
> > +
> > +static ssize_t jobfs_read(struct file *filp, char *buf,
> > +              size_t count, loff_t *offset)
> > +{
> > +   char *hbuf = (char *) filp->private_data;
> > +   int len;
> > +   char tmp[JOBFS_BUF];
> > +
> > +   len = snprintf(tmp, JOBFS_BUF, hbuf);
>
> A) Is there anyway that hbuf could get printf format characters in it
> (like %n)?  Safer to use a format %s here.
>
> B) Why copy it over to tmp buf?
>
> C) Just use simple_read_from_buffer()
>
> > +   if (*offset > len)
> > +           return 0;
> > +   if (count > len - *offset)
> > +           count = len - *offset;
> > +
> > +   if (copy_to_user(buf, tmp + *offset, count))
> > +           return -EFAULT;
> > +   *offset += count;
> > +   return count;
> > +}

I stole the open and read codes. The content of those readable files
are just numbers, either pid or jid (u64).

I'll try to see if there are better ways to do this.

> > +   tmp1 = strchr(tmp, ' ');
> > +                if (!tmp1)
> > +                        return -EINVAL;
> > +                *tmp1 = '\0';
> > +                tmp1++;
>
> This is looking like an odd interface.  Could these things get their own
> entries?

They share one writable file because these operations are neither frequent
nor concurrent. But different entries may be more appropriate.

> > +   if (copy_from_user(tmp, buf, count))
> > +           return -EFAULT;
> > +   if (sscanf(tmp, "%u", &h) != 1)
>
> Should h be unsigned int (esp for 64-bit machines where unsigned long
> is 64-bit)?

> > +   piddirs = jobfs_create(sb, root, "pids", S_IFDIR|0555, 0, 0,
> > +           data, JOB_RDIR);
>
> jobfs_create can fail, so jobsf_create_files should also be able to fail.

Modified. Thanks!


> > +   /* Initialize the list for accounting subscribers */
> > +   for (i = 0; i < JOB_ACCT_COUNT; i++) {
> > +           acct_list[i] = NULL;
> > +   }
>
> This is already initialized to NULL since it's in bss.

removed.

>
> > +   /* Get hostID string and fill in jid_template hostID segment */
> > +   if (hid) {
> > +           jid_hid = (int)simple_strtoul(hid, &hid, 16);
> > +   } else {
> > +           jid_hid = 0;
> > +   }
>
> This seems to make the static initialization to DISABLED unused.

Good catch, jid_hid does not need to be initialized as DISABLED.

Chris, thanks again for your review, I hope the revised version 
will be better. :)


To Andreas Schwab:
>
> Calling it "job" whould be quite confusing since it is already used to
> refer to all processes in a process group in the shell.

Unfortunatly that's the name SGI has been using for quite a long time.
Job is the middle layer of PAGG(process aggregates)/job/CSA(comprehensive
system accounting) stack.

To Rusty Russell:
> > +MODULE_PARM(hid, "s");
>
> Please use module_param from linux/moduleparam.h.  MODULE_PARM is
> obsolescent.

Thanks, this is changed.

To Jan Engelhardt:

> >But adding /proc/<pid>/job needs to patch fs/proc/base.c, we can not
> >do that in a module. Of course if job gets accepted, this won't be a problem.
>
> That depends on whether the community (read: the standard user) needs it. Maybe
> we could make some procfs functions exported and let jobfs be dependent on
> procfs (let jobfs using proc_create_dir() for example).

What can I do to make the community needs it? :)

I also replied Robin Holt's and Jeffrey Hundstad's comments that job may
belong to /proc in previous email, I would really like to hear more
discussions on this.

I have attached the revised version (mostly based on Chris Wright's comments)
of job patch, any suggestions are highly appreciated.

Thank you!

Signed-off-by: Limin Gu <limin@sgi.com>

--
Limin Gu - Linux System Software
Silicon Graphics Inc., Mountain View, CA