linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@mellanox.co.il>
To: discuss@x86-64.org, linux-kernel@vger.kernel.org
Subject: [patch] Re: [discuss] speed up compatible ioctls in linux kernel
Date: Wed, 15 Sep 2004 16:19:55 +0300	[thread overview]
Message-ID: <20040915131955.GA9274@mellanox.co.il> (raw)
In-Reply-To: <20040908143852.GA27411@wotan.suse.de>

Hello!
Quoting r. Andi Kleen (ak@suse.de) "Re: [patch]   Re: [discuss] f_ops flag to speed up compatible ioctls in linux kernel":
> On Wed, Sep 08, 2004 at 05:28:08PM +0300, Michael S. Tsirkin wrote:
> > --- linux-2.6.8.1/include/linux/fs.h	2004-09-07 19:33:43.000000000 +0300
> > +++ linux-2.6.8.1-new/include/linux/fs.h	2004-09-08 07:18:20.000000000 +0300
> > @@ -879,6 +879,8 @@ struct file_operations {
> >  	int (*readdir) (struct file *, void *, filldir_t);
> >  	unsigned int (*poll) (struct file *, struct poll_table_struct *);
> >  	int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
> > +	int (*ioctl_native) (struct inode *, struct file *, unsigned int, unsigned long);
> > +	int (*ioctl_compat) (struct inode *, struct file *, unsigned int, unsigned long);
> 
> Define these as long, not int.  No need to waste 32 perfectly good bits on 
> 64bit platforms.
> 
> The main thing missing is documentation. You need clear comments what
> the locking rules are and what compat is good for.
> 
> And you should change the code style to follow Documentation/CodingStyle
> 
> Other than that it looks ok to me.
> 
> -Andi

OK, here (below) is an updated version. I added a bit of documentation
and fixed two coding style issues that I found, if there are more issues
please let me know.

There are two new calls done without taking the BKL at any point -
for native and compat ioctls. A separate call for compat ioctl
makes it possible to avoid the (IMO ugly) hash lookup altogether.

I made the calls return long although this means a driver can not just
reuse hos old "ioctl" function - the return type has to be changed.
Otherwise ioctl_native/ioctl_compat are a drop-in replacement.


Toy benchmark:   ioctl does a switch and takes a semaphore (note
dual processor results may be more drastic of there is no semaphore to
serialise on).

single process test:

before:
time /tmp/ioctltest32 /dev/mst/mt23108_pci_cr0
0.592u 8.313s 0:08.91 99.8%     0+0k 0+0io 0pf+0w
time /tmp/ioctltest64 /dev/mst/mt23108_pci_cr0
0.669u 5.684s 0:06.36 99.6%     0+0k 0+0io 0pf+0w

after:
time /tmp/ioctltest32 /dev/mst/mt23108_pci_cr0
0.749u 5.133s 0:05.91 99.3%     0+0k 0+0io 1pf+0w
time /tmp/ioctltest64 /dev/mst/mt23108_pci_cr0
0.583u 5.424s 0:06.01 99.8%     0+0k 0+0io 0pf+0w

System time reduced by 20% for 32 bit test.

Dual process (on dual CPU):

before:

time /tmp/ioctltest32 /dev/mst/mt23108_pci_cr0 & ; time /tmp/ioctltest32 /dev/mst/mt23108_pci_cr0 &
wait
[2]  + Done                          /tmp/ioctltest32 /dev/mst/mt23108_pci_cr0
0.777u 28.376s 0:29.15 99.9%    0+0k 0+0io 0pf+0w
[1]  + Done                          /tmp/ioctltest32 /dev/mst/mt23108_pci_cr0
1.113u 28.179s 0:29.32 99.8%    0+0k 0+0io 0pf+0w


time /tmp/ioctltest64 /dev/mst/mt23108_pci_cr0 & ; time /tmp/ioctltest64 /dev/mst/mt23108_pci_cr0 &
wait
[2]  + Done                          /tmp/ioctltest64 /dev/mst/mt23108_pci_cr0
0.551u 12.367s 0:12.92 99.9%    0+0k 0+0io 0pf+0w
[1]  + Done                          /tmp/ioctltest64 /dev/mst/mt23108_pci_cr0
0.554u 12.447s 0:13.01 99.8%    0+0k 0+0io 0pf+0w

after:

time /tmp/ioctltest32 /dev/mst/mt23108_pci_cr0
0.749u 5.133s 0:05.91 99.3%     0+0k 0+0io 1pf+0w
time /tmp/ioctltest64 /dev/mst/mt23108_pci_cr0
0.583u 5.424s 0:06.01 99.8%     0+0k 0+0io 0pf+0w
time /tmp/ioctltest32 /dev/mst/mt23108_pci_cr0 & ; time /tmp/ioctltest32
/dev/mst/mt23108_pci_cr0 &
wait
[2]  + Done                          /tmp/ioctltest32 /dev/mst/mt23108_pci_cr0
0.626u 9.947s 0:10.60 99.6%     0+0k 0+0io 0pf+0w
[1]  + Done                          /tmp/ioctltest32 /dev/mst/mt23108_pci_cr0
0.660u 10.039s 0:10.70 99.9%    0+0k 0+0io 0pf+0w


time /tmp/ioctltest64 /dev/mst/mt23108_pci_cr0 & ; time /tmp/ioctltest64 /dev/mst/mt23108_pci_cr0 &
wait
[2]  + Done                          /tmp/ioctltest64 /dev/mst/mt23108_pci_cr0
0.473u 9.857s 0:10.37 99.5%     0+0k 0+0io 0pf+0w
[1]  + Done                          /tmp/ioctltest64 /dev/mst/mt23108_pci_cr0
0.632u 9.813s 0:10.44 100.0%    0+0k 0+0io 0pf+0w

Almost 50% improvement for 32 bit code.

MST



diff -ru linux-2.6.8.1/include/linux/fs.h linux-2.6.8.1-new/include/linux/fs.h
--- linux-2.6.8.1/include/linux/fs.h	2004-09-14 21:20:43.000000000 +0300
+++ linux-2.6.8.1-new/include/linux/fs.h	2004-09-15 15:12:38.474626592 +0300
@@ -879,6 +879,22 @@
 	int (*readdir) (struct file *, void *, filldir_t);
 	unsigned int (*poll) (struct file *, struct poll_table_struct *);
 	int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
+	/* The following two calls are a replacement for the ioctl call above.
+	 * They take precedence over ioctl - if set, ioctl will not be used.
+	 * Unlike ioctl, BKL is not taken: drivers shall manage their own
+	 * locking. */
+
+	/* If ioctl_native is set, it is used instead 
+	 * of ioctl for native ioctl syscalls.
+	 * Note that standard glibc ioctl trims the return code to int,
+	 * so dont try to put 64 bit there, unless you know what you are doing.
+	 */
+	long (*ioctl_native) (struct inode *, struct file *, unsigned int, unsigned long);
+	/* If ioctl_compat is set, it is used for compatibility ioctl
+	 * (i.e. a 32 bit binary running on a 64 bit OS).
+	 * Note that only the low 32 bit of the return code are passed to the
+	 * user-space application. */
+	long (*ioctl_compat) (struct inode *, struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
 	int (*open) (struct inode *, struct file *);
 	int (*flush) (struct file *);
diff -ru linux-2.6.8.1/Documentation/filesystems/Locking linux-2.6.8.1-new/Documentation/filesystems/Locking
--- linux-2.6.8.1/Documentation/filesystems/Locking	2004-09-14 21:20:57.000000000 +0300
+++ linux-2.6.8.1-new/Documentation/filesystems/Locking	2004-09-15 15:23:35.149796792 +0300
@@ -331,6 +331,10 @@
 	unsigned int (*poll) (struct file *, struct poll_table_struct *);
 	int (*ioctl) (struct inode *, struct file *, unsigned int,
 			unsigned long);
+	long (*ioctl_native) (struct inode *, struct file *, unsigned int,
+			unsigned long);
+	long (*ioctl_compat) (struct inode *, struct file *, unsigned int,
+			unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
 	int (*open) (struct inode *, struct file *);
 	int (*flush) (struct file *);
@@ -364,6 +368,8 @@
 readdir: 		no
 poll:			no
 ioctl:			yes	(see below)
+ioctl_native:		no	(see below)
+ioctl_compat:		no	(see below)
 mmap:			no
 open:			maybe	(see below)
 flush:			no
@@ -409,6 +415,9 @@
 anything that resembles union-mount we won't have a struct file for all
 components. And there are other reasons why the current interface is a mess...
 
+->ioctl() on regular files is superceded by the ->ioctl_native() and
+->ioctl_compat() pair. The lock is not taken for these new calls.
+
 ->read on directories probably must go away - we should just enforce -EISDIR
 in sys_read() and friends.
 
diff -ru linux-2.6.8.1/fs/compat.c linux-2.6.8.1-new/fs/compat.c
--- linux-2.6.8.1/fs/compat.c	2004-09-14 21:20:56.000000000 +0300
+++ linux-2.6.8.1-new/fs/compat.c	2004-09-15 15:15:04.384444928 +0300
@@ -385,15 +385,19 @@
 				unsigned long arg)
 {
 	struct file * filp;
-	int error = -EBADF;
+	long error = -EBADF;
 	struct ioctl_trans *t;
+	int fput_needed;
 
-	filp = fget(fd);
+	filp = fget_light(fd, &fput_needed);
 	if(!filp)
 		goto out2;
 
-	if (!filp->f_op || !filp->f_op->ioctl) {
-		error = sys_ioctl (fd, cmd, arg);
+	if (!std_sys_ioctl(fd,cmd,arg,filp,&error))
+		goto out;
+	else if (filp->f_op && filp->f_op->ioctl_compat) {
+		error = filp->f_op->ioctl_compat( filp->f_dentry->d_inode,
+				filp, cmd, arg);
 		goto out;
 	}
 
@@ -406,11 +410,12 @@
 	if (t) {
 		if (t->handler) { 
 			error = t->handler(fd, cmd, arg, filp);
-			unlock_kernel();
-		} else {
-			unlock_kernel();
-			error = sys_ioctl(fd, cmd, arg);
+		} else if (filp->f_op && filp->f_op->ioctl) {
+			error = filp->f_op->ioctl(
+					filp->f_dentry->d_inode,
+					filp, cmd, arg);
 		}
+		unlock_kernel();
 	} else {
 		unlock_kernel();
 		if (cmd >= SIOCDEVPRIVATE && cmd <= (SIOCDEVPRIVATE + 15)) {
@@ -446,7 +451,7 @@
 		}
 	}
 out:
-	fput(filp);
+	fput_light(filp, fput_needed);
 out2:
 	return error;
 }
diff -ru linux-2.6.8.1/fs/ioctl.c linux-2.6.8.1-new/fs/ioctl.c
--- linux-2.6.8.1/fs/ioctl.c	2004-09-14 21:20:56.000000000 +0300
+++ linux-2.6.8.1-new/fs/ioctl.c	2004-09-15 15:13:16.065911848 +0300
@@ -35,7 +35,9 @@
 			if ((error = get_user(block, p)) != 0)
 				return error;
 
+			lock_kernel();
 			res = mapping->a_ops->bmap(mapping, block);
+			unlock_kernel();
 			return put_user(res, p);
 		}
 		case FIGETBSZ:
@@ -45,29 +47,17 @@
 		case FIONREAD:
 			return put_user(i_size_read(inode) - filp->f_pos, p);
 	}
-	if (filp->f_op && filp->f_op->ioctl)
-		return filp->f_op->ioctl(inode, filp, cmd, arg);
 	return -ENOTTY;
 }
 
 
-asmlinkage long sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg)
-{	
-	struct file * filp;
+EXPORT_SYMBOL(std_sys_ioctl);
+int std_sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg,
+	struct file * filp, int* error)
+{
+	int on;
 	unsigned int flag;
-	int on, error = -EBADF;
-
-	filp = fget(fd);
-	if (!filp)
-		goto out;
-
-	error = security_file_ioctl(filp, cmd, arg);
-	if (error) {
-                fput(filp);
-                goto out;
-        }
-
-	lock_kernel();
+	*error = security_file_ioctl(filp, cmd, arg);
 	switch (cmd) {
 		case FIOCLEX:
 			set_close_on_exec(fd, 1);
@@ -78,7 +68,7 @@
 			break;
 
 		case FIONBIO:
-			if ((error = get_user(on, (int __user *)arg)) != 0)
+			if ((*error = get_user(on, (int __user *)arg)) != 0)
 				break;
 			flag = O_NONBLOCK;
 #ifdef __sparc__
@@ -93,17 +83,20 @@
 			break;
 
 		case FIOASYNC:
-			if ((error = get_user(on, (int __user *)arg)) != 0)
+			if ((*error = get_user(on, (int __user *)arg)) != 0)
 				break;
 			flag = on ? FASYNC : 0;
 
 			/* Did FASYNC state change ? */
 			if ((flag ^ filp->f_flags) & FASYNC) {
-				if (filp->f_op && filp->f_op->fasync)
-					error = filp->f_op->fasync(fd, filp, on);
-				else error = -ENOTTY;
+				if (filp->f_op && filp->f_op->fasync) {
+					lock_kernel();
+					*error = filp->f_op->fasync(fd, filp, on);
+					unlock_kernel();
+				}
+				else *error = -ENOTTY;
 			}
-			if (error != 0)
+			if (*error != 0)
 				break;
 
 			if (on)
@@ -117,20 +110,50 @@
 			    S_ISREG(filp->f_dentry->d_inode->i_mode) ||
 			    S_ISLNK(filp->f_dentry->d_inode->i_mode)) {
 				loff_t res = inode_get_bytes(filp->f_dentry->d_inode);
-				error = copy_to_user((loff_t __user *)arg, &res, sizeof(res)) ? -EFAULT : 0;
+				*error = copy_to_user((loff_t __user *)arg, &res, sizeof(res)) ? -EFAULT : 0;
 			}
 			else
-				error = -ENOTTY;
+				*error = -ENOTTY;
 			break;
 		default:
-			error = -ENOTTY;
-			if (S_ISREG(filp->f_dentry->d_inode->i_mode))
-				error = file_ioctl(filp, cmd, arg);
-			else if (filp->f_op && filp->f_op->ioctl)
-				error = filp->f_op->ioctl(filp->f_dentry->d_inode, filp, cmd, arg);
+			*error = -ENOTTY;
+			if (S_ISREG(filp->f_dentry->d_inode->i_mode)) {
+				*error = file_ioctl(filp, cmd, arg);
+			}
+			if (*error == -ENOTTY) return 1;
 	}
-	unlock_kernel();
-	fput(filp);
+	return 0;
+}
+
+
+
+asmlinkage long sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg)
+{	
+	struct file * filp;
+	long error = -EBADF;
+	int fput_needed;
+
+	filp = fget_light(fd,&fput_needed);
+	if (!filp)
+		goto out;
+
+	if (!std_sys_ioctl(fd,cmd,arg,filp,&error)) {
+		goto out;
+	}
+
+	if (filp->f_op && filp->f_op->ioctl_native)
+		error = filp->f_op->ioctl_native(
+				filp->f_dentry->d_inode,
+				filp, cmd, arg);
+	else if (filp->f_op && filp->f_op->ioctl) {
+		lock_kernel();
+		error = filp->f_op->ioctl(
+				filp->f_dentry->d_inode,
+				filp, cmd, arg);
+		unlock_kernel();
+	}
+
+	fput_light(filp,fput_needed);
 
 out:
 	return error;
diff -ru linux-2.6.8.1/include/linux/ioctl.h linux-2.6.8.1-new/include/linux/ioctl.h
--- linux-2.6.8.1/include/linux/ioctl.h	2004-09-14 21:20:43.000000000 +0300
+++ linux-2.6.8.1-new/include/linux/ioctl.h	2004-09-13 18:04:23.000000000 +0300
@@ -3,5 +3,10 @@
 
 #include <asm/ioctl.h>
 
+/* Handles standard ioctl calls */
+struct file;
+int std_sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg,
+	struct file * filp, int* error);
+
 #endif /* _LINUX_IOCTL_H */
 
diff -ru linux-2.6.8.1/include/linux/compat_ioctl.h linux-2.6.8.1-new/include/linux/compat_ioctl.h
--- linux-2.6.8.1/include/linux/compat_ioctl.h	2004-09-14 21:20:43.000000000 +0300
+++ linux-2.6.8.1-new/include/linux/compat_ioctl.h	2004-09-07 20:19:24.000000000 +0300
@@ -54,15 +54,6 @@
 COMPATIBLE_IOCTL(FBIOPAN_DISPLAY)
 COMPATIBLE_IOCTL(FBIOGET_CON2FBMAP)
 COMPATIBLE_IOCTL(FBIOPUT_CON2FBMAP)
-/* Little f */
-COMPATIBLE_IOCTL(FIOCLEX)
-COMPATIBLE_IOCTL(FIONCLEX)
-COMPATIBLE_IOCTL(FIOASYNC)
-COMPATIBLE_IOCTL(FIONBIO)
-COMPATIBLE_IOCTL(FIONREAD)  /* This is also TIOCINQ */
-/* 0x00 */
-COMPATIBLE_IOCTL(FIBMAP)
-COMPATIBLE_IOCTL(FIGETBSZ)
 /* 0x03 -- HD/IDE ioctl's used by hdparm and friends.
  *         Some need translations, these do not.
  */
diff -ru linux-2.6.8.1/arch/x86_64/ia32/ia32_ioctl.c linux-2.6.8.1-new/arch/x86_64/ia32/ia32_ioctl.c
--- linux-2.6.8.1/arch/x86_64/ia32/ia32_ioctl.c	2004-09-14 21:20:50.000000000 +0300
+++ linux-2.6.8.1-new/arch/x86_64/ia32/ia32_ioctl.c	2004-09-07 20:19:05.000000000 +0300
@@ -188,7 +188,6 @@
 COMPATIBLE_IOCTL(RTC_SET_TIME)
 COMPATIBLE_IOCTL(RTC_WKALM_SET)
 COMPATIBLE_IOCTL(RTC_WKALM_RD)
-COMPATIBLE_IOCTL(FIOQSIZE)
 
 /* And these ioctls need translation */
 HANDLE_IOCTL(TIOCGDEV, tiocgdev)

  parent reply	other threads:[~2004-09-15 13:35 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-01  7:22 f_ops flag to speed up compatible ioctls in linux kernel Michael S. Tsirkin
2004-09-01  7:32 ` viro
2004-09-01  7:44   ` Michael S. Tsirkin
2004-09-01  7:47   ` Lee Revell
2004-09-01  8:19     ` Michael S. Tsirkin
2004-09-01 15:55   ` Roland Dreier
2004-09-01 18:02     ` Chris Wright
2004-09-01 18:12       ` Roland Dreier
2004-09-01 18:31         ` viro
2004-09-01 20:54       ` Roland Dreier
     [not found]         ` <20040901170800.K1924@build.pdx.osdl.net>
     [not found]           ` <20040901190122.L1924@build.pdx.osdl.net>
2004-09-02  3:46             ` Roland Dreier
2004-09-01 18:06   ` Bill Davidsen
2004-09-01  8:30 ` Arjan van de Ven
2004-09-01 15:40 ` [PATCH] fs/compat.c: rwsem instead of BKL around ioctl32_hash_table Roland Dreier
2004-09-01 23:27   ` Andrew Morton
2004-09-02 21:14   ` Andi Kleen
2004-09-02 22:26     ` Roland Dreier
2004-09-03 14:37       ` [discuss] " Andi Kleen
2004-09-03 14:55         ` Roland Dreier
2004-09-03 15:02           ` Andi Kleen
2004-09-03  8:00 ` [discuss] f_ops flag to speed up compatible ioctls in linux kernel Andi Kleen
2004-09-07 10:40   ` Michael S. Tsirkin
2004-09-07 12:14     ` Andi Kleen
2004-09-07 13:45       ` Michael S. Tsirkin
2004-09-07 14:15         ` Andi Kleen
2004-09-07 14:25           ` Michael S. Tsirkin
2004-09-07 14:29             ` Andi Kleen
2004-09-07 14:37               ` Michael S. Tsirkin
2004-09-07 14:44                 ` Andi Kleen
2004-09-07 14:45                   ` Michael S. Tsirkin
2004-09-07 15:10                     ` Andi Kleen
2004-09-07 18:16                       ` Michael S. Tsirkin
2004-09-08  6:55                         ` Andi Kleen
2004-09-08 14:28                           ` [patch] " Michael S. Tsirkin
2004-09-08 14:38                             ` Andi Kleen
2004-09-08 14:54                               ` Michael S. Tsirkin
2004-09-08 14:58                                 ` Andi Kleen
2004-09-12 20:05                                   ` Michael S. Tsirkin
2004-09-15 13:19                               ` Michael S. Tsirkin [this message]
2004-09-07 21:36                       ` Is FIOQSIZE compatible? ( was Re: f_ops flag to speed up compatible ioctls in linux kernel) Michael S. Tsirkin
2004-09-08  6:54                         ` Andi Kleen
2004-09-07 15:03               ` [discuss] f_ops flag to speed up compatible ioctls in linux kernel Herbert Poetzl
2004-09-07 18:07                 ` Michael S. Tsirkin
2004-09-09 13:54                   ` Herbert Poetzl
2004-12-12 21:51       ` how to detect a 32 bit process on 64 bit kernel Michael S. Tsirkin
2004-12-12 22:01         ` Jan Engelhardt
2004-12-12 22:23         ` Christoph Hellwig
2004-12-13 19:50           ` Michael S. Tsirkin
2004-12-13 21:01             ` Jan Engelhardt
2004-12-13 21:32             ` Brian Gerst
2004-12-13 21:37               ` Michael S. Tsirkin
2004-12-12 22:37         ` Willy Tarreau
2004-12-12 23:30           ` Bongani Hlope
2004-12-14  7:28         ` Andi Kleen
2004-09-20 14:49   ` [patch] speed up ioctls in linux kernel Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040915131955.GA9274@mellanox.co.il \
    --to=mst@mellanox.co.il \
    --cc=discuss@x86-64.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).