linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-17  9:27 AUDIT: copy_from_user is a deathtrap Rusty Russell
@ 2002-05-17  9:21 ` David S. Miller
  2002-05-17  9:49   ` Rusty Russell
  2002-05-17 10:20 ` Christoph Hellwig
  1 sibling, 1 reply; 83+ messages in thread
From: David S. Miller @ 2002-05-17  9:21 UTC (permalink / raw)
  To: rusty; +Cc: torvalds, linux-kernel

   From: Rusty Russell <rusty@rustcorp.com.au>
   Date: Fri, 17 May 2002 19:27:54 +1000

   There are 415 uses of copy_to/from_user which are wrong, despite an
   audit 12 months ago by the Stanford checker.
   
I would much rather fix these instances than add yet another
interface.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* AUDIT: copy_from_user is a deathtrap.
@ 2002-05-17  9:27 Rusty Russell
  2002-05-17  9:21 ` David S. Miller
  2002-05-17 10:20 ` Christoph Hellwig
  0 siblings, 2 replies; 83+ messages in thread
From: Rusty Russell @ 2002-05-17  9:27 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel

Linus,

	Should I change copy_to/from_user to return -EFAULT, or
introduce a new copy_to/from_uspace which does and start moving
everything across?

There are 5,500 uses of copy_to/from_user in 2.5.15.  52 of them use
the return value in a way which would be broken by it returning
-EFAULT.  51 of those don't need to (mainly cut & paste between serial
drivers).

	/* Returns amount which wasn't copied before EFAULT.  Used by mount. */
	static inline unsigned long
	gradual_copy_from_user(void *to, const void *from, unsigned long n)
	{
		unsigned long i;

		for (i = 0; i < n; i++, to++, from++) {
			if (copy_from_user(from, to, 1) != 0)
				break;
		}

		return n - i;
	}

There are 415 uses of copy_to/from_user which are wrong, despite an
audit 12 months ago by the Stanford checker.

Tired of auditing the same bugs,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-17  9:49   ` Rusty Russell
@ 2002-05-17  9:35     ` David S. Miller
  2002-05-17 12:26       ` Rusty Russell
  2002-05-17 12:17     ` Alan Cox
  2002-05-18  2:37     ` Linus Torvalds
  2 siblings, 1 reply; 83+ messages in thread
From: David S. Miller @ 2002-05-17  9:35 UTC (permalink / raw)
  To: rusty; +Cc: torvalds, linux-kernel

   From: Rusty Russell <rusty@rustcorp.com.au>
   Date: Fri, 17 May 2002 19:49:40 +1000
   
   Sorry I wasn't clear: I'm saying *replace*, not add,

I don't understand what you are proposing then.  There are some
instances that do want to know how many bytes did make it before
the -EFAULT event.

You have to add a new version of the interface to handle both
the case that cares about the length copied successfully and
the case that only cares about -EFAULT vs. !-EFAULT

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-17  9:21 ` David S. Miller
@ 2002-05-17  9:49   ` Rusty Russell
  2002-05-17  9:35     ` David S. Miller
                       ` (2 more replies)
  0 siblings, 3 replies; 83+ messages in thread
From: Rusty Russell @ 2002-05-17  9:49 UTC (permalink / raw)
  To: David S. Miller; +Cc: torvalds, linux-kernel

In message <20020517.022148.48851839.davem@redhat.com> you write:
>    From: Rusty Russell <rusty@rustcorp.com.au>
>    Date: Fri, 17 May 2002 19:27:54 +1000
> 
>    There are 415 uses of copy_to/from_user which are wrong, despite an
>    audit 12 months ago by the Stanford checker.
>    
> I would much rather fix these instances than add yet another
> interface.

I'll accept that if someone's volunteering to audit the kernel for
them every six months.

Sorry I wasn't clear: I'm saying *replace*, not add,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-17  9:27 AUDIT: copy_from_user is a deathtrap Rusty Russell
  2002-05-17  9:21 ` David S. Miller
@ 2002-05-17 10:20 ` Christoph Hellwig
  1 sibling, 0 replies; 83+ messages in thread
From: Christoph Hellwig @ 2002-05-17 10:20 UTC (permalink / raw)
  To: Rusty Russell; +Cc: torvalds, linux-kernel

On Fri, May 17, 2002 at 07:27:54PM +1000, Rusty Russell wrote:
> Linus,
> 
> 	Should I change copy_to/from_user to return -EFAULT, or
> introduce a new copy_to/from_uspace which does and start moving
> everything across?

copyin/copyout! :)


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-17  9:49   ` Rusty Russell
  2002-05-17  9:35     ` David S. Miller
@ 2002-05-17 12:17     ` Alan Cox
  2002-05-17 12:21       ` Rusty Russell
  2002-05-18  2:37     ` Linus Torvalds
  2 siblings, 1 reply; 83+ messages in thread
From: Alan Cox @ 2002-05-17 12:17 UTC (permalink / raw)
  To: Rusty Russell; +Cc: David S. Miller, torvalds, linux-kernel

> > I would much rather fix these instances than add yet another
> > interface.
> 
> I'll accept that if someone's volunteering to audit the kernel for
> them every six months.
> 
> Sorry I wasn't clear: I'm saying *replace*, not add,

Replace requires you audit every single use, and then work out how to
handle those that do care about the length and the point it faulted. From
what I've seen of the stuff that has been fixed we have a mix of the
following

1.	Misports of ancient verify_* code - eg the serial ones
2.	Not checking the return code - 100% legal and standards compliant

I've seen very few that have other screwups. In fact I've seen far more
incorrect uses of kmalloc with a user passed input field, kmalloc with
maths overflows, copy*user with maths overflows and the like

Alan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-17 12:17     ` Alan Cox
@ 2002-05-17 12:21       ` Rusty Russell
  2002-05-17 12:58         ` Alan Cox
  0 siblings, 1 reply; 83+ messages in thread
From: Rusty Russell @ 2002-05-17 12:21 UTC (permalink / raw)
  To: Alan Cox; +Cc: David S. Miller, torvalds, linux-kernel

In message <E178gfl-0006Ip-00@the-village.bc.nu> you write:
> > > I would much rather fix these instances than add yet another
> > > interface.
> > 
> > I'll accept that if someone's volunteering to audit the kernel for
> > them every six months.
> > 
> > Sorry I wasn't clear: I'm saying *replace*, not add,
> 
> Replace requires you audit every single use, and then work out how to
> handle those that do care about the length and the point it faulted.

Read my original post.  I have done this.

> From what I've seen of the stuff that has been fixed we have a mix
> of the following
> 
> 1.	Misports of ancient verify_* code - eg the serial ones
> 2.	Not checking the return code - 100% legal and standards compliant

No, the 400+ are all of form:

	/* of course this returns 0 or -EFAULT! */
	return copy_from_user(xxx);

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-17  9:35     ` David S. Miller
@ 2002-05-17 12:26       ` Rusty Russell
  2002-05-17 17:42         ` Denis Vlasenko
  0 siblings, 1 reply; 83+ messages in thread
From: Rusty Russell @ 2002-05-17 12:26 UTC (permalink / raw)
  To: David S. Miller; +Cc: torvalds, linux-kernel

In message <20020517.023506.105129697.davem@redhat.com> you write:
>    From: Rusty Russell <rusty@rustcorp.com.au>
>    Date: Fri, 17 May 2002 19:49:40 +1000
>    
>    Sorry I wasn't clear: I'm saying *replace*, not add,
> 
> I don't understand what you are proposing then.  There are some
> instances that do want to know how many bytes did make it before
> the -EFAULT event.

Yes.  There are 52 places which care.  Most of these are unneccessary
attempts to return eg. number of bytes written in read call before we
hit the fault, instead of -EFAULT.

The one case I found which obviously needed it was the mount options
code, and I proposed a simple (slow) gradually_copy_from_user for this
case:

	static inline unsigned long
	gradual_copy_from_user(void *to, const void *from, unsigned long n)
	{
		unsigned long i;

		for (i = 0; i < n; i++, to++, from++) {
			if (copy_from_user(from, to, 1) != 0)
				break;
		}
		return n - i;
	}

Here is the list of places in 2.5.15 which actually use the return
values other than zero:

./fs/proc/generic.c:108: 		n -= copy_to_user(buf, start < page ? page : start, n);
./fs/hfs/file.c:263:	i = copy_from_user(data, buf, count);
./fs/hfs/file.c:390:				chars -= copy_to_user(buf, p, chars);
./fs/hfs/file.c:472:			copy_from_user(p, buf, c);
./fs/hfs/file_cap.c:162:		memcount -= copy_to_user(buf, ((char *)&meta) + pos, memcount);
./fs/hfs/file_cap.c:234:		mem_count -= copy_from_user(((char *)&meta) + pos, buf, mem_count);
./fs/hfs/file_hdr.c:422:		left -= copy_to_user(buf, ((char *)&meta) + pos, left);
./fs/hfs/file_hdr.c:592:			left -= copy_to_user(buf, p + offset, left);
./fs/hfs/file_hdr.c:671:		left -= copy_from_user(((char *)&meta) + pos, buf, left);
./fs/hfs/file_hdr.c:703:		left -= copy_from_user(((char *)&meta) + pos, buf, left);
./fs/hfs/file_hdr.c:868:			left -= copy_from_user(p + offset, buf, left);
./fs/namespace.c:670:	i = size - copy_from_user((void *)page, data, size);
./mm/filemap.c:1189:	left = __copy_to_user(desc->buf, kaddr + offset, size);
./drivers/char/pty.c:158:			n -= copy_from_user(temp_buffer, buf, n);
./drivers/char/esp.c:1332:			c -= copy_from_user(tmp_buf, buf, c);
./drivers/char/serial.c:1876:			c -= copy_from_user(tmp_buf, buf, c);
./drivers/char/n_tty.c:922:		retval = copy_to_user(*b, &tty->read_buf[tty->read_tail], n);
./drivers/char/vc_screen.c:251:		ret = copy_to_user(buf, con_buf_start, orig_count);
./drivers/char/vc_screen.c:325:		ret = copy_from_user(con_buf, buf, this_round);
./drivers/char/rocket.c:1606:			c -= copy_from_user(tmp_buf, buf, c);
./drivers/char/rocket.c:1643:			c -= copy_from_user(tmp_buf, buf, c);
./drivers/char/random.c:1362:			i -= copy_to_user(buf, (__u8 const *)tmp, i);
./drivers/char/random.c:1589:		bytes -= copy_from_user(&buf, p, bytes);
./drivers/char/riscom8.c:1247:			c -= copy_from_user(tmp_buf, buf, c);
./drivers/char/specialix.c:1626:			c -= copy_from_user(tmp_buf, buf, c);
./drivers/char/serial_amba.c:888:			c -= copy_from_user(tmp_buf, buf, c);
./drivers/char/dz.c:706:			c -= copy_from_user (tmp_buf, buf, c);
./drivers/char/mxser.c:931:			c -= copy_from_user(mxvar_tmp_buf, buf, c);
./drivers/char/generic_serial.c:237:			c -= copy_from_user(tmp_buf, buf, c);
./drivers/char/serial167.c:1257:		    c -= copy_from_user(tmp_buf, buf, c);
./drivers/char/amiserial.c:958:			c -= copy_from_user(tmp_buf, buf, c);
./drivers/sbus/char/zs.c:1112:			c -= copy_from_user(tmp_buf, buf, c);
./drivers/sbus/char/sab82532.c:1127:			c -= copy_from_user(tmp_buf, buf, c);
./drivers/sbus/char/su.c:1234:			c -= copy_from_user(tmp_buf, buf, c);
./drivers/sbus/char/aurora.c:1627:			c -= copy_from_user(tmp_buf, buf, c);
./drivers/video/fbmem.c:386:	    count -= copy_to_user(buf, base_addr+p, count);
./drivers/video/fbmem.c:422:	    count -= copy_from_user(base_addr+p, buf, count);
./drivers/macintosh/macserial.c:1551:			c -= copy_from_user(tmp_buf, buf, c);
./drivers/s390/char/con3215.c:596:				c -= copy_from_user(raw->buffer + raw->head,
./drivers/s390/char/tuball.c:329:				len2 -= copy_from_user(ob->bc_buf + ob->bc_wr,
./arch/i386/kernel/vm86.c:79:	tmp = copy_to_user(&current->thread.vm86_info->regs,regs, VM86_REGS_SIZE1);
./arch/i386/kernel/vm86.c:80:	tmp += copy_to_user(&current->thread.vm86_info->regs.VM86_REGS_PART2,
./arch/i386/kernel/vm86.c:147:	tmp  = copy_from_user(&info, v86, VM86_REGS_SIZE1);
./arch/i386/kernel/vm86.c:148:	tmp += copy_from_user(&info.regs.VM86_REGS_PART2, &v86->regs.VM86_REGS_PART2,
./arch/i386/kernel/vm86.c:148:	tmp += copy_from_user(&info.regs.VM86_REGS_PART2, &v86->regs.VM86_REGS_PART2,
./arch/i386/kernel/vm86.c:195:	tmp  = copy_from_user(&info, v86, VM86_REGS_SIZE1);
./arch/i386/kernel/vm86.c:196:	tmp += copy_from_user(&info.regs.VM86_REGS_PART2, &v86->regs.VM86_REGS_PART2,
./arch/mips/baget/vacserial.c:1085:			c -= copy_from_user(tmp_buf, buf, c);
./arch/mips/au1000/common/serial.c:1212:			c -= copy_from_user(tmp_buf, buf, c);
./arch/ppc/4xx_io/serial_sicc.c:988:            c -= copy_from_user(tmp_buf, buf, c);
./arch/ia64/hp/sim/simserial.c:328:			c -= copy_from_user(tmp_buf, buf, c);
./arch/cris/drivers/serial.c:2355:			c -= copy_from_user(tmp_buf, buf, c);
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-17 12:58         ` Alan Cox
@ 2002-05-17 12:58           ` Rusty Russell
  2002-05-17 13:13             ` John Levon
                               ` (2 more replies)
  0 siblings, 3 replies; 83+ messages in thread
From: Rusty Russell @ 2002-05-17 12:58 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, torvalds

In message <E178hJt-0006Rb-00@the-village.bc.nu> you write:
> > No, the 400+ are all of form:
> > 
> > 	/* of course this returns 0 or -EFAULT! */
> > 	return copy_from_user(xxx);
> 
> So lets verify and fix them. Post the list to the kenrel janitors

Again, like we did 12 months ago you mean?

We could do that, or, we could fix the actual problem, which is the
HUGE FUCKING BEARTRAP WHICH CATCHES EVERY SINGLE NEW PROGRAMMER ON THE
WAY THROUGH.

Not fixing earlier was criminal, not fixing it today is insane.
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-17 12:21       ` Rusty Russell
@ 2002-05-17 12:58         ` Alan Cox
  2002-05-17 12:58           ` Rusty Russell
  0 siblings, 1 reply; 83+ messages in thread
From: Alan Cox @ 2002-05-17 12:58 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Alan Cox, David S. Miller, torvalds, linux-kernel

> No, the 400+ are all of form:
> 
> 	/* of course this returns 0 or -EFAULT! */
> 	return copy_from_user(xxx);

So lets verify and fix them. Post the list to the kenrel janitors

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-17 12:58           ` Rusty Russell
@ 2002-05-17 13:13             ` John Levon
  2002-05-17 14:52             ` Alan Cox
  2002-05-17 17:58             ` Denis Vlasenko
  2 siblings, 0 replies; 83+ messages in thread
From: John Levon @ 2002-05-17 13:13 UTC (permalink / raw)
  To: linux-kernel

On Fri, May 17, 2002 at 10:58:08PM +1000, Rusty Russell wrote:

> Again, like we did 12 months ago you mean?

Adding some comments on the interface to uaccess.h might help. It's way
more convenient than looking it up in your guide

regards
john
-- 
"It is very difficult to prophesy, especially when it pertains to the
 future."
 	- Patrick Kurzawe

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-17 12:58           ` Rusty Russell
  2002-05-17 13:13             ` John Levon
@ 2002-05-17 14:52             ` Alan Cox
  2002-05-18  1:26               ` Rusty Russell
  2002-05-17 17:58             ` Denis Vlasenko
  2 siblings, 1 reply; 83+ messages in thread
From: Alan Cox @ 2002-05-17 14:52 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Alan Cox, linux-kernel, torvalds

> Again, like we did 12 months ago you mean?

We didnt fix them 12 months ago

> We could do that, or, we could fix the actual problem, which is the
> HUGE FUCKING BEARTRAP WHICH CATCHES EVERY SINGLE NEW PROGRAMMER ON THE
> WAY THROUGH.

Capital letters versus content. I'd prefer content

All the cases I looked at where replications of existing bugs copied from
old drivers. That doesn't say copy_*_user is wrong, it says lots of examples
people keep using are wrong


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-17 12:26       ` Rusty Russell
@ 2002-05-17 17:42         ` Denis Vlasenko
  0 siblings, 0 replies; 83+ messages in thread
From: Denis Vlasenko @ 2002-05-17 17:42 UTC (permalink / raw)
  To: Rusty Russell, David S. Miller; +Cc: linux-kernel

On 17 May 2002 10:26, Rusty Russell wrote:
> >
> >    Sorry I wasn't clear: I'm saying *replace*, not add,
> >
> > I don't understand what you are proposing then.  There are some
> > instances that do want to know how many bytes did make it before
> > the -EFAULT event.
>
> Yes.  There are 52 places which care.  Most of these are unneccessary
> attempts to return eg. number of bytes written in read call before we
> hit the fault, instead of -EFAULT.
>
> The one case I found which obviously needed it was the mount options
> code, and I proposed a simple (slow) gradually_copy_from_user for this
> case:
>
> 	static inline unsigned long
> 	gradual_copy_from_user(void *to, const void *from, unsigned long n)
> 	{
> 		unsigned long i;
>
> 		for (i = 0; i < n; i++, to++, from++) {
> 			if (copy_from_user(from, to, 1) != 0)
> 				break;
> 		}
> 		return n - i;
> 	}

It looks simple to my uncomplicated mind to make
copy_{to,from}_user() return number of bytes copied.
If they != bytes requested, there was -EFAULT.

This can be easily wrapped by inline which returns only
0 on success or -EFAULT on failure.

Then use appropriate version for each case.
--
vda

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-17 12:58           ` Rusty Russell
  2002-05-17 13:13             ` John Levon
  2002-05-17 14:52             ` Alan Cox
@ 2002-05-17 17:58             ` Denis Vlasenko
  2 siblings, 0 replies; 83+ messages in thread
From: Denis Vlasenko @ 2002-05-17 17:58 UTC (permalink / raw)
  To: Rusty Russell, Alan Cox; +Cc: linux-kernel

On 17 May 2002 10:58, Rusty Russell wrote:
> > > 	/* of course this returns 0 or -EFAULT! */
> > > 	return copy_from_user(xxx);
> >
> > So lets verify and fix them. Post the list to the kenrel janitors
>
> Again, like we did 12 months ago you mean?
>
> We could do that, or, we could fix the actual problem, which is the
> HUGE FUCKING BEARTRAP WHICH CATCHES EVERY SINGLE NEW PROGRAMMER ON THE
> WAY THROUGH.

Looks like it is waiting for me yet (if I'll ever do something useful
for lk).

> Not fixing earlier was criminal, not fixing it today is insane.

What's the problem? People don't understand what copy_x_user() returns
and how to check return value properly?

Maybe good function name(s) will help?

copy_{from,to}_user_and_count()

> There are 415 uses of copy_to/from_user which are wrong, despite an
> audit 12 months ago by the Stanford checker.

What are typical errors?
--
vda

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-17 14:52             ` Alan Cox
@ 2002-05-18  1:26               ` Rusty Russell
  0 siblings, 0 replies; 83+ messages in thread
From: Rusty Russell @ 2002-05-18  1:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, torvalds

In message <E178j5g-0006en-00@the-village.bc.nu> you write:
> > We could do that, or, we could fix the actual problem, which is the
> > HUGE FUCKING BEARTRAP WHICH CATCHES EVERY SINGLE NEW PROGRAMMER ON THE
> > WAY THROUGH.
> 
> Capital letters versus content. I'd prefer content

1) Returning 0 on success, and -errno on error is a common kernel
   convention.

2) Following kernel conventions makes it easier for other programmers
   to use your code.

3) You should only violate kernel conventions when there is a
   compelling reason.

	1a) If you're going to break a convention, do it in a way that
	    breaks compile, or 
	1b) If you can't do that, make it reliably break at runtime.

4) The single case which requires this information can be fixed by a
   simple 10-line wrapper function.

I do not believe this is a compelling reason to violate kernel
convention in a way which is almost impossible to notice.  I furthur
believe that it speaks very poorly about the thought put into kernel
interface design.

> All the cases I looked at where replications of existing bugs copied from
> old drivers.

Try looking at intermezzo, or the s390 and s390x ports.  New code, new
coders, same trap.

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-17  9:49   ` Rusty Russell
  2002-05-17  9:35     ` David S. Miller
  2002-05-17 12:17     ` Alan Cox
@ 2002-05-18  2:37     ` Linus Torvalds
  2002-05-18 15:06       ` John Alvord
  2 siblings, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2002-05-18  2:37 UTC (permalink / raw)
  To: Rusty Russell; +Cc: David S. Miller, linux-kernel



On Fri, 17 May 2002, Rusty Russell wrote:
>
> Sorry I wasn't clear: I'm saying *replace*, not add,

Ok, let _me_ be clear: replacing them with an inferior product that cannot
tell you partial copies is not going to happen. Not now, not ever. You
would break all the users who _require_ knowing about a read() that only
gave you 5 out of 50 bytes.

		Linus


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-18  2:37     ` Linus Torvalds
@ 2002-05-18 15:06       ` John Alvord
  0 siblings, 0 replies; 83+ messages in thread
From: John Alvord @ 2002-05-18 15:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rusty Russell, David S. Miller, linux-kernel

Maybe what is needed is a copy_from_user debug mode, like slab poisoning.

john alvord

On Fri, 17 May 2002, Linus Torvalds wrote:

> 
> 
> On Fri, 17 May 2002, Rusty Russell wrote:
> >
> > Sorry I wasn't clear: I'm saying *replace*, not add,
> 
> Ok, let _me_ be clear: replacing them with an inferior product that cannot
> tell you partial copies is not going to happen. Not now, not ever. You
> would break all the users who _require_ knowing about a read() that only
> gave you 5 out of 50 bytes.
> 
> 		Linus
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-23  3:54                                   ` Rusty Russell
@ 2002-05-23 11:15                                     ` Edgar Toernig
  0 siblings, 0 replies; 83+ messages in thread
From: Edgar Toernig @ 2002-05-23 11:15 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel

Rusty Russell wrote:
> 
> OS      Empty file      6 byte file             Pipe
>         Return  Size    Return  Size    Valid   Return  Size
> 
> AIX     EFAULT  99P     EFAULT  99P+6   99P+6   EFAULT  97P
> 
> Linux   99P     100P    99P-6   100P    99P-2   99P     99P
> (x86)
> 
> Solaris 98P     98P     99P-6   99P     99P     EFAULT  98.75P
> 

How could you miss this one - the only one that gets it right? ;-)

Linux2.0  EFAULT  0       EFAULT  6       0       EFAULT  0

SCNR, ET.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-22 14:54                                 ` Alan Cox
  2002-05-22 14:42                                   ` Pavel Machek
  2002-05-22 18:58                                   ` Kasper Dupont
@ 2002-05-23  3:54                                   ` Rusty Russell
  2002-05-23 11:15                                     ` Edgar Toernig
  2 siblings, 1 reply; 83+ messages in thread
From: Rusty Russell @ 2002-05-23  3:54 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, Andrew Morton, Rusty Russell, linux-kernel

In message <E17AXVZ-0001up-00@the-village.bc.nu> you write:
> What it seems to say is that it if an error
> is reported then no data got written down the actual pipe itself. Putting
> 4K into the pipe then reporting Esomething is not allowed. Copying 4K into
> a buffer faulting and erroring with Efoo then throwing away the buffer is
> allowed

Hmmm... then noone is compliant AFAICT.  Test program attached, which
mprotects 100th page and tries to write 100 pages (interestingly, most
OS's optimize writes to /dev/null, and always "succeed"):

OS	Empty file	6 byte file		Pipe
	Return	Size	Return	Size	Valid	Return	Size

AIX	EFAULT	99P	EFAULT	99P+6	99P+6	EFAULT	97P

Linux	99P	100P	99P-6	100P	99P-2	99P	99P
(x86)

Solaris 98P	98P	99P-6	99P	99P	EFAULT	98.75P

Key:	Return = return value or error code if -1
	Size = resulting file size
	Valid = bytes written which were actually those requested
	P = * PAGE_SIZE

Summary: this is undefined behaviour, so I believe that we should do
the simplest thing possible inside the kernel.  I believe the simplest
thing we can do is have the trap handler deliver SIGSEGV to the
process, zero fill the region, and always return "success" to the
caller.  None of the callers need then care.

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

/* Test for write from partially unmapped area.
   Aligned output:	./test-write > /tmp/out
   Unaligned output:	echo hello > /tmp/out && ./test-write >> /tmp/out
			(Note: check output with od -x /tmp/out)
   Pipe:		./test-write | cat > /tmp/out
*/
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <string.h>
#include <errno.h>
#include <limits.h>
#include <stdlib.h>
#include <stdio.h>

#define ALIGN(x,a) (((x)+(a)-1)&~((a)-1))

int main()
{
	int writeret;
	char *pages;
	long pagesize;

	pagesize = sysconf(_SC_PAGESIZE);
	fprintf(stderr, "Pagesize is %li\n", pagesize);

	pages = malloc(pagesize * 101);
	pages = (char *)ALIGN((unsigned long)pages, pagesize);
	memset(pages, 'A', pagesize*100);

	if (mprotect(pages + pagesize*99, pagesize, PROT_NONE) != 0) {
		perror("mprotect");
		exit(1);
	}

	writeret = write(STDOUT_FILENO, pages, pagesize*100);
	fprintf(stderr, "Write returned %i (%s)\n",
		writeret, writeret < 0 ? strerror(errno) : "no error");
	return 0;
}

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-22 18:58                                   ` Kasper Dupont
@ 2002-05-22 22:02                                     ` Alan Cox
  0 siblings, 0 replies; 83+ messages in thread
From: Alan Cox @ 2002-05-22 22:02 UTC (permalink / raw)
  To: Kasper Dupont; +Cc: linux-kernel

> write might be the easy case. But what about read?
> Is a failing read allowed to change the userspace
> memory?

Dave actually tried this with the UDP receive code. The answer appears to be
yes but not everything likes it when it occurs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-22 16:09                                 ` Linus Torvalds
@ 2002-05-22 20:28                                   ` Pavel Machek
  0 siblings, 0 replies; 83+ messages in thread
From: Pavel Machek @ 2002-05-22 20:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alan Cox, Andrew Morton, Rusty Russell, linux-kernel

Hi!

> > > Its festering quietly in the glibc source tree where all large and
> > > dubiously justifiable hacks seem to end up
> >
> > In such case, linus, here is your "reasonable" example. For PPro, it
> > is faster to copy out-of-order, and if we wanted to use that for
> > copy_to_user, you'd have your example.
> 
> Hey, you didn't read Alan's email.
> 
> The fact that glibc may use it is a strike _against_ your example.
> 
> Have you looked at glibc performance lately?

:-).
								Pavel
-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-22 13:40 Petr Vandrovec
@ 2002-05-22 18:58 ` Denis Vlasenko
  2002-05-22 14:13   ` Ruth Ivimey-Cook
  0 siblings, 1 reply; 83+ messages in thread
From: Denis Vlasenko @ 2002-05-22 18:58 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: linux-kernel, acme

On 22 May 2002 11:40, Petr Vandrovec wrote:
> Function name is not documentation. Documentation documents function
> API, or, in case documentation is not available, source does it.
> copy_to_user_dude_I_return_uncopied_length_on_error_not_EFAULT_parameters_
>are_same_as_for_memcpy() is unacceptable name, and anything shorter does not
> document things you must know.

Anything can be taken to the ridiculous extreme: "let's use ci() and co()!
We will document 'em and everything will be fine!"

Let's abstain from this type of discussion.

> I'm not sure that I want to use kernel which contains code written
> by people who do not read API documentation. I expect that everyone
> here tests every branch in code he writes at least once - and such test
> would (f.e.) quickly reveal that read(fd, NULL, 1000) does not return -1 &
> set errno to EFAULT, but instead returns complete success (1000) on
> affected sound drivers.

In the ideal world. We're in the real world and there *is* broken code,
as demonstrated by Rusty. If we can reduce chances of programming errors by 
choosing good names, we have to do it.

Which name is 'good'/too short/too long is up to "big boys" to decide.
I presume they know better, I don't hack on kernel day and night. They do.
--
vda

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-22 14:54                                 ` Alan Cox
  2002-05-22 14:42                                   ` Pavel Machek
@ 2002-05-22 18:58                                   ` Kasper Dupont
  2002-05-22 22:02                                     ` Alan Cox
  2002-05-23  3:54                                   ` Rusty Russell
  2 siblings, 1 reply; 83+ messages in thread
From: Kasper Dupont @ 2002-05-22 18:58 UTC (permalink / raw)
  To: linux-kernel

Alan Cox wrote:
> 
> > In such case, linus, here is your "reasonable" example. For PPro, it
> > is faster to copy out-of-order, and if we wanted to use that for
> > copy_to_user, you'd have your example.
> 
> I think there is a misunderstanding here.
> 
> Nothing in the standards says that
> 
>         write(pipe_fd, halfmappedbuffer, 2*PAGE_SIZE)
> 
> must return PAGE_SIZE on an error. What it seems to say is that it if an error
> is reported then no data got written down the actual pipe itself. Putting
> 4K into the pipe then reporting Esomething is not allowed. Copying 4K into
> a buffer faulting and erroring with Efoo then throwing away the buffer is
> allowed

write might be the easy case. But what about read?
Is a failing read allowed to change the userspace
memory?

-- 
Kasper Dupont -- der bruger for meget tid på usenet.
For sending spam use mailto:razor-report@daimi.au.dk

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-21 21:17                   ` Pavel Machek
  2002-05-21 21:25                     ` Linus Torvalds
  2002-05-21 21:44                     ` Alan Cox
@ 2002-05-22 18:43                     ` Marco Colombo
  2 siblings, 0 replies; 83+ messages in thread
From: Marco Colombo @ 2002-05-22 18:43 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel

[Cc: cleared]

On Tue, 21 May 2002, Pavel Machek wrote:

> Hi!
> 
> > > I thought POSIX made it explicit that you may SIGSEGV or EFAULT at your
> > > option. If that SUS rule is stupid, we should just drop it.
> > > 
> > > Performance advantage from MMX-copy-to-user is probably well worth it.
> > 
> > Stop this STUPID "it speeds things up" argument.
> > 
> > It's not TRUE.
> > 
> > We still have to do exactly the same things we do right now, because even
> > if we SIGSEGV we still have to return the right number of bytes
> > read/written. 
> > 
> > SIGSEGV doesn't mean that the system call wouldn't complete. It removes 
> > _none_ of the kernel fixup handling, because the SIGSEGV won't be 
> > delivered until we return to user mode anyway. So please stop mixing these 
> > two issues up.
> 
> If you pass bad pointer to memcpy(), you don't expect any reasonable
> return value, right?
> 
> So if you pass bad pointer to read(), why would you expect "number of
> bytes read" return? Its true that kernel can't simply not return
^^^^^^^^^^^^^^^^^^^^

Because read() may *consume* its input, while memcpy() it's only a copy and
leaves the source intact (if you care not to overlap src and dst).
If you're read()ing from a serial line with 1-byte fifo you want to
know how many chars you read successfully in the first place, since
once read, they're gone. SIGSEGVing won't help in re-reading the chars.

> anything, but giving SIGSEGV and returning -EFAULT seems pretty
> reasonable to me. If they really want to, they might extract number of
> bytes read from address SIGSEGV occured at [but that's dirty hack, and
> people will hopefully realise that and not rely on it].
> 								Pavel

.TM.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-22 10:08 Petr Vandrovec
@ 2002-05-22 16:23 ` Denis Vlasenko
  0 siblings, 0 replies; 83+ messages in thread
From: Denis Vlasenko @ 2002-05-22 16:23 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: Pete Zaitcev, linux-kernel, acme

> On 22 May 02 at 12:27, Denis Vlasenko wrote:
> > > As Linus and others pointed out, copy_{to_from}_user has its uses and
> > > will stay, but something like:
> >
> > I don't say 'kill it', I say 'rename it so that its name tells users what
> > return value to expect'. However, one have to weigh
>
> Why?

Why what? Why rename copy_to_user? Because in its current form people
misunderstand its return value and misuse it.
We can keep unmodified version of copy_to_user for some time for
compatibility.

Or maybe your "why?" is related to something else, I fail
to understand you in that case.

> From copyin/out descriptions sent yesterday if you want same source code
> running on all (BSD,SVR4,OSF/1) platforms, you must do
>
> if (copyin()) return [-]EFAULT;

But if I am new to Linux and just want to write my first piece of kernel
code, copyout() is even worse than copy_to_user(): 
it too lacks info of what it can return (0/1, 0/-EFAULT, # of copied bytes,
# of bytes remaining?) *and* copy direction become unclear:
copy out of *what*? out of kernel memery? out of user memory?
--
vda

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-22 14:13                               ` Pavel Machek
  2002-05-22 14:54                                 ` Alan Cox
@ 2002-05-22 16:09                                 ` Linus Torvalds
  2002-05-22 20:28                                   ` Pavel Machek
  1 sibling, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2002-05-22 16:09 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Alan Cox, Andrew Morton, Rusty Russell, linux-kernel



On Wed, 22 May 2002, Pavel Machek wrote:
> >
> > Its festering quietly in the glibc source tree where all large and
> > dubiously justifiable hacks seem to end up
>
> In such case, linus, here is your "reasonable" example. For PPro, it
> is faster to copy out-of-order, and if we wanted to use that for
> copy_to_user, you'd have your example.

Hey, you didn't read Alan's email.

The fact that glibc may use it is a strike _against_ your example.

Have you looked at glibc performance lately?

		Linus


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-22 14:42                                   ` Pavel Machek
@ 2002-05-22 15:27                                     ` Alan Cox
  0 siblings, 0 replies; 83+ messages in thread
From: Alan Cox @ 2002-05-22 15:27 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Alan Cox, Linus Torvalds, Andrew Morton, Rusty Russell, linux-kernel

> > must return PAGE_SIZE on an error. What it seems to say is that it if an error
> > is reported then no data got written down the actual pipe itself. Putting
> > 4K into the pipe then reporting Esomething is not allowed. Copying 4K into
> > a buffer faulting and erroring with Efoo then throwing away the buffer is
> > allowed
> 
> So... Is copy_to_user allowed to copy more than it actually reported?
> Because if so, it might return 0/-EFAULT as well ;-).

The specification defines the API not the implementation. It only matters if
the user can't tell. 0/-EFAULT would be wrong as that is EOF. 0/0 or
-1/Efoo.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-22 14:13                               ` Pavel Machek
@ 2002-05-22 14:54                                 ` Alan Cox
  2002-05-22 14:42                                   ` Pavel Machek
                                                     ` (2 more replies)
  2002-05-22 16:09                                 ` Linus Torvalds
  1 sibling, 3 replies; 83+ messages in thread
From: Alan Cox @ 2002-05-22 14:54 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Alan Cox, Linus Torvalds, Andrew Morton, Rusty Russell, linux-kernel

> In such case, linus, here is your "reasonable" example. For PPro, it
> is faster to copy out-of-order, and if we wanted to use that for
> copy_to_user, you'd have your example.

I think there is a misunderstanding here.

Nothing in the standards says that

	write(pipe_fd, halfmappedbuffer, 2*PAGE_SIZE)


must return PAGE_SIZE on an error. What it seems to say is that it if an error
is reported then no data got written down the actual pipe itself. Putting
4K into the pipe then reporting Esomething is not allowed. Copying 4K into
a buffer faulting and erroring with Efoo then throwing away the buffer is
allowed

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-22 14:54                                 ` Alan Cox
@ 2002-05-22 14:42                                   ` Pavel Machek
  2002-05-22 15:27                                     ` Alan Cox
  2002-05-22 18:58                                   ` Kasper Dupont
  2002-05-23  3:54                                   ` Rusty Russell
  2 siblings, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2002-05-22 14:42 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, Andrew Morton, Rusty Russell, linux-kernel

Hi!

> > In such case, linus, here is your "reasonable" example. For PPro, it
> > is faster to copy out-of-order, and if we wanted to use that for
> > copy_to_user, you'd have your example.
> 
> I think there is a misunderstanding here.
> 
> Nothing in the standards says that
> 
> 	write(pipe_fd, halfmappedbuffer, 2*PAGE_SIZE)
> 
> 
> must return PAGE_SIZE on an error. What it seems to say is that it if an error
> is reported then no data got written down the actual pipe itself. Putting
> 4K into the pipe then reporting Esomething is not allowed. Copying 4K into
> a buffer faulting and erroring with Efoo then throwing away the buffer is
> allowed

So... Is copy_to_user allowed to copy more than it actually reported?
Because if so, it might return 0/-EFAULT as well ;-).

									Pavel
-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-21  6:21       ` Arnaldo Carvalho de Melo
  2002-05-21  8:33         ` Christoph Hellwig
@ 2002-05-22 14:27         ` Denis Vlasenko
  1 sibling, 0 replies; 83+ messages in thread
From: Denis Vlasenko @ 2002-05-22 14:27 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Linus Torvalds; +Cc: Pete Zaitcev, linux-kernel

On 21 May 2002 04:21, Arnaldo Carvalho de Melo wrote:
> > Oh. Do you think a pair of
> >
> > copy_to_user_or_EINVAL(...)
> > copy_to_user_return_residue(...)
> >
> > will help avoid such bugs?
> > It is possible to audit kernel once, move it to new functions
> > and deprecate/delete old one.
>
> As Linus and others pointed out, copy_{to_from}_user has its uses and will
> stay, but something like:

I don't say 'kill it', I say 'rename it so that its name tells users what
return value to expect'. However, one have to weigh 
long_but_easy_to_understand_name() versus cryptcnshrt() here. :-)

I usually vote for long_but_easy_to_understand_name(), but it's MHO only.

> #define copyin(...) (copy_from_user(...) ? -EFAULT : 0)
> #define copyout(...) (copy_to_user(...) ? -EFAULT : 0)

This falls in cryptcnshrt() category.
Will "new programmer" grasp form the name alone that it returns EFAULT?
/me in doubt. OTOH BSD folks may be happy.
--
vda

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-22 13:47                             ` Alan Cox
@ 2002-05-22 14:13                               ` Pavel Machek
  2002-05-22 14:54                                 ` Alan Cox
  2002-05-22 16:09                                 ` Linus Torvalds
  0 siblings, 2 replies; 83+ messages in thread
From: Pavel Machek @ 2002-05-22 14:13 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, Andrew Morton, Rusty Russell, linux-kernel

Hi!

> > I believe I seen some strange memcpy for PPro (or something that
> > obscure) that done out-of-order accesses to trigger prefetch logic. I
> > can't find it any more, so I can't be sure...
> 
> Its festering quietly in the glibc source tree where all large and 
> dubiously justifiable hacks seem to end up

In such case, linus, here is your "reasonable" example. For PPro, it
is faster to copy out-of-order, and if we wanted to use that for
copy_to_user, you'd have your example.
								Pavel
-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-22 18:58 ` Denis Vlasenko
@ 2002-05-22 14:13   ` Ruth Ivimey-Cook
  0 siblings, 0 replies; 83+ messages in thread
From: Ruth Ivimey-Cook @ 2002-05-22 14:13 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: Petr Vandrovec, linux-kernel, acme

On Wed, 22 May 2002, Denis Vlasenko wrote:
>Which name is 'good'/too short/too long is up to "big boys" to decide.
>I presume they know better, I don't hack on kernel day and night. They do.

Actually, in this case I would say it's the occasional hacker who is more 
likely to need good names, and consequently better placed to suggest them; the 
big boys presumably know this stuff so well it is a non-issue to them.

That said, I agree with Linus that copy_from_user et al is better than copyin, 
for reasons expressed earlier.

My work as a technical writer often results in comments like "I didn't 
understand that" or "no, it's not like that", when in fact the words are 
correct. I still take such reports seriously, though, as the point of writing
English is to convey meaning from you to someone's head, not to the paper.

All I can suggest about naming function calls is that consistency is generally 
better than point solutions.

Ruth

P.S. Might it be possible to avoid misuse of copy_ return values by changing 
the return type in some way?


-- 
Ruth Ivimey-Cook
Software engineer and technical writer.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-21 22:21                           ` Pavel Machek
@ 2002-05-22 13:47                             ` Alan Cox
  2002-05-22 14:13                               ` Pavel Machek
  0 siblings, 1 reply; 83+ messages in thread
From: Alan Cox @ 2002-05-22 13:47 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Andrew Morton, Alan Cox, Pavel Machek,
	Rusty Russell, linux-kernel

> I believe I seen some strange memcpy for PPro (or something that
> obscure) that done out-of-order accesses to trigger prefetch logic. I
> can't find it any more, so I can't be sure...

Its festering quietly in the glibc source tree where all large and 
dubiously justifiable hacks seem to end up


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
@ 2002-05-22 13:40 Petr Vandrovec
  2002-05-22 18:58 ` Denis Vlasenko
  0 siblings, 1 reply; 83+ messages in thread
From: Petr Vandrovec @ 2002-05-22 13:40 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: linux-kernel, acme

[trimmed cc a bit]

On 22 May 02 at 14:23, Denis Vlasenko wrote:
> > On 22 May 02 at 12:27, Denis Vlasenko wrote:
> > > > As Linus and others pointed out, copy_{to_from}_user has its uses and
> > > > will stay, but something like:
> > >
> > > I don't say 'kill it', I say 'rename it so that its name tells users what
> > > return value to expect'. However, one have to weigh
> >
> > Why?
> 
> Why what? Why rename copy_to_user? Because in its current form people

Why rename it. 

> misunderstand its return value and misuse it.
> We can keep unmodified version of copy_to_user for some time for
> compatibility.

Function name is not documentation. Documentation documents function
API, or, in case documentation is not available, source does it.
copy_to_user_dude_I_return_uncopied_length_on_error_not_EFAULT_parameters_are_same_as_for_memcpy() 
is unacceptable name, and anything shorter does not document things 
you must know.

> > From copyin/out descriptions sent yesterday if you want same source code
> > running on all (BSD,SVR4,OSF/1) platforms, you must do
> >
> > if (copyin()) return [-]EFAULT;
> 
> But if I am new to Linux and just want to write my first piece of kernel
> code, copyout() is even worse than copy_to_user(): 
> it too lacks info of what it can return (0/1, 0/-EFAULT, # of copied bytes,

Hey, please change sys_read(). I'm used that read() returns -1 on error,
but now in kernel it returns some strange negative value. Please fix it.

I'm not sure that I want to use kernel which contains code written
by people who do not read API documentation. I expect that everyone 
here tests every branch in code he writes at least once - and such test
would (f.e.) quickly reveal that read(fd, NULL, 1000) does not return -1 &
set errno to EFAULT, but instead returns complete success (1000) on
affected sound drivers.

No. copy_*_user interface is documented since 2.1.0, only bad old
verify_area() in older kernels returned -EFAULT on error, and I do
not believe that coders who coded for 2.0.x did not notice completely
different interface (copy_from_user instead of verify_area + memcpy_fromfs)
during last almost 6 years.
                                            Best regards,
                                                Petr Vandrovec
                                                vandrove@vc.cvut.cz
                                                

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-22  4:57                       ` Rusty Russell
@ 2002-05-22 13:30                         ` Alan Cox
  0 siblings, 0 replies; 83+ messages in thread
From: Alan Cox @ 2002-05-22 13:30 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Alan Cox, pavel, linux-kernel

> > Because the standard says either you return the errorcode and no data
> > is transferred or for a partial I/O you return how much was done.
> 
> Hmm... I can't find anything like that in SuSv2: can you give a reference?
> And we're already violating that for the write() case.

The we should fix the bug. Its explicitly covered in SuSv3 because that
incorporates the posix text. Its also discussed in things like 1003.1g
drafts in great detail (sockets have very explicit ordering for error
reporting)

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
@ 2002-05-22 10:08 Petr Vandrovec
  2002-05-22 16:23 ` Denis Vlasenko
  0 siblings, 1 reply; 83+ messages in thread
From: Petr Vandrovec @ 2002-05-22 10:08 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: Pete Zaitcev, linux-kernel, acme

On 22 May 02 at 12:27, Denis Vlasenko wrote:

> > As Linus and others pointed out, copy_{to_from}_user has its uses and will
> > stay, but something like:
> 
> I don't say 'kill it', I say 'rename it so that its name tells users what
> return value to expect'. However, one have to weigh 

Why? OSF/1's copyin/copyout returns exactly same value which
our current copy_{to,from}_user does. You should not penalize
developers who read documentation.

> I usually vote for long_but_easy_to_understand_name(), but it's MHO only.
> 
> > #define copyin(...) (copy_from_user(...) ? -EFAULT : 0)
> > #define copyout(...) (copy_to_user(...) ? -EFAULT : 0)
> 
> This falls in cryptcnshrt() category.
> Will "new programmer" grasp form the name alone that it returns EFAULT?
> /me in doubt. OTOH BSD folks may be happy.

>From copyin/out descriptions sent yesterday if you want same source code 
running on all (BSD,SVR4,OSF/1) platforms, you must do

if (copyin()) return [-]EFAULT;

anyway, otherwise OSF/1 and SVR4 variants are wrong.
                                                    Petr Vandrovec
                                                    vandrove@vc.cvut.cz


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-21 21:46                       ` Andrew Morton
                                           ` (2 preceding siblings ...)
  2002-05-22  5:01                         ` Rusty Russell
@ 2002-05-22  6:28                         ` Rusty Russell
  3 siblings, 0 replies; 83+ messages in thread
From: Rusty Russell @ 2002-05-22  6:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: pavel, linux-kernel

On Tue, 21 May 2002 14:46:08 -0700
Andrew Morton <akpm@zip.com.au> wrote:
> Is it safe to stick a nose in here with some irrelevancies?

Well, if you can do it, I guess I can too...

Someone who knows x86 asm better than I can change everything in uaccess.h
to always return 0, like so:

#define copy_from_user(to, from, n) 				\
({								\
	/* Returns non-zero only if hit fault handler */	\
	if (general_copy_user((to), (from), (n)))		\
		memset(to, 0, n);				\
	0;							\
})

So the fault handler sends a SIGSEGV, and makes generic_copy_user return 1
immediately (if you were ambitious, it could do the memset too).

If someone wants to write the asm for that, I'll fix the SEGV delivery code
and mount option reading (two places where we must not deliver a signal),
and we can see what it looks like.

Cheers,
Rusty.
-- 
   there are those who do and those who hang on and you don't see too
   many doers quoting their contemporaries.  -- Larry McVoy

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-21 21:46                       ` Andrew Morton
  2002-05-21 22:04                         ` Linus Torvalds
  2002-05-22  0:47                         ` Andrea Arcangeli
@ 2002-05-22  5:01                         ` Rusty Russell
  2002-05-22  6:28                         ` Rusty Russell
  3 siblings, 0 replies; 83+ messages in thread
From: Rusty Russell @ 2002-05-22  5:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: alan, pavel, linux-kernel, paulus

On Tue, 21 May 2002 14:46:08 -0700
Andrew Morton <akpm@zip.com.au> wrote:
> Pavel makes a reasonable point that copy_*_user may elect
> to copy the data in something other than strictly ascending
> user virtual addresses.  In which case it's not possible to return
> a sane "how much was copied" number.

If I understand Paulus correctly, PPC64 could share their optimized memcpy
routine (ie. icache win), from which it is really hard to tell how far we got
before we faulted.

You then do the fixup search on the link register (ie. to find the caller).

Rusty.
-- 
   there are those who do and those who hang on and you don't see too
   many doers quoting their contemporaries.  -- Larry McVoy

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-21 21:44                     ` Alan Cox
  2002-05-21 21:46                       ` Andrew Morton
@ 2002-05-22  4:57                       ` Rusty Russell
  2002-05-22 13:30                         ` Alan Cox
  1 sibling, 1 reply; 83+ messages in thread
From: Rusty Russell @ 2002-05-22  4:57 UTC (permalink / raw)
  To: Alan Cox; +Cc: pavel, linux-kernel

On Tue, 21 May 2002 22:44:42 +0100 (BST)
Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> > So if you pass bad pointer to read(), why would you expect "number of
> > bytes read" return? Its true that kernel can't simply not return
> 
> Because the standard says either you return the errorcode and no data
> is transferred or for a partial I/O you return how much was done.

Hmm... I can't find anything like that in SuSv2: can you give a reference?

And we're already violating that for the write() case.

Rusty.
-- 
   there are those who do and those who hang on and you don't see too
   many doers quoting their contemporaries.  -- Larry McVoy

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-21 21:46                       ` Andrew Morton
  2002-05-21 22:04                         ` Linus Torvalds
@ 2002-05-22  0:47                         ` Andrea Arcangeli
  2002-05-22  5:01                         ` Rusty Russell
  2002-05-22  6:28                         ` Rusty Russell
  3 siblings, 0 replies; 83+ messages in thread
From: Andrea Arcangeli @ 2002-05-22  0:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alan Cox, Pavel Machek, Linus Torvalds, Rusty Russell, linux-kernel

On Tue, May 21, 2002 at 02:46:08PM -0700, Andrew Morton wrote:
> Alan Cox wrote:
> > 
> > > So if you pass bad pointer to read(), why would you expect "number of
> > > bytes read" return? Its true that kernel can't simply not return
> > 
> > Because the standard says either you return the errorcode and no data
> > is transferred or for a partial I/O you return how much was done. Its
> > not neccessarily about accuracy either. If you do a 4k copy_from_user and
> > error after for some reason returning -Esomething thats fine providing you
> > didnt do anything that consumed data or shifted the file position etc
> 
> Is it safe to stick a nose in here with some irrelevancies?
> 
> Pavel makes a reasonable point that copy_*_user may elect
> to copy the data in something other than strictly ascending
> user virtual addresses.  In which case it's not possible to return
> a sane "how much was copied" number.
> 
> And copy_*_user is buggy at present: it doesn't correctly handle
> the case where the source and destination of the copy are overlapping
> in the same physical page.  Example code below.  One fix is to
> do the copy with descending addresses if src<dest or whatever.
> But then how to return the number of bytes??
> 
> Also, I see all these noises from x86 gurus about how copy_*_user()
> could be sped up heaps with fancy CPU-specific features.  Could
> someone actually alight from butt and code that up?
> 
> 
> akpm-1:/usr/src/ext3/tools> ./copy-user-test foo
> aabcddfghhjkllnopprsttvwxx
> 
> #include <stdio.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <stdlib.h>
> #include <errno.h>
> #include <string.h>
> #include <sys/mman.h>
> 
> int main(int argc, char *argv[])
> {
> 	int fd;
> 	char *filename;
> 	char *mapped_mem;
> 	char buf[26];
> 	int i;
> 
> 	if (argc != 2) {
> 		fprintf(stderr, "Usage; %s filename\n", argv[0]);
> 		exit(1);
> 	}
> 
> 	filename = argv[1];
> 	fd = open(filename, O_RDWR|O_TRUNC|O_CREAT, 0666);
> 	if (fd < 0) {
> 		fprintf(stderr, "%s: Cannot open `%s': %s\n",
> 			argv[0], filename, strerror(errno));
> 		exit(1);
> 	}
> 
> 	for (i = 0; i < 26; i++)
> 		buf[i] = 'a' + i;
> 
> 	mapped_mem = mmap(0, 26, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> 	if (mapped_mem == 0) {
	    ^^^^^^^^^^^^^^^ minor reminder: you should check against MAP_FAILED here (-1 not 0)

> 		perror("mmap");
> 		exit(1);
> 	}
> 	write(fd, buf, 26);
> 	lseek(fd, 1, SEEK_SET);
> 	write(fd, mapped_mem, 25);

very funny test really (it's like those games leading to an absurd), so
you actually notice that copy_user reads and writes 4 byte at once till
the last few bytes in the buffer :). I don't think anybody cares about
those kind of semantics of physical page overlapping in a "word" range
between pagecache destination and buffers source that is again the same
phys page.


If we use mmx unrolled loops faster on the athlons the "word"
granularity of copy_user would be 40bytes so it would be even more
noticeable (by luck in your above testcase then it would actually write
what you expect because the string is shorter and it wouldn't trigger a
roll of the unrolled loop).

An easy fix to resurrect the "expected" semantics of write, is to read
userspace and write pagecache in 1byte units, but that's slow, and I
don't think it really worth.  Let's declare this case undefined, it's
not a security matter and it's not useful either I think.


> 	msync(mapped_mem, 26, MS_SYNC);

other minor side note: msync actually cannot make differences to this case.

> 	munmap(mapped_mem, 26);
> 	close(fd);
> 
> 	{
> 		char *p = malloc(strlen(filename) + 20);
> 		sprintf(p, "cat %s ; echo", filename);
> 		system(p);
> 	}
> 	exit(0);
> }
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


Andrea

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-21 22:04                         ` Linus Torvalds
@ 2002-05-21 22:21                           ` Pavel Machek
  2002-05-22 13:47                             ` Alan Cox
  0 siblings, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2002-05-21 22:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Alan Cox, Pavel Machek, Rusty Russell, linux-kernel

Hi!

> > Pavel makes a reasonable point that copy_*_user may elect
> > to copy the data in something other than strictly ascending
> > user virtual addresses.  In which case it's not possible to return
> > a sane "how much was copied" number.
> 
> I don't agree that that is true.
> 
> Do you have _any_ reasonable implementation taht would do that_

I believe I seen some strange memcpy for PPro (or something that
obscure) that done out-of-order accesses to trigger prefetch logic. I
can't find it any more, so I can't be sure...
							Pavel
-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-21 21:46                       ` Andrew Morton
@ 2002-05-21 22:04                         ` Linus Torvalds
  2002-05-21 22:21                           ` Pavel Machek
  2002-05-22  0:47                         ` Andrea Arcangeli
                                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2002-05-21 22:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alan Cox, Pavel Machek, Rusty Russell, linux-kernel


On Tue, 21 May 2002, Andrew Morton wrote:
> 
> Pavel makes a reasonable point that copy_*_user may elect
> to copy the data in something other than strictly ascending
> user virtual addresses.  In which case it's not possible to return
> a sane "how much was copied" number.

I don't agree that that is true.

Do you have _any_ reasonable implementation taht would do that_

> And copy_*_user is buggy at present: it doesn't correctly handle
> the case where the source and destination of the copy are overlapping
> in the same physical page.  Example code below.

So we have memcpy() semantics for read()/write(), big deal. 

The same way you aren't supposed to use memcpy() for overlapping areas, 
you're not supposed to read/write into such areas, for all the same 
reasons.

> One fix is to
> do the copy with descending addresses if src<dest or whatever.

No. That wouldn't work anyway, because the addresses are totally different 
kinds.

> But then how to return the number of bytes??

The way we do now, which is the CORRECT way.

Stop this idiocy. 

The current interface is quite well-defined, and has good semantics. Every 
single argument against it has been totally bogus, with no redeeming 
values.

			Linus


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-21 21:44                     ` Alan Cox
@ 2002-05-21 21:46                       ` Andrew Morton
  2002-05-21 22:04                         ` Linus Torvalds
                                           ` (3 more replies)
  2002-05-22  4:57                       ` Rusty Russell
  1 sibling, 4 replies; 83+ messages in thread
From: Andrew Morton @ 2002-05-21 21:46 UTC (permalink / raw)
  To: Alan Cox; +Cc: Pavel Machek, Linus Torvalds, Rusty Russell, linux-kernel

Alan Cox wrote:
> 
> > So if you pass bad pointer to read(), why would you expect "number of
> > bytes read" return? Its true that kernel can't simply not return
> 
> Because the standard says either you return the errorcode and no data
> is transferred or for a partial I/O you return how much was done. Its
> not neccessarily about accuracy either. If you do a 4k copy_from_user and
> error after for some reason returning -Esomething thats fine providing you
> didnt do anything that consumed data or shifted the file position etc

Is it safe to stick a nose in here with some irrelevancies?

Pavel makes a reasonable point that copy_*_user may elect
to copy the data in something other than strictly ascending
user virtual addresses.  In which case it's not possible to return
a sane "how much was copied" number.

And copy_*_user is buggy at present: it doesn't correctly handle
the case where the source and destination of the copy are overlapping
in the same physical page.  Example code below.  One fix is to
do the copy with descending addresses if src<dest or whatever.
But then how to return the number of bytes??

Also, I see all these noises from x86 gurus about how copy_*_user()
could be sped up heaps with fancy CPU-specific features.  Could
someone actually alight from butt and code that up?


akpm-1:/usr/src/ext3/tools> ./copy-user-test foo
aabcddfghhjkllnopprsttvwxx

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <sys/mman.h>

int main(int argc, char *argv[])
{
	int fd;
	char *filename;
	char *mapped_mem;
	char buf[26];
	int i;

	if (argc != 2) {
		fprintf(stderr, "Usage; %s filename\n", argv[0]);
		exit(1);
	}

	filename = argv[1];
	fd = open(filename, O_RDWR|O_TRUNC|O_CREAT, 0666);
	if (fd < 0) {
		fprintf(stderr, "%s: Cannot open `%s': %s\n",
			argv[0], filename, strerror(errno));
		exit(1);
	}

	for (i = 0; i < 26; i++)
		buf[i] = 'a' + i;

	mapped_mem = mmap(0, 26, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	if (mapped_mem == 0) {
		perror("mmap");
		exit(1);
	}
	write(fd, buf, 26);
	lseek(fd, 1, SEEK_SET);
	write(fd, mapped_mem, 25);
	msync(mapped_mem, 26, MS_SYNC);
	munmap(mapped_mem, 26);
	close(fd);

	{
		char *p = malloc(strlen(filename) + 20);
		sprintf(p, "cat %s ; echo", filename);
		system(p);
	}
	exit(0);
}

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-21 21:17                   ` Pavel Machek
  2002-05-21 21:25                     ` Linus Torvalds
@ 2002-05-21 21:44                     ` Alan Cox
  2002-05-21 21:46                       ` Andrew Morton
  2002-05-22  4:57                       ` Rusty Russell
  2002-05-22 18:43                     ` Marco Colombo
  2 siblings, 2 replies; 83+ messages in thread
From: Alan Cox @ 2002-05-21 21:44 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Linus Torvalds, Rusty Russell, linux-kernel, alan

> So if you pass bad pointer to read(), why would you expect "number of
> bytes read" return? Its true that kernel can't simply not return

Because the standard says either you return the errorcode and no data
is transferred or for a partial I/O you return how much was done. Its
not neccessarily about accuracy either. If you do a 4k copy_from_user and
error after for some reason returning -Esomething thats fine providing you
didnt do anything that consumed data or shifted the file position etc

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-21 21:17                   ` Pavel Machek
@ 2002-05-21 21:25                     ` Linus Torvalds
  2002-05-21 21:44                     ` Alan Cox
  2002-05-22 18:43                     ` Marco Colombo
  2 siblings, 0 replies; 83+ messages in thread
From: Linus Torvalds @ 2002-05-21 21:25 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Rusty Russell, linux-kernel, alan


On Tue, 21 May 2002, Pavel Machek wrote:
> 
> If you pass bad pointer to memcpy(), you don't expect any reasonable
> return value, right?

Actually, if I pass a bad pointer to memcpy(), and I have a handler for
the SIGSEGV that fixes up the thing, yes, by golly, I _do_ expect memcpy()  
to get the right value.

If it doesn't, then the system is BROKEN.

Face it Pavel, you're WRONG. You're so incredibly wrong that this is not 
worth even discussing. Linux does it right now, and I won't break that 
correct behaviour just because somebody has a incorrect and silly view of 
what is readable.

Face it, copy_to/from_user does the RIGHT THING, and stop whining about 
it.

		Linus


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-21 20:47                 ` Linus Torvalds
@ 2002-05-21 21:17                   ` Pavel Machek
  2002-05-21 21:25                     ` Linus Torvalds
                                       ` (2 more replies)
  0 siblings, 3 replies; 83+ messages in thread
From: Pavel Machek @ 2002-05-21 21:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rusty Russell, linux-kernel, alan

Hi!

> > I thought POSIX made it explicit that you may SIGSEGV or EFAULT at your
> > option. If that SUS rule is stupid, we should just drop it.
> > 
> > Performance advantage from MMX-copy-to-user is probably well worth it.
> 
> Stop this STUPID "it speeds things up" argument.
> 
> It's not TRUE.
> 
> We still have to do exactly the same things we do right now, because even
> if we SIGSEGV we still have to return the right number of bytes
> read/written. 
> 
> SIGSEGV doesn't mean that the system call wouldn't complete. It removes 
> _none_ of the kernel fixup handling, because the SIGSEGV won't be 
> delivered until we return to user mode anyway. So please stop mixing these 
> two issues up.

If you pass bad pointer to memcpy(), you don't expect any reasonable
return value, right?

So if you pass bad pointer to read(), why would you expect "number of
bytes read" return? Its true that kernel can't simply not return
anything, but giving SIGSEGV and returning -EFAULT seems pretty
reasonable to me. If they really want to, they might extract number of
bytes read from address SIGSEGV occured at [but that's dirty hack, and
people will hopefully realise that and not rely on it].
								Pavel
-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-16 23:53               ` Pavel Machek
@ 2002-05-21 20:47                 ` Linus Torvalds
  2002-05-21 21:17                   ` Pavel Machek
  0 siblings, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2002-05-21 20:47 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Rusty Russell, linux-kernel, alan


On Thu, 16 May 2002, Pavel Machek wrote:
> I thought POSIX made it explicit that you may SIGSEGV or EFAULT at your
> option. If that SUS rule is stupid, we should just drop it.
> 
> Performance advantage from MMX-copy-to-user is probably well worth it.

Stop this STUPID "it speeds things up" argument.

It's not TRUE.

We still have to do exactly the same things we do right now, because even
if we SIGSEGV we still have to return the right number of bytes
read/written. 

SIGSEGV doesn't mean that the system call wouldn't complete. It removes 
_none_ of the kernel fixup handling, because the SIGSEGV won't be 
delivered until we return to user mode anyway. So please stop mixing these 
two issues up.

There are two completely orthogonal issues:

 - Use SIGSEGV on system calls or not.

   Using SIGSEGV makes the system call vs library routine issue more 
   regular, but it does not change the fact that the system call has to 
   return _something_.

 - system call return value for partially successful read/write. 

   We already have the exact same issue wrt something like SIGINT, and 
   nobody sane would ever suggest that we always return the "whole" thing
   if we're interrupted by an external signal.

   Similarly, it's naive and stupid to suggest we return success if we get 
   interrupted by a SIGSEGV/EFAULT.

On the first issue (SIGSEGV) I'm certainly open to trying that out, 
although I'm fairly certain there was _some_ reason we didn't do this a 
few years ago.

On the second issue, absolutely _nobody_ has shown any reason why we 
should break the existing code that does this correctly, and I've shown 
reasons why breaking it is STUPID.

			Linus


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-21  8:33         ` Christoph Hellwig
@ 2002-05-21 19:02           ` Albert D. Cahalan
  0 siblings, 0 replies; 83+ messages in thread
From: Albert D. Cahalan @ 2002-05-21 19:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Arnaldo Carvalho de Melo, Linus Torvalds, Denis Vlasenko,
	Pete Zaitcev, linux-kernel

Christoph Hellwig writes:

------
> FreeBSD has:
> /* return 0 on success, EFAULT on failure */
> int copyin(const void *udaddr, void *kaddr, size_t len);
> int copyout(const void *kaddr, void *udaddr, size_t len);

return copyin(x,y,z);                /* want EFAULT */
return copyin(x,y,z) ? -1 : 0;       /* want -1 */
return copyin(x,y,z);                /* want non-zero */

FreeBSD behavior might be best, considering where we
are most likely to share drivers.

------
> System V and derivates have:
> /* return 0 on success, -1 on failure */
> int  copyin(const  void  *userbuf, void *driverbuf, size_t cn);
> int  copyout(const  void *driverbuf, void *userbuf, size_t cn);

System V behavior is the easiest to use:

return copyin(x,y,z) & EFAULT;       /* want EFAULT */
return copyin(x,y,z);                /* want -1 */
return copyin(x,y,z);                /* want non-zero */

------
> OSF/1 has:
> /* return 0 on success, some non-specified error on failure) */
> int copyin(caddr_t user_src, caddr_t kernel_dest, u_int bcount);
> int copyout(caddr_t kernel_src, caddr_t user_dest, u_int bcount);

return copyin(x,y,z) ? EFAULT : 0;   /* want EFAULT */
return copyin(x,y,z) ? -1 : 0;       /* want -1 */
return copyin(x,y,z);                /* want non-zero */

Yuck... but good if it makes the assembly any faster.

------
With -EFAULT on an error:

return -copyin(x,y,z);       /* want EFAULT */
return copyin(x,y,z)>>31;    /* want -1 (rely on gcc's sign awareness) */
return copyin(x,y,z);        /* want non-zero */

Well, I like it.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
       [not found]   ` <20020520112232.A8983@devserv.devel.redhat.com>
@ 2002-05-21 10:57     ` Denis Vlasenko
  2002-05-21  6:21       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 83+ messages in thread
From: Denis Vlasenko @ 2002-05-21 10:57 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: linux-kernel

On 20 May 2002 13:22, you wrote:
> > Can you tell me what's wrong with copy_from_user? How did you used it
> > wrongly?
>
> Denis, I agree with the essense of Rusty's argument, which is that
> copy_to_user is easy to misuse in the following way:
>
> xxx_ioctl (..., cmd, arg) {
> 	return copy_to_user(....);
> }
>
> Since copy_to_user returns a number of residue bytes instead of
> -EINVAL, such statement confuses the caller.
> Rusty found something about 54 instances of this.

Oh. Do you think a pair of

copy_to_user_or_EINVAL(...)
copy_to_user_return_residue(...)

will help avoid such bugs?
It is possible to audit kernel once, move it to new functions
and deprecate/delete old one.
--
vda

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-21  6:21       ` Arnaldo Carvalho de Melo
@ 2002-05-21  8:33         ` Christoph Hellwig
  2002-05-21 19:02           ` Albert D. Cahalan
  2002-05-22 14:27         ` Denis Vlasenko
  1 sibling, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2002-05-21  8:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Linus Torvalds, Denis Vlasenko,
	Pete Zaitcev, linux-kernel

On Tue, May 21, 2002 at 03:21:18AM -0300, Arnaldo Carvalho de Melo wrote:
> stay, but something like:
> 
> #define copyin(...) (copy_from_user(...) ? -EFAULT : 0)
> #define copyout(...) (copy_to_user(...) ? -EFAULT : 0)
> 
> Like several drivers already have (with these names or with other names)
> would be something interesting, that way we could clean up the ones that
> use this construct and all the others that use the longer
> 'copy_{to,from}_user(...) ? -EFAULT : 0' construct. If the powers that be
> accept this, I'd do the work 8)
> 
> Is it *BSD that have copyin/copyout with this semantic? If so it'd even
> have an extra bonus to make porting a little bit faster... 8)

FreeBSD has:
/* return 0 on success, EFAULT on failure */
int copyin(const void *udaddr, void *kaddr, size_t len);
int copyout(const void *kaddr, void *udaddr, size_t len);

System V and derivates have:
/* return 0 on success, -1 on failure */
int  copyin(const  void  *userbuf, void *driverbuf, size_t cn);
int  copyout(const  void *driverbuf, void *userbuf, size_t cn);

OSF/1 has:
/* return 0 on success, some non-specified error on failure) */
int copyin(caddr_t user_src, caddr_t kernel_dest, u_int bcount);
int copyout(caddr_t kernel_src, caddr_t user_dest, u_int bcount);


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-21 10:57     ` Denis Vlasenko
@ 2002-05-21  6:21       ` Arnaldo Carvalho de Melo
  2002-05-21  8:33         ` Christoph Hellwig
  2002-05-22 14:27         ` Denis Vlasenko
  0 siblings, 2 replies; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2002-05-21  6:21 UTC (permalink / raw)
  To: Linus Torvalds, Denis Vlasenko; +Cc: Pete Zaitcev, linux-kernel

Em Tue, May 21, 2002 at 08:57:28AM -0200, Denis Vlasenko escreveu:
> On 20 May 2002 13:22, you wrote:
> > > Can you tell me what's wrong with copy_from_user? How did you used it
> > > wrongly?
> >
> > Denis, I agree with the essense of Rusty's argument, which is that
> > copy_to_user is easy to misuse in the following way:
> >
> > xxx_ioctl (..., cmd, arg) {
> > 	return copy_to_user(....);
> > }
> >
> > Since copy_to_user returns a number of residue bytes instead of
> > -EINVAL, such statement confuses the caller.
> > Rusty found something about 54 instances of this.
> 
> Oh. Do you think a pair of
> 
> copy_to_user_or_EINVAL(...)
> copy_to_user_return_residue(...)
> 
> will help avoid such bugs?
> It is possible to audit kernel once, move it to new functions
> and deprecate/delete old one.

As Linus and others pointed out, copy_{to_from}_user has its uses and will
stay, but something like:

#define copyin(...) (copy_from_user(...) ? -EFAULT : 0)
#define copyout(...) (copy_to_user(...) ? -EFAULT : 0)

Like several drivers already have (with these names or with other names)
would be something interesting, that way we could clean up the ones that
use this construct and all the others that use the longer
'copy_{to,from}_user(...) ? -EFAULT : 0' construct. If the powers that be
accept this, I'd do the work 8)

Is it *BSD that have copyin/copyout with this semantic? If so it'd even
have an extra bonus to make porting a little bit faster... 8)

- Arnaldo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-20  4:53           ` Rusty Russell
  2002-05-19 20:12             ` Arnaldo Carvalho de Melo
@ 2002-05-20 16:00             ` Linus Torvalds
  1 sibling, 0 replies; 83+ messages in thread
From: Linus Torvalds @ 2002-05-20 16:00 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, alan



On Mon, 20 May 2002, Rusty Russell wrote:
>
> Not quite:
> 	copy_from_user(xxx);
>
> Is my suggestion.  No error return.

The fact is, that that would still make you have to audit all the users,
AND you'd be left up shit creek for the users who _need_ the error return,
so now you not only have to fix all existing broken stuff, you have to fix
the _correct_ stuff too some strange way. I agree with returning SIGSEGV,
but it is NOT a _replacement_ for getting the right error return from
read/write.

So what's your point? You want to dumb down the interfaces until you can't
make mistakes, and only idiots will be able to use the system.

As long as you continue to push an interface that DOES NOT WORK, there's
no way you can win this argument. read()/write() _needs_ to work, and
that's not a "warm and fuzzy" kind of thing you can play with.

		Linus


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
       [not found] ` <E179fAd-0005vs-00@wagner.rustcorp.com.au.suse.lists.linux.kernel>
@ 2002-05-20 10:59   ` Andi Kleen
  0 siblings, 0 replies; 83+ messages in thread
From: Andi Kleen @ 2002-05-20 10:59 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel

Rusty Russell <rusty@rustcorp.com.au> writes:

> In message <Pine.LNX.4.44.0205191951460.22433-100000@home.transmeta.com> you wr
> ite:
> > 	ret = copy_from_user(xxx);
> > 	if (ret)
> > 		return ret;
> > 
> > which is apparently your suggestion.
> 
> Not quite:
> 	copy_from_user(xxx);
> 
> Is my suggestion.  No error return.

I don't think it is a good idea. Consider a driver that does lots 
of small copy_from_user() from user space for a longer write (e.g. TCP
to a small MSS) If xxx was faulting it would eat a lot of cycles with 
faulting again and again.

-Andi


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-20  2:54         ` Linus Torvalds
@ 2002-05-20  4:53           ` Rusty Russell
  2002-05-19 20:12             ` Arnaldo Carvalho de Melo
  2002-05-20 16:00             ` Linus Torvalds
  0 siblings, 2 replies; 83+ messages in thread
From: Rusty Russell @ 2002-05-20  4:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, alan

In message <Pine.LNX.4.44.0205191951460.22433-100000@home.transmeta.com> you wr
ite:
> 	ret = copy_from_user(xxx);
> 	if (ret)
> 		return ret;
> 
> which is apparently your suggestion.

Not quite:
	copy_from_user(xxx);

Is my suggestion.  No error return.

> So a lot of people didn't get it? Arnaldo seems to have fixed a lot of
> them already

Yeah, thanks to my kernel audit.  But I won't be auditing all 5,500
every release (I promised Alan I'd do 2.4 though: I'm waiting for the
next Marcelo kernel).

> and maybe you who apparently care can add _documentation_,
> but the fact is that there is no reason to make a less powerful interface.

It's been documented in the kernel docs.  It's also in the device
driver book.  And people still get it wrong because it's "special".

Please please please, Linus: to me this is like the min & max macros:
you didn't want a programmer trap in there, but everyone else
disagreed.  If there's any sane way we can get rid of this trap (which
has shown to cause real bugs), I would weigh it very carefully.

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-20  2:06       ` Rusty Russell
@ 2002-05-20  2:54         ` Linus Torvalds
  2002-05-20  4:53           ` Rusty Russell
  0 siblings, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2002-05-20  2:54 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, alan



On Mon, 20 May 2002, Rusty Russell wrote:
>
> If read always returns the amount read (ignoring any copy_to_user
> errors), then you can repeat it by seeking backwards[1] and redoing the
> read.

No.

> So copy_to_user can simply deliver a SIGSEGV and return "success", and
> everything will work (except sockets, pipes, etc).

I don't mind the SIGSEGV, but I refuse to make a stupid change that has
absolutely _zero_ reason for it.

The current "copy_to/from_user()" is perfectly fine. It's very simple to
do

	if (copy_from_user(xxx))
		return -EFAULT;

and it is not AT ALL simpler to do

	ret = copy_from_user(xxx);
	if (ret)
		return ret;

which is apparently your suggestion.

So a lot of people didn't get it? Arnaldo seems to have fixed a lot of
them already, and maybe you who apparently care can add _documentation_,
but the fact is that there is no reason to make a less powerful interface.

		Linus


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-19 18:29     ` Linus Torvalds
  2002-05-19 19:57       ` Roman Zippel
@ 2002-05-20  2:06       ` Rusty Russell
  2002-05-20  2:54         ` Linus Torvalds
  1 sibling, 1 reply; 83+ messages in thread
From: Rusty Russell @ 2002-05-20  2:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rusty Russell, linux-kernel, alan

In message <Pine.LNX.4.44.0205191125120.3104-100000@home.transmeta.com> you wri
te:
> 
> 
> On Sat, 18 May 2002, Benjamin Herrenschmidt wrote:
> >
> > Looking at generic_file_write(), it ignore the count returned by
> > copy_from_user and always commit a write for the whole requested
> > count, regardless of how much could actually be read from userland.
> > The result of copy_from_user is only used as an error condition.
> 
> And this is exactly what makes it re-startable.

If read always returns the amount read (ignoring any copy_to_user
errors), then you can repeat it by seeking backwards[1] and redoing the
read.

So copy_to_user can simply deliver a SIGSEGV and return "success", and
everything will work (except sockets, pipes, etc).

Is this satisfactory?  I'd really like to get rid of 5,500 code paths
in the kernel...

BTW, SuSv3/POSIX.1.2001 says it's OK,
Rusty.
[1] No, this won't work on pipes & sockets, but the whole idea won't
work on many devices anyway...
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-19 20:23       ` Edgar Toernig
@ 2002-05-19 22:44         ` Alan Cox
  0 siblings, 0 replies; 83+ messages in thread
From: Alan Cox @ 2002-05-19 22:44 UTC (permalink / raw)
  To: Edgar Toernig; +Cc: Linus Torvalds, Andi Kleen, linux-kernel, rusty, alan

> > Oh, like read() or write() for regular files? Yup, they exist.
> 
> And that would be?  I've never seen such an abomination.  Sure,
> mapping pages on SEGV is use, but passing only partially mapped
> buffers to read/write on purpose???

Some of the storage systems do this. They may try to use the host page
size but on some platforms the host page size is variable so that isnt
feasible.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-18 16:14     ` Linus Torvalds
  2002-05-19  2:10       ` Rusty Russell
@ 2002-05-19 20:23       ` Edgar Toernig
  2002-05-19 22:44         ` Alan Cox
  1 sibling, 1 reply; 83+ messages in thread
From: Edgar Toernig @ 2002-05-19 20:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andi Kleen, linux-kernel, rusty, alan

Linus Torvalds wrote:
> 
> On 18 May 2002, Andi Kleen wrote:
> >
> > Are you sure they even exist ?
> 
> Oh, like read() or write() for regular files? Yup, they exist.

And that would be?  I've never seen such an abomination.  Sure,
mapping pages on SEGV is use, but passing only partially mapped
buffers to read/write on purpose???

Ciao, ET.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-20  4:53           ` Rusty Russell
@ 2002-05-19 20:12             ` Arnaldo Carvalho de Melo
  2002-05-20 16:00             ` Linus Torvalds
  1 sibling, 0 replies; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2002-05-19 20:12 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Linus Torvalds, linux-kernel, alan

Em Mon, May 20, 2002 at 02:53:18PM +1000, Rusty Russell escreveu:
> > So a lot of people didn't get it? Arnaldo seems to have fixed a lot of
> > them already
> 
> Yeah, thanks to my kernel audit.  But I won't be auditing all 5,500
> every release (I promised Alan I'd do 2.4 though: I'm waiting for the
> next Marcelo kernel).

Yeah, that put the needed pressure for the patches to get accepted ;) I and
others had done most of that in the past but patches were getting lost in the
noise, now they are getting in much more easily 8)

http://kerneljanitors.org/TODO had that listed for a long time 8)

Anyway, thanks again for the audit! :-)

And I'm all for something that is more easy to use, as I, like you, don't
want to keep auditing for the very same thing over and over again.

- Arnaldo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-19 18:29     ` Linus Torvalds
@ 2002-05-19 19:57       ` Roman Zippel
  2002-05-20  2:06       ` Rusty Russell
  1 sibling, 0 replies; 83+ messages in thread
From: Roman Zippel @ 2002-05-19 19:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Benjamin Herrenschmidt, Rusty Russell, linux-kernel, alan

Hi,

On Sun, 19 May 2002, Linus Torvalds wrote:

> A faulting write will fill some subsequent memory area with zeroes, but a
> subsequent write can complete the original one.
> 
> It has to _commit_ the whole area, because it uses the pre-fault size
> information to optimize away reads etc, ie if you do a
> 
> 	write(fd, buf, 4096);
> 
> at a page-aligned offset, the write code knows that it shouldn't read the
> old contents because they get overwritten.
> 
> Which is why we need to commit the whole 4096 bytes, even if we only
> actually were able to get a single byte from user space.

I basically agree, but I think there is a special case: writing at the end
of the file. Instead of writing zeros, we have to truncate the file,
otherwise you can't restart append writes. Currently we get this wrong.

bye, Roman


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-18 21:47   ` Benjamin Herrenschmidt
  2002-05-19 12:22     ` Alan Cox
@ 2002-05-19 18:29     ` Linus Torvalds
  2002-05-19 19:57       ` Roman Zippel
  2002-05-20  2:06       ` Rusty Russell
  1 sibling, 2 replies; 83+ messages in thread
From: Linus Torvalds @ 2002-05-19 18:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Rusty Russell, linux-kernel, alan



On Sat, 18 May 2002, Benjamin Herrenschmidt wrote:
>
> Looking at generic_file_write(), it ignore the count returned by
> copy_from_user and always commit a write for the whole requested
> count, regardless of how much could actually be read from userland.
> The result of copy_from_user is only used as an error condition.

And this is exactly what makes it re-startable.

A faulting write will fill some subsequent memory area with zeroes, but a
subsequent write can complete the original one.

It has to _commit_ the whole area, because it uses the pre-fault size
information to optimize away reads etc, ie if you do a

	write(fd, buf, 4096);

at a page-aligned offset, the write code knows that it shouldn't read the
old contents because they get overwritten.

Which is why we need to commit the whole 4096 bytes, even if we only
actually were able to get a single byte from user space.

But by then telling user space that we couldn't actually write more than 1
byte, we give user space the _ability_ to re-start the write with the
missing 4095 bytes.

> generic_file_read() on the other hand seems to be ok.

That one doesn't have any of the same issues.

		Linus


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-18 21:47   ` Benjamin Herrenschmidt
@ 2002-05-19 12:22     ` Alan Cox
  2002-05-19 18:29     ` Linus Torvalds
  1 sibling, 0 replies; 83+ messages in thread
From: Alan Cox @ 2002-05-19 12:22 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Linus Torvalds, Rusty Russell, linux-kernel, alan

> Looking at generic_file_write(), it ignore the count returned by
> copy_from_user and always commit a write for the whole requested
> count, regardless of how much could actually be read from userland.

It has to commit the write for the entire block. That is needed to get
the disk sectors correct versus another reader. The error reporting may
not be berfect however

Alan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-19  5:23 ` Linus Torvalds
  2002-05-17  0:00   ` Pavel Machek
  2002-05-18 21:47   ` Benjamin Herrenschmidt
@ 2002-05-19 11:41   ` Alan Cox
  2 siblings, 0 replies; 83+ messages in thread
From: Alan Cox @ 2002-05-19 11:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rusty Russell, linux-kernel, alan

> Oh, read() has to return the right value, but we should _also_ do a
> SIGSEGV, in my opinion (it would also catch all those programs that didn't
> expect it).
> 
> However, that apparently flies in the face of UNIX history and apparently
> some standard (whether it was POSIX or SuS or something else, I can't
> remember, but that discussion came up earlier..)

Unix history I think

Posix doesnt care - indeed it can be that a posix system has no memory
protection or kernel/user divide. SuS seems to simply leave passing bogus
addresses as undefined

Alan

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-19  3:38 Rusty Russell
@ 2002-05-19  5:23 ` Linus Torvalds
  2002-05-17  0:00   ` Pavel Machek
                     ` (2 more replies)
  0 siblings, 3 replies; 83+ messages in thread
From: Linus Torvalds @ 2002-05-19  5:23 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, alan



On Sun, 19 May 2002, Rusty Russell wrote:
> > returns 0, but a signal is delivered.  Then the only places which need
> > to be clever are the mount option copying, and the signal delivery
> > code for SIGSEGV (which uses copy_to_user).
>
> Sorry, this doesn't work here either: this would return the wrong
> value from read.

Oh, read() has to return the right value, but we should _also_ do a
SIGSEGV, in my opinion (it would also catch all those programs that didn't
expect it).

However, that apparently flies in the face of UNIX history and apparently
some standard (whether it was POSIX or SuS or something else, I can't
remember, but that discussion came up earlier..)

> Of course, everyone who does more than one copy_to_user should be
> checking that return value, and not doing:
>
> 	if (copy_to_user(uptr....)
> 	   || copy_to_user(uptr+10,....)
> 		return -EFAULT
>
> So that such gc schemes actually work,

Note that _most_ system calls are simply just re-startable, ie if your
"stat()" system call dies half-way and returns EFAULT, you can re-start it
without having to know how much of the "stat" structure you might have
filled in. So for many things a plain -EFAULT is plenty good enough, and
your "copy_to/from_user() should return 0/-EFAULT" would work for them.

But read (and particularly write) are _not_ re-startable without the
knowledge of how much was written, because they change f_pos and other
things ("write()" in particular changes a _lot_ of "other things", namely
the data in the file itself, of course).

There are other system calls that aren't re-startable, but basically
read/write are the "big ones", and thus Linux should try its best to make
them work in an environment that requires restartability. Most programs
can live without various random ioctl's and special system calls, but very
very few programs/environments can live without read/write.

("restartable" here doesn't mean that the _kernel_ would re-start them,
but that a "gc-aware library" can make wrappers around them and correctly
restart them internally, if you see my point - kind of like how stdio
already handles the issue of EINTR returns for read/write, which is
actually very similar in nature).

		Linus


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-19  4:01             ` Rusty Russell
@ 2002-05-19  4:02               ` Larry McVoy
  2002-05-16 23:56                 ` Pavel Machek
  2002-05-16 23:56                 ` Pavel Machek
  0 siblings, 2 replies; 83+ messages in thread
From: Larry McVoy @ 2002-05-19  4:02 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Larry McVoy, linux-kernel, alan

On Sun, May 19, 2002 at 02:01:25PM +1000, Rusty Russell wrote:
> But as we all know, it is harder to remove a feature from Linux, than
> to get the camel book through the eye of a needle (or something).

It's possible that I'm too tired to have grasped this, but if I have,
you're all wet.  In all cases, read needs to return the number of bytes
successfully moved.  If you ask for N and 1/2 of the way through N you
are going to get a fault, and you return SEGFAULT, now how can I ever
find out that N/2 bytes actually made it out to me?  I want to know that.
If you are arguing that return N/2 is wrong, you are incorrect. 
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-19  3:05           ` Larry McVoy
@ 2002-05-19  4:01             ` Rusty Russell
  2002-05-19  4:02               ` Larry McVoy
  0 siblings, 1 reply; 83+ messages in thread
From: Rusty Russell @ 2002-05-19  4:01 UTC (permalink / raw)
  To: Larry McVoy; +Cc: linux-kernel, alan

In message <20020518200540.N8794@work.bitmover.com> you write:
> On Sat, May 18, 2002 at 08:01:48PM -0700, Linus Torvalds wrote:
> > On Sun, 19 May 2002, Rusty Russell wrote:
> > >
> > > Huh?  No, you ask for 2000 bytes into a buffer that can only take 1000
> > > bytes without hitting an unmapped page, returning EFAULT or giving a
> > > SIGSEGV is perfectly acceptable.
> > 
> > Bzzt, wrong answer.
> 
> Linus is absolutely right.  The correct semantics are to return the number
> of bytes read, if they are greater than zero, and on the next read return
> the error.  This has been a corner case in read for a long time in various
> Unix versions, and Linus has it right.  I went through this back at Sun
> and we explored all the different ways, and the bottom line is that you
> first ACK that you moved some data and then you NAK on the next read.

It's interesting to look at this backwards:

	Imagine if copy_to_user returned void and delivered a SIGSEGV
	on fault, and always had.

	Now, to fix this, we'd want to add new code paths to the 5,500
	callers throughout the kernel.

I'm pretty sure everyone would balk.  They'd say "sorry, you're going
to have to wrap your syscalls somehow".

But as we all know, it is harder to remove a feature from Linux, than
to get the camel book through the eye of a needle (or something).

Oh well,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
@ 2002-05-19  3:38 Rusty Russell
  2002-05-19  5:23 ` Linus Torvalds
  0 siblings, 1 reply; 83+ messages in thread
From: Rusty Russell @ 2002-05-19  3:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, alan

> Um, what about delivering a SIGSEGV?  So, copy_to/from_user always
> returns 0, but a signal is delivered.  Then the only places which need
> to be clever are the mount option copying, and the signal delivery
> code for SIGSEGV (which uses copy_to_user).

Sorry, this doesn't work here either: this would return the wrong
value from read.

Of course, everyone who does more than one copy_to_user should be
checking that return value, and not doing:

	if (copy_to_user(uptr....) 
	   || copy_to_user(uptr+10,....) 
		return -EFAULT

So that such gc schemes actually work,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-19  3:31           ` Rusty Russell
@ 2002-05-19  3:34             ` Linus Torvalds
  2002-05-16 23:53               ` Pavel Machek
  0 siblings, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2002-05-19  3:34 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, alan



On Sun, 19 May 2002, Rusty Russell wrote:
>
> Um, what about delivering a SIGSEGV?  So, copy_to/from_user always
> returns 0, but a signal is delivered.

That doesn't help. It's against some stupid SUS rule, I'm afraid.

(And THAT is a stupid rule, I 100% agree with. It means that some things
return -EFAULT, and other things do SIGSEGV, and the only difference is
whether something is a system call or is implemented as a library thing.
UNIX should always just have segfaulted, but there you are..)

		Linus


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-19  3:01         ` Linus Torvalds
  2002-05-19  3:05           ` Larry McVoy
@ 2002-05-19  3:31           ` Rusty Russell
  2002-05-19  3:34             ` Linus Torvalds
  1 sibling, 1 reply; 83+ messages in thread
From: Rusty Russell @ 2002-05-19  3:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, alan

In message <Pine.LNX.4.44.0205181958140.30454-100000@home.transmeta.com> you wr
ite:
> 
> 
> On Sun, 19 May 2002, Rusty Russell wrote:
> >
> > Huh?  No, you ask for 2000 bytes into a buffer that can only take 1000
> > bytes without hitting an unmapped page, returning EFAULT or giving a
> > SIGSEGV is perfectly acceptable.
> 
> Bzzt, wrong answer.
> 
> Partial reads/writes are perfectly possible and non-buggy for any system
> that uses variations of mmap/mprotect to implement user-level memory
> management, for example persistant data-bases, garbage collection etc.
> 
> Which means that if half of a buffer used for "read()" just happens to be
> marked non-writable for some GC purpose, the kernel HAS to have the
> ability return the right answer (which in this case is to say "I could
> only read 1000 bytes"). Because anything else just doesn't give the GC
> library (or whatever) any way to recover nicely.

Um, what about delivering a SIGSEGV?  So, copy_to/from_user always
returns 0, but a signal is delivered.  Then the only places which need
to be clever are the mount option copying, and the signal delivery
code for SIGSEGV (which uses copy_to_user).

This has the benefit of not breaking existing kernel code, whichever
interpretation of the return value is used.

> > As a coder, I'd *really* prefer that to hiding the bug!
> 
> Rusty, face it, you're wrong on this one. Just drop it.

That's certainly possible,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-19  3:01         ` Linus Torvalds
@ 2002-05-19  3:05           ` Larry McVoy
  2002-05-19  4:01             ` Rusty Russell
  2002-05-19  3:31           ` Rusty Russell
  1 sibling, 1 reply; 83+ messages in thread
From: Larry McVoy @ 2002-05-19  3:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rusty Russell, linux-kernel, alan

On Sat, May 18, 2002 at 08:01:48PM -0700, Linus Torvalds wrote:
> On Sun, 19 May 2002, Rusty Russell wrote:
> >
> > Huh?  No, you ask for 2000 bytes into a buffer that can only take 1000
> > bytes without hitting an unmapped page, returning EFAULT or giving a
> > SIGSEGV is perfectly acceptable.
> 
> Bzzt, wrong answer.

Linus is absolutely right.  The correct semantics are to return the number
of bytes read, if they are greater than zero, and on the next read return
the error.  This has been a corner case in read for a long time in various
Unix versions, and Linus has it right.  I went through this back at Sun
and we explored all the different ways, and the bottom line is that you
first ACK that you moved some data and then you NAK on the next read.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-19  2:10       ` Rusty Russell
@ 2002-05-19  3:01         ` Linus Torvalds
  2002-05-19  3:05           ` Larry McVoy
  2002-05-19  3:31           ` Rusty Russell
  0 siblings, 2 replies; 83+ messages in thread
From: Linus Torvalds @ 2002-05-19  3:01 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, alan



On Sun, 19 May 2002, Rusty Russell wrote:
>
> Huh?  No, you ask for 2000 bytes into a buffer that can only take 1000
> bytes without hitting an unmapped page, returning EFAULT or giving a
> SIGSEGV is perfectly acceptable.

Bzzt, wrong answer.

Partial reads/writes are perfectly possible and non-buggy for any system
that uses variations of mmap/mprotect to implement user-level memory
management, for example persistant data-bases, garbage collection etc.

Which means that if half of a buffer used for "read()" just happens to be
marked non-writable for some GC purpose, the kernel HAS to have the
ability return the right answer (which in this case is to say "I could
only read 1000 bytes"). Because anything else just doesn't give the GC
library (or whatever) any way to recover nicely.

> As a coder, I'd *really* prefer that to hiding the bug!

Rusty, face it, you're wrong on this one. Just drop it.

		Linus


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-18 16:14     ` Linus Torvalds
@ 2002-05-19  2:10       ` Rusty Russell
  2002-05-19  3:01         ` Linus Torvalds
  2002-05-19 20:23       ` Edgar Toernig
  1 sibling, 1 reply; 83+ messages in thread
From: Rusty Russell @ 2002-05-19  2:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, rusty, alan

In message <Pine.LNX.4.44.0205180910570.26742-100000@home.transmeta.com> you wr
ite:
> 
> 
> On 18 May 2002, Andi Kleen wrote:
> >
> > Are you sure they even exist ?
> 
> Oh, like read() or write() for regular files? Yup, they exist.

Huh?  No, you ask for 2000 bytes into a buffer that can only take 1000
bytes without hitting an unmapped page, returning EFAULT or giving a
SIGSEGV is perfectly acceptable.

As a coder, I'd *really* prefer that to hiding the bug!

There's only one case which really actually cares for a valid reason:
the hack in copying mount options.

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-19  5:23 ` Linus Torvalds
  2002-05-17  0:00   ` Pavel Machek
@ 2002-05-18 21:47   ` Benjamin Herrenschmidt
  2002-05-19 12:22     ` Alan Cox
  2002-05-19 18:29     ` Linus Torvalds
  2002-05-19 11:41   ` Alan Cox
  2 siblings, 2 replies; 83+ messages in thread
From: Benjamin Herrenschmidt @ 2002-05-18 21:47 UTC (permalink / raw)
  To: Linus Torvalds, Rusty Russell; +Cc: linux-kernel, alan

>But read (and particularly write) are _not_ re-startable without the
>knowledge of how much was written, because they change f_pos and other
>things ("write()" in particular changes a _lot_ of "other things", namely
>the data in the file itself, of course).

Looking at generic_file_write(), it ignore the count returned by
copy_from_user and always commit a write for the whole requested
count, regardless of how much could actually be read from userland.
The result of copy_from_user is only used as an error condition.

generic_file_read() on the other hand seems to be ok.

>There are other system calls that aren't re-startable, but basically
>read/write are the "big ones", and thus Linux should try its best to make
>them work in an environment that requires restartability. Most programs
>can live without various random ioctl's and special system calls, but very
>very few programs/environments can live without read/write.
>
>("restartable" here doesn't mean that the _kernel_ would re-start them,
>but that a "gc-aware library" can make wrappers around them and correctly
>restart them internally, if you see my point - kind of like how stdio
>already handles the issue of EINTR returns for read/write, which is
>actually very similar in nature).



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-18 10:16   ` Andi Kleen
@ 2002-05-18 16:14     ` Linus Torvalds
  2002-05-19  2:10       ` Rusty Russell
  2002-05-19 20:23       ` Edgar Toernig
  0 siblings, 2 replies; 83+ messages in thread
From: Linus Torvalds @ 2002-05-18 16:14 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, rusty, alan



On 18 May 2002, Andi Kleen wrote:
>
> Are you sure they even exist ?

Oh, like read() or write() for regular files? Yup, they exist.

		Linus


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
       [not found] ` <Pine.LNX.4.44.0205171936220.1524-100000@home.transmeta.com.suse.lists.linux.kernel>
@ 2002-05-18 10:16   ` Andi Kleen
  2002-05-18 16:14     ` Linus Torvalds
  0 siblings, 1 reply; 83+ messages in thread
From: Andi Kleen @ 2002-05-18 10:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, rusty, alan

Linus Torvalds <torvalds@transmeta.com> writes:

> On Fri, 17 May 2002, Rusty Russell wrote:
> >
> > Sorry I wasn't clear: I'm saying *replace*, not add,
> 
> Ok, let _me_ be clear: replacing them with an inferior product that cannot
> tell you partial copies is not going to happen. Not now, not ever. You
> would break all the users who _require_ knowing about a read() that only
> gave you 5 out of 50 bytes.

Are you sure they even exist ? As far as I can see near everybody relies
on zeroing of target on exception instead.

At least for the SSE optimized copy_*_user always would be much better,
because optimizing the miss count is painful from an unrolled loop
and cannot be even done accurately (8 bytes accuracy is best with 8 byte
loads/stored).  With that in mind I think the byte count is broken by 
design because it cannot be correctly implemented unless you do byte copies.

I remember TCP was given as the prime user when this interface was 
introduced in 2.1, but TCP does not use the byte count currently and never has
(in fact the primary memory copy interface of TCP - csum_copy_* - does not 
even support it)

-Andi 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-18  1:05   ` Rusty Russell
@ 2002-05-18  2:57     ` Alan Cox
  2002-05-16 23:27       ` Pavel Machek
  0 siblings, 1 reply; 83+ messages in thread
From: Alan Cox @ 2002-05-18  2:57 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Pete Zaitcev, linux-kernel

> > and foolish to program without reading interface specifications
> > or the code. It did not occur to me to shift the blame onto
> > copy_from_user creators.
> 
> Please send me your mailing address.  I shall send you a copy of
> "Design of Everyday Things" (Donald A Norman).  You should not blame
> yourself for others' bad design.

By extrapolation perhaps you'd also care to reimplement it in terms of
exceptions, refcount the objects with an object based primitive to do the
transfers, garbage collect to remove memory leaks and implement strings and
variable size buffers as base objects.

Thats where your argument goes

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-17 17:36 ` Pete Zaitcev
@ 2002-05-18  1:05   ` Rusty Russell
  2002-05-18  2:57     ` Alan Cox
  0 siblings, 1 reply; 83+ messages in thread
From: Rusty Russell @ 2002-05-18  1:05 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: linux-kernel

In message <200205171736.g4HHant04061@devserv.devel.redhat.com> you write:
> >[...]
> > We could do that, or, we could fix the actual problem, which is the
> > HUGE FUCKING BEARTRAP WHICH CATCHES EVERY SINGLE NEW PROGRAMMER ON THE
> > WAY THROUGH.
> 
> It is but one of many crooked interfaces. For example, Linux
> has outb() arguments swapped relatively to all other environments.

Yes, and they should all be fixed.  But the one which is never found
by the compiler or any simple testing is a clear winner in the "trap
for programmers" category.

> I think it may be the best to have Corbet to update the O'Reily
> book with a chapter of common traps and add a @-comment near
> the copy_from_user.
> 
> In the interest of full disclosure, I must admit that I used
> copy_from_user wrong once, many years ago. The lesson which
> I extracted was different though. I decided that I was arrogant
> and foolish to program without reading interface specifications
> or the code. It did not occur to me to shift the blame onto
> copy_from_user creators.

Please send me your mailing address.  I shall send you a copy of
"Design of Everyday Things" (Donald A Norman).  You should not blame
yourself for others' bad design.

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
       [not found] <mailman.1021642692.12772.linux-kernel2news@redhat.com>
@ 2002-05-17 17:36 ` Pete Zaitcev
  2002-05-18  1:05   ` Rusty Russell
       [not found] ` <200205191212.g4JCCLY25867@Port.imtp.ilyichevsk.odessa.ua>
  1 sibling, 1 reply; 83+ messages in thread
From: Pete Zaitcev @ 2002-05-17 17:36 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel

>[...]
> We could do that, or, we could fix the actual problem, which is the
> HUGE FUCKING BEARTRAP WHICH CATCHES EVERY SINGLE NEW PROGRAMMER ON THE
> WAY THROUGH.

It is but one of many crooked interfaces. For example, Linux
has outb() arguments swapped relatively to all other environments.
I think it may be the best to have Corbet to update the O'Reily
book with a chapter of common traps and add a @-comment near
the copy_from_user.

In the interest of full disclosure, I must admit that I used
copy_from_user wrong once, many years ago. The lesson which
I extracted was different though. I decided that I was arrogant
and foolish to program without reading interface specifications
or the code. It did not occur to me to shift the blame onto
copy_from_user creators.

-- Pete

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-19  5:23 ` Linus Torvalds
@ 2002-05-17  0:00   ` Pavel Machek
  2002-05-18 21:47   ` Benjamin Herrenschmidt
  2002-05-19 11:41   ` Alan Cox
  2 siblings, 0 replies; 83+ messages in thread
From: Pavel Machek @ 2002-05-17  0:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rusty Russell, linux-kernel, alan

Hi!

> However, that apparently flies in the face of UNIX history and apparently
> some standard (whether it was POSIX or SuS or something else, I can't
> remember, but that discussion came up earlier..)

As far as I can remember, discussion was that standards *do* allow that.
I actually ran my system with -EFAULT -> SIGSEGV [it is one-liner!] and
it worked perfectly well.
								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-19  4:02               ` Larry McVoy
  2002-05-16 23:56                 ` Pavel Machek
@ 2002-05-16 23:56                 ` Pavel Machek
  1 sibling, 0 replies; 83+ messages in thread
From: Pavel Machek @ 2002-05-16 23:56 UTC (permalink / raw)
  To: Larry McVoy, Rusty Russell, Larry McVoy, linux-kernel, alan

Hi!

> > But as we all know, it is harder to remove a feature from Linux, than
> > to get the camel book through the eye of a needle (or something).
> 
> It's possible that I'm too tired to have grasped this, but if I have,
> you're all wet.  In all cases, read needs to return the number of bytes
> successfully moved.  If you ask for N and 1/2 of the way through N you
> are going to get a fault, and you return SEGFAULT, now how can I ever
> find out that N/2 bytes actually made it out to me?  I want to know that.
> If you are arguing that return N/2 is wrong, you are incorrect. 

Imagine "I reallyu want to know how much memory memcpy() copied!". It is
as unreasonable as that.
								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-19  4:02               ` Larry McVoy
@ 2002-05-16 23:56                 ` Pavel Machek
  2002-05-16 23:56                 ` Pavel Machek
  1 sibling, 0 replies; 83+ messages in thread
From: Pavel Machek @ 2002-05-16 23:56 UTC (permalink / raw)
  To: Larry McVoy, Rusty Russell, Larry McVoy, linux-kernel, alan

Hi!

> > But as we all know, it is harder to remove a feature from Linux, than
> > to get the camel book through the eye of a needle (or something).
> 
> It's possible that I'm too tired to have grasped this, but if I have,
> you're all wet.  In all cases, read needs to return the number of bytes
> successfully moved.  If you ask for N and 1/2 of the way through N you
> are going to get a fault, and you return SEGFAULT, now how can I ever
> find out that N/2 bytes actually made it out to me?  I want to know that.
> If you are arguing that return N/2 is wrong, you are incorrect. 

You passed bad pointer to the kernel. You are broken. If you want to know
that N/2 was delivered, fix your code not to pass bad pointers.
								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-19  3:34             ` Linus Torvalds
@ 2002-05-16 23:53               ` Pavel Machek
  2002-05-21 20:47                 ` Linus Torvalds
  0 siblings, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2002-05-16 23:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rusty Russell, linux-kernel, alan

Hi!

> > Um, what about delivering a SIGSEGV?  So, copy_to/from_user always
> > returns 0, but a signal is delivered.
> 
> That doesn't help. It's against some stupid SUS rule, I'm afraid.
> 
> (And THAT is a stupid rule, I 100% agree with. It means that some things
> return -EFAULT, and other things do SIGSEGV, and the only difference is
> whether something is a system call or is implemented as a library thing.
> UNIX should always just have segfaulted, but there you are..)

I thought POSIX made it explicit that you may SIGSEGV or EFAULT at your
option. If that SUS rule is stupid, we should just drop it.

Performance advantage from MMX-copy-to-user is probably well worth it.

Ouch, and your read example. Imagine you do read (fd, buf, 12000), and first
page of buf is there, second is not and third is [could happen in your GC
case]. What if copy-to-user decides to first write byte 11045 of buffer, 
then byte 17, then byte 4875 (and fault)? I think kernel *has* right to 
do that. What will it use as return value?

I think such GC library is seriously broken.
								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: AUDIT: copy_from_user is a deathtrap.
  2002-05-18  2:57     ` Alan Cox
@ 2002-05-16 23:27       ` Pavel Machek
  0 siblings, 0 replies; 83+ messages in thread
From: Pavel Machek @ 2002-05-16 23:27 UTC (permalink / raw)
  To: Alan Cox; +Cc: Rusty Russell, Pete Zaitcev, linux-kernel

Hi!

> > > and foolish to program without reading interface specifications
> > > or the code. It did not occur to me to shift the blame onto
> > > copy_from_user creators.
> > 
> > Please send me your mailing address.  I shall send you a copy of
> > "Design of Everyday Things" (Donald A Norman).  You should not blame
> > yourself for others' bad design.
> 
> By extrapolation perhaps you'd also care to reimplement it in terms of
> exceptions, refcount the objects with an object based primitive to do the
> transfers, garbage collect to remove memory leaks and implement strings and
> variable size buffers as base objects.

Actually, no. 0 vs. -ERRNO is common in kernel, objects are not.
									Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2002-05-23 11:26 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-17  9:27 AUDIT: copy_from_user is a deathtrap Rusty Russell
2002-05-17  9:21 ` David S. Miller
2002-05-17  9:49   ` Rusty Russell
2002-05-17  9:35     ` David S. Miller
2002-05-17 12:26       ` Rusty Russell
2002-05-17 17:42         ` Denis Vlasenko
2002-05-17 12:17     ` Alan Cox
2002-05-17 12:21       ` Rusty Russell
2002-05-17 12:58         ` Alan Cox
2002-05-17 12:58           ` Rusty Russell
2002-05-17 13:13             ` John Levon
2002-05-17 14:52             ` Alan Cox
2002-05-18  1:26               ` Rusty Russell
2002-05-17 17:58             ` Denis Vlasenko
2002-05-18  2:37     ` Linus Torvalds
2002-05-18 15:06       ` John Alvord
2002-05-17 10:20 ` Christoph Hellwig
     [not found] <mailman.1021642692.12772.linux-kernel2news@redhat.com>
2002-05-17 17:36 ` Pete Zaitcev
2002-05-18  1:05   ` Rusty Russell
2002-05-18  2:57     ` Alan Cox
2002-05-16 23:27       ` Pavel Machek
     [not found] ` <200205191212.g4JCCLY25867@Port.imtp.ilyichevsk.odessa.ua>
     [not found]   ` <20020520112232.A8983@devserv.devel.redhat.com>
2002-05-21 10:57     ` Denis Vlasenko
2002-05-21  6:21       ` Arnaldo Carvalho de Melo
2002-05-21  8:33         ` Christoph Hellwig
2002-05-21 19:02           ` Albert D. Cahalan
2002-05-22 14:27         ` Denis Vlasenko
     [not found] <E178eMm-0000NO-00@wagner.rustcorp.com.au.suse.lists.linux.kernel>
     [not found] ` <Pine.LNX.4.44.0205171936220.1524-100000@home.transmeta.com.suse.lists.linux.kernel>
2002-05-18 10:16   ` Andi Kleen
2002-05-18 16:14     ` Linus Torvalds
2002-05-19  2:10       ` Rusty Russell
2002-05-19  3:01         ` Linus Torvalds
2002-05-19  3:05           ` Larry McVoy
2002-05-19  4:01             ` Rusty Russell
2002-05-19  4:02               ` Larry McVoy
2002-05-16 23:56                 ` Pavel Machek
2002-05-16 23:56                 ` Pavel Machek
2002-05-19  3:31           ` Rusty Russell
2002-05-19  3:34             ` Linus Torvalds
2002-05-16 23:53               ` Pavel Machek
2002-05-21 20:47                 ` Linus Torvalds
2002-05-21 21:17                   ` Pavel Machek
2002-05-21 21:25                     ` Linus Torvalds
2002-05-21 21:44                     ` Alan Cox
2002-05-21 21:46                       ` Andrew Morton
2002-05-21 22:04                         ` Linus Torvalds
2002-05-21 22:21                           ` Pavel Machek
2002-05-22 13:47                             ` Alan Cox
2002-05-22 14:13                               ` Pavel Machek
2002-05-22 14:54                                 ` Alan Cox
2002-05-22 14:42                                   ` Pavel Machek
2002-05-22 15:27                                     ` Alan Cox
2002-05-22 18:58                                   ` Kasper Dupont
2002-05-22 22:02                                     ` Alan Cox
2002-05-23  3:54                                   ` Rusty Russell
2002-05-23 11:15                                     ` Edgar Toernig
2002-05-22 16:09                                 ` Linus Torvalds
2002-05-22 20:28                                   ` Pavel Machek
2002-05-22  0:47                         ` Andrea Arcangeli
2002-05-22  5:01                         ` Rusty Russell
2002-05-22  6:28                         ` Rusty Russell
2002-05-22  4:57                       ` Rusty Russell
2002-05-22 13:30                         ` Alan Cox
2002-05-22 18:43                     ` Marco Colombo
2002-05-19 20:23       ` Edgar Toernig
2002-05-19 22:44         ` Alan Cox
2002-05-19  3:38 Rusty Russell
2002-05-19  5:23 ` Linus Torvalds
2002-05-17  0:00   ` Pavel Machek
2002-05-18 21:47   ` Benjamin Herrenschmidt
2002-05-19 12:22     ` Alan Cox
2002-05-19 18:29     ` Linus Torvalds
2002-05-19 19:57       ` Roman Zippel
2002-05-20  2:06       ` Rusty Russell
2002-05-20  2:54         ` Linus Torvalds
2002-05-20  4:53           ` Rusty Russell
2002-05-19 20:12             ` Arnaldo Carvalho de Melo
2002-05-20 16:00             ` Linus Torvalds
2002-05-19 11:41   ` Alan Cox
     [not found] <Pine.LNX.4.44.0205191951460.22433-100000@home.transmeta.com.suse.lists.linux.kernel>
     [not found] ` <E179fAd-0005vs-00@wagner.rustcorp.com.au.suse.lists.linux.kernel>
2002-05-20 10:59   ` Andi Kleen
2002-05-22 10:08 Petr Vandrovec
2002-05-22 16:23 ` Denis Vlasenko
2002-05-22 13:40 Petr Vandrovec
2002-05-22 18:58 ` Denis Vlasenko
2002-05-22 14:13   ` Ruth Ivimey-Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).