All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] devmem: handle partial kmem write/read
@ 2009-09-14  3:29 Wu Fengguang
  2009-09-14  4:34 ` Wu Fengguang
  2009-09-15  7:58 ` Question: how to handle too big f_pos " KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 32+ messages in thread
From: Wu Fengguang @ 2009-09-14  3:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mtosatti, gregkh, broonie, johannes, avi, andi, linux-kernel

Current vwrite silently ignores vwrite() to vmap holes.
Better behavior should be stop the write and return
on such holes.

Also return on partial read, which may happen in future
(eg. hit hwpoison pages).

CC: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 drivers/char/mem.c |   30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

--- linux-mm.orig/drivers/char/mem.c	2009-09-14 10:59:50.000000000 +0800
+++ linux-mm/drivers/char/mem.c	2009-09-14 11:06:25.000000000 +0800
@@ -444,18 +444,22 @@ static ssize_t read_kmem(struct file *fi
 		if (!kbuf)
 			return -ENOMEM;
 		while (count > 0) {
+			unsigned long n;
+
 			sz = size_inside_page(p, count);
-			sz = vread(kbuf, (char *)p, sz);
-			if (!sz)
+			n = vread(kbuf, (char *)p, sz);
+			if (!n)
 				break;
-			if (copy_to_user(buf, kbuf, sz)) {
+			if (copy_to_user(buf, kbuf, n)) {
 				free_page((unsigned long)kbuf);
 				return -EFAULT;
 			}
-			count -= sz;
-			buf += sz;
-			read += sz;
-			p += sz;
+			count -= n;
+			buf += n;
+			read += n;
+			p += n;
+			if (n < sz)
+				break;
 		}
 		free_page((unsigned long)kbuf);
 	}
@@ -551,11 +555,13 @@ static ssize_t write_kmem(struct file * 
 				free_page((unsigned long)kbuf);
 				return -EFAULT;
 			}
-			sz = vwrite(kbuf, (char *)p, sz);
-			count -= sz;
-			buf += sz;
-			virtr += sz;
-			p += sz;
+			n = vwrite(kbuf, (char *)p, sz);
+			count -= n;
+			buf += n;
+			virtr += n;
+			p += n;
+			if (n < sz)
+				break;
 		}
 		free_page((unsigned long)kbuf);
 	}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] devmem: handle partial kmem write/read
  2009-09-14  3:29 [PATCH] devmem: handle partial kmem write/read Wu Fengguang
@ 2009-09-14  4:34 ` Wu Fengguang
  2009-09-15  0:24   ` KAMEZAWA Hiroyuki
  2009-09-15  7:58 ` Question: how to handle too big f_pos " KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 32+ messages in thread
From: Wu Fengguang @ 2009-09-14  4:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mtosatti, gregkh, broonie, johannes, avi, andi, linux-kernel,
	KAMEZAWA Hiroyuki, WANG Cong, Mike Smith, Nick Piggin

Hi Kame,

This patch needs more work. I first intent to fix a bug:

                        sz = vwrite(kbuf, (char *)p, sz);
                        p += sz;
                }

So if the returned len is 0, the kbuf/p pointers will mismatch. 

Then I realize it changed the write behavior. The current vwrite()
behavior is strange, it returns 0 if _whole range_ is hole, otherwise
ignores the hole silently. So holes can be treated differently even in
the original code.

I'm not really sure about the right behavior. KAME-san, do you have
any suggestions?

Thanks,
Fengguang

On Mon, Sep 14, 2009 at 11:29:01AM +0800, Wu Fengguang wrote:
> Current vwrite silently ignores vwrite() to vmap holes.
> Better behavior should be stop the write and return
> on such holes.
> 
> Also return on partial read, which may happen in future
> (eg. hit hwpoison pages).
> 
> CC: Andi Kleen <andi@firstfloor.org>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  drivers/char/mem.c |   30 ++++++++++++++++++------------
>  1 file changed, 18 insertions(+), 12 deletions(-)
> 
> --- linux-mm.orig/drivers/char/mem.c	2009-09-14 10:59:50.000000000 +0800
> +++ linux-mm/drivers/char/mem.c	2009-09-14 11:06:25.000000000 +0800
> @@ -444,18 +444,22 @@ static ssize_t read_kmem(struct file *fi
>  		if (!kbuf)
>  			return -ENOMEM;
>  		while (count > 0) {
> +			unsigned long n;
> +
>  			sz = size_inside_page(p, count);
> -			sz = vread(kbuf, (char *)p, sz);
> -			if (!sz)
> +			n = vread(kbuf, (char *)p, sz);
> +			if (!n)
>  				break;
> -			if (copy_to_user(buf, kbuf, sz)) {
> +			if (copy_to_user(buf, kbuf, n)) {
>  				free_page((unsigned long)kbuf);
>  				return -EFAULT;
>  			}
> -			count -= sz;
> -			buf += sz;
> -			read += sz;
> -			p += sz;
> +			count -= n;
> +			buf += n;
> +			read += n;
> +			p += n;
> +			if (n < sz)
> +				break;
>  		}
>  		free_page((unsigned long)kbuf);
>  	}
> @@ -551,11 +555,13 @@ static ssize_t write_kmem(struct file * 
>  				free_page((unsigned long)kbuf);
>  				return -EFAULT;
>  			}
> -			sz = vwrite(kbuf, (char *)p, sz);
> -			count -= sz;
> -			buf += sz;
> -			virtr += sz;
> -			p += sz;
> +			n = vwrite(kbuf, (char *)p, sz);
> +			count -= n;
> +			buf += n;
> +			virtr += n;
> +			p += n;
> +			if (n < sz)
> +				break;
>  		}
>  		free_page((unsigned long)kbuf);
>  	}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] devmem: handle partial kmem write/read
  2009-09-14  4:34 ` Wu Fengguang
@ 2009-09-15  0:24   ` KAMEZAWA Hiroyuki
  2009-09-15  1:52     ` KAMEZAWA Hiroyuki
  2009-09-15  2:02     ` Wu Fengguang
  0 siblings, 2 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-15  0:24 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, mtosatti, gregkh, broonie, johannes, avi, andi,
	linux-kernel, WANG Cong, Mike Smith, Nick Piggin

On Mon, 14 Sep 2009 12:34:44 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> Hi Kame,
> 
> This patch needs more work. I first intent to fix a bug:
> 
>                         sz = vwrite(kbuf, (char *)p, sz);
>                         p += sz;
>                 }
> 
> So if the returned len is 0, the kbuf/p pointers will mismatch. 
> 
> Then I realize it changed the write behavior. The current vwrite()
> behavior is strange, it returns 0 if _whole range_ is hole, otherwise
> ignores the hole silently. So holes can be treated differently even in
> the original code.
> 
Ah, ok...

> I'm not really sure about the right behavior. KAME-san, do you have
> any suggestions?
> 
maybe following make sense.
=
written = vwrite(kbuf, (char *p), sz);
if (!written) // whole vmem was a hole
	written = sz;
==
needs fix.

Anyway, I wonder kmem is broken. It's should be totally rewritten.

For example, this doesn't check anything.
==
        if (p < (unsigned long) high_memory) {

==

But....are there users ?
If necessary, I'll write some...

Thanks,
-Kame


> Thanks,
> Fengguang
> 
> On Mon, Sep 14, 2009 at 11:29:01AM +0800, Wu Fengguang wrote:
> > Current vwrite silently ignores vwrite() to vmap holes.
> > Better behavior should be stop the write and return
> > on such holes.
> > 
> > Also return on partial read, which may happen in future
> > (eg. hit hwpoison pages).
> > 
> > CC: Andi Kleen <andi@firstfloor.org>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> >  drivers/char/mem.c |   30 ++++++++++++++++++------------
> >  1 file changed, 18 insertions(+), 12 deletions(-)
> > 
> > --- linux-mm.orig/drivers/char/mem.c	2009-09-14 10:59:50.000000000 +0800
> > +++ linux-mm/drivers/char/mem.c	2009-09-14 11:06:25.000000000 +0800
> > @@ -444,18 +444,22 @@ static ssize_t read_kmem(struct file *fi
> >  		if (!kbuf)
> >  			return -ENOMEM;
> >  		while (count > 0) {
> > +			unsigned long n;
> > +
> >  			sz = size_inside_page(p, count);
> > -			sz = vread(kbuf, (char *)p, sz);
> > -			if (!sz)
> > +			n = vread(kbuf, (char *)p, sz);
> > +			if (!n)
> >  				break;
> > -			if (copy_to_user(buf, kbuf, sz)) {
> > +			if (copy_to_user(buf, kbuf, n)) {
> >  				free_page((unsigned long)kbuf);
> >  				return -EFAULT;
> >  			}
> > -			count -= sz;
> > -			buf += sz;
> > -			read += sz;
> > -			p += sz;
> > +			count -= n;
> > +			buf += n;
> > +			read += n;
> > +			p += n;
> > +			if (n < sz)
> > +				break;
> >  		}
> >  		free_page((unsigned long)kbuf);
> >  	}
> > @@ -551,11 +555,13 @@ static ssize_t write_kmem(struct file * 
> >  				free_page((unsigned long)kbuf);
> >  				return -EFAULT;
> >  			}
> > -			sz = vwrite(kbuf, (char *)p, sz);
> > -			count -= sz;
> > -			buf += sz;
> > -			virtr += sz;
> > -			p += sz;
> > +			n = vwrite(kbuf, (char *)p, sz);
> > +			count -= n;
> > +			buf += n;
> > +			virtr += n;
> > +			p += n;
> > +			if (n < sz)
> > +				break;
> >  		}
> >  		free_page((unsigned long)kbuf);
> >  	}
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] devmem: handle partial kmem write/read
  2009-09-15  0:24   ` KAMEZAWA Hiroyuki
@ 2009-09-15  1:52     ` KAMEZAWA Hiroyuki
  2009-09-15  2:05       ` Wu Fengguang
  2009-09-15  2:02     ` Wu Fengguang
  1 sibling, 1 reply; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-15  1:52 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, Andrew Morton, mtosatti, gregkh, broonie, johannes,
	avi, andi, linux-kernel, WANG Cong, Mike Smith, Nick Piggin

On Tue, 15 Sep 2009 09:24:48 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Mon, 14 Sep 2009 12:34:44 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > Hi Kame,
> > 
> > This patch needs more work. I first intent to fix a bug:
> > 
> >                         sz = vwrite(kbuf, (char *)p, sz);
> >                         p += sz;
> >                 }
> > 
> > So if the returned len is 0, the kbuf/p pointers will mismatch. 
> > 
> > Then I realize it changed the write behavior. The current vwrite()
> > behavior is strange, it returns 0 if _whole range_ is hole, otherwise
> > ignores the hole silently. So holes can be treated differently even in
> > the original code.
> > 
> Ah, ok...
> 

I'd like to post a fix...but it seems you are woriking.
Hmm, I'll post a patch including your fix (and your sign.)
Is it ok ?

Thanks,
-Kame


> > I'm not really sure about the right behavior. KAME-san, do you have
> > any suggestions?
> > 
> maybe following make sense.
> =
> written = vwrite(kbuf, (char *p), sz);
> if (!written) // whole vmem was a hole
> 	written = sz;
> ==
> needs fix.
> 
> Anyway, I wonder kmem is broken. It's should be totally rewritten.
> 
> For example, this doesn't check anything.
> ==
>         if (p < (unsigned long) high_memory) {
> 
> ==
> 
> But....are there users ?
> If necessary, I'll write some...
> 
> Thanks,
> -Kame
> 
> 
> > Thanks,
> > Fengguang
> > 
> > On Mon, Sep 14, 2009 at 11:29:01AM +0800, Wu Fengguang wrote:
> > > Current vwrite silently ignores vwrite() to vmap holes.
> > > Better behavior should be stop the write and return
> > > on such holes.
> > > 
> > > Also return on partial read, which may happen in future
> > > (eg. hit hwpoison pages).
> > > 
> > > CC: Andi Kleen <andi@firstfloor.org>
> > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > ---
> > >  drivers/char/mem.c |   30 ++++++++++++++++++------------
> > >  1 file changed, 18 insertions(+), 12 deletions(-)
> > > 
> > > --- linux-mm.orig/drivers/char/mem.c	2009-09-14 10:59:50.000000000 +0800
> > > +++ linux-mm/drivers/char/mem.c	2009-09-14 11:06:25.000000000 +0800
> > > @@ -444,18 +444,22 @@ static ssize_t read_kmem(struct file *fi
> > >  		if (!kbuf)
> > >  			return -ENOMEM;
> > >  		while (count > 0) {
> > > +			unsigned long n;
> > > +
> > >  			sz = size_inside_page(p, count);
> > > -			sz = vread(kbuf, (char *)p, sz);
> > > -			if (!sz)
> > > +			n = vread(kbuf, (char *)p, sz);
> > > +			if (!n)
> > >  				break;
> > > -			if (copy_to_user(buf, kbuf, sz)) {
> > > +			if (copy_to_user(buf, kbuf, n)) {
> > >  				free_page((unsigned long)kbuf);
> > >  				return -EFAULT;
> > >  			}
> > > -			count -= sz;
> > > -			buf += sz;
> > > -			read += sz;
> > > -			p += sz;
> > > +			count -= n;
> > > +			buf += n;
> > > +			read += n;
> > > +			p += n;
> > > +			if (n < sz)
> > > +				break;
> > >  		}
> > >  		free_page((unsigned long)kbuf);
> > >  	}
> > > @@ -551,11 +555,13 @@ static ssize_t write_kmem(struct file * 
> > >  				free_page((unsigned long)kbuf);
> > >  				return -EFAULT;
> > >  			}
> > > -			sz = vwrite(kbuf, (char *)p, sz);
> > > -			count -= sz;
> > > -			buf += sz;
> > > -			virtr += sz;
> > > -			p += sz;
> > > +			n = vwrite(kbuf, (char *)p, sz);
> > > +			count -= n;
> > > +			buf += n;
> > > +			virtr += n;
> > > +			p += n;
> > > +			if (n < sz)
> > > +				break;
> > >  		}
> > >  		free_page((unsigned long)kbuf);
> > >  	}
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] devmem: handle partial kmem write/read
  2009-09-15  0:24   ` KAMEZAWA Hiroyuki
  2009-09-15  1:52     ` KAMEZAWA Hiroyuki
@ 2009-09-15  2:02     ` Wu Fengguang
  2009-09-15  2:31       ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 32+ messages in thread
From: Wu Fengguang @ 2009-09-15  2:02 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, mtosatti, gregkh, broonie, johannes, avi, andi,
	linux-kernel, WANG Cong, Mike Smith, Nick Piggin

On Tue, Sep 15, 2009 at 08:24:48AM +0800, KAMEZAWA Hiroyuki wrote:
> On Mon, 14 Sep 2009 12:34:44 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > Hi Kame,
> > 
> > This patch needs more work. I first intent to fix a bug:
> > 
> >                         sz = vwrite(kbuf, (char *)p, sz);
> >                         p += sz;
> >                 }
> > 
> > So if the returned len is 0, the kbuf/p pointers will mismatch. 
> > 
> > Then I realize it changed the write behavior. The current vwrite()
> > behavior is strange, it returns 0 if _whole range_ is hole, otherwise
> > ignores the hole silently. So holes can be treated differently even in
> > the original code.
> > 
> Ah, ok...
> 
> > I'm not really sure about the right behavior. KAME-san, do you have
> > any suggestions?
> > 
> maybe following make sense.
> =
> written = vwrite(kbuf, (char *p), sz);
> if (!written) // whole vmem was a hole
> 	written = sz;

Since the 0 return value won't be used at all, it would be simpler to
tell vwrite() return the untouched buflen/sz, like this. It will ignore
_all_ the holes silently. Need to update comments too.

--- linux-mm.orig/mm/vmalloc.c  2009-09-15 09:40:08.000000000 +0800                                                                  
+++ linux-mm/mm/vmalloc.c       2009-09-15 09:43:33.000000000 +0800                                                                  
@@ -1834,7 +1834,6 @@ long vwrite(char *buf, char *addr, unsig                                                                       
        struct vm_struct *tmp;                                                                                                       
        char *vaddr;                                                                                                                 
        unsigned long n, buflen;                                                                                                     
-       int copied = 0;                                                                                                              
                                                                                                                                     
        /* Don't allow overflow */                                                                                                   
        if ((unsigned long) addr + count < count)                                                                                    
@@ -1858,7 +1857,6 @@ long vwrite(char *buf, char *addr, unsig                                                                       
                        n = count;                                                                                                   
                if (!(tmp->flags & VM_IOREMAP)) {                                                                                    
                        aligned_vwrite(buf, addr, n);                                                                                
-                       copied++;                                                                                                    
                }                                                                                                                    
                buf += n;                                                                                                            
                addr += n;                                                                                                           
@@ -1866,8 +1864,6 @@ long vwrite(char *buf, char *addr, unsig                                                                       
        }                                                                                                                            
 finished:                                                                                                                           
        read_unlock(&vmlist_lock);                                                                                                   
-       if (!copied)                                                                                                                 
-               return 0;                                                                                                            
        return buflen;                                                                                                               
 }                                                                                                                                   

> ==
> needs fix.
> 
> Anyway, I wonder kmem is broken. It's should be totally rewritten.
> 
> For example, this doesn't check anything.
> ==
>         if (p < (unsigned long) high_memory) {
> 
> ==
> 
> But....are there users ?
> If necessary, I'll write some...

I'm trying to stop possible mem/kmem users to access hwpoison pages.. 
I'm not the user, but rather a tester ;)

Thanks,
Fengguang

> Thanks,
> -Kame
> 
> 
> > Thanks,
> > Fengguang
> > 
> > On Mon, Sep 14, 2009 at 11:29:01AM +0800, Wu Fengguang wrote:
> > > Current vwrite silently ignores vwrite() to vmap holes.
> > > Better behavior should be stop the write and return
> > > on such holes.
> > > 
> > > Also return on partial read, which may happen in future
> > > (eg. hit hwpoison pages).
> > > 
> > > CC: Andi Kleen <andi@firstfloor.org>
> > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > ---
> > >  drivers/char/mem.c |   30 ++++++++++++++++++------------
> > >  1 file changed, 18 insertions(+), 12 deletions(-)
> > > 
> > > --- linux-mm.orig/drivers/char/mem.c	2009-09-14 10:59:50.000000000 +0800
> > > +++ linux-mm/drivers/char/mem.c	2009-09-14 11:06:25.000000000 +0800
> > > @@ -444,18 +444,22 @@ static ssize_t read_kmem(struct file *fi
> > >  		if (!kbuf)
> > >  			return -ENOMEM;
> > >  		while (count > 0) {
> > > +			unsigned long n;
> > > +
> > >  			sz = size_inside_page(p, count);
> > > -			sz = vread(kbuf, (char *)p, sz);
> > > -			if (!sz)
> > > +			n = vread(kbuf, (char *)p, sz);
> > > +			if (!n)
> > >  				break;
> > > -			if (copy_to_user(buf, kbuf, sz)) {
> > > +			if (copy_to_user(buf, kbuf, n)) {
> > >  				free_page((unsigned long)kbuf);
> > >  				return -EFAULT;
> > >  			}
> > > -			count -= sz;
> > > -			buf += sz;
> > > -			read += sz;
> > > -			p += sz;
> > > +			count -= n;
> > > +			buf += n;
> > > +			read += n;
> > > +			p += n;
> > > +			if (n < sz)
> > > +				break;
> > >  		}
> > >  		free_page((unsigned long)kbuf);
> > >  	}
> > > @@ -551,11 +555,13 @@ static ssize_t write_kmem(struct file * 
> > >  				free_page((unsigned long)kbuf);
> > >  				return -EFAULT;
> > >  			}
> > > -			sz = vwrite(kbuf, (char *)p, sz);
> > > -			count -= sz;
> > > -			buf += sz;
> > > -			virtr += sz;
> > > -			p += sz;
> > > +			n = vwrite(kbuf, (char *)p, sz);
> > > +			count -= n;
> > > +			buf += n;
> > > +			virtr += n;
> > > +			p += n;
> > > +			if (n < sz)
> > > +				break;
> > >  		}
> > >  		free_page((unsigned long)kbuf);
> > >  	}
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] devmem: handle partial kmem write/read
  2009-09-15  1:52     ` KAMEZAWA Hiroyuki
@ 2009-09-15  2:05       ` Wu Fengguang
  0 siblings, 0 replies; 32+ messages in thread
From: Wu Fengguang @ 2009-09-15  2:05 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, mtosatti, gregkh, broonie, johannes, avi, andi,
	linux-kernel, WANG Cong, Mike Smith, Nick Piggin

On Tue, Sep 15, 2009 at 09:52:55AM +0800, KAMEZAWA Hiroyuki wrote:
> On Tue, 15 Sep 2009 09:24:48 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > On Mon, 14 Sep 2009 12:34:44 +0800
> > Wu Fengguang <fengguang.wu@intel.com> wrote:
> > 
> > > Hi Kame,
> > > 
> > > This patch needs more work. I first intent to fix a bug:
> > > 
> > >                         sz = vwrite(kbuf, (char *)p, sz);
> > >                         p += sz;
> > >                 }
> > > 
> > > So if the returned len is 0, the kbuf/p pointers will mismatch. 
> > > 
> > > Then I realize it changed the write behavior. The current vwrite()
> > > behavior is strange, it returns 0 if _whole range_ is hole, otherwise
> > > ignores the hole silently. So holes can be treated differently even in
> > > the original code.
> > > 
> > Ah, ok...
> > 
> 
> I'd like to post a fix...but it seems you are woriking.
> Hmm, I'll post a patch including your fix (and your sign.)
> Is it ok ?

Sure OK. You can work based on my patches. Many of them are in -mm now
and there are three more. I'll show you in a series.

Thanks,
Fengguang

> > > I'm not really sure about the right behavior. KAME-san, do you have
> > > any suggestions?
> > > 
> > maybe following make sense.
> > =
> > written = vwrite(kbuf, (char *p), sz);
> > if (!written) // whole vmem was a hole
> > 	written = sz;
> > ==
> > needs fix.
> > 
> > Anyway, I wonder kmem is broken. It's should be totally rewritten.
> > 
> > For example, this doesn't check anything.
> > ==
> >         if (p < (unsigned long) high_memory) {
> > 
> > ==
> > 
> > But....are there users ?
> > If necessary, I'll write some...
> > 
> > Thanks,
> > -Kame
> > 
> > 
> > > Thanks,
> > > Fengguang
> > > 
> > > On Mon, Sep 14, 2009 at 11:29:01AM +0800, Wu Fengguang wrote:
> > > > Current vwrite silently ignores vwrite() to vmap holes.
> > > > Better behavior should be stop the write and return
> > > > on such holes.
> > > > 
> > > > Also return on partial read, which may happen in future
> > > > (eg. hit hwpoison pages).
> > > > 
> > > > CC: Andi Kleen <andi@firstfloor.org>
> > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > > ---
> > > >  drivers/char/mem.c |   30 ++++++++++++++++++------------
> > > >  1 file changed, 18 insertions(+), 12 deletions(-)
> > > > 
> > > > --- linux-mm.orig/drivers/char/mem.c	2009-09-14 10:59:50.000000000 +0800
> > > > +++ linux-mm/drivers/char/mem.c	2009-09-14 11:06:25.000000000 +0800
> > > > @@ -444,18 +444,22 @@ static ssize_t read_kmem(struct file *fi
> > > >  		if (!kbuf)
> > > >  			return -ENOMEM;
> > > >  		while (count > 0) {
> > > > +			unsigned long n;
> > > > +
> > > >  			sz = size_inside_page(p, count);
> > > > -			sz = vread(kbuf, (char *)p, sz);
> > > > -			if (!sz)
> > > > +			n = vread(kbuf, (char *)p, sz);
> > > > +			if (!n)
> > > >  				break;
> > > > -			if (copy_to_user(buf, kbuf, sz)) {
> > > > +			if (copy_to_user(buf, kbuf, n)) {
> > > >  				free_page((unsigned long)kbuf);
> > > >  				return -EFAULT;
> > > >  			}
> > > > -			count -= sz;
> > > > -			buf += sz;
> > > > -			read += sz;
> > > > -			p += sz;
> > > > +			count -= n;
> > > > +			buf += n;
> > > > +			read += n;
> > > > +			p += n;
> > > > +			if (n < sz)
> > > > +				break;
> > > >  		}
> > > >  		free_page((unsigned long)kbuf);
> > > >  	}
> > > > @@ -551,11 +555,13 @@ static ssize_t write_kmem(struct file * 
> > > >  				free_page((unsigned long)kbuf);
> > > >  				return -EFAULT;
> > > >  			}
> > > > -			sz = vwrite(kbuf, (char *)p, sz);
> > > > -			count -= sz;
> > > > -			buf += sz;
> > > > -			virtr += sz;
> > > > -			p += sz;
> > > > +			n = vwrite(kbuf, (char *)p, sz);
> > > > +			count -= n;
> > > > +			buf += n;
> > > > +			virtr += n;
> > > > +			p += n;
> > > > +			if (n < sz)
> > > > +				break;
> > > >  		}
> > > >  		free_page((unsigned long)kbuf);
> > > >  	}
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> > > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] devmem: handle partial kmem write/read
  2009-09-15  2:02     ` Wu Fengguang
@ 2009-09-15  2:31       ` KAMEZAWA Hiroyuki
  2009-09-15  2:57         ` Wu Fengguang
  0 siblings, 1 reply; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-15  2:31 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, mtosatti, gregkh, broonie, johannes, avi, andi,
	linux-kernel, WANG Cong, Mike Smith, Nick Piggin

On Tue, 15 Sep 2009 10:02:08 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> On Tue, Sep 15, 2009 at 08:24:48AM +0800, KAMEZAWA Hiroyuki wrote:
> > On Mon, 14 Sep 2009 12:34:44 +0800
> > Wu Fengguang <fengguang.wu@intel.com> wrote:
> > 
> > > Hi Kame,
> > > 
> > > This patch needs more work. I first intent to fix a bug:
> > > 
> > >                         sz = vwrite(kbuf, (char *)p, sz);
> > >                         p += sz;
> > >                 }
> > > 
> > > So if the returned len is 0, the kbuf/p pointers will mismatch. 
> > > 
> > > Then I realize it changed the write behavior. The current vwrite()
> > > behavior is strange, it returns 0 if _whole range_ is hole, otherwise
> > > ignores the hole silently. So holes can be treated differently even in
> > > the original code.
> > > 
> > Ah, ok...
> > 
> > > I'm not really sure about the right behavior. KAME-san, do you have
> > > any suggestions?
> > > 
> > maybe following make sense.
> > =
> > written = vwrite(kbuf, (char *p), sz);
> > if (!written) // whole vmem was a hole
> > 	written = sz;
> 
> Since the 0 return value won't be used at all, it would be simpler to
> tell vwrite() return the untouched buflen/sz, like this. It will ignore
> _all_ the holes silently. Need to update comments too.
> 

Hmm. IIUC the original kmem code returns immediately if vread/vwrite returns 0.
But it seems this depends on wrong assumption that vmap area is continuous,
at the first look.

In man(4) mem,kmem
==
       Byte  addresses  in  mem  are interpreted as physical memory addresses.
       References to nonexistent locations cause errors to be returned.
	.....
       The file kmem is the same as mem, except that the kernel virtual memory
       rather than physical memory is accessed.

==

Then, we have to return error for accesses to "nonexistent locations".

memory-hole in vmap area is ....."nonexistent" ? 
I think it's nonexistent if there are no overlaps between requested [pos, pos+len)
and registerred vmalloc area.

But, hmm, there are no way for users to know "existing vmalloc area".
Then, my above idea may be wrong.

Then, I'd like to modify as following,

  - If is_vmalloc_or_module_addr(requested address) is false, return -EFAULT.
  - If is_vmalloc_or_module_addr(requested address) is true, return no error.
    Even if specified range doesn't include no exsiting vmalloc area.

How do you think ?

Thanks,
-Kame
p.s. I wonder current /dev/kmem cannot read text area of kernel if it's not
directly mapped.



> --- linux-mm.orig/mm/vmalloc.c  2009-09-15 09:40:08.000000000 +0800                                                                  
> +++ linux-mm/mm/vmalloc.c       2009-09-15 09:43:33.000000000 +0800                                                                  
> @@ -1834,7 +1834,6 @@ long vwrite(char *buf, char *addr, unsig                                                                       
>         struct vm_struct *tmp;                                                                                                       
>         char *vaddr;                                                                                                                 
>         unsigned long n, buflen;                                                                                                     
> -       int copied = 0;                                                                                                              
>                                                                                                                                      
>         /* Don't allow overflow */                                                                                                   
>         if ((unsigned long) addr + count < count)                                                                                    
> @@ -1858,7 +1857,6 @@ long vwrite(char *buf, char *addr, unsig                                                                       
>                         n = count;                                                                                                   
>                 if (!(tmp->flags & VM_IOREMAP)) {                                                                                    
>                         aligned_vwrite(buf, addr, n);                                                                                
> -                       copied++;                                                                                                    
>                 }                                                                                                                    
>                 buf += n;                                                                                                            
>                 addr += n;                                                                                                           
> @@ -1866,8 +1864,6 @@ long vwrite(char *buf, char *addr, unsig                                                                       
>         }                                                                                                                            
>  finished:                                                                                                                           
>         read_unlock(&vmlist_lock);                                                                                                   
> -       if (!copied)                                                                                                                 
> -               return 0;                                                                                                            
>         return buflen;                                                                                                               
>  }                                                                                                                                   
> 
> > ==
> > needs fix.
> > 
> > Anyway, I wonder kmem is broken. It's should be totally rewritten.
> > 
> > For example, this doesn't check anything.
> > ==
> >         if (p < (unsigned long) high_memory) {
> > 
> > ==
> > 
> > But....are there users ?
> > If necessary, I'll write some...
> 
> I'm trying to stop possible mem/kmem users to access hwpoison pages.. 
> I'm not the user, but rather a tester ;)
> 
> Thanks,
> Fengguang
> 
> > Thanks,
> > -Kame
> > 
> > 
> > > Thanks,
> > > Fengguang
> > > 
> > > On Mon, Sep 14, 2009 at 11:29:01AM +0800, Wu Fengguang wrote:
> > > > Current vwrite silently ignores vwrite() to vmap holes.
> > > > Better behavior should be stop the write and return
> > > > on such holes.
> > > > 
> > > > Also return on partial read, which may happen in future
> > > > (eg. hit hwpoison pages).
> > > > 
> > > > CC: Andi Kleen <andi@firstfloor.org>
> > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > > ---
> > > >  drivers/char/mem.c |   30 ++++++++++++++++++------------
> > > >  1 file changed, 18 insertions(+), 12 deletions(-)
> > > > 
> > > > --- linux-mm.orig/drivers/char/mem.c	2009-09-14 10:59:50.000000000 +0800
> > > > +++ linux-mm/drivers/char/mem.c	2009-09-14 11:06:25.000000000 +0800
> > > > @@ -444,18 +444,22 @@ static ssize_t read_kmem(struct file *fi
> > > >  		if (!kbuf)
> > > >  			return -ENOMEM;
> > > >  		while (count > 0) {
> > > > +			unsigned long n;
> > > > +
> > > >  			sz = size_inside_page(p, count);
> > > > -			sz = vread(kbuf, (char *)p, sz);
> > > > -			if (!sz)
> > > > +			n = vread(kbuf, (char *)p, sz);
> > > > +			if (!n)
> > > >  				break;
> > > > -			if (copy_to_user(buf, kbuf, sz)) {
> > > > +			if (copy_to_user(buf, kbuf, n)) {
> > > >  				free_page((unsigned long)kbuf);
> > > >  				return -EFAULT;
> > > >  			}
> > > > -			count -= sz;
> > > > -			buf += sz;
> > > > -			read += sz;
> > > > -			p += sz;
> > > > +			count -= n;
> > > > +			buf += n;
> > > > +			read += n;
> > > > +			p += n;
> > > > +			if (n < sz)
> > > > +				break;
> > > >  		}
> > > >  		free_page((unsigned long)kbuf);
> > > >  	}
> > > > @@ -551,11 +555,13 @@ static ssize_t write_kmem(struct file * 
> > > >  				free_page((unsigned long)kbuf);
> > > >  				return -EFAULT;
> > > >  			}
> > > > -			sz = vwrite(kbuf, (char *)p, sz);
> > > > -			count -= sz;
> > > > -			buf += sz;
> > > > -			virtr += sz;
> > > > -			p += sz;
> > > > +			n = vwrite(kbuf, (char *)p, sz);
> > > > +			count -= n;
> > > > +			buf += n;
> > > > +			virtr += n;
> > > > +			p += n;
> > > > +			if (n < sz)
> > > > +				break;
> > > >  		}
> > > >  		free_page((unsigned long)kbuf);
> > > >  	}
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> > > 
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] devmem: handle partial kmem write/read
  2009-09-15  2:31       ` KAMEZAWA Hiroyuki
@ 2009-09-15  2:57         ` Wu Fengguang
  0 siblings, 0 replies; 32+ messages in thread
From: Wu Fengguang @ 2009-09-15  2:57 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, mtosatti, gregkh, broonie, johannes, avi, andi,
	linux-kernel, WANG Cong, Mike Smith, Nick Piggin

[-- Attachment #1: Type: text/plain, Size: 10101 bytes --]

On Tue, Sep 15, 2009 at 10:31:13AM +0800, KAMEZAWA Hiroyuki wrote:
> On Tue, 15 Sep 2009 10:02:08 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > On Tue, Sep 15, 2009 at 08:24:48AM +0800, KAMEZAWA Hiroyuki wrote:
> > > On Mon, 14 Sep 2009 12:34:44 +0800
> > > Wu Fengguang <fengguang.wu@intel.com> wrote:
> > > 
> > > > Hi Kame,
> > > > 
> > > > This patch needs more work. I first intent to fix a bug:
> > > > 
> > > >                         sz = vwrite(kbuf, (char *)p, sz);
> > > >                         p += sz;
> > > >                 }
> > > > 
> > > > So if the returned len is 0, the kbuf/p pointers will mismatch. 
> > > > 
> > > > Then I realize it changed the write behavior. The current vwrite()
> > > > behavior is strange, it returns 0 if _whole range_ is hole, otherwise
> > > > ignores the hole silently. So holes can be treated differently even in
> > > > the original code.
> > > > 
> > > Ah, ok...
> > > 
> > > > I'm not really sure about the right behavior. KAME-san, do you have
> > > > any suggestions?
> > > > 
> > > maybe following make sense.
> > > =
> > > written = vwrite(kbuf, (char *p), sz);
> > > if (!written) // whole vmem was a hole
> > > 	written = sz;
> > 
> > Since the 0 return value won't be used at all, it would be simpler to
> > tell vwrite() return the untouched buflen/sz, like this. It will ignore
> > _all_ the holes silently. Need to update comments too.
> > 
> 
> Hmm. IIUC the original kmem code returns immediately if vread/vwrite returns 0.

Agreed for vread. It seems vwrite don't do that for both kmem and
vmalloc part.

> But it seems this depends on wrong assumption that vmap area is continuous,
> at the first look.

You mean it is possible that not all pages are mapped in a vmlist addr,size range?
Is vmalloc areas guaranteed to be page aligned? Sorry for the dumb questions.

> In man(4) mem,kmem
> ==
>        Byte  addresses  in  mem  are interpreted as physical memory addresses.
>        References to nonexistent locations cause errors to be returned.
> 	.....
>        The file kmem is the same as mem, except that the kernel virtual memory
>        rather than physical memory is accessed.
> 
> ==
> 
> Then, we have to return error for accesses to "nonexistent locations".

That's one important factor to consider. On the other hand, the
original kmem read/write implementation don't return error code for
holes. Instead kmem read returns early, while kmem write ignores holes
but is buggy..

> memory-hole in vmap area is ....."nonexistent" ? 
> I think it's nonexistent if there are no overlaps between requested [pos, pos+len)
> and registerred vmalloc area.
> But, hmm, there are no way for users to know "existing vmalloc area".
> Then, my above idea may be wrong.
> 
> Then, I'd like to modify as following,
> 
>   - If is_vmalloc_or_module_addr(requested address) is false, return -EFAULT.
>   - If is_vmalloc_or_module_addr(requested address) is true, return no error.
>     Even if specified range doesn't include no exsiting vmalloc area.
> 
> How do you think ?

Looks reasonable to me. But it's good to hear more wide opinions..

> Thanks,
> -Kame
> p.s. I wonder current /dev/kmem cannot read text area of kernel if it's not
> directly mapped.

Attached is a small tool (from LKML) for reading 'modprobe_path' from kmem,
it's not text, but is close..

Thanks,
Fengguang

> 
> 
> > --- linux-mm.orig/mm/vmalloc.c  2009-09-15 09:40:08.000000000 +0800                                                                  
> > +++ linux-mm/mm/vmalloc.c       2009-09-15 09:43:33.000000000 +0800                                                                  
> > @@ -1834,7 +1834,6 @@ long vwrite(char *buf, char *addr, unsig                                                                       
> >         struct vm_struct *tmp;                                                                                                       
> >         char *vaddr;                                                                                                                 
> >         unsigned long n, buflen;                                                                                                     
> > -       int copied = 0;                                                                                                              
> >                                                                                                                                      
> >         /* Don't allow overflow */                                                                                                   
> >         if ((unsigned long) addr + count < count)                                                                                    
> > @@ -1858,7 +1857,6 @@ long vwrite(char *buf, char *addr, unsig                                                                       
> >                         n = count;                                                                                                   
> >                 if (!(tmp->flags & VM_IOREMAP)) {                                                                                    
> >                         aligned_vwrite(buf, addr, n);                                                                                
> > -                       copied++;                                                                                                    
> >                 }                                                                                                                    
> >                 buf += n;                                                                                                            
> >                 addr += n;                                                                                                           
> > @@ -1866,8 +1864,6 @@ long vwrite(char *buf, char *addr, unsig                                                                       
> >         }                                                                                                                            
> >  finished:                                                                                                                           
> >         read_unlock(&vmlist_lock);                                                                                                   
> > -       if (!copied)                                                                                                                 
> > -               return 0;                                                                                                            
> >         return buflen;                                                                                                               
> >  }                                                                                                                                   
> > 
> > > ==
> > > needs fix.
> > > 
> > > Anyway, I wonder kmem is broken. It's should be totally rewritten.
> > > 
> > > For example, this doesn't check anything.
> > > ==
> > >         if (p < (unsigned long) high_memory) {
> > > 
> > > ==
> > > 
> > > But....are there users ?
> > > If necessary, I'll write some...
> > 
> > I'm trying to stop possible mem/kmem users to access hwpoison pages.. 
> > I'm not the user, but rather a tester ;)
> > 
> > Thanks,
> > Fengguang
> > 
> > > Thanks,
> > > -Kame
> > > 
> > > 
> > > > Thanks,
> > > > Fengguang
> > > > 
> > > > On Mon, Sep 14, 2009 at 11:29:01AM +0800, Wu Fengguang wrote:
> > > > > Current vwrite silently ignores vwrite() to vmap holes.
> > > > > Better behavior should be stop the write and return
> > > > > on such holes.
> > > > > 
> > > > > Also return on partial read, which may happen in future
> > > > > (eg. hit hwpoison pages).
> > > > > 
> > > > > CC: Andi Kleen <andi@firstfloor.org>
> > > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > > > ---
> > > > >  drivers/char/mem.c |   30 ++++++++++++++++++------------
> > > > >  1 file changed, 18 insertions(+), 12 deletions(-)
> > > > > 
> > > > > --- linux-mm.orig/drivers/char/mem.c	2009-09-14 10:59:50.000000000 +0800
> > > > > +++ linux-mm/drivers/char/mem.c	2009-09-14 11:06:25.000000000 +0800
> > > > > @@ -444,18 +444,22 @@ static ssize_t read_kmem(struct file *fi
> > > > >  		if (!kbuf)
> > > > >  			return -ENOMEM;
> > > > >  		while (count > 0) {
> > > > > +			unsigned long n;
> > > > > +
> > > > >  			sz = size_inside_page(p, count);
> > > > > -			sz = vread(kbuf, (char *)p, sz);
> > > > > -			if (!sz)
> > > > > +			n = vread(kbuf, (char *)p, sz);
> > > > > +			if (!n)
> > > > >  				break;
> > > > > -			if (copy_to_user(buf, kbuf, sz)) {
> > > > > +			if (copy_to_user(buf, kbuf, n)) {
> > > > >  				free_page((unsigned long)kbuf);
> > > > >  				return -EFAULT;
> > > > >  			}
> > > > > -			count -= sz;
> > > > > -			buf += sz;
> > > > > -			read += sz;
> > > > > -			p += sz;
> > > > > +			count -= n;
> > > > > +			buf += n;
> > > > > +			read += n;
> > > > > +			p += n;
> > > > > +			if (n < sz)
> > > > > +				break;
> > > > >  		}
> > > > >  		free_page((unsigned long)kbuf);
> > > > >  	}
> > > > > @@ -551,11 +555,13 @@ static ssize_t write_kmem(struct file * 
> > > > >  				free_page((unsigned long)kbuf);
> > > > >  				return -EFAULT;
> > > > >  			}
> > > > > -			sz = vwrite(kbuf, (char *)p, sz);
> > > > > -			count -= sz;
> > > > > -			buf += sz;
> > > > > -			virtr += sz;
> > > > > -			p += sz;
> > > > > +			n = vwrite(kbuf, (char *)p, sz);
> > > > > +			count -= n;
> > > > > +			buf += n;
> > > > > +			virtr += n;
> > > > > +			p += n;
> > > > > +			if (n < sz)
> > > > > +				break;
> > > > >  		}
> > > > >  		free_page((unsigned long)kbuf);
> > > > >  	}
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > Please read the FAQ at  http://www.tux.org/lkml/
> > > > 
> > 

[-- Attachment #2: tmap.c --]
[-- Type: text/x-csrc, Size: 1385 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/poll.h>
#include <sys/mman.h>

int page_size;
#define PAGE_SIZE page_size
#define PAGE_MASK (~(PAGE_SIZE-1))

void get_var (unsigned long addr) {
	off_t ptr = addr & ~(PAGE_MASK);
	off_t offset = addr & PAGE_MASK;
	int i = 0;
	char *map;
	static int kfd = -1;

	kfd = open("/dev/kmem",O_RDONLY);
	if (kfd < 0) {
		perror("open");
		exit(0);
	}

	map = mmap(NULL,PAGE_SIZE,PROT_READ,MAP_SHARED,kfd,offset);
	if (map == MAP_FAILED) {
		perror("mmap");
		exit(-1);
	}
	printf("%s\n",map+ptr);

	return;
}

int main(int argc, char **argv)
{
	FILE *fp;
	char addr_str[11]="0x";
	char var[51];
	unsigned long addr;
	char ch;
	int r;

	if (argc != 2) {
		fprintf(stderr,"usage: %s System.map\n",argv[0]);
		exit(-1);
	}

	if ((fp = fopen(argv[1],"r")) == NULL) {
		perror("fopen");
		exit(-1);
	}

	do {
		/* ffffffff81723880 D modprobe_path */
		r = fscanf(fp,"%16s %c %50s\n",&addr_str[2],&ch,var);
		if (strcmp(var,"modprobe_path")==0)
			break;
	} while(r > 0);
	if (r < 0) {
		printf("could not find modprobe_path\n");
		exit(-1);
	}
	page_size = getpagesize();
	addr = strtoul(addr_str,NULL,16);
	printf("found modprobe_path at (%s) %08lx\n",addr_str,addr);
	get_var(addr);

	return 0;
}


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Question: how to handle too big f_pos Re: [PATCH] devmem: handle partial kmem write/read
  2009-09-14  3:29 [PATCH] devmem: handle partial kmem write/read Wu Fengguang
  2009-09-14  4:34 ` Wu Fengguang
@ 2009-09-15  7:58 ` KAMEZAWA Hiroyuki
  2009-09-15  8:11   ` Wu Fengguang
                     ` (2 more replies)
  1 sibling, 3 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-15  7:58 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: viro, Andrew Morton, mtosatti, gregkh, broonie, johannes, avi,
	andi, linux-kernel

I'm writing a patch against /dev/kmem...I found a problem.

/dev/kmem (and /proc/<pid>/mem) puts virtual addres to f->f_pos.

And, f->f_pos is loff_t .... signed value (not unsigned)

Then, this check
rw_verify_area(READ, file, pos, count);
=>
        pos = *ppos;
        if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
                return retval;
	
always returns -EINVAL when I read /dev/kmem's valid address.

Then, how should I do for read /dev/kmem ? Any idea ?
(Added Al Viro to CC:)

What I'm really afraid of is /proc/<pid>/mem. IIUC, it's used by gdb to snoop
memory, (for example, at gcore).
I think gdb uses ptrace if it fails to read /proc/<pid>/mem but...it's deadly
slow.

Thanks,
-Kame

p.s. no problems in /proc/kcore..its offset is not bare addresss.





On Mon, 14 Sep 2009 11:29:01 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> Current vwrite silently ignores vwrite() to vmap holes.
> Better behavior should be stop the write and return
> on such holes.
> 
> Also return on partial read, which may happen in future
> (eg. hit hwpoison pages).
> 
> CC: Andi Kleen <andi@firstfloor.org>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  drivers/char/mem.c |   30 ++++++++++++++++++------------
>  1 file changed, 18 insertions(+), 12 deletions(-)
> 
> --- linux-mm.orig/drivers/char/mem.c	2009-09-14 10:59:50.000000000 +0800
> +++ linux-mm/drivers/char/mem.c	2009-09-14 11:06:25.000000000 +0800
> @@ -444,18 +444,22 @@ static ssize_t read_kmem(struct file *fi
>  		if (!kbuf)
>  			return -ENOMEM;
>  		while (count > 0) {
> +			unsigned long n;
> +
>  			sz = size_inside_page(p, count);
> -			sz = vread(kbuf, (char *)p, sz);
> -			if (!sz)
> +			n = vread(kbuf, (char *)p, sz);
> +			if (!n)
>  				break;
> -			if (copy_to_user(buf, kbuf, sz)) {
> +			if (copy_to_user(buf, kbuf, n)) {
>  				free_page((unsigned long)kbuf);
>  				return -EFAULT;
>  			}
> -			count -= sz;
> -			buf += sz;
> -			read += sz;
> -			p += sz;
> +			count -= n;
> +			buf += n;
> +			read += n;
> +			p += n;
> +			if (n < sz)
> +				break;
>  		}
>  		free_page((unsigned long)kbuf);
>  	}
> @@ -551,11 +555,13 @@ static ssize_t write_kmem(struct file * 
>  				free_page((unsigned long)kbuf);
>  				return -EFAULT;
>  			}
> -			sz = vwrite(kbuf, (char *)p, sz);
> -			count -= sz;
> -			buf += sz;
> -			virtr += sz;
> -			p += sz;
> +			n = vwrite(kbuf, (char *)p, sz);
> +			count -= n;
> +			buf += n;
> +			virtr += n;
> +			p += n;
> +			if (n < sz)
> +				break;
>  		}
>  		free_page((unsigned long)kbuf);
>  	}
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: how to handle too big f_pos Re: [PATCH] devmem: handle partial kmem write/read
  2009-09-15  7:58 ` Question: how to handle too big f_pos " KAMEZAWA Hiroyuki
@ 2009-09-15  8:11   ` Wu Fengguang
  2009-09-15  9:52   ` Hugh Dickins
  2009-09-16  5:29   ` [RFC][PATCH][bugfix] more checks for negative f_pos handling (Was Re: Question: how to handle too big f_pos KAMEZAWA Hiroyuki
  2 siblings, 0 replies; 32+ messages in thread
From: Wu Fengguang @ 2009-09-15  8:11 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: viro, Andrew Morton, mtosatti, gregkh, broonie, johannes, avi,
	andi, linux-kernel

Good catch, Kame!

FYI mmap is OK. I succeeded in reading this address via kmem:

        ffffffff81723880 D modprobe_path

Thanks,
Fengguang

On Tue, Sep 15, 2009 at 03:58:52PM +0800, KAMEZAWA Hiroyuki wrote:
> I'm writing a patch against /dev/kmem...I found a problem.
> 
> /dev/kmem (and /proc/<pid>/mem) puts virtual addres to f->f_pos.
> 
> And, f->f_pos is loff_t .... signed value (not unsigned)
> 
> Then, this check
> rw_verify_area(READ, file, pos, count);
> =>
>         pos = *ppos;
>         if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
>                 return retval;
> 	
> always returns -EINVAL when I read /dev/kmem's valid address.
> 
> Then, how should I do for read /dev/kmem ? Any idea ?
> (Added Al Viro to CC:)
> 
> What I'm really afraid of is /proc/<pid>/mem. IIUC, it's used by gdb to snoop
> memory, (for example, at gcore).
> I think gdb uses ptrace if it fails to read /proc/<pid>/mem but...it's deadly
> slow.
> 
> Thanks,
> -Kame
> 
> p.s. no problems in /proc/kcore..its offset is not bare addresss.
> 
> 
> 
> 
> 
> On Mon, 14 Sep 2009 11:29:01 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > Current vwrite silently ignores vwrite() to vmap holes.
> > Better behavior should be stop the write and return
> > on such holes.
> > 
> > Also return on partial read, which may happen in future
> > (eg. hit hwpoison pages).
> > 
> > CC: Andi Kleen <andi@firstfloor.org>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> >  drivers/char/mem.c |   30 ++++++++++++++++++------------
> >  1 file changed, 18 insertions(+), 12 deletions(-)
> > 
> > --- linux-mm.orig/drivers/char/mem.c	2009-09-14 10:59:50.000000000 +0800
> > +++ linux-mm/drivers/char/mem.c	2009-09-14 11:06:25.000000000 +0800
> > @@ -444,18 +444,22 @@ static ssize_t read_kmem(struct file *fi
> >  		if (!kbuf)
> >  			return -ENOMEM;
> >  		while (count > 0) {
> > +			unsigned long n;
> > +
> >  			sz = size_inside_page(p, count);
> > -			sz = vread(kbuf, (char *)p, sz);
> > -			if (!sz)
> > +			n = vread(kbuf, (char *)p, sz);
> > +			if (!n)
> >  				break;
> > -			if (copy_to_user(buf, kbuf, sz)) {
> > +			if (copy_to_user(buf, kbuf, n)) {
> >  				free_page((unsigned long)kbuf);
> >  				return -EFAULT;
> >  			}
> > -			count -= sz;
> > -			buf += sz;
> > -			read += sz;
> > -			p += sz;
> > +			count -= n;
> > +			buf += n;
> > +			read += n;
> > +			p += n;
> > +			if (n < sz)
> > +				break;
> >  		}
> >  		free_page((unsigned long)kbuf);
> >  	}
> > @@ -551,11 +555,13 @@ static ssize_t write_kmem(struct file * 
> >  				free_page((unsigned long)kbuf);
> >  				return -EFAULT;
> >  			}
> > -			sz = vwrite(kbuf, (char *)p, sz);
> > -			count -= sz;
> > -			buf += sz;
> > -			virtr += sz;
> > -			p += sz;
> > +			n = vwrite(kbuf, (char *)p, sz);
> > +			count -= n;
> > +			buf += n;
> > +			virtr += n;
> > +			p += n;
> > +			if (n < sz)
> > +				break;
> >  		}
> >  		free_page((unsigned long)kbuf);
> >  	}
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: how to handle too big f_pos Re: [PATCH] devmem: handle partial kmem write/read
  2009-09-15  7:58 ` Question: how to handle too big f_pos " KAMEZAWA Hiroyuki
  2009-09-15  8:11   ` Wu Fengguang
@ 2009-09-15  9:52   ` Hugh Dickins
  2009-09-16  5:29   ` [RFC][PATCH][bugfix] more checks for negative f_pos handling (Was Re: Question: how to handle too big f_pos KAMEZAWA Hiroyuki
  2 siblings, 0 replies; 32+ messages in thread
From: Hugh Dickins @ 2009-09-15  9:52 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, viro, Andrew Morton, mtosatti, gregkh, broonie,
	johannes, avi, andi, linux-kernel

On Tue, 15 Sep 2009, KAMEZAWA Hiroyuki wrote:

> I'm writing a patch against /dev/kmem...I found a problem.
> 
> /dev/kmem (and /proc/<pid>/mem) puts virtual addres to f->f_pos.
> 
> And, f->f_pos is loff_t .... signed value (not unsigned)
> 
> Then, this check
> rw_verify_area(READ, file, pos, count);
> =>
>         pos = *ppos;
>         if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
>                 return retval;
> 	
> always returns -EINVAL when I read /dev/kmem's valid address.
> 
> Then, how should I do for read /dev/kmem ? Any idea ?
> (Added Al Viro to CC:)

I believe what's supposed to happen is that special drivers like
/dev/mem and /dev/kmem provide their own .llseek method (they do).
But at some point along the way rw_verify_area() got "improved"
and now prevents them working on many configurations.

I noticed around 2.6.24-rc1, and have been building my debug kernels
with the patch below ever since then, as I sometimes like to peek and
poke into /dev/kmem to examine and try different things while running.

But whether it's the right patch is another matter.  Though this should
be independent of that, I've also got a patch at the drivers/char/mem.c
end (I'll post that shortly in the main thread, not appropriate here),
and have never found time to consider whether these hacks amount to
anything more than "works for me".

IIRC, this patch below only covers one aspect of the negative offset
problem.  Where the problems lie does change from time to time - back
in 2003 I used to use pread() and pwrite() to avoid llseek() issues;
but when I came back to it in 2007, I think I found those no longer
worked (on i386 or x86_64 or powerpc?), and lseek64() was the best bet.

Anyway, for what it's worth, ...

--- 2.6.31/fs/read_write.c	2009-09-09 23:13:59.000000000 +0100
+++ 2.6.31d/fs/read_write.c	2009-09-10 09:38:30.000000000 +0100
@@ -222,8 +222,13 @@ int rw_verify_area(int read_write, struc
 	if (unlikely((ssize_t) count < 0))
 		return retval;
 	pos = *ppos;
-	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
-		return retval;
+	if (pos >= 0) {
+		if (unlikely((loff_t) (pos + count) < 0))
+			count = 1 + (size_t) LLONG_MAX - pos;
+	} else {
+		if (unlikely((loff_t) (pos + count) > 0))
+			count = - pos;
+	}
 
 	if (unlikely(inode->i_flock && mandatory_lock(inode))) {
 		retval = locks_mandatory_area(

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH][bugfix] more checks for negative f_pos handling (Was Re: Question: how to handle too big f_pos
  2009-09-15  7:58 ` Question: how to handle too big f_pos " KAMEZAWA Hiroyuki
  2009-09-15  8:11   ` Wu Fengguang
  2009-09-15  9:52   ` Hugh Dickins
@ 2009-09-16  5:29   ` KAMEZAWA Hiroyuki
  2009-09-16  8:20     ` Américo Wang
  2009-09-17  5:53     ` [RFC][PATCH][bugfix] more checks for negative f_pos handling v2 KAMEZAWA Hiroyuki
  2 siblings, 2 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-16  5:29 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, viro, Andrew Morton, linux-kernel, hugh.dickins, oleg


The problem:
> I'm writing a patch against /dev/kmem...I found a problem.
> 
> /dev/kmem (and /proc/<pid>/mem) puts virtual addres to f->f_pos.
> 
> but f->f_pos is always negative and rw_verify_ara() returns -EINVAL always.

Changed CC: List. 

This is a trial to consider how to fix negative f_pos problem shown in above.

Hmm, even after this patch, x86's vsyscall area is not readable.
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0  [vsyscall]
But maybe no problems. (now, it cannot be read, anyway.)

I tested /dev/kmem on x86-64 and this works fine. I added a fix for
/proc/<pid>/mem because I know ia64's hugetlbe area is not readable
via /proc/<pid>/mem. (But I'm not sure other 64bit arch has this
kind of problems in /proc/<pid>/mem)

==
From: KAMEZAWA Hiruyoki <kamezawa.hiroyu@jp.fujitsu.com>

Modifying rw_verify_area()'s negative f_pos check.

Now, rw_verify_area() has this check
   if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
		return -EINVAL

And access to special files as /dev/mem,kmem, /proc/<pid>/mem
returns unexpected -EINVAL.
(For example, ia64 maps hugetlb at 0x8000000000000000- region)

This patch tries to make range check more precise by using
llseek ops defined per special files.

Signed-off-by: KAMEZAWA Hiruyoki <kamezawa.hiroyu@jp.fujitsu.com>
---
 fs/proc/base.c  |   22 +++++++++++++++++-----
 fs/read_write.c |   39 +++++++++++++++++++++++++++++++++++++--
 2 files changed, 54 insertions(+), 7 deletions(-)

Index: mmotm-2.6.31-Sep14/fs/read_write.c
===================================================================
--- mmotm-2.6.31-Sep14.orig/fs/read_write.c
+++ mmotm-2.6.31-Sep14/fs/read_write.c
@@ -205,6 +205,37 @@ bad:
 }
 #endif
 
+static int
+__verify_negative_pos_range(struct file *file, loff_t pos, size_t count)
+{
+	unsigned long long upos, end;
+	loff_t ret;
+
+	/* disallow overflow */
+	upos = (unsigned long long)pos;
+	end = upos + count;
+	if (end < pos)
+		return -EOVERFLOW;
+	/*
+	 * Sanity check...subsystem has to provide llseek for handle big pos.
+	 * Subsystem's llseek should verify f_pos's value comaparing with its
+	 * max file size.
+	 * Note1: generic file ops' llseek cannot handle negative pos.
+	 * Note2: should we take care of pos == -EINVAL ?
+	 * Note3: we check flags and ops here for avoiding taking locks in.
+	 * default_lseek.
+	 */
+	ret = -EINVAL;
+	if ((file->f_mode & FMODE_LSEEK) &&
+	    (file->f_op && file->f_op->llseek)) {
+		ret = vfs_llseek(file, 0, SEEK_CUR);
+		if (ret == pos)
+			return 0;
+	}
+
+	return (int)ret;
+}
+
 /*
  * rw_verify_area doesn't like huge counts. We limit
  * them to something that fits in "int" so that others
@@ -222,8 +253,12 @@ int rw_verify_area(int read_write, struc
 	if (unlikely((ssize_t) count < 0))
 		return retval;
 	pos = *ppos;
-	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
-		return retval;
+	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
+		/* some files requires special care */
+		retval = __verify_negative_pos_range(file, pos, count);
+		if (retval)
+			return retval;
+	}
 
 	if (unlikely(inode->i_flock && mandatory_lock(inode))) {
 		retval = locks_mandatory_area(
Index: mmotm-2.6.31-Sep14/fs/proc/base.c
===================================================================
--- mmotm-2.6.31-Sep14.orig/fs/proc/base.c
+++ mmotm-2.6.31-Sep14/fs/proc/base.c
@@ -903,18 +903,30 @@ out_no_task:
 
 loff_t mem_lseek(struct file *file, loff_t offset, int orig)
 {
+	struct task_struct *task = get_proc_task(file->f_path.dentry->d_inode);
+	unsigned long long new_offset = -EINVAL;
+
+	if (!task) /* lseek's spec doesn't allow -ESRCH but... */
+		return -ESRCH;
+
 	switch (orig) {
 	case 0:
-		file->f_pos = offset;
+		new_offset = offset;
 		break;
 	case 1:
-		file->f_pos += offset;
+		new_offset = (unsigned long long)f->f_pos + offset;
 		break;
 	default:
-		return -EINVAL;
+		new_offset = -EINVAL;
+		break;
 	}
-	force_successful_syscall_return();
-	return file->f_pos;
+	if (new_offset < (unsigned long long)TASK_SIZE_OF(task)) {
+		file->f_pos = (loff_t)new_offset;
+		force_successful_syscall_return();
+	} else
+		new_offset = -EINVAL;
+	put_task_struct(task);
+	return (loff_t)new_offset;
 }
 
 static const struct file_operations proc_mem_operations = {


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling (Was  Re: Question: how to handle too big f_pos
  2009-09-16  5:29   ` [RFC][PATCH][bugfix] more checks for negative f_pos handling (Was Re: Question: how to handle too big f_pos KAMEZAWA Hiroyuki
@ 2009-09-16  8:20     ` Américo Wang
  2009-09-16  8:44       ` KAMEZAWA Hiroyuki
  2009-09-17  5:53     ` [RFC][PATCH][bugfix] more checks for negative f_pos handling v2 KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 32+ messages in thread
From: Américo Wang @ 2009-09-16  8:20 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, viro, Andrew Morton, linux-kernel, hugh.dickins, oleg

On Wed, Sep 16, 2009 at 1:29 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> The problem:
>> I'm writing a patch against /dev/kmem...I found a problem.
>>
>> /dev/kmem (and /proc/<pid>/mem) puts virtual addres to f->f_pos.
>>
>> but f->f_pos is always negative and rw_verify_ara() returns -EINVAL always.
>
> Changed CC: List.
>
> This is a trial to consider how to fix negative f_pos problem shown in above.
>
> Hmm, even after this patch, x86's vsyscall area is not readable.
> ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0  [vsyscall]
> But maybe no problems. (now, it cannot be read, anyway.)
>
> I tested /dev/kmem on x86-64 and this works fine. I added a fix for
> /proc/<pid>/mem because I know ia64's hugetlbe area is not readable
> via /proc/<pid>/mem. (But I'm not sure other 64bit arch has this
> kind of problems in /proc/<pid>/mem)
>
> ==
> From: KAMEZAWA Hiruyoki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Modifying rw_verify_area()'s negative f_pos check.
>
> Now, rw_verify_area() has this check
>   if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
>                return -EINVAL
>
> And access to special files as /dev/mem,kmem, /proc/<pid>/mem
> returns unexpected -EINVAL.
> (For example, ia64 maps hugetlb at 0x8000000000000000- region)
>
> This patch tries to make range check more precise by using
> llseek ops defined per special files.
>
> Signed-off-by: KAMEZAWA Hiruyoki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  fs/proc/base.c  |   22 +++++++++++++++++-----
>  fs/read_write.c |   39 +++++++++++++++++++++++++++++++++++++--
>  2 files changed, 54 insertions(+), 7 deletions(-)
>
> Index: mmotm-2.6.31-Sep14/fs/read_write.c
> ===================================================================
> --- mmotm-2.6.31-Sep14.orig/fs/read_write.c
> +++ mmotm-2.6.31-Sep14/fs/read_write.c
> @@ -205,6 +205,37 @@ bad:
>  }
>  #endif
>
> +static int
> +__verify_negative_pos_range(struct file *file, loff_t pos, size_t count)
> +{
> +       unsigned long long upos, end;
> +       loff_t ret;
> +
> +       /* disallow overflow */
> +       upos = (unsigned long long)pos;
> +       end = upos + count;
> +       if (end < pos)
> +               return -EOVERFLOW;
> +       /*
> +        * Sanity check...subsystem has to provide llseek for handle big pos.
> +        * Subsystem's llseek should verify f_pos's value comaparing with its
> +        * max file size.
> +        * Note1: generic file ops' llseek cannot handle negative pos.
> +        * Note2: should we take care of pos == -EINVAL ?
> +        * Note3: we check flags and ops here for avoiding taking locks in.
> +        * default_lseek.
> +        */
> +       ret = -EINVAL;
> +       if ((file->f_mode & FMODE_LSEEK) &&
> +           (file->f_op && file->f_op->llseek)) {
> +               ret = vfs_llseek(file, 0, SEEK_CUR);
> +               if (ret == pos)
> +                       return 0;
> +       }
> +
> +       return (int)ret;
> +}
> +
>  /*
>  * rw_verify_area doesn't like huge counts. We limit
>  * them to something that fits in "int" so that others
> @@ -222,8 +253,12 @@ int rw_verify_area(int read_write, struc
>        if (unlikely((ssize_t) count < 0))
>                return retval;
>        pos = *ppos;
> -       if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> -               return retval;
> +       if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
> +               /* some files requires special care */
> +               retval = __verify_negative_pos_range(file, pos, count);
> +               if (retval)
> +                       return retval;
> +       }
>
>        if (unlikely(inode->i_flock && mandatory_lock(inode))) {
>                retval = locks_mandatory_area(
> Index: mmotm-2.6.31-Sep14/fs/proc/base.c
> ===================================================================
> --- mmotm-2.6.31-Sep14.orig/fs/proc/base.c
> +++ mmotm-2.6.31-Sep14/fs/proc/base.c
> @@ -903,18 +903,30 @@ out_no_task:
>
>  loff_t mem_lseek(struct file *file, loff_t offset, int orig)
>  {
> +       struct task_struct *task = get_proc_task(file->f_path.dentry->d_inode);
> +       unsigned long long new_offset = -EINVAL;


Why not make 'new_offset' as loff_t? This can make your code easier.

> +
> +       if (!task) /* lseek's spec doesn't allow -ESRCH but... */


No worry, we have many ESRCH for proc files.

> +               return -ESRCH;
> +
>        switch (orig) {
>        case 0:
> -               file->f_pos = offset;
> +               new_offset = offset;
>                break;
>        case 1:
> -               file->f_pos += offset;
> +               new_offset = (unsigned long long)f->f_pos + offset;
>                break;
>        default:
> -               return -EINVAL;
> +               new_offset = -EINVAL;
> +               break;
>        }
> -       force_successful_syscall_return();
> -       return file->f_pos;
> +       if (new_offset < (unsigned long long)TASK_SIZE_OF(task)) {


Hmm, why this check?

> +               file->f_pos = (loff_t)new_offset;
> +               force_successful_syscall_return();
> +       } else
> +               new_offset = -EINVAL;
> +       put_task_struct(task);
> +       return (loff_t)new_offset;
>  }
>
>  static const struct file_operations proc_mem_operations = {

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling (Was  Re: Question: how to handle too big f_pos
  2009-09-16  8:20     ` Américo Wang
@ 2009-09-16  8:44       ` KAMEZAWA Hiroyuki
  2009-09-16  9:13         ` Américo Wang
  0 siblings, 1 reply; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-16  8:44 UTC (permalink / raw)
  To: Américo Wang
  Cc: Wu Fengguang, viro, Andrew Morton, linux-kernel, hugh.dickins, oleg

Ah, sorry. I should CC: you.

On Wed, 16 Sep 2009 16:20:32 +0800
Américo Wang <xiyou.wangcong@gmail.com> wrote:

> On Wed, Sep 16, 2009 at 1:29 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > The problem:
> >> I'm writing a patch against /dev/kmem...I found a problem.
> >>
> >> /dev/kmem (and /proc/<pid>/mem) puts virtual addres to f->f_pos.
> >>
> >> but f->f_pos is always negative and rw_verify_ara() returns -EINVAL always.
> >
> > Changed CC: List.
> >
> > This is a trial to consider how to fix negative f_pos problem shown in above.
> >
> > Hmm, even after this patch, x86's vsyscall area is not readable.
> > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0  [vsyscall]
> > But maybe no problems. (now, it cannot be read, anyway.)
> >
> > I tested /dev/kmem on x86-64 and this works fine. I added a fix for
> > /proc/<pid>/mem because I know ia64's hugetlbe area is not readable
> > via /proc/<pid>/mem. (But I'm not sure other 64bit arch has this
> > kind of problems in /proc/<pid>/mem)
> >
> > ==
> > From: KAMEZAWA Hiruyoki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Modifying rw_verify_area()'s negative f_pos check.
> >
> > Now, rw_verify_area() has this check
> >   if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> >                return -EINVAL
> >
> > And access to special files as /dev/mem,kmem, /proc/<pid>/mem
> > returns unexpected -EINVAL.
> > (For example, ia64 maps hugetlb at 0x8000000000000000- region)
> >
> > This patch tries to make range check more precise by using
> > llseek ops defined per special files.
> >
> > Signed-off-by: KAMEZAWA Hiruyoki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  fs/proc/base.c  |   22 +++++++++++++++++-----
> >  fs/read_write.c |   39 +++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 54 insertions(+), 7 deletions(-)
> >
> > Index: mmotm-2.6.31-Sep14/fs/read_write.c
> > ===================================================================
> > --- mmotm-2.6.31-Sep14.orig/fs/read_write.c
> > +++ mmotm-2.6.31-Sep14/fs/read_write.c
> > @@ -205,6 +205,37 @@ bad:
> >  }
> >  #endif
> >
> > +static int
> > +__verify_negative_pos_range(struct file *file, loff_t pos, size_t count)
> > +{
> > +       unsigned long long upos, end;
> > +       loff_t ret;
> > +
> > +       /* disallow overflow */
> > +       upos = (unsigned long long)pos;
> > +       end = upos + count;
> > +       if (end < pos)
> > +               return -EOVERFLOW;
> > +       /*
> > +        * Sanity check...subsystem has to provide llseek for handle big pos.
> > +        * Subsystem's llseek should verify f_pos's value comaparing with its
> > +        * max file size.
> > +        * Note1: generic file ops' llseek cannot handle negative pos.
> > +        * Note2: should we take care of pos == -EINVAL ?
> > +        * Note3: we check flags and ops here for avoiding taking locks in.
> > +        * default_lseek.
> > +        */
> > +       ret = -EINVAL;
> > +       if ((file->f_mode & FMODE_LSEEK) &&
> > +           (file->f_op && file->f_op->llseek)) {
> > +               ret = vfs_llseek(file, 0, SEEK_CUR);
> > +               if (ret == pos)
> > +                       return 0;
> > +       }
> > +
> > +       return (int)ret;
> > +}
> > +
> >  /*
> >  * rw_verify_area doesn't like huge counts. We limit
> >  * them to something that fits in "int" so that others
> > @@ -222,8 +253,12 @@ int rw_verify_area(int read_write, struc
> >        if (unlikely((ssize_t) count < 0))
> >                return retval;
> >        pos = *ppos;
> > -       if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> > -               return retval;
> > +       if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
> > +               /* some files requires special care */
> > +               retval = __verify_negative_pos_range(file, pos, count);
> > +               if (retval)
> > +                       return retval;
> > +       }
> >
> >        if (unlikely(inode->i_flock && mandatory_lock(inode))) {
> >                retval = locks_mandatory_area(
> > Index: mmotm-2.6.31-Sep14/fs/proc/base.c
> > ===================================================================
> > --- mmotm-2.6.31-Sep14.orig/fs/proc/base.c
> > +++ mmotm-2.6.31-Sep14/fs/proc/base.c
> > @@ -903,18 +903,30 @@ out_no_task:
> >
> >  loff_t mem_lseek(struct file *file, loff_t offset, int orig)
> >  {
> > +       struct task_struct *task = get_proc_task(file->f_path.dentry->d_inode);
> > +       unsigned long long new_offset = -EINVAL;
> 
> 
> Why not make 'new_offset' as loff_t? This can make your code easier.
> 
loff_t is "long long", I wanted "unsigned long long" for showing
f_pos here is treated as "unsigned".



> > +
> > +       if (!task) /* lseek's spec doesn't allow -ESRCH but... */
> 
> 
> No worry, we have many ESRCH for proc files.
> 
I know ;)

> > +               return -ESRCH;
> > +
> >        switch (orig) {
> >        case 0:
> > -               file->f_pos = offset;
> > +               new_offset = offset;
> >                break;
> >        case 1:
> > -               file->f_pos += offset;
> > +               new_offset = (unsigned long long)f->f_pos + offset;
> >                break;
> >        default:
> > -               return -EINVAL;
> > +               new_offset = -EINVAL;
> > +               break;
> >        }
> > -       force_successful_syscall_return();
> > -       return file->f_pos;
> > +       if (new_offset < (unsigned long long)TASK_SIZE_OF(task)) {
> 
> 
> Hmm, why this check?
> 
2 reasons.

  1. If this lseek has to check something, this is it.
  2. On architecture where 32bit program can ran on 64bit,
     moving f_pos above 4G is out-of-range, for example.

But mem_read() will catch any bad f_pos, anyway. So, just making
allow all f_pos here is maybe a choice. Considering lseek,
providing this range check here is not so bad.

Thanks.
-Kame

> > +               file->f_pos = (loff_t)new_offset;
> > +               force_successful_syscall_return();
> > +       } else
> > +               new_offset = -EINVAL;
> > +       put_task_struct(task);
> > +       return (loff_t)new_offset;
> >  }
> >
> >  static const struct file_operations proc_mem_operations = {
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling (Was  Re: Question: how to handle too big f_pos
  2009-09-16  8:44       ` KAMEZAWA Hiroyuki
@ 2009-09-16  9:13         ` Américo Wang
  2009-09-16 12:06           ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 32+ messages in thread
From: Américo Wang @ 2009-09-16  9:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, viro, Andrew Morton, linux-kernel, hugh.dickins, oleg

On Wed, Sep 16, 2009 at 4:44 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Ah, sorry. I should CC: you.


No problem. :)

>> > --- mmotm-2.6.31-Sep14.orig/fs/proc/base.c
>> > +++ mmotm-2.6.31-Sep14/fs/proc/base.c
>> > @@ -903,18 +903,30 @@ out_no_task:
>> >
>> >  loff_t mem_lseek(struct file *file, loff_t offset, int orig)
>> >  {
>> > +       struct task_struct *task = get_proc_task(file->f_path.dentry->d_inode);
>> > +       unsigned long long new_offset = -EINVAL;
>>
>>
>> Why not make 'new_offset' as loff_t? This can make your code easier.
>>
> loff_t is "long long", I wanted "unsigned long long" for showing
> f_pos here is treated as "unsigned".
>


Yeah, the same as for __verify_negative_pos_range(), right...


<snip>

>> >        }
>> > -       force_successful_syscall_return();
>> > -       return file->f_pos;
>> > +       if (new_offset < (unsigned long long)TASK_SIZE_OF(task)) {
>>
>>
>> Hmm, why this check?
>>
> 2 reasons.
>
>  1. If this lseek has to check something, this is it.
>  2. On architecture where 32bit program can ran on 64bit,
>     moving f_pos above 4G is out-of-range, for example.
>
> But mem_read() will catch any bad f_pos, anyway. So, just making
> allow all f_pos here is maybe a choice. Considering lseek,
> providing this range check here is not so bad.

Ok, I misunderstood the macro 'TASK_SIZE_OF', then no problem.

Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling (Was  Re: Question: how to handle too big f_pos
  2009-09-16  9:13         ` Américo Wang
@ 2009-09-16 12:06           ` KAMEZAWA Hiroyuki
  2009-09-17  3:06             ` Américo Wang
  0 siblings, 1 reply; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-16 12:06 UTC (permalink / raw)
  To: Am?rico_Wang
  Cc: KAMEZAWA Hiroyuki, Wu Fengguang, viro, Andrew Morton,
	linux-kernel, hugh.dickins, oleg

Américo_Wang wrote:
> On Wed, Sep 16, 2009 at 4:44 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> Ah, sorry. I should CC: you.
>
>
> No problem. :)
>
>>> > --- mmotm-2.6.31-Sep14.orig/fs/proc/base.c
>>> > +++ mmotm-2.6.31-Sep14/fs/proc/base.c
>>> > @@ -903,18 +903,30 @@ out_no_task:
>>> >
>>> > ?loff_t mem_lseek(struct file *file, loff_t offset, int orig)
>>> > ?{
>>> > + ? ? ? struct task_struct *task =
>>> get_proc_task(file->f_path.dentry->d_inode);
>>> > + ? ? ? unsigned long long new_offset = -EINVAL;
>>>
>>>
>>> Why not make 'new_offset' as loff_t? This can make your code easier.
>>>
>> loff_t is "long long", I wanted "unsigned long long" for showing
>> f_pos here is treated as "unsigned".
>>
>
>
> Yeah, the same as for __verify_negative_pos_range(), right...
>
>
> <snip>
>
>>> > ? ? ? ?}
>>> > - ? ? ? force_successful_syscall_return();
>>> > - ? ? ? return file->f_pos;
>>> > + ? ? ? if (new_offset < (unsigned long long)TASK_SIZE_OF(task)) {
>>>
>>>
>>> Hmm, why this check?
>>>
>> 2 reasons.
>>
>> ?1. If this lseek has to check something, this is it.
>> ?2. On architecture where 32bit program can ran on 64bit,
>> ? ? moving f_pos above 4G is out-of-range, for example.
>>
>> But mem_read() will catch any bad f_pos, anyway. So, just making
>> allow all f_pos here is maybe a choice. Considering lseek,
>> providing this range check here is not so bad.
>
> Ok, I misunderstood the macro 'TASK_SIZE_OF', then no problem.
>
> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
>
Ah, very sorry. I noticed I didn't handle pread/pwrite, splice, etc...
I'll do retry.

Sorry,
-Kame


> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling (Was  Re: Question: how to handle too big f_pos
  2009-09-16 12:06           ` KAMEZAWA Hiroyuki
@ 2009-09-17  3:06             ` Américo Wang
  0 siblings, 0 replies; 32+ messages in thread
From: Américo Wang @ 2009-09-17  3:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, viro, Andrew Morton, linux-kernel, hugh.dickins, oleg

2009/9/16 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
> Am�+1rico_Wang wrote:
>> On Wed, Sep 16, 2009 at 4:44 PM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>>> Ah, sorry. I should CC: you.
>>
>>
>> No problem. :)
>>
>>>> > --- mmotm-2.6.31-Sep14.orig/fs/proc/base.c
>>>> > +++ mmotm-2.6.31-Sep14/fs/proc/base.c
>>>> > @@ -903,18 +903,30 @@ out_no_task:
>>>> >
>>>> > ?loff_t mem_lseek(struct file *file, loff_t offset, int orig)
>>>> > ?{
>>>> > + ? ? ? struct task_struct *task =
>>>> get_proc_task(file->f_path.dentry->d_inode);
>>>> > + ? ? ? unsigned long long new_offset = -EINVAL;
>>>>
>>>>
>>>> Why not make 'new_offset' as loff_t? This can make your code easier.
>>>>
>>> loff_t is "long long", I wanted "unsigned long long" for showing
>>> f_pos here is treated as "unsigned".
>>>
>>
>>
>> Yeah, the same as for __verify_negative_pos_range(), right...
>>
>>
>> <snip>
>>
>>>> > ? ? ? ?}
>>>> > - ? ? ? force_successful_syscall_return();
>>>> > - ? ? ? return file->f_pos;
>>>> > + ? ? ? if (new_offset < (unsigned long long)TASK_SIZE_OF(task)) {
>>>>
>>>>
>>>> Hmm, why this check?
>>>>
>>> 2 reasons.
>>>
>>> ?1. If this lseek has to check something, this is it.
>>> ?2. On architecture where 32bit program can ran on 64bit,
>>> ? ? moving f_pos above 4G is out-of-range, for example.
>>>
>>> But mem_read() will catch any bad f_pos, anyway. So, just making
>>> allow all f_pos here is maybe a choice. Considering lseek,
>>> providing this range check here is not so bad.
>>
>> Ok, I misunderstood the macro 'TASK_SIZE_OF', then no problem.
>>
>> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
>>
> Ah, very sorry. I noticed I didn't handle pread/pwrite, splice, etc...
> I'll do retry.

Yes? Not necessary...

In man page, it said /proc/<pid>/mem can be accessed via open(),
read(), fseek(), no pread(), no splice()...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH][bugfix] more checks for negative f_pos handling v2
  2009-09-16  5:29   ` [RFC][PATCH][bugfix] more checks for negative f_pos handling (Was Re: Question: how to handle too big f_pos KAMEZAWA Hiroyuki
  2009-09-16  8:20     ` Américo Wang
@ 2009-09-17  5:53     ` KAMEZAWA Hiroyuki
  2009-09-17  6:07       ` [RFC][PATCH][bugfix] more checks for negative f_pos handling v3 KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-17  5:53 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, viro, Andrew Morton, linux-kernel, hugh.dickins,
	oleg, xiyou.wangcong

Considering several approaches, I think this is the simplest..
Any idea ?

Thanks,
-Kame
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Now, rw_verify_area() checsk f_pos is negative or not. And if
negative, returns -EINVAL.

But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..
has negative offsets. And we can't do any access via read/write
to the file(device).

This patch introduce a flag S_VERYBIG and allow negative file
offsets.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 drivers/char/mem.c |   23 +++++++++++++----------
 fs/proc/base.c     |    2 ++
 fs/read_write.c    |   20 ++++++++++++++++++--
 include/linux/fs.h |    2 ++
 4 files changed, 35 insertions(+), 12 deletions(-)

Index: mmotm-2.6.31-Sep14/fs/read_write.c
===================================================================
--- mmotm-2.6.31-Sep14.orig/fs/read_write.c
+++ mmotm-2.6.31-Sep14/fs/read_write.c
@@ -205,6 +205,19 @@ bad:
 }
 #endif
 
+static bool
+__negative_fpos_check(struct inode *inode, loff_t pos, size_t count)
+{
+	struct inode *inode;
+	loff_t isize;
+
+	if (pos + count < pos)
+		return -EOVERFLOW;
+	if (IS_VERYBIG(inode))
+		return 0;
+	return -EINVAL;
+}
+
 /*
  * rw_verify_area doesn't like huge counts. We limit
  * them to something that fits in "int" so that others
@@ -222,8 +235,11 @@ int rw_verify_area(int read_write, struc
 	if (unlikely((ssize_t) count < 0))
 		return retval;
 	pos = *ppos;
-	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
-		return retval;
+	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
+		retval = __negative_fpos_check(inode, pos, count))
+		if (retval)
+			return retval;
+	}
 
 	if (unlikely(inode->i_flock && mandatory_lock(inode))) {
 		retval = locks_mandatory_area(
Index: mmotm-2.6.31-Sep14/include/linux/fs.h
===================================================================
--- mmotm-2.6.31-Sep14.orig/include/linux/fs.h
+++ mmotm-2.6.31-Sep14/include/linux/fs.h
@@ -231,6 +231,7 @@ struct inodes_stat_t {
 #define S_NOCMTIME	128	/* Do not update file c/mtime */
 #define S_SWAPFILE	256	/* Do not truncate: swapon got its bmaps */
 #define S_PRIVATE	512	/* Inode is fs-internal */
+#define S_VERYBIG	1024	/* Allow file's loff_t can be negative */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system
@@ -265,6 +266,7 @@ struct inodes_stat_t {
 #define IS_NOCMTIME(inode)	((inode)->i_flags & S_NOCMTIME)
 #define IS_SWAPFILE(inode)	((inode)->i_flags & S_SWAPFILE)
 #define IS_PRIVATE(inode)	((inode)->i_flags & S_PRIVATE)
+#define IS_VERYBIG(inode)	((inode)->i_flags & S_VERYBIG)
 
 /* the read-only stuff doesn't really belong here, but any other place is
    probably as bad and I don't want to create yet another include file. */
Index: mmotm-2.6.31-Sep14/drivers/char/mem.c
===================================================================
--- mmotm-2.6.31-Sep14.orig/drivers/char/mem.c
+++ mmotm-2.6.31-Sep14/drivers/char/mem.c
@@ -825,22 +825,23 @@ static const struct memdev {
 	const char *name;
 	const struct file_operations *fops;
 	struct backing_dev_info *dev_info;
+	bool	verybig;
 } devlist[] = {
-	[ 1] = { "mem", &mem_fops, &directly_mappable_cdev_bdi },
+	[1] = { "mem", &mem_fops, &directly_mappable_cdev_bdi, true },
 #ifdef CONFIG_DEVKMEM
-	[ 2] = { "kmem", &kmem_fops, &directly_mappable_cdev_bdi },
+	[2] = { "kmem", &kmem_fops, &directly_mappable_cdev_bdi, true },
 #endif
-	[ 3] = {"null", &null_fops, NULL },
+	[3] = {"null", &null_fops, NULL, false },
 #ifdef CONFIG_DEVPORT
-	[ 4] = { "port", &port_fops, NULL },
+	[4] = { "port", &port_fops, NULL, false },
 #endif
-	[ 5] = { "zero", &zero_fops, &zero_bdi },
-	[ 6] = { "full", &full_fops, NULL },
-	[ 7] = { "random", &random_fops, NULL },
-	[ 9] = { "urandom", &urandom_fops, NULL },
-	[11] = { "kmsg", &kmsg_fops, NULL },
+	[5] = { "zero", &zero_fops, &zero_bdi, false },
+	[6] = { "full", &full_fops, NULL, false },
+	[7] = { "random", &random_fops, NULL, false },
+	[9] = { "urandom", &urandom_fops, NULL, false },
+	[11] = { "kmsg", &kmsg_fops, NULL, false },
 #ifdef CONFIG_CRASH_DUMP
-	[12] = { "oldmem", &oldmem_fops, NULL },
+	[12] = { "oldmem", &oldmem_fops, NULL, false },
 #endif
 };
 
@@ -868,6 +869,8 @@ static int memory_open(struct inode *ino
 		ret = dev->fops->open(inode, filp);
 	else
 		ret = 0;
+	if (dev->verybig)
+		inode->i_flags |= S_VERYBIG;
 out:
 	unlock_kernel();
 	return ret;
Index: mmotm-2.6.31-Sep14/fs/proc/base.c
===================================================================
--- mmotm-2.6.31-Sep14.orig/fs/proc/base.c
+++ mmotm-2.6.31-Sep14/fs/proc/base.c
@@ -779,6 +779,8 @@ static const struct file_operations proc
 static int mem_open(struct inode* inode, struct file* file)
 {
 	file->private_data = (void*)((long)current->self_exec_id);
+	/* this file is read only and we can catch out-pf-range */
+	inode->i_flags |= S_VERYBIG;
 	return 0;
 }
 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH][bugfix] more checks for negative f_pos handling v3
  2009-09-17  5:53     ` [RFC][PATCH][bugfix] more checks for negative f_pos handling v2 KAMEZAWA Hiroyuki
@ 2009-09-17  6:07       ` KAMEZAWA Hiroyuki
  2009-09-17  6:21         ` Wu Fengguang
  0 siblings, 1 reply; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-17  6:07 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, viro, Andrew Morton, linux-kernel, hugh.dickins,
	oleg, xiyou.wangcong

Very sorry...reflesh miss.
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Now, rw_verify_area() checsk f_pos is negative or not. And if
negative, returns -EINVAL.

But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..
has negative offsets. And we can't do any access via read/write
to the file(device).

This patch introduce a flag S_VERYBIG and allow negative file
offsets.

Changelog:
 - fixed bug in rw_verify_area (it cannot be compiled)

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 drivers/char/mem.c |   23 +++++++++++++----------
 fs/proc/base.c     |    2 ++
 fs/read_write.c    |   20 ++++++++++++++++++--
 include/linux/fs.h |    2 ++
 4 files changed, 35 insertions(+), 12 deletions(-)

Index: mmotm-2.6.31-Sep14/fs/read_write.c
===================================================================
--- mmotm-2.6.31-Sep14.orig/fs/read_write.c
+++ mmotm-2.6.31-Sep14/fs/read_write.c
@@ -205,6 +205,16 @@ bad:
 }
 #endif
 
+static bool
+__negative_fpos_check(struct inode *inode, loff_t pos, size_t count)
+{
+	if (pos + count < pos)
+		return -EOVERFLOW;
+	if (IS_VERYBIG(inode))
+		return 0;
+	return -EINVAL;
+}
+
 /*
  * rw_verify_area doesn't like huge counts. We limit
  * them to something that fits in "int" so that others
@@ -222,8 +232,11 @@ int rw_verify_area(int read_write, struc
 	if (unlikely((ssize_t) count < 0))
 		return retval;
 	pos = *ppos;
-	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
-		return retval;
+	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
+		retval = __negative_fpos_check(inode, pos, count);
+		if (retval)
+			return retval;
+	}
 
 	if (unlikely(inode->i_flock && mandatory_lock(inode))) {
 		retval = locks_mandatory_area(
Index: mmotm-2.6.31-Sep14/include/linux/fs.h
===================================================================
--- mmotm-2.6.31-Sep14.orig/include/linux/fs.h
+++ mmotm-2.6.31-Sep14/include/linux/fs.h
@@ -231,6 +231,7 @@ struct inodes_stat_t {
 #define S_NOCMTIME	128	/* Do not update file c/mtime */
 #define S_SWAPFILE	256	/* Do not truncate: swapon got its bmaps */
 #define S_PRIVATE	512	/* Inode is fs-internal */
+#define S_VERYBIG	1024	/* Allow file's loff_t can be negative */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system
@@ -265,6 +266,7 @@ struct inodes_stat_t {
 #define IS_NOCMTIME(inode)	((inode)->i_flags & S_NOCMTIME)
 #define IS_SWAPFILE(inode)	((inode)->i_flags & S_SWAPFILE)
 #define IS_PRIVATE(inode)	((inode)->i_flags & S_PRIVATE)
+#define IS_VERYBIG(inode)	((inode)->i_flags & S_VERYBIG)
 
 /* the read-only stuff doesn't really belong here, but any other place is
    probably as bad and I don't want to create yet another include file. */
Index: mmotm-2.6.31-Sep14/drivers/char/mem.c
===================================================================
--- mmotm-2.6.31-Sep14.orig/drivers/char/mem.c
+++ mmotm-2.6.31-Sep14/drivers/char/mem.c
@@ -825,22 +825,23 @@ static const struct memdev {
 	const char *name;
 	const struct file_operations *fops;
 	struct backing_dev_info *dev_info;
+	bool	verybig;
 } devlist[] = {
-	[ 1] = { "mem", &mem_fops, &directly_mappable_cdev_bdi },
+	[1] = { "mem", &mem_fops, &directly_mappable_cdev_bdi, true },
 #ifdef CONFIG_DEVKMEM
-	[ 2] = { "kmem", &kmem_fops, &directly_mappable_cdev_bdi },
+	[2] = { "kmem", &kmem_fops, &directly_mappable_cdev_bdi, true },
 #endif
-	[ 3] = {"null", &null_fops, NULL },
+	[3] = {"null", &null_fops, NULL, false },
 #ifdef CONFIG_DEVPORT
-	[ 4] = { "port", &port_fops, NULL },
+	[4] = { "port", &port_fops, NULL, false },
 #endif
-	[ 5] = { "zero", &zero_fops, &zero_bdi },
-	[ 6] = { "full", &full_fops, NULL },
-	[ 7] = { "random", &random_fops, NULL },
-	[ 9] = { "urandom", &urandom_fops, NULL },
-	[11] = { "kmsg", &kmsg_fops, NULL },
+	[5] = { "zero", &zero_fops, &zero_bdi, false },
+	[6] = { "full", &full_fops, NULL, false },
+	[7] = { "random", &random_fops, NULL, false },
+	[9] = { "urandom", &urandom_fops, NULL, false },
+	[11] = { "kmsg", &kmsg_fops, NULL, false },
 #ifdef CONFIG_CRASH_DUMP
-	[12] = { "oldmem", &oldmem_fops, NULL },
+	[12] = { "oldmem", &oldmem_fops, NULL, false },
 #endif
 };
 
@@ -868,6 +869,8 @@ static int memory_open(struct inode *ino
 		ret = dev->fops->open(inode, filp);
 	else
 		ret = 0;
+	if (dev->verybig)
+		inode->i_flags |= S_VERYBIG;
 out:
 	unlock_kernel();
 	return ret;
Index: mmotm-2.6.31-Sep14/fs/proc/base.c
===================================================================
--- mmotm-2.6.31-Sep14.orig/fs/proc/base.c
+++ mmotm-2.6.31-Sep14/fs/proc/base.c
@@ -779,6 +779,8 @@ static const struct file_operations proc
 static int mem_open(struct inode* inode, struct file* file)
 {
 	file->private_data = (void*)((long)current->self_exec_id);
+	/* this file is read only and we can catch out-pf-range */
+	inode->i_flags |= S_VERYBIG;
 	return 0;
 }
 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling v3
  2009-09-17  6:07       ` [RFC][PATCH][bugfix] more checks for negative f_pos handling v3 KAMEZAWA Hiroyuki
@ 2009-09-17  6:21         ` Wu Fengguang
  2009-09-17  6:31           ` KAMEZAWA Hiroyuki
  2009-09-17  6:51           ` [RFC][PATCH][bugfix] more checks for negative f_pos handling v4 KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 32+ messages in thread
From: Wu Fengguang @ 2009-09-17  6:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: viro, Andrew Morton, linux-kernel, hugh.dickins, oleg, xiyou.wangcong

On Thu, Sep 17, 2009 at 02:07:26PM +0800, KAMEZAWA Hiroyuki wrote:
> Very sorry...reflesh miss.
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Now, rw_verify_area() checsk f_pos is negative or not. And if
> negative, returns -EINVAL.
> 
> But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..
> has negative offsets. And we can't do any access via read/write
> to the file(device).
> 
> This patch introduce a flag S_VERYBIG and allow negative file
> offsets.
> 
> Changelog:
>  - fixed bug in rw_verify_area (it cannot be compiled)
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  drivers/char/mem.c |   23 +++++++++++++----------
>  fs/proc/base.c     |    2 ++
>  fs/read_write.c    |   20 ++++++++++++++++++--
>  include/linux/fs.h |    2 ++
>  4 files changed, 35 insertions(+), 12 deletions(-)
> 
> Index: mmotm-2.6.31-Sep14/fs/read_write.c
> ===================================================================
> --- mmotm-2.6.31-Sep14.orig/fs/read_write.c
> +++ mmotm-2.6.31-Sep14/fs/read_write.c
> @@ -205,6 +205,16 @@ bad:
>  }
>  #endif
>  
> +static bool
> +__negative_fpos_check(struct inode *inode, loff_t pos, size_t count)
> +{
> +	if (pos + count < pos)
> +		return -EOVERFLOW;
> +	if (IS_VERYBIG(inode))
> +		return 0;
> +	return -EINVAL;

This always return negative values for !IS_VERYBIG inodes,
looks weird at least.

> +}
> +
>  /*
>   * rw_verify_area doesn't like huge counts. We limit
>   * them to something that fits in "int" so that others
> @@ -222,8 +232,11 @@ int rw_verify_area(int read_write, struc
>  	if (unlikely((ssize_t) count < 0))
>  		return retval;
>  	pos = *ppos;
> -	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> -		return retval;
> +	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
> +		retval = __negative_fpos_check(inode, pos, count);
> +		if (retval)
> +			return retval;
> +	}
>  
>  	if (unlikely(inode->i_flock && mandatory_lock(inode))) {
>  		retval = locks_mandatory_area(
> Index: mmotm-2.6.31-Sep14/include/linux/fs.h
> ===================================================================
> --- mmotm-2.6.31-Sep14.orig/include/linux/fs.h
> +++ mmotm-2.6.31-Sep14/include/linux/fs.h
> @@ -231,6 +231,7 @@ struct inodes_stat_t {
>  #define S_NOCMTIME	128	/* Do not update file c/mtime */
>  #define S_SWAPFILE	256	/* Do not truncate: swapon got its bmaps */
>  #define S_PRIVATE	512	/* Inode is fs-internal */
> +#define S_VERYBIG	1024	/* Allow file's loff_t can be negative */
>  
>  /*
>   * Note that nosuid etc flags are inode-specific: setting some file-system
> @@ -265,6 +266,7 @@ struct inodes_stat_t {
>  #define IS_NOCMTIME(inode)	((inode)->i_flags & S_NOCMTIME)
>  #define IS_SWAPFILE(inode)	((inode)->i_flags & S_SWAPFILE)
>  #define IS_PRIVATE(inode)	((inode)->i_flags & S_PRIVATE)
> +#define IS_VERYBIG(inode)	((inode)->i_flags & S_VERYBIG)
>  
>  /* the read-only stuff doesn't really belong here, but any other place is
>     probably as bad and I don't want to create yet another include file. */
> Index: mmotm-2.6.31-Sep14/drivers/char/mem.c
> ===================================================================
> --- mmotm-2.6.31-Sep14.orig/drivers/char/mem.c
> +++ mmotm-2.6.31-Sep14/drivers/char/mem.c
> @@ -825,22 +825,23 @@ static const struct memdev {
>  	const char *name;
>  	const struct file_operations *fops;
>  	struct backing_dev_info *dev_info;
> +	bool	verybig;
>  } devlist[] = {
> -	[ 1] = { "mem", &mem_fops, &directly_mappable_cdev_bdi },
> +	[1] = { "mem", &mem_fops, &directly_mappable_cdev_bdi, true },
>  #ifdef CONFIG_DEVKMEM
> -	[ 2] = { "kmem", &kmem_fops, &directly_mappable_cdev_bdi },
> +	[2] = { "kmem", &kmem_fops, &directly_mappable_cdev_bdi, true },
>  #endif
> -	[ 3] = {"null", &null_fops, NULL },
> +	[3] = {"null", &null_fops, NULL, false },
>  #ifdef CONFIG_DEVPORT
> -	[ 4] = { "port", &port_fops, NULL },
> +	[4] = { "port", &port_fops, NULL, false },
>  #endif
> -	[ 5] = { "zero", &zero_fops, &zero_bdi },
> -	[ 6] = { "full", &full_fops, NULL },
> -	[ 7] = { "random", &random_fops, NULL },
> -	[ 9] = { "urandom", &urandom_fops, NULL },
> -	[11] = { "kmsg", &kmsg_fops, NULL },
> +	[5] = { "zero", &zero_fops, &zero_bdi, false },
> +	[6] = { "full", &full_fops, NULL, false },
> +	[7] = { "random", &random_fops, NULL, false },
> +	[9] = { "urandom", &urandom_fops, NULL, false },
> +	[11] = { "kmsg", &kmsg_fops, NULL, false },
>  #ifdef CONFIG_CRASH_DUMP
> -	[12] = { "oldmem", &oldmem_fops, NULL },
> +	[12] = { "oldmem", &oldmem_fops, NULL, false },
>  #endif

How about make the fields aligned? That would look much nicer :)

Thanks,
Fengguang

>  };
>  
> @@ -868,6 +869,8 @@ static int memory_open(struct inode *ino
>  		ret = dev->fops->open(inode, filp);
>  	else
>  		ret = 0;
> +	if (dev->verybig)
> +		inode->i_flags |= S_VERYBIG;
>  out:
>  	unlock_kernel();
>  	return ret;
> Index: mmotm-2.6.31-Sep14/fs/proc/base.c
> ===================================================================
> --- mmotm-2.6.31-Sep14.orig/fs/proc/base.c
> +++ mmotm-2.6.31-Sep14/fs/proc/base.c
> @@ -779,6 +779,8 @@ static const struct file_operations proc
>  static int mem_open(struct inode* inode, struct file* file)
>  {
>  	file->private_data = (void*)((long)current->self_exec_id);
> +	/* this file is read only and we can catch out-pf-range */
> +	inode->i_flags |= S_VERYBIG;
>  	return 0;
>  }
>  

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling v3
  2009-09-17  6:21         ` Wu Fengguang
@ 2009-09-17  6:31           ` KAMEZAWA Hiroyuki
  2009-09-17  6:53             ` Wu Fengguang
  2009-09-17  6:51           ` [RFC][PATCH][bugfix] more checks for negative f_pos handling v4 KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-17  6:31 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: viro, Andrew Morton, linux-kernel, hugh.dickins, oleg, xiyou.wangcong

On Thu, 17 Sep 2009 14:21:24 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> On Thu, Sep 17, 2009 at 02:07:26PM +0800, KAMEZAWA Hiroyuki wrote:
> > Very sorry...reflesh miss.
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > Now, rw_verify_area() checsk f_pos is negative or not. And if
> > negative, returns -EINVAL.
> > 
> > But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..
> > has negative offsets. And we can't do any access via read/write
> > to the file(device).
> > 
> > This patch introduce a flag S_VERYBIG and allow negative file
> > offsets.
> > 
> > Changelog:
> >  - fixed bug in rw_verify_area (it cannot be compiled)
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  drivers/char/mem.c |   23 +++++++++++++----------
> >  fs/proc/base.c     |    2 ++
> >  fs/read_write.c    |   20 ++++++++++++++++++--
> >  include/linux/fs.h |    2 ++
> >  4 files changed, 35 insertions(+), 12 deletions(-)
> > 
> > Index: mmotm-2.6.31-Sep14/fs/read_write.c
> > ===================================================================
> > --- mmotm-2.6.31-Sep14.orig/fs/read_write.c
> > +++ mmotm-2.6.31-Sep14/fs/read_write.c
> > @@ -205,6 +205,16 @@ bad:
> >  }
> >  #endif
> >  
> > +static bool
> > +__negative_fpos_check(struct inode *inode, loff_t pos, size_t count)
> > +{
> > +	if (pos + count < pos)
> > +		return -EOVERFLOW;
> > +	if (IS_VERYBIG(inode))
> > +		return 0;
> > +	return -EINVAL;
> 
> This always return negative values for !IS_VERYBIG inodes,
> looks weird at least.
> 
I need to refresh my eyes, too...
Thanks, will fix.. return value is not bool, but int...




> > +}
> > +
> >  /*
> >   * rw_verify_area doesn't like huge counts. We limit
> >   * them to something that fits in "int" so that others
> > @@ -222,8 +232,11 @@ int rw_verify_area(int read_write, struc
> >  	if (unlikely((ssize_t) count < 0))
> >  		return retval;
> >  	pos = *ppos;
> > -	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> > -		return retval;
> > +	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
> > +		retval = __negative_fpos_check(inode, pos, count);
> > +		if (retval)
> > +			return retval;
> > +	}
> >  
> >  	if (unlikely(inode->i_flock && mandatory_lock(inode))) {
> >  		retval = locks_mandatory_area(
> > Index: mmotm-2.6.31-Sep14/include/linux/fs.h
> > ===================================================================
> > --- mmotm-2.6.31-Sep14.orig/include/linux/fs.h
> > +++ mmotm-2.6.31-Sep14/include/linux/fs.h
> > @@ -231,6 +231,7 @@ struct inodes_stat_t {
> >  #define S_NOCMTIME	128	/* Do not update file c/mtime */
> >  #define S_SWAPFILE	256	/* Do not truncate: swapon got its bmaps */
> >  #define S_PRIVATE	512	/* Inode is fs-internal */
> > +#define S_VERYBIG	1024	/* Allow file's loff_t can be negative */
> >  
> >  /*
> >   * Note that nosuid etc flags are inode-specific: setting some file-system
> > @@ -265,6 +266,7 @@ struct inodes_stat_t {
> >  #define IS_NOCMTIME(inode)	((inode)->i_flags & S_NOCMTIME)
> >  #define IS_SWAPFILE(inode)	((inode)->i_flags & S_SWAPFILE)
> >  #define IS_PRIVATE(inode)	((inode)->i_flags & S_PRIVATE)
> > +#define IS_VERYBIG(inode)	((inode)->i_flags & S_VERYBIG)
> >  
> >  /* the read-only stuff doesn't really belong here, but any other place is
> >     probably as bad and I don't want to create yet another include file. */
> > Index: mmotm-2.6.31-Sep14/drivers/char/mem.c
> > ===================================================================
> > --- mmotm-2.6.31-Sep14.orig/drivers/char/mem.c
> > +++ mmotm-2.6.31-Sep14/drivers/char/mem.c
> > @@ -825,22 +825,23 @@ static const struct memdev {
> >  	const char *name;
> >  	const struct file_operations *fops;
> >  	struct backing_dev_info *dev_info;
> > +	bool	verybig;
> >  } devlist[] = {
> > -	[ 1] = { "mem", &mem_fops, &directly_mappable_cdev_bdi },
> > +	[1] = { "mem", &mem_fops, &directly_mappable_cdev_bdi, true },
> >  #ifdef CONFIG_DEVKMEM
> > -	[ 2] = { "kmem", &kmem_fops, &directly_mappable_cdev_bdi },
> > +	[2] = { "kmem", &kmem_fops, &directly_mappable_cdev_bdi, true },
> >  #endif
> > -	[ 3] = {"null", &null_fops, NULL },
> > +	[3] = {"null", &null_fops, NULL, false },
> >  #ifdef CONFIG_DEVPORT
> > -	[ 4] = { "port", &port_fops, NULL },
> > +	[4] = { "port", &port_fops, NULL, false },
> >  #endif
> > -	[ 5] = { "zero", &zero_fops, &zero_bdi },
> > -	[ 6] = { "full", &full_fops, NULL },
> > -	[ 7] = { "random", &random_fops, NULL },
> > -	[ 9] = { "urandom", &urandom_fops, NULL },
> > -	[11] = { "kmsg", &kmsg_fops, NULL },
> > +	[5] = { "zero", &zero_fops, &zero_bdi, false },
> > +	[6] = { "full", &full_fops, NULL, false },
> > +	[7] = { "random", &random_fops, NULL, false },
> > +	[9] = { "urandom", &urandom_fops, NULL, false },
> > +	[11] = { "kmsg", &kmsg_fops, NULL, false },
> >  #ifdef CONFIG_CRASH_DUMP
> > -	[12] = { "oldmem", &oldmem_fops, NULL },
> > +	[12] = { "oldmem", &oldmem_fops, NULL, false },
> >  #endif
> 
> How about make the fields aligned? That would look much nicer :)
> 
checkpatch.pl complains  agaisnt [ 5]...but ok, ajust "=" placement.

Thanks,
-kame


> Thanks,
> Fengguang
> 
> >  };
> >  
> > @@ -868,6 +869,8 @@ static int memory_open(struct inode *ino
> >  		ret = dev->fops->open(inode, filp);
> >  	else
> >  		ret = 0;
> > +	if (dev->verybig)
> > +		inode->i_flags |= S_VERYBIG;
> >  out:
> >  	unlock_kernel();
> >  	return ret;
> > Index: mmotm-2.6.31-Sep14/fs/proc/base.c
> > ===================================================================
> > --- mmotm-2.6.31-Sep14.orig/fs/proc/base.c
> > +++ mmotm-2.6.31-Sep14/fs/proc/base.c
> > @@ -779,6 +779,8 @@ static const struct file_operations proc
> >  static int mem_open(struct inode* inode, struct file* file)
> >  {
> >  	file->private_data = (void*)((long)current->self_exec_id);
> > +	/* this file is read only and we can catch out-pf-range */
> > +	inode->i_flags |= S_VERYBIG;
> >  	return 0;
> >  }
> >  
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH][bugfix] more checks for negative f_pos handling v4
  2009-09-17  6:21         ` Wu Fengguang
  2009-09-17  6:31           ` KAMEZAWA Hiroyuki
@ 2009-09-17  6:51           ` KAMEZAWA Hiroyuki
  2009-09-17  7:14             ` Wu Fengguang
  1 sibling, 1 reply; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-17  6:51 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: viro, Andrew Morton, linux-kernel, hugh.dickins, oleg, xiyou.wangcong

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Now, rw_verify_area() checsk f_pos is negative or not. And if
negative, returns -EINVAL.

But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..
has negative offsets. And we can't do any access via read/write
to the file(device).

This patch introduce a flag S_VERYBIG and allow negative file
offsets for big files. (usual files don't allow it.)

Changelog: v3->v4
 - make changes in mem.c aligned.
 - change __negative_fpos_check() to return int. 
 - fixed bug in "pos" check.
 - added comments.

Changelog: v2->v3
 - fixed bug in rw_verify_area (it cannot be compiled)

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 drivers/char/mem.c |   23 +++++++++++++----------
 fs/proc/base.c     |    2 ++
 fs/read_write.c    |   22 ++++++++++++++++++++--
 include/linux/fs.h |    2 ++
 4 files changed, 37 insertions(+), 12 deletions(-)

Index: mmotm-2.6.31-Sep14/fs/read_write.c
===================================================================
--- mmotm-2.6.31-Sep14.orig/fs/read_write.c
+++ mmotm-2.6.31-Sep14/fs/read_write.c
@@ -205,6 +205,21 @@ bad:
 }
 #endif
 
+static int
+__negative_fpos_check(struct inode *inode, loff_t pos, size_t count)
+{
+	/*
+	 * pos or pos+count is negative here, check overflow.
+	 * too big "count" will be caught in rw_verify_area().
+	 */
+	if ((pos < 0) && (pos + count < pos))
+		return -EOVERFLOW;
+	/* If !VERYBIG inode, negative pos(pos+count) is not allowed */
+	if (!IS_VERYBIG(inode))
+		return -EINVAL;
+	return 0;
+}
+
 /*
  * rw_verify_area doesn't like huge counts. We limit
  * them to something that fits in "int" so that others
@@ -222,8 +237,11 @@ int rw_verify_area(int read_write, struc
 	if (unlikely((ssize_t) count < 0))
 		return retval;
 	pos = *ppos;
-	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
-		return retval;
+	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
+		retval = __negative_fpos_check(inode, pos, count);
+		if (retval)
+			return retval;
+	}
 
 	if (unlikely(inode->i_flock && mandatory_lock(inode))) {
 		retval = locks_mandatory_area(
Index: mmotm-2.6.31-Sep14/include/linux/fs.h
===================================================================
--- mmotm-2.6.31-Sep14.orig/include/linux/fs.h
+++ mmotm-2.6.31-Sep14/include/linux/fs.h
@@ -231,6 +231,7 @@ struct inodes_stat_t {
 #define S_NOCMTIME	128	/* Do not update file c/mtime */
 #define S_SWAPFILE	256	/* Do not truncate: swapon got its bmaps */
 #define S_PRIVATE	512	/* Inode is fs-internal */
+#define S_VERYBIG	1024	/* Allow file's loff_t can be negative */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system
@@ -265,6 +266,7 @@ struct inodes_stat_t {
 #define IS_NOCMTIME(inode)	((inode)->i_flags & S_NOCMTIME)
 #define IS_SWAPFILE(inode)	((inode)->i_flags & S_SWAPFILE)
 #define IS_PRIVATE(inode)	((inode)->i_flags & S_PRIVATE)
+#define IS_VERYBIG(inode)	((inode)->i_flags & S_VERYBIG)
 
 /* the read-only stuff doesn't really belong here, but any other place is
    probably as bad and I don't want to create yet another include file. */
Index: mmotm-2.6.31-Sep14/drivers/char/mem.c
===================================================================
--- mmotm-2.6.31-Sep14.orig/drivers/char/mem.c
+++ mmotm-2.6.31-Sep14/drivers/char/mem.c
@@ -825,22 +825,23 @@ static const struct memdev {
 	const char *name;
 	const struct file_operations *fops;
 	struct backing_dev_info *dev_info;
+	bool	verybig;
 } devlist[] = {
-	[ 1] = { "mem", &mem_fops, &directly_mappable_cdev_bdi },
+	[1]  = { "mem", &mem_fops, &directly_mappable_cdev_bdi, true },
 #ifdef CONFIG_DEVKMEM
-	[ 2] = { "kmem", &kmem_fops, &directly_mappable_cdev_bdi },
+	[2]  = { "kmem", &kmem_fops, &directly_mappable_cdev_bdi, true },
 #endif
-	[ 3] = {"null", &null_fops, NULL },
+	[3]  = {"null", &null_fops, NULL, false },
 #ifdef CONFIG_DEVPORT
-	[ 4] = { "port", &port_fops, NULL },
+	[4]  = { "port", &port_fops, NULL, false },
 #endif
-	[ 5] = { "zero", &zero_fops, &zero_bdi },
-	[ 6] = { "full", &full_fops, NULL },
-	[ 7] = { "random", &random_fops, NULL },
-	[ 9] = { "urandom", &urandom_fops, NULL },
-	[11] = { "kmsg", &kmsg_fops, NULL },
+	[5]  = { "zero", &zero_fops, &zero_bdi, false },
+	[6]  = { "full", &full_fops, NULL, false },
+	[7]  = { "random", &random_fops, NULL, false },
+	[9]  = { "urandom", &urandom_fops, NULL, false },
+	[11] = { "kmsg", &kmsg_fops, NULL, false },
 #ifdef CONFIG_CRASH_DUMP
-	[12] = { "oldmem", &oldmem_fops, NULL },
+	[12] = { "oldmem", &oldmem_fops, NULL, false },
 #endif
 };
 
@@ -868,6 +869,8 @@ static int memory_open(struct inode *ino
 		ret = dev->fops->open(inode, filp);
 	else
 		ret = 0;
+	if (dev->verybig)
+		inode->i_flags |= S_VERYBIG;
 out:
 	unlock_kernel();
 	return ret;
Index: mmotm-2.6.31-Sep14/fs/proc/base.c
===================================================================
--- mmotm-2.6.31-Sep14.orig/fs/proc/base.c
+++ mmotm-2.6.31-Sep14/fs/proc/base.c
@@ -779,6 +779,8 @@ static const struct file_operations proc
 static int mem_open(struct inode* inode, struct file* file)
 {
 	file->private_data = (void*)((long)current->self_exec_id);
+	/* this file is read only and we can catch out-pf-range */
+	inode->i_flags |= S_VERYBIG;
 	return 0;
 }
 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling v3
  2009-09-17  6:31           ` KAMEZAWA Hiroyuki
@ 2009-09-17  6:53             ` Wu Fengguang
  0 siblings, 0 replies; 32+ messages in thread
From: Wu Fengguang @ 2009-09-17  6:53 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: viro, Andrew Morton, linux-kernel, hugh.dickins, oleg, xiyou.wangcong

On Thu, Sep 17, 2009 at 02:31:19PM +0800, KAMEZAWA Hiroyuki wrote:
> > How about make the fields aligned? That would look much nicer :)
> > 
> checkpatch.pl complains  agaisnt [ 5]...but ok, ajust "=" placement.

How about this? checkpatch won't complain on it :)

} devlist[] = {
	[1] = {     "mem",     &mem_fops, &directly_mappable_cdev_bdi,  true },
#ifdef CONFIG_DEVKMEM
	[2] = {    "kmem",    &kmem_fops, &directly_mappable_cdev_bdi,  true },
#endif
	[3] = {    "null",    &null_fops,                        NULL, false },
#ifdef CONFIG_DEVPORT
	[4] = {    "port",    &port_fops,                        NULL, false },
#endif
	[5] = {    "zero",    &zero_fops,                   &zero_bdi, false },
	[6] = {    "full",    &full_fops,                        NULL, false },
	[7] = {  "random",  &random_fops,                        NULL, false },
	[9] = { "urandom", &urandom_fops,                        NULL, false },
	[11] = {   "kmsg",    &kmsg_fops,                        NULL, false },
#ifdef CONFIG_CRASH_DUMP
	[12] = { "oldmem",  &oldmem_fops,                        NULL, false },
#endif
};

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling v4
  2009-09-17  6:51           ` [RFC][PATCH][bugfix] more checks for negative f_pos handling v4 KAMEZAWA Hiroyuki
@ 2009-09-17  7:14             ` Wu Fengguang
  2009-09-17  7:23               ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 32+ messages in thread
From: Wu Fengguang @ 2009-09-17  7:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: viro, Andrew Morton, linux-kernel, hugh.dickins, oleg, xiyou.wangcong

On Thu, Sep 17, 2009 at 02:51:00PM +0800, KAMEZAWA Hiroyuki wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Now, rw_verify_area() checsk f_pos is negative or not. And if
> negative, returns -EINVAL.
> 
> But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..
> has negative offsets. And we can't do any access via read/write
> to the file(device).
> 
> This patch introduce a flag S_VERYBIG and allow negative file
> offsets for big files. (usual files don't allow it.)
> 
> Changelog: v3->v4
>  - make changes in mem.c aligned.
>  - change __negative_fpos_check() to return int. 
>  - fixed bug in "pos" check.
>  - added comments.
> 
> Changelog: v2->v3
>  - fixed bug in rw_verify_area (it cannot be compiled)
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  drivers/char/mem.c |   23 +++++++++++++----------
>  fs/proc/base.c     |    2 ++
>  fs/read_write.c    |   22 ++++++++++++++++++++--
>  include/linux/fs.h |    2 ++
>  4 files changed, 37 insertions(+), 12 deletions(-)
> 
> Index: mmotm-2.6.31-Sep14/fs/read_write.c
> ===================================================================
> --- mmotm-2.6.31-Sep14.orig/fs/read_write.c
> +++ mmotm-2.6.31-Sep14/fs/read_write.c
> @@ -205,6 +205,21 @@ bad:
>  }
>  #endif
>  
> +static int
> +__negative_fpos_check(struct inode *inode, loff_t pos, size_t count)
> +{
> +	/*
> +	 * pos or pos+count is negative here, check overflow.
> +	 * too big "count" will be caught in rw_verify_area().
> +	 */
> +	if ((pos < 0) && (pos + count < pos))
> +		return -EOVERFLOW;

This returns -EOVERFLOW when pos=-10 and count=1. What's the intention?
Just to return a different error code other than -EINVAL?

Also it seems you did two behavior changes at the same time: the above
-EOVERFLOW and the below IS_VERYBIG(). Are they tightly coupled?

> +	/* If !VERYBIG inode, negative pos(pos+count) is not allowed */
> +	if (!IS_VERYBIG(inode))
> +		return -EINVAL;
> +	return 0;
> +}
> +
>  /*
>   * rw_verify_area doesn't like huge counts. We limit
>   * them to something that fits in "int" so that others
> @@ -222,8 +237,11 @@ int rw_verify_area(int read_write, struc
>  	if (unlikely((ssize_t) count < 0))
>  		return retval;
>  	pos = *ppos;
> -	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> -		return retval;
> +	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
> +		retval = __negative_fpos_check(inode, pos, count);
> +		if (retval)
> +			return retval;
> +	}

This could look more nicer:

        retval = __negative_fpos_check(inode, pos, count);
        if (retval)
                return retval;

But they are all minor issues :)

Thanks,
Fengguang

>  
>  	if (unlikely(inode->i_flock && mandatory_lock(inode))) {
>  		retval = locks_mandatory_area(
> Index: mmotm-2.6.31-Sep14/include/linux/fs.h
> ===================================================================
> --- mmotm-2.6.31-Sep14.orig/include/linux/fs.h
> +++ mmotm-2.6.31-Sep14/include/linux/fs.h
> @@ -231,6 +231,7 @@ struct inodes_stat_t {
>  #define S_NOCMTIME	128	/* Do not update file c/mtime */
>  #define S_SWAPFILE	256	/* Do not truncate: swapon got its bmaps */
>  #define S_PRIVATE	512	/* Inode is fs-internal */
> +#define S_VERYBIG	1024	/* Allow file's loff_t can be negative */
>  
>  /*
>   * Note that nosuid etc flags are inode-specific: setting some file-system
> @@ -265,6 +266,7 @@ struct inodes_stat_t {
>  #define IS_NOCMTIME(inode)	((inode)->i_flags & S_NOCMTIME)
>  #define IS_SWAPFILE(inode)	((inode)->i_flags & S_SWAPFILE)
>  #define IS_PRIVATE(inode)	((inode)->i_flags & S_PRIVATE)
> +#define IS_VERYBIG(inode)	((inode)->i_flags & S_VERYBIG)
>  
>  /* the read-only stuff doesn't really belong here, but any other place is
>     probably as bad and I don't want to create yet another include file. */
> Index: mmotm-2.6.31-Sep14/drivers/char/mem.c
> ===================================================================
> --- mmotm-2.6.31-Sep14.orig/drivers/char/mem.c
> +++ mmotm-2.6.31-Sep14/drivers/char/mem.c
> @@ -825,22 +825,23 @@ static const struct memdev {
>  	const char *name;
>  	const struct file_operations *fops;
>  	struct backing_dev_info *dev_info;
> +	bool	verybig;
>  } devlist[] = {
> -	[ 1] = { "mem", &mem_fops, &directly_mappable_cdev_bdi },
> +	[1]  = { "mem", &mem_fops, &directly_mappable_cdev_bdi, true },
>  #ifdef CONFIG_DEVKMEM
> -	[ 2] = { "kmem", &kmem_fops, &directly_mappable_cdev_bdi },
> +	[2]  = { "kmem", &kmem_fops, &directly_mappable_cdev_bdi, true },
>  #endif
> -	[ 3] = {"null", &null_fops, NULL },
> +	[3]  = {"null", &null_fops, NULL, false },
>  #ifdef CONFIG_DEVPORT
> -	[ 4] = { "port", &port_fops, NULL },
> +	[4]  = { "port", &port_fops, NULL, false },
>  #endif
> -	[ 5] = { "zero", &zero_fops, &zero_bdi },
> -	[ 6] = { "full", &full_fops, NULL },
> -	[ 7] = { "random", &random_fops, NULL },
> -	[ 9] = { "urandom", &urandom_fops, NULL },
> -	[11] = { "kmsg", &kmsg_fops, NULL },
> +	[5]  = { "zero", &zero_fops, &zero_bdi, false },
> +	[6]  = { "full", &full_fops, NULL, false },
> +	[7]  = { "random", &random_fops, NULL, false },
> +	[9]  = { "urandom", &urandom_fops, NULL, false },
> +	[11] = { "kmsg", &kmsg_fops, NULL, false },
>  #ifdef CONFIG_CRASH_DUMP
> -	[12] = { "oldmem", &oldmem_fops, NULL },
> +	[12] = { "oldmem", &oldmem_fops, NULL, false },
>  #endif
>  };
>  
> @@ -868,6 +869,8 @@ static int memory_open(struct inode *ino
>  		ret = dev->fops->open(inode, filp);
>  	else
>  		ret = 0;
> +	if (dev->verybig)
> +		inode->i_flags |= S_VERYBIG;
>  out:
>  	unlock_kernel();
>  	return ret;
> Index: mmotm-2.6.31-Sep14/fs/proc/base.c
> ===================================================================
> --- mmotm-2.6.31-Sep14.orig/fs/proc/base.c
> +++ mmotm-2.6.31-Sep14/fs/proc/base.c
> @@ -779,6 +779,8 @@ static const struct file_operations proc
>  static int mem_open(struct inode* inode, struct file* file)
>  {
>  	file->private_data = (void*)((long)current->self_exec_id);
> +	/* this file is read only and we can catch out-pf-range */
> +	inode->i_flags |= S_VERYBIG;
>  	return 0;
>  }
>  

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling v4
  2009-09-17  7:14             ` Wu Fengguang
@ 2009-09-17  7:23               ` KAMEZAWA Hiroyuki
  2009-09-17  7:30                 ` Wu Fengguang
  2009-09-17  9:42                 ` Wu Fengguang
  0 siblings, 2 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-17  7:23 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: viro, Andrew Morton, linux-kernel, hugh.dickins, oleg, xiyou.wangcong

On Thu, 17 Sep 2009 15:14:28 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> On Thu, Sep 17, 2009 at 02:51:00PM +0800, KAMEZAWA Hiroyuki wrote:
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > Now, rw_verify_area() checsk f_pos is negative or not. And if
> > negative, returns -EINVAL.
> > 
> > But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..
> > has negative offsets. And we can't do any access via read/write
> > to the file(device).
> > 
> > This patch introduce a flag S_VERYBIG and allow negative file
> > offsets for big files. (usual files don't allow it.)
> > 
> > Changelog: v3->v4
> >  - make changes in mem.c aligned.
> >  - change __negative_fpos_check() to return int. 
> >  - fixed bug in "pos" check.
> >  - added comments.
> > 
> > Changelog: v2->v3
> >  - fixed bug in rw_verify_area (it cannot be compiled)
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  drivers/char/mem.c |   23 +++++++++++++----------
> >  fs/proc/base.c     |    2 ++
> >  fs/read_write.c    |   22 ++++++++++++++++++++--
> >  include/linux/fs.h |    2 ++
> >  4 files changed, 37 insertions(+), 12 deletions(-)
> > 
> > Index: mmotm-2.6.31-Sep14/fs/read_write.c
> > ===================================================================
> > --- mmotm-2.6.31-Sep14.orig/fs/read_write.c
> > +++ mmotm-2.6.31-Sep14/fs/read_write.c
> > @@ -205,6 +205,21 @@ bad:
> >  }
> >  #endif
> >  
> > +static int
> > +__negative_fpos_check(struct inode *inode, loff_t pos, size_t count)
> > +{
> > +	/*
> > +	 * pos or pos+count is negative here, check overflow.
> > +	 * too big "count" will be caught in rw_verify_area().
> > +	 */
> > +	if ((pos < 0) && (pos + count < pos))
> > +		return -EOVERFLOW;
> 
> This returns -EOVERFLOW when pos=-10 and count=1. What's the intention?
  Hmm ?

  pos+count=-9 > -10 ? it's ok. no -EOVERFLOW

  pos=-10, count=11, 
  pos+count=1 > -10, then overflow.

> Just to return a different error code other than -EINVAL?
> 
For showing what this "if" checks. EINVAL is better ?

Thanks,
-Kame


> Also it seems you did two behavior changes at the same time: the above
> -EOVERFLOW and the below IS_VERYBIG(). Are they tightly coupled?
> 
> > +	/* If !VERYBIG inode, negative pos(pos+count) is not allowed */
> > +	if (!IS_VERYBIG(inode))
> > +		return -EINVAL;
> > +	return 0;
> > +}
> > +
> >  /*
> >   * rw_verify_area doesn't like huge counts. We limit
> >   * them to something that fits in "int" so that others
> > @@ -222,8 +237,11 @@ int rw_verify_area(int read_write, struc
> >  	if (unlikely((ssize_t) count < 0))
> >  		return retval;
> >  	pos = *ppos;
> > -	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> > -		return retval;
> > +	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
> > +		retval = __negative_fpos_check(inode, pos, count);
> > +		if (retval)
> > +			return retval;
> > +	}
> 
> This could look more nicer:
> 
>         retval = __negative_fpos_check(inode, pos, count);
>         if (retval)
>                 return retval;
> 
> But they are all minor issues :)
> 
> Thanks,
> Fengguang
> 
> >  
> >  	if (unlikely(inode->i_flock && mandatory_lock(inode))) {
> >  		retval = locks_mandatory_area(
> > Index: mmotm-2.6.31-Sep14/include/linux/fs.h
> > ===================================================================
> > --- mmotm-2.6.31-Sep14.orig/include/linux/fs.h
> > +++ mmotm-2.6.31-Sep14/include/linux/fs.h
> > @@ -231,6 +231,7 @@ struct inodes_stat_t {
> >  #define S_NOCMTIME	128	/* Do not update file c/mtime */
> >  #define S_SWAPFILE	256	/* Do not truncate: swapon got its bmaps */
> >  #define S_PRIVATE	512	/* Inode is fs-internal */
> > +#define S_VERYBIG	1024	/* Allow file's loff_t can be negative */
> >  
> >  /*
> >   * Note that nosuid etc flags are inode-specific: setting some file-system
> > @@ -265,6 +266,7 @@ struct inodes_stat_t {
> >  #define IS_NOCMTIME(inode)	((inode)->i_flags & S_NOCMTIME)
> >  #define IS_SWAPFILE(inode)	((inode)->i_flags & S_SWAPFILE)
> >  #define IS_PRIVATE(inode)	((inode)->i_flags & S_PRIVATE)
> > +#define IS_VERYBIG(inode)	((inode)->i_flags & S_VERYBIG)
> >  
> >  /* the read-only stuff doesn't really belong here, but any other place is
> >     probably as bad and I don't want to create yet another include file. */
> > Index: mmotm-2.6.31-Sep14/drivers/char/mem.c
> > ===================================================================
> > --- mmotm-2.6.31-Sep14.orig/drivers/char/mem.c
> > +++ mmotm-2.6.31-Sep14/drivers/char/mem.c
> > @@ -825,22 +825,23 @@ static const struct memdev {
> >  	const char *name;
> >  	const struct file_operations *fops;
> >  	struct backing_dev_info *dev_info;
> > +	bool	verybig;
> >  } devlist[] = {
> > -	[ 1] = { "mem", &mem_fops, &directly_mappable_cdev_bdi },
> > +	[1]  = { "mem", &mem_fops, &directly_mappable_cdev_bdi, true },
> >  #ifdef CONFIG_DEVKMEM
> > -	[ 2] = { "kmem", &kmem_fops, &directly_mappable_cdev_bdi },
> > +	[2]  = { "kmem", &kmem_fops, &directly_mappable_cdev_bdi, true },
> >  #endif
> > -	[ 3] = {"null", &null_fops, NULL },
> > +	[3]  = {"null", &null_fops, NULL, false },
> >  #ifdef CONFIG_DEVPORT
> > -	[ 4] = { "port", &port_fops, NULL },
> > +	[4]  = { "port", &port_fops, NULL, false },
> >  #endif
> > -	[ 5] = { "zero", &zero_fops, &zero_bdi },
> > -	[ 6] = { "full", &full_fops, NULL },
> > -	[ 7] = { "random", &random_fops, NULL },
> > -	[ 9] = { "urandom", &urandom_fops, NULL },
> > -	[11] = { "kmsg", &kmsg_fops, NULL },
> > +	[5]  = { "zero", &zero_fops, &zero_bdi, false },
> > +	[6]  = { "full", &full_fops, NULL, false },
> > +	[7]  = { "random", &random_fops, NULL, false },
> > +	[9]  = { "urandom", &urandom_fops, NULL, false },
> > +	[11] = { "kmsg", &kmsg_fops, NULL, false },
> >  #ifdef CONFIG_CRASH_DUMP
> > -	[12] = { "oldmem", &oldmem_fops, NULL },
> > +	[12] = { "oldmem", &oldmem_fops, NULL, false },
> >  #endif
> >  };
> >  
> > @@ -868,6 +869,8 @@ static int memory_open(struct inode *ino
> >  		ret = dev->fops->open(inode, filp);
> >  	else
> >  		ret = 0;
> > +	if (dev->verybig)
> > +		inode->i_flags |= S_VERYBIG;
> >  out:
> >  	unlock_kernel();
> >  	return ret;
> > Index: mmotm-2.6.31-Sep14/fs/proc/base.c
> > ===================================================================
> > --- mmotm-2.6.31-Sep14.orig/fs/proc/base.c
> > +++ mmotm-2.6.31-Sep14/fs/proc/base.c
> > @@ -779,6 +779,8 @@ static const struct file_operations proc
> >  static int mem_open(struct inode* inode, struct file* file)
> >  {
> >  	file->private_data = (void*)((long)current->self_exec_id);
> > +	/* this file is read only and we can catch out-pf-range */
> > +	inode->i_flags |= S_VERYBIG;
> >  	return 0;
> >  }
> >  
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling v4
  2009-09-17  7:23               ` KAMEZAWA Hiroyuki
@ 2009-09-17  7:30                 ` Wu Fengguang
  2009-09-17  9:42                 ` Wu Fengguang
  1 sibling, 0 replies; 32+ messages in thread
From: Wu Fengguang @ 2009-09-17  7:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: viro, Andrew Morton, linux-kernel, hugh.dickins, oleg, xiyou.wangcong

On Thu, Sep 17, 2009 at 03:23:24PM +0800, KAMEZAWA Hiroyuki wrote:
> On Thu, 17 Sep 2009 15:14:28 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > On Thu, Sep 17, 2009 at 02:51:00PM +0800, KAMEZAWA Hiroyuki wrote:
> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > 
> > > Now, rw_verify_area() checsk f_pos is negative or not. And if
> > > negative, returns -EINVAL.
> > > 
> > > But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..
> > > has negative offsets. And we can't do any access via read/write
> > > to the file(device).
> > > 
> > > This patch introduce a flag S_VERYBIG and allow negative file
> > > offsets for big files. (usual files don't allow it.)
> > > 
> > > Changelog: v3->v4
> > >  - make changes in mem.c aligned.
> > >  - change __negative_fpos_check() to return int. 
> > >  - fixed bug in "pos" check.
> > >  - added comments.
> > > 
> > > Changelog: v2->v3
> > >  - fixed bug in rw_verify_area (it cannot be compiled)
> > > 
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > ---
> > >  drivers/char/mem.c |   23 +++++++++++++----------
> > >  fs/proc/base.c     |    2 ++
> > >  fs/read_write.c    |   22 ++++++++++++++++++++--
> > >  include/linux/fs.h |    2 ++
> > >  4 files changed, 37 insertions(+), 12 deletions(-)
> > > 
> > > Index: mmotm-2.6.31-Sep14/fs/read_write.c
> > > ===================================================================
> > > --- mmotm-2.6.31-Sep14.orig/fs/read_write.c
> > > +++ mmotm-2.6.31-Sep14/fs/read_write.c
> > > @@ -205,6 +205,21 @@ bad:
> > >  }
> > >  #endif
> > >  
> > > +static int
> > > +__negative_fpos_check(struct inode *inode, loff_t pos, size_t count)
> > > +{
> > > +	/*
> > > +	 * pos or pos+count is negative here, check overflow.
> > > +	 * too big "count" will be caught in rw_verify_area().
> > > +	 */
> > > +	if ((pos < 0) && (pos + count < pos))
> > > +		return -EOVERFLOW;
> > 
> > This returns -EOVERFLOW when pos=-10 and count=1. What's the intention?
>   Hmm ?
> 
>   pos+count=-9 > -10 ? it's ok. no -EOVERFLOW
> 
>   pos=-10, count=11, 
>   pos+count=1 > -10, then overflow.

Ah yes, was confused..

> > Just to return a different error code other than -EINVAL?
> > 
> For showing what this "if" checks. EINVAL is better ?

..so please ignore this question.

Thanks,
Fengguang

> 
> > Also it seems you did two behavior changes at the same time: the above
> > -EOVERFLOW and the below IS_VERYBIG(). Are they tightly coupled?
> > 
> > > +	/* If !VERYBIG inode, negative pos(pos+count) is not allowed */
> > > +	if (!IS_VERYBIG(inode))
> > > +		return -EINVAL;
> > > +	return 0;
> > > +}
> > > +
> > >  /*
> > >   * rw_verify_area doesn't like huge counts. We limit
> > >   * them to something that fits in "int" so that others
> > > @@ -222,8 +237,11 @@ int rw_verify_area(int read_write, struc
> > >  	if (unlikely((ssize_t) count < 0))
> > >  		return retval;
> > >  	pos = *ppos;
> > > -	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> > > -		return retval;
> > > +	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
> > > +		retval = __negative_fpos_check(inode, pos, count);
> > > +		if (retval)
> > > +			return retval;
> > > +	}
> > 
> > This could look more nicer:
> > 
> >         retval = __negative_fpos_check(inode, pos, count);
> >         if (retval)
> >                 return retval;
> > 
> > But they are all minor issues :)
> > 
> > Thanks,
> > Fengguang
> > 
> > >  
> > >  	if (unlikely(inode->i_flock && mandatory_lock(inode))) {
> > >  		retval = locks_mandatory_area(
> > > Index: mmotm-2.6.31-Sep14/include/linux/fs.h
> > > ===================================================================
> > > --- mmotm-2.6.31-Sep14.orig/include/linux/fs.h
> > > +++ mmotm-2.6.31-Sep14/include/linux/fs.h
> > > @@ -231,6 +231,7 @@ struct inodes_stat_t {
> > >  #define S_NOCMTIME	128	/* Do not update file c/mtime */
> > >  #define S_SWAPFILE	256	/* Do not truncate: swapon got its bmaps */
> > >  #define S_PRIVATE	512	/* Inode is fs-internal */
> > > +#define S_VERYBIG	1024	/* Allow file's loff_t can be negative */
> > >  
> > >  /*
> > >   * Note that nosuid etc flags are inode-specific: setting some file-system
> > > @@ -265,6 +266,7 @@ struct inodes_stat_t {
> > >  #define IS_NOCMTIME(inode)	((inode)->i_flags & S_NOCMTIME)
> > >  #define IS_SWAPFILE(inode)	((inode)->i_flags & S_SWAPFILE)
> > >  #define IS_PRIVATE(inode)	((inode)->i_flags & S_PRIVATE)
> > > +#define IS_VERYBIG(inode)	((inode)->i_flags & S_VERYBIG)
> > >  
> > >  /* the read-only stuff doesn't really belong here, but any other place is
> > >     probably as bad and I don't want to create yet another include file. */
> > > Index: mmotm-2.6.31-Sep14/drivers/char/mem.c
> > > ===================================================================
> > > --- mmotm-2.6.31-Sep14.orig/drivers/char/mem.c
> > > +++ mmotm-2.6.31-Sep14/drivers/char/mem.c
> > > @@ -825,22 +825,23 @@ static const struct memdev {
> > >  	const char *name;
> > >  	const struct file_operations *fops;
> > >  	struct backing_dev_info *dev_info;
> > > +	bool	verybig;
> > >  } devlist[] = {
> > > -	[ 1] = { "mem", &mem_fops, &directly_mappable_cdev_bdi },
> > > +	[1]  = { "mem", &mem_fops, &directly_mappable_cdev_bdi, true },
> > >  #ifdef CONFIG_DEVKMEM
> > > -	[ 2] = { "kmem", &kmem_fops, &directly_mappable_cdev_bdi },
> > > +	[2]  = { "kmem", &kmem_fops, &directly_mappable_cdev_bdi, true },
> > >  #endif
> > > -	[ 3] = {"null", &null_fops, NULL },
> > > +	[3]  = {"null", &null_fops, NULL, false },
> > >  #ifdef CONFIG_DEVPORT
> > > -	[ 4] = { "port", &port_fops, NULL },
> > > +	[4]  = { "port", &port_fops, NULL, false },
> > >  #endif
> > > -	[ 5] = { "zero", &zero_fops, &zero_bdi },
> > > -	[ 6] = { "full", &full_fops, NULL },
> > > -	[ 7] = { "random", &random_fops, NULL },
> > > -	[ 9] = { "urandom", &urandom_fops, NULL },
> > > -	[11] = { "kmsg", &kmsg_fops, NULL },
> > > +	[5]  = { "zero", &zero_fops, &zero_bdi, false },
> > > +	[6]  = { "full", &full_fops, NULL, false },
> > > +	[7]  = { "random", &random_fops, NULL, false },
> > > +	[9]  = { "urandom", &urandom_fops, NULL, false },
> > > +	[11] = { "kmsg", &kmsg_fops, NULL, false },
> > >  #ifdef CONFIG_CRASH_DUMP
> > > -	[12] = { "oldmem", &oldmem_fops, NULL },
> > > +	[12] = { "oldmem", &oldmem_fops, NULL, false },
> > >  #endif
> > >  };
> > >  
> > > @@ -868,6 +869,8 @@ static int memory_open(struct inode *ino
> > >  		ret = dev->fops->open(inode, filp);
> > >  	else
> > >  		ret = 0;
> > > +	if (dev->verybig)
> > > +		inode->i_flags |= S_VERYBIG;
> > >  out:
> > >  	unlock_kernel();
> > >  	return ret;
> > > Index: mmotm-2.6.31-Sep14/fs/proc/base.c
> > > ===================================================================
> > > --- mmotm-2.6.31-Sep14.orig/fs/proc/base.c
> > > +++ mmotm-2.6.31-Sep14/fs/proc/base.c
> > > @@ -779,6 +779,8 @@ static const struct file_operations proc
> > >  static int mem_open(struct inode* inode, struct file* file)
> > >  {
> > >  	file->private_data = (void*)((long)current->self_exec_id);
> > > +	/* this file is read only and we can catch out-pf-range */
> > > +	inode->i_flags |= S_VERYBIG;
> > >  	return 0;
> > >  }
> > >  
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling v4
  2009-09-17  7:23               ` KAMEZAWA Hiroyuki
  2009-09-17  7:30                 ` Wu Fengguang
@ 2009-09-17  9:42                 ` Wu Fengguang
  2009-09-17 10:54                   ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 32+ messages in thread
From: Wu Fengguang @ 2009-09-17  9:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: viro, Andrew Morton, linux-kernel, hugh.dickins, oleg, xiyou.wangcong

On Thu, Sep 17, 2009 at 03:23:24PM +0800, KAMEZAWA Hiroyuki wrote:
> On Thu, 17 Sep 2009 15:14:28 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > On Thu, Sep 17, 2009 at 02:51:00PM +0800, KAMEZAWA Hiroyuki wrote:
> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > 
> > > Now, rw_verify_area() checsk f_pos is negative or not. And if
> > > negative, returns -EINVAL.
> > > 
> > > But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..
> > > has negative offsets. And we can't do any access via read/write
> > > to the file(device).
> > > 
> > > This patch introduce a flag S_VERYBIG and allow negative file
> > > offsets for big files. (usual files don't allow it.)
> > > 
> > > Changelog: v3->v4
> > >  - make changes in mem.c aligned.
> > >  - change __negative_fpos_check() to return int. 
> > >  - fixed bug in "pos" check.
> > >  - added comments.
> > > 
> > > Changelog: v2->v3
> > >  - fixed bug in rw_verify_area (it cannot be compiled)
> > > 
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > ---
> > >  drivers/char/mem.c |   23 +++++++++++++----------
> > >  fs/proc/base.c     |    2 ++
> > >  fs/read_write.c    |   22 ++++++++++++++++++++--
> > >  include/linux/fs.h |    2 ++
> > >  4 files changed, 37 insertions(+), 12 deletions(-)
> > > 
> > > Index: mmotm-2.6.31-Sep14/fs/read_write.c
> > > ===================================================================
> > > --- mmotm-2.6.31-Sep14.orig/fs/read_write.c
> > > +++ mmotm-2.6.31-Sep14/fs/read_write.c
> > > @@ -205,6 +205,21 @@ bad:
> > >  }
> > >  #endif
> > >  
> > > +static int
> > > +__negative_fpos_check(struct inode *inode, loff_t pos, size_t count)
> > > +{
> > > +	/*
> > > +	 * pos or pos+count is negative here, check overflow.
> > > +	 * too big "count" will be caught in rw_verify_area().
> > > +	 */
> > > +	if ((pos < 0) && (pos + count < pos))
> > > +		return -EOVERFLOW;
> > 
> > This returns -EOVERFLOW when pos=-10 and count=1. What's the intention?
>   Hmm ?
> 
>   pos+count=-9 > -10 ? it's ok. no -EOVERFLOW
> 
>   pos=-10, count=11, 
>   pos+count=1 > -10, then overflow.

Hmm, it seems less confusing to do

static int __negative_fpos_check(struct inode *inode,
                                 unsigned long pos,
                                 unsigned long count)
{
        if (pos + count < pos)
                return -EOVERFLOW;
        ...
}

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling v4
  2009-09-17  9:42                 ` Wu Fengguang
@ 2009-09-17 10:54                   ` KAMEZAWA Hiroyuki
  2009-09-17 10:58                     ` KAMEZAWA Hiroyuki
  2009-09-17 12:40                     ` Wu Fengguang
  0 siblings, 2 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-17 10:54 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: KAMEZAWA Hiroyuki, viro, Andrew Morton, linux-kernel,
	hugh.dickins, oleg, xiyou.wangcong

Wu Fengguang wrote:
> On Thu, Sep 17, 2009 at 03:23:24PM +0800, KAMEZAWA Hiroyuki wrote:
>> On Thu, 17 Sep 2009 15:14:28 +0800
>> Wu Fengguang <fengguang.wu@intel.com> wrote:
>>
>> > On Thu, Sep 17, 2009 at 02:51:00PM +0800, KAMEZAWA Hiroyuki wrote:
>> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> > >
>> > > Now, rw_verify_area() checsk f_pos is negative or not. And if
>> > > negative, returns -EINVAL.
>> > >
>> > > But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..
>> > > has negative offsets. And we can't do any access via read/write
>> > > to the file(device).
>> > >
>> > > This patch introduce a flag S_VERYBIG and allow negative file
>> > > offsets for big files. (usual files don't allow it.)
>> > >
>> > > Changelog: v3->v4
>> > >  - make changes in mem.c aligned.
>> > >  - change __negative_fpos_check() to return int.
>> > >  - fixed bug in "pos" check.
>> > >  - added comments.
>> > >
>> > > Changelog: v2->v3
>> > >  - fixed bug in rw_verify_area (it cannot be compiled)
>> > >
>> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> > > ---
>> > >  drivers/char/mem.c |   23 +++++++++++++----------
>> > >  fs/proc/base.c     |    2 ++
>> > >  fs/read_write.c    |   22 ++++++++++++++++++++--
>> > >  include/linux/fs.h |    2 ++
>> > >  4 files changed, 37 insertions(+), 12 deletions(-)
>> > >
>> > > Index: mmotm-2.6.31-Sep14/fs/read_write.c
>> > > ===================================================================
>> > > --- mmotm-2.6.31-Sep14.orig/fs/read_write.c
>> > > +++ mmotm-2.6.31-Sep14/fs/read_write.c
>> > > @@ -205,6 +205,21 @@ bad:
>> > >  }
>> > >  #endif
>> > >
>> > > +static int
>> > > +__negative_fpos_check(struct inode *inode, loff_t pos, size_t
>> count)
>> > > +{
>> > > +	/*
>> > > +	 * pos or pos+count is negative here, check overflow.
>> > > +	 * too big "count" will be caught in rw_verify_area().
>> > > +	 */
>> > > +	if ((pos < 0) && (pos + count < pos))
>> > > +		return -EOVERFLOW;
>> >
>> > This returns -EOVERFLOW when pos=-10 and count=1. What's the
>> intention?
>>   Hmm ?
>>
>>   pos+count=-9 > -10 ? it's ok. no -EOVERFLOW
>>
>>   pos=-10, count=11,
>>   pos+count=1 > -10, then overflow.
>
> Hmm, it seems less confusing to do
>
> static int __negative_fpos_check(struct inode *inode,
>                                  unsigned long pos,
>                                  unsigned long count)
> {
>         if (pos + count < pos)
>                 return -EOVERFLOW;
>         ...
> }
>
have to avoid pos == LONGLONGMAX case.

Thanks,
-Kame


> Thanks,
> Fengguang
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling v4
  2009-09-17 10:54                   ` KAMEZAWA Hiroyuki
@ 2009-09-17 10:58                     ` KAMEZAWA Hiroyuki
  2009-09-17 12:40                     ` Wu Fengguang
  1 sibling, 0 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-17 10:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, KAMEZAWA Hiroyuki, viro, Andrew Morton,
	linux-kernel, hugh.dickins, oleg, xiyou.wangcong

KAMEZAWA Hiroyuki wrote:
> Wu Fengguang wrote:
>> On Thu, Sep 17, 2009 at 03:23:24PM +0800, KAMEZAWA Hiroyuki wrote:
>>> On Thu, 17 Sep 2009 15:14:28 +0800
>>> Wu Fengguang <fengguang.wu@intel.com> wrote:
>>>
>>> > On Thu, Sep 17, 2009 at 02:51:00PM +0800, KAMEZAWA Hiroyuki wrote:
>>> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>> > >
>>> > > Now, rw_verify_area() checsk f_pos is negative or not. And if
>>> > > negative, returns -EINVAL.
>>> > >
>>> > > But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..
>>> > > has negative offsets. And we can't do any access via read/write
>>> > > to the file(device).
>>> > >
>>> > > This patch introduce a flag S_VERYBIG and allow negative file
>>> > > offsets for big files. (usual files don't allow it.)
>>> > >
>>> > > Changelog: v3->v4
>>> > >  - make changes in mem.c aligned.
>>> > >  - change __negative_fpos_check() to return int.
>>> > >  - fixed bug in "pos" check.
>>> > >  - added comments.
>>> > >
>>> > > Changelog: v2->v3
>>> > >  - fixed bug in rw_verify_area (it cannot be compiled)
>>> > >
>>> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>> > > ---
>>> > >  drivers/char/mem.c |   23 +++++++++++++----------
>>> > >  fs/proc/base.c     |    2 ++
>>> > >  fs/read_write.c    |   22 ++++++++++++++++++++--
>>> > >  include/linux/fs.h |    2 ++
>>> > >  4 files changed, 37 insertions(+), 12 deletions(-)
>>> > >
>>> > > Index: mmotm-2.6.31-Sep14/fs/read_write.c
>>> > > ===================================================================
>>> > > --- mmotm-2.6.31-Sep14.orig/fs/read_write.c
>>> > > +++ mmotm-2.6.31-Sep14/fs/read_write.c
>>> > > @@ -205,6 +205,21 @@ bad:
>>> > >  }
>>> > >  #endif
>>> > >
>>> > > +static int
>>> > > +__negative_fpos_check(struct inode *inode, loff_t pos, size_t
>>> count)
>>> > > +{
>>> > > +	/*
>>> > > +	 * pos or pos+count is negative here, check overflow.
>>> > > +	 * too big "count" will be caught in rw_verify_area().
>>> > > +	 */
>>> > > +	if ((pos < 0) && (pos + count < pos))
>>> > > +		return -EOVERFLOW;
>>> >
>>> > This returns -EOVERFLOW when pos=-10 and count=1. What's the
>>> intention?
>>>   Hmm ?
>>>
>>>   pos+count=-9 > -10 ? it's ok. no -EOVERFLOW
>>>
>>>   pos=-10, count=11,
>>>   pos+count=1 > -10, then overflow.
>>
>> Hmm, it seems less confusing to do
>>
>> static int __negative_fpos_check(struct inode *inode,
>>                                  unsigned long pos,
>>                                  unsigned long count)
>> {
>>         if (pos + count < pos)
>>                 return -EOVERFLOW;
>>         ...
>> }
>>
> have to avoid pos == LONGLONGMAX case.
>
Ah, you ask me to do cast from loff_t to unsigned long long ?

Not making much difference, I think. This is usual math.
But ok, I don't want to explain again.
If I post v5, I'll do.

Thanks,
-Kame


> Thanks,
> -Kame
>
>
>> Thanks,
>> Fengguang
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
>> in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling v4
  2009-09-17 10:54                   ` KAMEZAWA Hiroyuki
  2009-09-17 10:58                     ` KAMEZAWA Hiroyuki
@ 2009-09-17 12:40                     ` Wu Fengguang
  2009-09-18  0:02                       ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 32+ messages in thread
From: Wu Fengguang @ 2009-09-17 12:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: viro, Andrew Morton, linux-kernel, hugh.dickins, oleg, xiyou.wangcong

On Thu, Sep 17, 2009 at 06:54:00PM +0800, KAMEZAWA Hiroyuki wrote:
> Wu Fengguang wrote:
> > On Thu, Sep 17, 2009 at 03:23:24PM +0800, KAMEZAWA Hiroyuki wrote:
> >> On Thu, 17 Sep 2009 15:14:28 +0800
> >> Wu Fengguang <fengguang.wu@intel.com> wrote:
> >>
> >> > On Thu, Sep 17, 2009 at 02:51:00PM +0800, KAMEZAWA Hiroyuki wrote:
> >> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >> > >
> >> > > Now, rw_verify_area() checsk f_pos is negative or not. And if
> >> > > negative, returns -EINVAL.
> >> > >
> >> > > But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..
> >> > > has negative offsets. And we can't do any access via read/write
> >> > > to the file(device).
> >> > >
> >> > > This patch introduce a flag S_VERYBIG and allow negative file
> >> > > offsets for big files. (usual files don't allow it.)
> >> > >
> >> > > Changelog: v3->v4
> >> > >  - make changes in mem.c aligned.
> >> > >  - change __negative_fpos_check() to return int.
> >> > >  - fixed bug in "pos" check.
> >> > >  - added comments.
> >> > >
> >> > > Changelog: v2->v3
> >> > >  - fixed bug in rw_verify_area (it cannot be compiled)
> >> > >
> >> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >> > > ---
> >> > >  drivers/char/mem.c |   23 +++++++++++++----------
> >> > >  fs/proc/base.c     |    2 ++
> >> > >  fs/read_write.c    |   22 ++++++++++++++++++++--
> >> > >  include/linux/fs.h |    2 ++
> >> > >  4 files changed, 37 insertions(+), 12 deletions(-)
> >> > >
> >> > > Index: mmotm-2.6.31-Sep14/fs/read_write.c
> >> > > ===================================================================
> >> > > --- mmotm-2.6.31-Sep14.orig/fs/read_write.c
> >> > > +++ mmotm-2.6.31-Sep14/fs/read_write.c
> >> > > @@ -205,6 +205,21 @@ bad:
> >> > >  }
> >> > >  #endif
> >> > >
> >> > > +static int
> >> > > +__negative_fpos_check(struct inode *inode, loff_t pos, size_t
> >> count)
> >> > > +{
> >> > > +	/*
> >> > > +	 * pos or pos+count is negative here, check overflow.
> >> > > +	 * too big "count" will be caught in rw_verify_area().
> >> > > +	 */
> >> > > +	if ((pos < 0) && (pos + count < pos))
> >> > > +		return -EOVERFLOW;
> >> >
> >> > This returns -EOVERFLOW when pos=-10 and count=1. What's the
> >> intention?
> >>   Hmm ?
> >>
> >>   pos+count=-9 > -10 ? it's ok. no -EOVERFLOW
> >>
> >>   pos=-10, count=11,
> >>   pos+count=1 > -10, then overflow.
> >
> > Hmm, it seems less confusing to do
> >
> > static int __negative_fpos_check(struct inode *inode,
> >                                  unsigned long pos,
> >                                  unsigned long count)
> > {
> >         if (pos + count < pos)
> >                 return -EOVERFLOW;
> >         ...
> > }
> >
> have to avoid pos == LONGLONGMAX case.

Ah sorry, didn't know loff_t could be long long..

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling v4
  2009-09-17 12:40                     ` Wu Fengguang
@ 2009-09-18  0:02                       ` KAMEZAWA Hiroyuki
  2009-09-18  2:25                         ` Américo Wang
  0 siblings, 1 reply; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-18  0:02 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: viro, Andrew Morton, linux-kernel, hugh.dickins, oleg, xiyou.wangcong

On Thu, 17 Sep 2009 20:40:39 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> > >         if (pos + count < pos)
> > >                 return -EOVERFLOW;
> > >         ...
> > > }
> > >
> > have to avoid pos == LONGLONGMAX case.
> 
> Ah sorry, didn't know loff_t could be long long..
> 
Thank you. I'll restart this after merge window.
(And maybe I had to CC fsdevel...)

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling v4
  2009-09-18  0:02                       ` KAMEZAWA Hiroyuki
@ 2009-09-18  2:25                         ` Américo Wang
  0 siblings, 0 replies; 32+ messages in thread
From: Américo Wang @ 2009-09-18  2:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, viro, Andrew Morton, linux-kernel, hugh.dickins, oleg

On Fri, Sep 18, 2009 at 8:02 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Thu, 17 Sep 2009 20:40:39 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
>
>> > >         if (pos + count < pos)
>> > >                 return -EOVERFLOW;
>> > >         ...
>> > > }
>> > >
>> > have to avoid pos == LONGLONGMAX case.
>>
>> Ah sorry, didn't know loff_t could be long long..
>>
> Thank you. I'll restart this after merge window.
> (And maybe I had to CC fsdevel...)

Yes, please do so. I would like to hear from Al..

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2009-09-18  2:25 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-14  3:29 [PATCH] devmem: handle partial kmem write/read Wu Fengguang
2009-09-14  4:34 ` Wu Fengguang
2009-09-15  0:24   ` KAMEZAWA Hiroyuki
2009-09-15  1:52     ` KAMEZAWA Hiroyuki
2009-09-15  2:05       ` Wu Fengguang
2009-09-15  2:02     ` Wu Fengguang
2009-09-15  2:31       ` KAMEZAWA Hiroyuki
2009-09-15  2:57         ` Wu Fengguang
2009-09-15  7:58 ` Question: how to handle too big f_pos " KAMEZAWA Hiroyuki
2009-09-15  8:11   ` Wu Fengguang
2009-09-15  9:52   ` Hugh Dickins
2009-09-16  5:29   ` [RFC][PATCH][bugfix] more checks for negative f_pos handling (Was Re: Question: how to handle too big f_pos KAMEZAWA Hiroyuki
2009-09-16  8:20     ` Américo Wang
2009-09-16  8:44       ` KAMEZAWA Hiroyuki
2009-09-16  9:13         ` Américo Wang
2009-09-16 12:06           ` KAMEZAWA Hiroyuki
2009-09-17  3:06             ` Américo Wang
2009-09-17  5:53     ` [RFC][PATCH][bugfix] more checks for negative f_pos handling v2 KAMEZAWA Hiroyuki
2009-09-17  6:07       ` [RFC][PATCH][bugfix] more checks for negative f_pos handling v3 KAMEZAWA Hiroyuki
2009-09-17  6:21         ` Wu Fengguang
2009-09-17  6:31           ` KAMEZAWA Hiroyuki
2009-09-17  6:53             ` Wu Fengguang
2009-09-17  6:51           ` [RFC][PATCH][bugfix] more checks for negative f_pos handling v4 KAMEZAWA Hiroyuki
2009-09-17  7:14             ` Wu Fengguang
2009-09-17  7:23               ` KAMEZAWA Hiroyuki
2009-09-17  7:30                 ` Wu Fengguang
2009-09-17  9:42                 ` Wu Fengguang
2009-09-17 10:54                   ` KAMEZAWA Hiroyuki
2009-09-17 10:58                     ` KAMEZAWA Hiroyuki
2009-09-17 12:40                     ` Wu Fengguang
2009-09-18  0:02                       ` KAMEZAWA Hiroyuki
2009-09-18  2:25                         ` Américo Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.