linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Oops on 2.4.x invalid procfs i_ino value
@ 2004-12-17 22:49 Brent Casavant
  2004-12-18  0:38 ` William Lee Irwin III
  0 siblings, 1 reply; 7+ messages in thread
From: Brent Casavant @ 2004-12-17 22:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: wli, mingo

I've run into a number of crashes while closing procfs stat files
when a system is under load.  I think I've found the problem, but
am a little unsure how to proceed.  This all happens to be on a
2.4.21 based kernel, but by my brief code inspection I think the
problem still exists on more recent 2.4.x kernels.

In procfs the fake_ino() macro is used to construct the inode number
for each entry.

	#define fake_ino(pid,ino) (((pid)<<16)|(ino))

In particular this is used in proc_pid_make_inode:

	inode->i_ino = fake_ino(task->pid, ino);

Note that a pid may be more than 16 bits in width (e.g. in IA64), and
we're trying to stuff it into the upper 16 bits of the inode number.
This isn't usually a problem, except when the lower 16 bits of the
inode happen to be 0 (i.e. pids that are a multiple of 65536).

Why does zero matter?  Glad you asked.

In proc_delete_inode there is a check to see if the inode is is
a "proper" (whatever that means) procfs inode.  The whole function
is:

static void proc_delete_inode(struct inode *inode)
{
	struct proc_dir_entry *de = inode->u.generic_ip;

	inode->i_state = I_CLEAR;

	if (PROC_INODE_PROPER(inode)) {
		proc_pid_delete_inode(inode);
		return;
	}
	if (de) {
		if (de->owner)
			__MOD_DEC_USE_COUNT(de->owner);
		de_put(de);
	}
}

PROC_INODE_PROPER() is:

	#define PROC_INODE_PROPER(inode) ((inode)->i_ino & ~0xffff)

In other words, it checks whether the top 16 bits of the inode number
(equivalent to the bottom 16 bits of the pid) are non-zero.

Thus closing a proc entry for any task with a pid that is a multiple of
65536 will fail this check, skip proc_pid_delete_inode, and call
__MOD_DEC_USE_COUNT, more than likely causing a panic on an invalid
memory access, and minimally corrupting something in memory otherwise.

I don't have a solution coded up (mostly because I'm a bit bleary
eyed after looking at crash dumps all day) -- but are there any
thoughts on how to go about addressing this one?  An obvious workaround
is setting kernel.pid_max to 65535, but that's only a workaround, not
a solution.

On a related note, if it matters, on about half the crash dumps I've
looked at, I see a pid of 0 has been assigned to a user process,
tripping this same problem.  I suspect there's another bug somewhere
that's allowing a pid of 0 to be chosen in the first place -- but I
don't totally discount that this problem may lay in SGI's patches to
this particular kernel -- I'll need to take a more thorough look.

Thanks,
Brent

-- 
Brent Casavant                          If you had nothing to fear,
bcasavan@sgi.com                        how then could you be brave?
Silicon Graphics, Inc.                    -- Queen Dama, Source Wars

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Oops on 2.4.x invalid procfs i_ino value
  2004-12-17 22:49 Oops on 2.4.x invalid procfs i_ino value Brent Casavant
@ 2004-12-18  0:38 ` William Lee Irwin III
  2004-12-18  0:47   ` William Lee Irwin III
  0 siblings, 1 reply; 7+ messages in thread
From: William Lee Irwin III @ 2004-12-18  0:38 UTC (permalink / raw)
  To: Brent Casavant; +Cc: linux-kernel, mingo

On Fri, Dec 17, 2004 at 04:49:44PM -0600, Brent Casavant wrote:
> Thus closing a proc entry for any task with a pid that is a multiple of
> 65536 will fail this check, skip proc_pid_delete_inode, and call
> __MOD_DEC_USE_COUNT, more than likely causing a panic on an invalid
> memory access, and minimally corrupting something in memory otherwise.
> I don't have a solution coded up (mostly because I'm a bit bleary
> eyed after looking at crash dumps all day) -- but are there any
> thoughts on how to go about addressing this one?  An obvious workaround
> is setting kernel.pid_max to 65535, but that's only a workaround, not
> a solution.
> On a related note, if it matters, on about half the crash dumps I've
> looked at, I see a pid of 0 has been assigned to a user process,
> tripping this same problem.  I suspect there's another bug somewhere
> that's allowing a pid of 0 to be chosen in the first place -- but I
> don't totally discount that this problem may lay in SGI's patches to
> this particular kernel -- I'll need to take a more thorough look.

That's rather ominous. I'll pore over pid.c and see what's going on.
Also, does the pid.c in your kernel version match 2.6.x-CURRENT?


-- wli

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Oops on 2.4.x invalid procfs i_ino value
  2004-12-18  0:38 ` William Lee Irwin III
@ 2004-12-18  0:47   ` William Lee Irwin III
  2004-12-20 22:35     ` Brent Casavant
  0 siblings, 1 reply; 7+ messages in thread
From: William Lee Irwin III @ 2004-12-18  0:47 UTC (permalink / raw)
  To: Brent Casavant; +Cc: linux-kernel, mingo

On Fri, Dec 17, 2004 at 04:49:44PM -0600, Brent Casavant wrote:
>> On a related note, if it matters, on about half the crash dumps I've
>> looked at, I see a pid of 0 has been assigned to a user process,
>> tripping this same problem.  I suspect there's another bug somewhere
>> that's allowing a pid of 0 to be chosen in the first place -- but I
>> don't totally discount that this problem may lay in SGI's patches to
>> this particular kernel -- I'll need to take a more thorough look.

On Fri, Dec 17, 2004 at 04:38:35PM -0800, William Lee Irwin III wrote:
> That's rather ominous. I'll pore over pid.c and see what's going on.
> Also, does the pid.c in your kernel version match 2.6.x-CURRENT?

Ouch, 2.4.21; this will be trouble. So next, what patches atop 2.4.21?


-- wli

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Oops on 2.4.x invalid procfs i_ino value
  2004-12-18  0:47   ` William Lee Irwin III
@ 2004-12-20 22:35     ` Brent Casavant
  2004-12-22 15:46       ` Marcelo Tosatti
  2004-12-27 21:40       ` William Lee Irwin III
  0 siblings, 2 replies; 7+ messages in thread
From: Brent Casavant @ 2004-12-20 22:35 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel, mingo

On Fri, 17 Dec 2004, William Lee Irwin III wrote:

> On Fri, Dec 17, 2004 at 04:49:44PM -0600, Brent Casavant wrote:
> >> On a related note, if it matters, on about half the crash dumps I've
> >> looked at, I see a pid of 0 has been assigned to a user process,
> >> tripping this same problem.  I suspect there's another bug somewhere
> >> that's allowing a pid of 0 to be chosen in the first place -- but I
> >> don't totally discount that this problem may lay in SGI's patches to
> >> this particular kernel -- I'll need to take a more thorough look.
> 
> On Fri, Dec 17, 2004 at 04:38:35PM -0800, William Lee Irwin III wrote:
> > That's rather ominous. I'll pore over pid.c and see what's going on.
> > Also, does the pid.c in your kernel version match 2.6.x-CURRENT?
> 
> Ouch, 2.4.21; this will be trouble. So next, what patches atop 2.4.21?

I wouldn't worry about the pid=0 issue -- I think it's most likely
due to the PAGG patches (http://oss.sgi.com/projects/pagg) causing
some sort of problem at process teardown (all the pid=0 processes are
in the process of exiting).

I'm more concerned about the (0 == pid & 0xffff) bug, which is present
in the unpatched mainline 2.4.x kernel.  It seems that the easiest fix
is marking such pids as in-use at pidmap allocation, so that they are
never assigned to real tasks.  I've got the code almost done, but need
to port it to top-of-tree before submitting a patch.

Brent

-- 
Brent Casavant                          If you had nothing to fear,
bcasavan@sgi.com                        how then could you be brave?
Silicon Graphics, Inc.                    -- Queen Dama, Source Wars

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Oops on 2.4.x invalid procfs i_ino value
  2004-12-20 22:35     ` Brent Casavant
@ 2004-12-22 15:46       ` Marcelo Tosatti
  2004-12-27 19:04         ` Marcelo Tosatti
  2004-12-27 21:40       ` William Lee Irwin III
  1 sibling, 1 reply; 7+ messages in thread
From: Marcelo Tosatti @ 2004-12-22 15:46 UTC (permalink / raw)
  To: Brent Casavant; +Cc: William Lee Irwin III, linux-kernel, mingo, Al Viro

On Mon, Dec 20, 2004 at 04:35:18PM -0600, Brent Casavant wrote:
> On Fri, 17 Dec 2004, William Lee Irwin III wrote:
> 
> > On Fri, Dec 17, 2004 at 04:49:44PM -0600, Brent Casavant wrote:
> > >> On a related note, if it matters, on about half the crash dumps I've
> > >> looked at, I see a pid of 0 has been assigned to a user process,
> > >> tripping this same problem.  I suspect there's another bug somewhere
> > >> that's allowing a pid of 0 to be chosen in the first place -- but I
> > >> don't totally discount that this problem may lay in SGI's patches to
> > >> this particular kernel -- I'll need to take a more thorough look.
> > 
> > On Fri, Dec 17, 2004 at 04:38:35PM -0800, William Lee Irwin III wrote:
> > > That's rather ominous. I'll pore over pid.c and see what's going on.
> > > Also, does the pid.c in your kernel version match 2.6.x-CURRENT?
> > 
> > Ouch, 2.4.21; this will be trouble. So next, what patches atop 2.4.21?
> 
> I wouldn't worry about the pid=0 issue -- I think it's most likely
> due to the PAGG patches (http://oss.sgi.com/projects/pagg) causing
> some sort of problem at process teardown (all the pid=0 processes are
> in the process of exiting).
> 
> I'm more concerned about the (0 == pid & 0xffff) bug, which is present
> in the unpatched mainline 2.4.x kernel.  It seems that the easiest fix
> is marking such pids as in-use at pidmap allocation, so that they are
> never assigned to real tasks.  I've got the code almost done, but need
> to port it to top-of-tree before submitting a patch.

Hi Brent,

Wouldnt it be feasible to have another "procfs inode type" to indicate such 
lower 16-bit zeroed pid's with a new type PROC_PID_INO_ZERO16BIT (or a better
name) and have fake_ino() handle these case by then using the upper 16-bits on
the inode for this "special" pid's.

And have proc_pid_make_inode() and related code handle this new type? No?

I'm not a big fan of making such pids unuseable for real tasks, so it would be 
nice if we could come up a fix for the buggy proc inode logic.

Thanks for finding this out!


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Oops on 2.4.x invalid procfs i_ino value
  2004-12-22 15:46       ` Marcelo Tosatti
@ 2004-12-27 19:04         ` Marcelo Tosatti
  0 siblings, 0 replies; 7+ messages in thread
From: Marcelo Tosatti @ 2004-12-27 19:04 UTC (permalink / raw)
  To: Brent Casavant; +Cc: William Lee Irwin III, linux-kernel, mingo, Al Viro

On Wed, Dec 22, 2004 at 01:46:27PM -0200, Marcelo Tosatti wrote:
> On Mon, Dec 20, 2004 at 04:35:18PM -0600, Brent Casavant wrote:
> > On Fri, 17 Dec 2004, William Lee Irwin III wrote:
> > 
> > > On Fri, Dec 17, 2004 at 04:49:44PM -0600, Brent Casavant wrote:
> > > >> On a related note, if it matters, on about half the crash dumps I've
> > > >> looked at, I see a pid of 0 has been assigned to a user process,
> > > >> tripping this same problem.  I suspect there's another bug somewhere
> > > >> that's allowing a pid of 0 to be chosen in the first place -- but I
> > > >> don't totally discount that this problem may lay in SGI's patches to
> > > >> this particular kernel -- I'll need to take a more thorough look.
> > > 
> > > On Fri, Dec 17, 2004 at 04:38:35PM -0800, William Lee Irwin III wrote:
> > > > That's rather ominous. I'll pore over pid.c and see what's going on.
> > > > Also, does the pid.c in your kernel version match 2.6.x-CURRENT?
> > > 
> > > Ouch, 2.4.21; this will be trouble. So next, what patches atop 2.4.21?
> > 
> > I wouldn't worry about the pid=0 issue -- I think it's most likely
> > due to the PAGG patches (http://oss.sgi.com/projects/pagg) causing
> > some sort of problem at process teardown (all the pid=0 processes are
> > in the process of exiting).
> > 
> > I'm more concerned about the (0 == pid & 0xffff) bug, which is present
> > in the unpatched mainline 2.4.x kernel.  It seems that the easiest fix
> > is marking such pids as in-use at pidmap allocation, so that they are
> > never assigned to real tasks.  I've got the code almost done, but need
> > to port it to top-of-tree before submitting a patch.
> 
> Hi Brent,
> 
> Wouldnt it be feasible to have another "procfs inode type" to indicate such 
> lower 16-bit zeroed pid's with a new type PROC_PID_INO_ZERO16BIT (or a better
> name) and have fake_ino() handle these case by then using the upper 16-bits on
> the inode for this "special" pid's.
> 
> And have proc_pid_make_inode() and related code handle this new type? No?
> 
> I'm not a big fan of making such pids unuseable for real tasks, so it would be 
> nice if we could come up a fix for the buggy proc inode logic.

Actually, while talking to wli on IRC:

#define PID_MAX 0x8000

So it shouldnt be a problem at all in mainline v2.4.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Oops on 2.4.x invalid procfs i_ino value
  2004-12-20 22:35     ` Brent Casavant
  2004-12-22 15:46       ` Marcelo Tosatti
@ 2004-12-27 21:40       ` William Lee Irwin III
  1 sibling, 0 replies; 7+ messages in thread
From: William Lee Irwin III @ 2004-12-27 21:40 UTC (permalink / raw)
  To: Brent Casavant; +Cc: linux-kernel, mingo

On Fri, 17 Dec 2004, William Lee Irwin III wrote:
>> Ouch, 2.4.21; this will be trouble. So next, what patches atop 2.4.21?

On Mon, Dec 20, 2004 at 04:35:18PM -0600, Brent Casavant wrote:
> I wouldn't worry about the pid=0 issue -- I think it's most likely
> due to the PAGG patches (http://oss.sgi.com/projects/pagg) causing
> some sort of problem at process teardown (all the pid=0 processes are
> in the process of exiting).
> I'm more concerned about the (0 == pid & 0xffff) bug, which is present
> in the unpatched mainline 2.4.x kernel.  It seems that the easiest fix
> is marking such pids as in-use at pidmap allocation, so that they are
> never assigned to real tasks.  I've got the code almost done, but need
> to port it to top-of-tree before submitting a patch.

I see no 0 == pid & 0xffff nor any pidmap in unpatched 2.4.x. Also,
please notice that pid & ~(PID_MAX-1) a.k.a. pid & ~0x7fff a.k.a.
pid & 0xffff8000? And so it appears numerous checks of this form are
already there.

Perhaps the pristine sources are not as pristine as one had hoped?


-- wli

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-12-27 21:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-17 22:49 Oops on 2.4.x invalid procfs i_ino value Brent Casavant
2004-12-18  0:38 ` William Lee Irwin III
2004-12-18  0:47   ` William Lee Irwin III
2004-12-20 22:35     ` Brent Casavant
2004-12-22 15:46       ` Marcelo Tosatti
2004-12-27 19:04         ` Marcelo Tosatti
2004-12-27 21:40       ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).