* Oops on 2.4.x invalid procfs i_ino value
@ 2004-12-17 22:49 Brent Casavant
2004-12-18 0:38 ` William Lee Irwin III
0 siblings, 1 reply; 7+ messages in thread
From: Brent Casavant @ 2004-12-17 22:49 UTC (permalink / raw)
To: linux-kernel; +Cc: wli, mingo
I've run into a number of crashes while closing procfs stat files
when a system is under load. I think I've found the problem, but
am a little unsure how to proceed. This all happens to be on a
2.4.21 based kernel, but by my brief code inspection I think the
problem still exists on more recent 2.4.x kernels.
In procfs the fake_ino() macro is used to construct the inode number
for each entry.
#define fake_ino(pid,ino) (((pid)<<16)|(ino))
In particular this is used in proc_pid_make_inode:
inode->i_ino = fake_ino(task->pid, ino);
Note that a pid may be more than 16 bits in width (e.g. in IA64), and
we're trying to stuff it into the upper 16 bits of the inode number.
This isn't usually a problem, except when the lower 16 bits of the
inode happen to be 0 (i.e. pids that are a multiple of 65536).
Why does zero matter? Glad you asked.
In proc_delete_inode there is a check to see if the inode is is
a "proper" (whatever that means) procfs inode. The whole function
is:
static void proc_delete_inode(struct inode *inode)
{
struct proc_dir_entry *de = inode->u.generic_ip;
inode->i_state = I_CLEAR;
if (PROC_INODE_PROPER(inode)) {
proc_pid_delete_inode(inode);
return;
}
if (de) {
if (de->owner)
__MOD_DEC_USE_COUNT(de->owner);
de_put(de);
}
}
PROC_INODE_PROPER() is:
#define PROC_INODE_PROPER(inode) ((inode)->i_ino & ~0xffff)
In other words, it checks whether the top 16 bits of the inode number
(equivalent to the bottom 16 bits of the pid) are non-zero.
Thus closing a proc entry for any task with a pid that is a multiple of
65536 will fail this check, skip proc_pid_delete_inode, and call
__MOD_DEC_USE_COUNT, more than likely causing a panic on an invalid
memory access, and minimally corrupting something in memory otherwise.
I don't have a solution coded up (mostly because I'm a bit bleary
eyed after looking at crash dumps all day) -- but are there any
thoughts on how to go about addressing this one? An obvious workaround
is setting kernel.pid_max to 65535, but that's only a workaround, not
a solution.
On a related note, if it matters, on about half the crash dumps I've
looked at, I see a pid of 0 has been assigned to a user process,
tripping this same problem. I suspect there's another bug somewhere
that's allowing a pid of 0 to be chosen in the first place -- but I
don't totally discount that this problem may lay in SGI's patches to
this particular kernel -- I'll need to take a more thorough look.
Thanks,
Brent
--
Brent Casavant If you had nothing to fear,
bcasavan@sgi.com how then could you be brave?
Silicon Graphics, Inc. -- Queen Dama, Source Wars
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Oops on 2.4.x invalid procfs i_ino value
2004-12-17 22:49 Oops on 2.4.x invalid procfs i_ino value Brent Casavant
@ 2004-12-18 0:38 ` William Lee Irwin III
2004-12-18 0:47 ` William Lee Irwin III
0 siblings, 1 reply; 7+ messages in thread
From: William Lee Irwin III @ 2004-12-18 0:38 UTC (permalink / raw)
To: Brent Casavant; +Cc: linux-kernel, mingo
On Fri, Dec 17, 2004 at 04:49:44PM -0600, Brent Casavant wrote:
> Thus closing a proc entry for any task with a pid that is a multiple of
> 65536 will fail this check, skip proc_pid_delete_inode, and call
> __MOD_DEC_USE_COUNT, more than likely causing a panic on an invalid
> memory access, and minimally corrupting something in memory otherwise.
> I don't have a solution coded up (mostly because I'm a bit bleary
> eyed after looking at crash dumps all day) -- but are there any
> thoughts on how to go about addressing this one? An obvious workaround
> is setting kernel.pid_max to 65535, but that's only a workaround, not
> a solution.
> On a related note, if it matters, on about half the crash dumps I've
> looked at, I see a pid of 0 has been assigned to a user process,
> tripping this same problem. I suspect there's another bug somewhere
> that's allowing a pid of 0 to be chosen in the first place -- but I
> don't totally discount that this problem may lay in SGI's patches to
> this particular kernel -- I'll need to take a more thorough look.
That's rather ominous. I'll pore over pid.c and see what's going on.
Also, does the pid.c in your kernel version match 2.6.x-CURRENT?
-- wli
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Oops on 2.4.x invalid procfs i_ino value
2004-12-18 0:38 ` William Lee Irwin III
@ 2004-12-18 0:47 ` William Lee Irwin III
2004-12-20 22:35 ` Brent Casavant
0 siblings, 1 reply; 7+ messages in thread
From: William Lee Irwin III @ 2004-12-18 0:47 UTC (permalink / raw)
To: Brent Casavant; +Cc: linux-kernel, mingo
On Fri, Dec 17, 2004 at 04:49:44PM -0600, Brent Casavant wrote:
>> On a related note, if it matters, on about half the crash dumps I've
>> looked at, I see a pid of 0 has been assigned to a user process,
>> tripping this same problem. I suspect there's another bug somewhere
>> that's allowing a pid of 0 to be chosen in the first place -- but I
>> don't totally discount that this problem may lay in SGI's patches to
>> this particular kernel -- I'll need to take a more thorough look.
On Fri, Dec 17, 2004 at 04:38:35PM -0800, William Lee Irwin III wrote:
> That's rather ominous. I'll pore over pid.c and see what's going on.
> Also, does the pid.c in your kernel version match 2.6.x-CURRENT?
Ouch, 2.4.21; this will be trouble. So next, what patches atop 2.4.21?
-- wli
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Oops on 2.4.x invalid procfs i_ino value
2004-12-18 0:47 ` William Lee Irwin III
@ 2004-12-20 22:35 ` Brent Casavant
2004-12-22 15:46 ` Marcelo Tosatti
2004-12-27 21:40 ` William Lee Irwin III
0 siblings, 2 replies; 7+ messages in thread
From: Brent Casavant @ 2004-12-20 22:35 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: linux-kernel, mingo
On Fri, 17 Dec 2004, William Lee Irwin III wrote:
> On Fri, Dec 17, 2004 at 04:49:44PM -0600, Brent Casavant wrote:
> >> On a related note, if it matters, on about half the crash dumps I've
> >> looked at, I see a pid of 0 has been assigned to a user process,
> >> tripping this same problem. I suspect there's another bug somewhere
> >> that's allowing a pid of 0 to be chosen in the first place -- but I
> >> don't totally discount that this problem may lay in SGI's patches to
> >> this particular kernel -- I'll need to take a more thorough look.
>
> On Fri, Dec 17, 2004 at 04:38:35PM -0800, William Lee Irwin III wrote:
> > That's rather ominous. I'll pore over pid.c and see what's going on.
> > Also, does the pid.c in your kernel version match 2.6.x-CURRENT?
>
> Ouch, 2.4.21; this will be trouble. So next, what patches atop 2.4.21?
I wouldn't worry about the pid=0 issue -- I think it's most likely
due to the PAGG patches (http://oss.sgi.com/projects/pagg) causing
some sort of problem at process teardown (all the pid=0 processes are
in the process of exiting).
I'm more concerned about the (0 == pid & 0xffff) bug, which is present
in the unpatched mainline 2.4.x kernel. It seems that the easiest fix
is marking such pids as in-use at pidmap allocation, so that they are
never assigned to real tasks. I've got the code almost done, but need
to port it to top-of-tree before submitting a patch.
Brent
--
Brent Casavant If you had nothing to fear,
bcasavan@sgi.com how then could you be brave?
Silicon Graphics, Inc. -- Queen Dama, Source Wars
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Oops on 2.4.x invalid procfs i_ino value
2004-12-20 22:35 ` Brent Casavant
@ 2004-12-22 15:46 ` Marcelo Tosatti
2004-12-27 19:04 ` Marcelo Tosatti
2004-12-27 21:40 ` William Lee Irwin III
1 sibling, 1 reply; 7+ messages in thread
From: Marcelo Tosatti @ 2004-12-22 15:46 UTC (permalink / raw)
To: Brent Casavant; +Cc: William Lee Irwin III, linux-kernel, mingo, Al Viro
On Mon, Dec 20, 2004 at 04:35:18PM -0600, Brent Casavant wrote:
> On Fri, 17 Dec 2004, William Lee Irwin III wrote:
>
> > On Fri, Dec 17, 2004 at 04:49:44PM -0600, Brent Casavant wrote:
> > >> On a related note, if it matters, on about half the crash dumps I've
> > >> looked at, I see a pid of 0 has been assigned to a user process,
> > >> tripping this same problem. I suspect there's another bug somewhere
> > >> that's allowing a pid of 0 to be chosen in the first place -- but I
> > >> don't totally discount that this problem may lay in SGI's patches to
> > >> this particular kernel -- I'll need to take a more thorough look.
> >
> > On Fri, Dec 17, 2004 at 04:38:35PM -0800, William Lee Irwin III wrote:
> > > That's rather ominous. I'll pore over pid.c and see what's going on.
> > > Also, does the pid.c in your kernel version match 2.6.x-CURRENT?
> >
> > Ouch, 2.4.21; this will be trouble. So next, what patches atop 2.4.21?
>
> I wouldn't worry about the pid=0 issue -- I think it's most likely
> due to the PAGG patches (http://oss.sgi.com/projects/pagg) causing
> some sort of problem at process teardown (all the pid=0 processes are
> in the process of exiting).
>
> I'm more concerned about the (0 == pid & 0xffff) bug, which is present
> in the unpatched mainline 2.4.x kernel. It seems that the easiest fix
> is marking such pids as in-use at pidmap allocation, so that they are
> never assigned to real tasks. I've got the code almost done, but need
> to port it to top-of-tree before submitting a patch.
Hi Brent,
Wouldnt it be feasible to have another "procfs inode type" to indicate such
lower 16-bit zeroed pid's with a new type PROC_PID_INO_ZERO16BIT (or a better
name) and have fake_ino() handle these case by then using the upper 16-bits on
the inode for this "special" pid's.
And have proc_pid_make_inode() and related code handle this new type? No?
I'm not a big fan of making such pids unuseable for real tasks, so it would be
nice if we could come up a fix for the buggy proc inode logic.
Thanks for finding this out!
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Oops on 2.4.x invalid procfs i_ino value
2004-12-22 15:46 ` Marcelo Tosatti
@ 2004-12-27 19:04 ` Marcelo Tosatti
0 siblings, 0 replies; 7+ messages in thread
From: Marcelo Tosatti @ 2004-12-27 19:04 UTC (permalink / raw)
To: Brent Casavant; +Cc: William Lee Irwin III, linux-kernel, mingo, Al Viro
On Wed, Dec 22, 2004 at 01:46:27PM -0200, Marcelo Tosatti wrote:
> On Mon, Dec 20, 2004 at 04:35:18PM -0600, Brent Casavant wrote:
> > On Fri, 17 Dec 2004, William Lee Irwin III wrote:
> >
> > > On Fri, Dec 17, 2004 at 04:49:44PM -0600, Brent Casavant wrote:
> > > >> On a related note, if it matters, on about half the crash dumps I've
> > > >> looked at, I see a pid of 0 has been assigned to a user process,
> > > >> tripping this same problem. I suspect there's another bug somewhere
> > > >> that's allowing a pid of 0 to be chosen in the first place -- but I
> > > >> don't totally discount that this problem may lay in SGI's patches to
> > > >> this particular kernel -- I'll need to take a more thorough look.
> > >
> > > On Fri, Dec 17, 2004 at 04:38:35PM -0800, William Lee Irwin III wrote:
> > > > That's rather ominous. I'll pore over pid.c and see what's going on.
> > > > Also, does the pid.c in your kernel version match 2.6.x-CURRENT?
> > >
> > > Ouch, 2.4.21; this will be trouble. So next, what patches atop 2.4.21?
> >
> > I wouldn't worry about the pid=0 issue -- I think it's most likely
> > due to the PAGG patches (http://oss.sgi.com/projects/pagg) causing
> > some sort of problem at process teardown (all the pid=0 processes are
> > in the process of exiting).
> >
> > I'm more concerned about the (0 == pid & 0xffff) bug, which is present
> > in the unpatched mainline 2.4.x kernel. It seems that the easiest fix
> > is marking such pids as in-use at pidmap allocation, so that they are
> > never assigned to real tasks. I've got the code almost done, but need
> > to port it to top-of-tree before submitting a patch.
>
> Hi Brent,
>
> Wouldnt it be feasible to have another "procfs inode type" to indicate such
> lower 16-bit zeroed pid's with a new type PROC_PID_INO_ZERO16BIT (or a better
> name) and have fake_ino() handle these case by then using the upper 16-bits on
> the inode for this "special" pid's.
>
> And have proc_pid_make_inode() and related code handle this new type? No?
>
> I'm not a big fan of making such pids unuseable for real tasks, so it would be
> nice if we could come up a fix for the buggy proc inode logic.
Actually, while talking to wli on IRC:
#define PID_MAX 0x8000
So it shouldnt be a problem at all in mainline v2.4.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Oops on 2.4.x invalid procfs i_ino value
2004-12-20 22:35 ` Brent Casavant
2004-12-22 15:46 ` Marcelo Tosatti
@ 2004-12-27 21:40 ` William Lee Irwin III
1 sibling, 0 replies; 7+ messages in thread
From: William Lee Irwin III @ 2004-12-27 21:40 UTC (permalink / raw)
To: Brent Casavant; +Cc: linux-kernel, mingo
On Fri, 17 Dec 2004, William Lee Irwin III wrote:
>> Ouch, 2.4.21; this will be trouble. So next, what patches atop 2.4.21?
On Mon, Dec 20, 2004 at 04:35:18PM -0600, Brent Casavant wrote:
> I wouldn't worry about the pid=0 issue -- I think it's most likely
> due to the PAGG patches (http://oss.sgi.com/projects/pagg) causing
> some sort of problem at process teardown (all the pid=0 processes are
> in the process of exiting).
> I'm more concerned about the (0 == pid & 0xffff) bug, which is present
> in the unpatched mainline 2.4.x kernel. It seems that the easiest fix
> is marking such pids as in-use at pidmap allocation, so that they are
> never assigned to real tasks. I've got the code almost done, but need
> to port it to top-of-tree before submitting a patch.
I see no 0 == pid & 0xffff nor any pidmap in unpatched 2.4.x. Also,
please notice that pid & ~(PID_MAX-1) a.k.a. pid & ~0x7fff a.k.a.
pid & 0xffff8000? And so it appears numerous checks of this form are
already there.
Perhaps the pristine sources are not as pristine as one had hoped?
-- wli
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-12-27 21:40 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-17 22:49 Oops on 2.4.x invalid procfs i_ino value Brent Casavant
2004-12-18 0:38 ` William Lee Irwin III
2004-12-18 0:47 ` William Lee Irwin III
2004-12-20 22:35 ` Brent Casavant
2004-12-22 15:46 ` Marcelo Tosatti
2004-12-27 19:04 ` Marcelo Tosatti
2004-12-27 21:40 ` William Lee Irwin III
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).