* kexec "problem"
2004-02-24 16:03 Latest AIO patchset Suparna Bhattacharya
@ 2004-02-24 11:02 ` Carlos Silva
2004-02-24 17:10 ` Randy.Dunlap
2004-02-27 0:54 ` kexec "problem" [and patch updates] Randy.Dunlap
2004-02-25 18:45 ` Latest AIO patchset Hayim Shaul
1 sibling, 2 replies; 15+ messages in thread
From: Carlos Silva @ 2004-02-24 11:02 UTC (permalink / raw)
To: linux-kernel
hi guys,
i have just compiled a kernel with the kexec patch. compiled kexec-tools
and when i try to load a kernel, it gives me this:
# ./do-kexec.sh /boot/bzImage-2.6.2-g
kexec_load failed: Invalid argument
entry = 0x91764
nr_segments = 2
segment[0].buf = 0x80b3480
segment[0].bufsz = 1880
segment[0].mem = 0x90000
segment[0].memsz = 1880
segment[1].buf = 0x40001008
segment[1].bufsz = 19795a
segment[1].mem = 0x100000
segment[1].memsz = 19795a
anyone tried to run kexec and actually did it? i'm trying with kernel 2.6.3
^ permalink raw reply [flat|nested] 15+ messages in thread
* Latest AIO patchset
@ 2004-02-24 16:03 Suparna Bhattacharya
2004-02-24 11:02 ` kexec "problem" Carlos Silva
2004-02-25 18:45 ` Latest AIO patchset Hayim Shaul
0 siblings, 2 replies; 15+ messages in thread
From: Suparna Bhattacharya @ 2004-02-24 16:03 UTC (permalink / raw)
To: akpm, linux-aio; +Cc: linux-kernel
The latest set of AIO patches are being maintained at:
http://www.kernel.org/pub/linux/kernel/people/suparna/aio/
The patches have been reduced to a minimal set that addresses
the most relevant blocking points. Please let me know if you
think there is a case for bringing in any of the additional
patches.
(The patches need to be applied in the order mention in the
'series' file)
A new addition to the patchset is Chris Mason's nice and simple
implementation of AIO support for pipes using the retry
infrastructure. He also fixed some problems in the AIO cancel
logic to make it play well with retries.
Besides this and some re-organization and cleaning up, there
are a couple of changes since the last set of patches in -mm, that
are worth a mention:
- Upfront readahead is now clipped to the readahead limit for
the device (ra_pages) and happens only for AIO. This helps
address the sendfile regression seen by Felix von Leitner.
If your AIO read requests are likely to exceed the default
readahead size, then use hdparm -a <new size> to tune it.
The patchset also currently includes Ram Pai's adaptive
lazy readahead code which is required for good streaming AIO
read performance.
- David Brownell's suggestion of enabling the fops to set up
their own retry methods. This should make his USB gadgetfs
AIO patch co-exist smoothly with fsaio.
Some basic results are up comparing streaming non-cached random
AIO read/write throughputs for a single ext3 file using aio-stress
for various io sizes with and without these patches, and also
comparsions with O_DIRECT AIO throughputs. The short summary has
been a doubling of throughput using the fsaio patches, which is
also close to the results seen with O_DIRECT AIO.
As usual feedback, bug fixes, test results etc are welcome.
Regards
Suparna
--
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: kexec "problem"
2004-02-24 11:02 ` kexec "problem" Carlos Silva
@ 2004-02-24 17:10 ` Randy.Dunlap
2004-02-24 17:24 ` Carlos Silva
2004-02-27 0:54 ` kexec "problem" [and patch updates] Randy.Dunlap
1 sibling, 1 reply; 15+ messages in thread
From: Randy.Dunlap @ 2004-02-24 17:10 UTC (permalink / raw)
To: Carlos Silva; +Cc: linux-kernel
On Tue, 24 Feb 2004 11:02:21 -0000 (WET) Carlos Silva wrote:
| hi guys,
|
| i have just compiled a kernel with the kexec patch. compiled kexec-tools
| and when i try to load a kernel, it gives me this:
| # ./do-kexec.sh /boot/bzImage-2.6.2-g
| kexec_load failed: Invalid argument
| entry = 0x91764
| nr_segments = 2
| segment[0].buf = 0x80b3480
| segment[0].bufsz = 1880
| segment[0].mem = 0x90000
| segment[0].memsz = 1880
| segment[1].buf = 0x40001008
| segment[1].bufsz = 19795a
| segment[1].mem = 0x100000
| segment[1].memsz = 19795a
|
| anyone tried to run kexec and actually did it? i'm trying with kernel 2.6.3
I haven't updated for 2.6.3 yet, or even tested it...
but I'll get to it soon.
--
~Randy
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: kexec "problem"
2004-02-24 17:10 ` Randy.Dunlap
@ 2004-02-24 17:24 ` Carlos Silva
0 siblings, 0 replies; 15+ messages in thread
From: Carlos Silva @ 2004-02-24 17:24 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: linux-kernel
> On Tue, 24 Feb 2004 11:02:21 -0000 (WET) Carlos Silva wrote:
>
> | hi guys,
> |
> | i have just compiled a kernel with the kexec patch. compiled kexec-tools
> | and when i try to load a kernel, it gives me this:
> | # ./do-kexec.sh /boot/bzImage-2.6.2-g
> | kexec_load failed: Invalid argument
> | entry = 0x91764
> | nr_segments = 2
> | segment[0].buf = 0x80b3480
> | segment[0].bufsz = 1880
> | segment[0].mem = 0x90000
> | segment[0].memsz = 1880
> | segment[1].buf = 0x40001008
> | segment[1].bufsz = 19795a
> | segment[1].mem = 0x100000
> | segment[1].memsz = 19795a
> |
> | anyone tried to run kexec and actually did it? i'm trying with kernel
> 2.6.3
>
> I haven't updated for 2.6.3 yet, or even tested it...
> but I'll get to it soon.
>
> --
> ~Randy
>
just changed line 13 on kexec.h from:
"long kexec_load(void *entry, unsigned long nr_segments, struct
kexec_segment *segments, unsigned long);"
to:
"long kexec_load(void *entry, unsigned long nr_segments, struct
kexec_segment *segments, unsigned long flags);"
and it gives me this error:
# ./do-kexec.sh /boot/bzImage-2.6.2-g
kexec_load failed: Function not implemented
entry = 0x91764
nr_segments = 2
segment[0].buf = 0x80b3480
segment[0].bufsz = 1880
segment[0].mem = 0x90000
segment[0].memsz = 1880
segment[1].buf = 0x40001008
segment[1].bufsz = 19795a
segment[1].mem = 0x100000
segment[1].memsz = 19795a
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latest AIO patchset
2004-02-24 16:03 Latest AIO patchset Suparna Bhattacharya
2004-02-24 11:02 ` kexec "problem" Carlos Silva
@ 2004-02-25 18:45 ` Hayim Shaul
2004-02-26 0:27 ` Benjamin LaHaise
1 sibling, 1 reply; 15+ messages in thread
From: Hayim Shaul @ 2004-02-25 18:45 UTC (permalink / raw)
To: Suparna Bhattacharya; +Cc: akpm, linux-aio, linux-kernel
>
> Some basic results are up comparing streaming non-cached random
> AIO read/write throughputs for a single ext3 file using aio-stress
> for various io sizes with and without these patches, and also
> comparsions with O_DIRECT AIO throughputs. The short summary has
> been a doubling of throughput using the fsaio patches, which is
> also close to the results seen with O_DIRECT AIO.
>
> As usual feedback, bug fixes, test results etc are welcome.
>
What exactly is the O_DIRECT flag? When I add this flag to the open func
it fails.
More specificaly, this function fails
open("filename", O_RDWR | O_DIRECT | O_LARGEFILE | O_CREAT, S_IRWXU);
but this one succeeds
open("filename", O_RDWR | O_LARGEFILE | O_CREAT, S_IRWXU);
I'm running linux 2.6.0 with libaio 0.3.92.
Hayim.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latest AIO patchset
2004-02-25 18:45 ` Latest AIO patchset Hayim Shaul
@ 2004-02-26 0:27 ` Benjamin LaHaise
2004-02-26 13:30 ` Hayim Shaul
0 siblings, 1 reply; 15+ messages in thread
From: Benjamin LaHaise @ 2004-02-26 0:27 UTC (permalink / raw)
To: Hayim Shaul; +Cc: Suparna Bhattacharya, akpm, linux-aio, linux-kernel
On Wed, Feb 25, 2004 at 08:45:29PM +0200, Hayim Shaul wrote:
> What exactly is the O_DIRECT flag? When I add this flag to the open func
> it fails.
>
> More specificaly, this function fails
> open("filename", O_RDWR | O_DIRECT | O_LARGEFILE | O_CREAT, S_IRWXU);
>
> but this one succeeds
> open("filename", O_RDWR | O_LARGEFILE | O_CREAT, S_IRWXU);
>
> I'm running linux 2.6.0 with libaio 0.3.92.
Which filesystem? Not all support O_DIRECT.
-ben
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latest AIO patchset
2004-02-26 0:27 ` Benjamin LaHaise
@ 2004-02-26 13:30 ` Hayim Shaul
2004-02-26 16:45 ` Daniel McNeil
0 siblings, 1 reply; 15+ messages in thread
From: Hayim Shaul @ 2004-02-26 13:30 UTC (permalink / raw)
To: Benjamin LaHaise; +Cc: Suparna Bhattacharya, akpm, linux-aio, linux-kernel
On Wed, 25 Feb 2004, Benjamin LaHaise wrote:
> On Wed, Feb 25, 2004 at 08:45:29PM +0200, Hayim Shaul wrote:
> > What exactly is the O_DIRECT flag? When I add this flag to the open func
> > it fails.
> >
> > More specificaly, this function fails
> > open("filename", O_RDWR | O_DIRECT | O_LARGEFILE | O_CREAT, S_IRWXU);
> >
> > but this one succeeds
> > open("filename", O_RDWR | O_LARGEFILE | O_CREAT, S_IRWXU);
> >
> > I'm running linux 2.6.0 with libaio 0.3.92.
>
> Which filesystem? Not all support O_DIRECT.
>
ext3
I'm think it does support ext3.
Actually, I was wrong. open does succeed. It return a valid fd
but after writing and exiting, the file is still zero size.
removing the O_DIRECT with the same prog writes quite alot to the file.
Hayim.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Latest AIO patchset
2004-02-26 13:30 ` Hayim Shaul
@ 2004-02-26 16:45 ` Daniel McNeil
0 siblings, 0 replies; 15+ messages in thread
From: Daniel McNeil @ 2004-02-26 16:45 UTC (permalink / raw)
To: Hayim Shaul
Cc: Benjamin LaHaise, Suparna Bhattacharya, Andrew Morton, linux-aio,
Linux Kernel Mailing List
Are you checking the return value of write()?
O_DIRECT has alignment requirements since it writes directly from
user-space to disk by passing the page cache. Also the size
of the write has to be a multiple of 512 (for 2.6).
Try using posix_memalign() with pagesize as the alignment arg
to allocated the data buffer. O_DIRECT does work on ext3.
Daniel
On Thu, 2004-02-26 at 05:30, Hayim Shaul wrote:
> On Wed, 25 Feb 2004, Benjamin LaHaise wrote:
>
> > On Wed, Feb 25, 2004 at 08:45:29PM +0200, Hayim Shaul wrote:
> > > What exactly is the O_DIRECT flag? When I add this flag to the open func
> > > it fails.
> > >
> > > More specificaly, this function fails
> > > open("filename", O_RDWR | O_DIRECT | O_LARGEFILE | O_CREAT, S_IRWXU);
> > >
> > > but this one succeeds
> > > open("filename", O_RDWR | O_LARGEFILE | O_CREAT, S_IRWXU);
> > >
> > > I'm running linux 2.6.0 with libaio 0.3.92.
> >
> > Which filesystem? Not all support O_DIRECT.
> >
>
> ext3
> I'm think it does support ext3.
>
> Actually, I was wrong. open does succeed. It return a valid fd
> but after writing and exiting, the file is still zero size.
>
> removing the O_DIRECT with the same prog writes quite alot to the file.
>
> Hayim.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org. For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: kexec "problem" [and patch updates]
2004-02-24 11:02 ` kexec "problem" Carlos Silva
2004-02-24 17:10 ` Randy.Dunlap
@ 2004-02-27 0:54 ` Randy.Dunlap
2004-02-27 8:00 ` [Fastboot] " Eric W. Biederman
1 sibling, 1 reply; 15+ messages in thread
From: Randy.Dunlap @ 2004-02-27 0:54 UTC (permalink / raw)
To: Carlos Silva; +Cc: linux-kernel, fastboot
On Tue, 24 Feb 2004 11:02:21 -0000 (WET) Carlos Silva wrote:
| hi guys,
|
| i have just compiled a kernel with the kexec patch. compiled kexec-tools
| and when i try to load a kernel, it gives me this:
| # ./do-kexec.sh /boot/bzImage-2.6.2-g
| kexec_load failed: Invalid argument
| entry = 0x91764
| nr_segments = 2
| segment[0].buf = 0x80b3480
| segment[0].bufsz = 1880
| segment[0].mem = 0x90000
| segment[0].memsz = 1880
| segment[1].buf = 0x40001008
| segment[1].bufsz = 19795a
| segment[1].mem = 0x100000
| segment[1].memsz = 19795a
|
| anyone tried to run kexec and actually did it? i'm trying with kernel 2.6.3
| -
I updated the kexec patch for 2.6.2 and 2.6.3.
It works fine on 2.6.2. It works for me on 2.6.3 if not SMP.
If the kernel is built for SMP, when running kexec, I get a
BUG in arch/i386/kernel/smp.c at line 359.
I'm testing various workarounds for that BUG now.
--
~Randy
kexec updates are at:
http://developer.osdl.org/rddunlap/kexec/2.6.2/
and
http://developer.osdl.org/rddunlap/kexec/2.6.3/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Fastboot] Re: kexec "problem" [and patch updates]
2004-02-27 0:54 ` kexec "problem" [and patch updates] Randy.Dunlap
@ 2004-02-27 8:00 ` Eric W. Biederman
2004-02-27 19:32 ` Randy.Dunlap
0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2004-02-27 8:00 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: Carlos Silva, fastboot, linux-kernel
"Randy.Dunlap" <rddunlap@osdl.org> writes:
> On Tue, 24 Feb 2004 11:02:21 -0000 (WET) Carlos Silva wrote:
>
> | hi guys,
> |
> | i have just compiled a kernel with the kexec patch. compiled kexec-tools
> | and when i try to load a kernel, it gives me this:
> | # ./do-kexec.sh /boot/bzImage-2.6.2-g
> | kexec_load failed: Invalid argument
> | entry = 0x91764
> | nr_segments = 2
> | segment[0].buf = 0x80b3480
> | segment[0].bufsz = 1880
> | segment[0].mem = 0x90000
> | segment[0].memsz = 1880
> | segment[1].buf = 0x40001008
> | segment[1].bufsz = 19795a
> | segment[1].mem = 0x100000
> | segment[1].memsz = 19795a
> |
> | anyone tried to run kexec and actually did it? i'm trying with kernel 2.6.3
> | -
>
> I updated the kexec patch for 2.6.2 and 2.6.3.
> It works fine on 2.6.2. It works for me on 2.6.3 if not SMP.
> If the kernel is built for SMP, when running kexec, I get a
> BUG in arch/i386/kernel/smp.c at line 359.
> I'm testing various workarounds for that BUG now.
I will eyeball it...
Is it the kernel that is shutting down, or the kernel that is being
brought up that has problems?
The back trace from the BUG would be interesting.
As I see it flush_tlb_others is being called when we have shutdown
cpus and the kernel still thinks we have the mm present on foreign
cpus.
So it appears we simply have a case that was not anticipated
by the authors of that code. So we need to adjust either
the code we are calling or cpu_vm_mask so it does not list
other cpus after we have shut them down.
At least that it what it looks like at first glance.
Eric
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Fastboot] Re: kexec "problem" [and patch updates]
2004-02-27 8:00 ` [Fastboot] " Eric W. Biederman
@ 2004-02-27 19:32 ` Randy.Dunlap
2004-02-28 10:41 ` Eric W. Biederman
0 siblings, 1 reply; 15+ messages in thread
From: Randy.Dunlap @ 2004-02-27 19:32 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: r3pek, fastboot, linux-kernel
On 27 Feb 2004 01:00:04 -0700 Eric W. Biederman wrote:
| "Randy.Dunlap" <rddunlap@osdl.org> writes:
|
| > On Tue, 24 Feb 2004 11:02:21 -0000 (WET) Carlos Silva wrote:
| >
| > | hi guys,
| > |
| > | i have just compiled a kernel with the kexec patch. compiled kexec-tools
| > | and when i try to load a kernel, it gives me this:
| > | # ./do-kexec.sh /boot/bzImage-2.6.2-g
| > | kexec_load failed: Invalid argument
| > | entry = 0x91764
| > | nr_segments = 2
| > | segment[0].buf = 0x80b3480
| > | segment[0].bufsz = 1880
| > | segment[0].mem = 0x90000
| > | segment[0].memsz = 1880
| > | segment[1].buf = 0x40001008
| > | segment[1].bufsz = 19795a
| > | segment[1].mem = 0x100000
| > | segment[1].memsz = 19795a
| > |
| > | anyone tried to run kexec and actually did it? i'm trying with kernel 2.6.3
| > | -
| >
| > I updated the kexec patch for 2.6.2 and 2.6.3.
| > It works fine on 2.6.2. It works for me on 2.6.3 if not SMP.
| > If the kernel is built for SMP, when running kexec, I get a
| > BUG in arch/i386/kernel/smp.c at line 359.
| > I'm testing various workarounds for that BUG now.
|
| I will eyeball it...
|
| Is it the kernel that is shutting down, or the kernel that is being
| brought up that has problems?
the kernel that is shutting down.
| The back trace from the BUG would be interesting.
see below. my bad. i should have included it.
| As I see it flush_tlb_others is being called when we have shutdown
| cpus and the kernel still thinks we have the mm present on foreign
| cpus.
Martin Bligh thinks that there is a tlb race here.
I printed the 2 cpu masks on my dual-proc macine and saw
0 in one of them and 0xc in the other one.
| So it appears we simply have a case that was not anticipated
| by the authors of that code. So we need to adjust either
| the code we are calling or cpu_vm_mask so it does not list
| other cpus after we have shut them down.
|
| At least that it what it looks like at first glance.
Thanks,
--
~Randy
Feb 25 15:52:21 gargoyle kernel: ------------[ cut here ]------------
Feb 25 15:52:21 gargoyle kernel: kernel BUG at arch/i386/kernel/smp.c:359!
Feb 25 15:52:21 gargoyle kernel: invalid operand: 0000 [#1]
Feb 25 15:52:21 gargoyle kernel: CPU: 1
Feb 25 15:52:21 gargoyle kernel: EIP: 0060:[<c011673d>] Not tainted
Feb 25 15:52:21 gargoyle kernel: EFLAGS: 00010206
Feb 25 15:52:21 gargoyle kernel: EIP is at flush_tlb_others+0x141/0x15c
Feb 25 15:52:21 gargoyle kernel: eax: 00000000 ebx: c043c9e0 ecx: c043c9e0 edx: 0000000c
Feb 25 15:52:21 gargoyle kernel: esi: f53effc4 edi: 00851da8 ebp: f5449ebc esp: f5449ea8
Feb 25 15:52:21 gargoyle kernel: ds: 007b es: 007b ss: 0068
Feb 25 15:52:21 gargoyle kernel: Process kexec (pid: 1095, threadinfo=f5448000 task=f54719b0)
Feb 25 15:52:21 gargoyle kernel: Stack: f5449ed4 c014eae3 c1851d58 353f0000 00000000 f5449ed4 c01167e9 0000000c
Feb 25 15:52:21 gargoyle kernel: c043c9e0 ffffffff 0000000c f5449f20 c0150084 c043c9e0 f61d7610 003f1000
Feb 25 15:52:21 gargoyle kernel: 003f1000 35000000 35000000 00400000 c0101354 c043ca14 c043c9e0 353f1000
Feb 25 15:52:21 gargoyle kernel: Call Trace:
Feb 25 15:52:21 gargoyle kernel: [<c014eae3>] pte_alloc_map+0xd9/0x12e
Feb 25 15:52:21 gargoyle kernel: [<c01167e9>] flush_tlb_mm+0x47/0x8c
Feb 25 15:52:21 gargoyle kernel: [<c0150084>] remap_page_range+0x1ae/0x218
Feb 25 15:52:21 gargoyle kernel: [<c013dae7>] identity_map_pages+0xf7/0x130
Feb 25 15:52:21 gargoyle kernel: [<c013dbd4>] kimage_alloc_reboot_code_pages+0xb4/0x164
Feb 25 15:52:21 gargoyle kernel: [<c013d928>] kimage_alloc+0x100/0x186
Feb 25 15:52:21 gargoyle kernel: [<c013e496>] sys_kexec_load+0x9c/0xff
Feb 25 15:52:21 gargoyle kernel: [<c0109637>] syscall_call+0x7/0xb
Feb 25 15:52:21 gargoyle kernel:
Feb 25 15:52:21 gargoyle kernel: Code: 0f 0b 67 01 ee df 3d c0 e9 d7 fe ff ff 0f 0b 64 01 ee df 3d
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Fastboot] Re: kexec "problem" [and patch updates]
2004-02-27 19:32 ` Randy.Dunlap
@ 2004-02-28 10:41 ` Eric W. Biederman
2004-03-04 13:03 ` Hariprasad Nellitheertha
0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2004-02-28 10:41 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: r3pek, fastboot, linux-kernel
"Randy.Dunlap" <rddunlap@osdl.org> writes:
> On 27 Feb 2004 01:00:04 -0700 Eric W. Biederman wrote:
>
> | > It works fine on 2.6.2. It works for me on 2.6.3 if not SMP.
> | > If the kernel is built for SMP, when running kexec, I get a
> | > BUG in arch/i386/kernel/smp.c at line 359.
> | > I'm testing various workarounds for that BUG now.
> |
> | I will eyeball it...
> |
> | Is it the kernel that is shutting down, or the kernel that is being
> | brought up that has problems?
>
> the kernel that is shutting down.
>
> | The back trace from the BUG would be interesting.
>
> see below. my bad. i should have included it.
>
> | As I see it flush_tlb_others is being called when we have shutdown
> | cpus and the kernel still thinks we have the mm present on foreign
> | cpus.
>
> Martin Bligh thinks that there is a tlb race here.
> I printed the 2 cpu masks on my dual-proc macine and saw
> 0 in one of them and 0xc in the other one.
Ouch we have both cpus running when this happens, and we have not
started any shutdown whatsoever. This is the bit that sets up
the page tables for later use...
I think identity_map_pages will have problems with a kernel that does
the 4G/4G split, and it has known issues on some other architectures,
because they treat init_mm specially. So the proper solution may be
to simply rewrite identity_map_pages.
Before we do that in the short term we need to see if
identity_map_pages is actually doing anything bad. You are
not using the 4G/4G split so that is not the cause. So either
init_mm is now special in some way, or we have hit a generic kernel
bug.
So this may indeed be a tlb race. But it is init_mm->cpu_vm_mask and
cpu_online map that are different. With the implication being
that init_mm->cpu_vm_mask has cpus set that are not in cpu_online_map?
Very weird especially on SMP.
Without attribution I have a hard time making sense of which cpumask
is which so I can't draw any conclusions. But I find it very
interesting that it is bits 2 and 3 that are set. I wonder if
there is any mixup between logical cpu identities and apic ids.
Eric
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Fastboot] Re: kexec "problem" [and patch updates]
2004-02-28 10:41 ` Eric W. Biederman
@ 2004-03-04 13:03 ` Hariprasad Nellitheertha
2004-03-08 0:32 ` Eric W. Biederman
2004-03-08 18:35 ` Randy.Dunlap
0 siblings, 2 replies; 15+ messages in thread
From: Hariprasad Nellitheertha @ 2004-03-04 13:03 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Randy.Dunlap, r3pek, fastboot, linux-kernel
Hello,
I recreated this on a UNI system running an SMP kernel as well.
The problem is because we now initialize cpu_vm_mask for init_mm with
CPU_MASK_ALL (from 2.6.3 onwards) which makes all bits in cpumask 1.
Hence BUG_ON(!cpus_equal(cpumask,tmp) fails. The change to set
cpu_vm_mask to CPU_MASK_ALL was done to remove tlb flush optimizations
for ppc64. On UNI kernels, CPU_MASK_ALL is 1 and hence the problem
does not occur.
I made a small patch which fixes this problem. The change is, essentially,
to use "tmp" instead of "cpumask". This ensures that only the (other) online
cpus are sent the IPI.
I have done some testing with this patch. Kexec loads fine and I haven't seen
anything untoward.
Comments please.
Regards, Hari
diff -Naur linux-2.6.3-before/arch/i386/kernel/smp.c linux-2.6.3/arch/i386/kernel/smp.c
--- linux-2.6.3-before/arch/i386/kernel/smp.c 2004-02-18 09:27:15.000000000 +0530
+++ linux-2.6.3/arch/i386/kernel/smp.c 2004-03-04 14:16:43.000000000 +0530
@@ -356,7 +356,8 @@
BUG_ON(cpus_empty(cpumask));
cpus_and(tmp, cpumask, cpu_online_map);
- BUG_ON(!cpus_equal(cpumask, tmp));
+ if(cpus_empty(tmp))
+ return;
BUG_ON(cpu_isset(smp_processor_id(), cpumask));
BUG_ON(!mm);
@@ -371,12 +372,12 @@
flush_mm = mm;
flush_va = va;
#if NR_CPUS <= BITS_PER_LONG
- atomic_set_mask(cpumask, &flush_cpumask);
+ atomic_set_mask(tmp, &flush_cpumask);
#else
{
int k;
unsigned long *flush_mask = (unsigned long *)&flush_cpumask;
- unsigned long *cpu_mask = (unsigned long *)&cpumask;
+ unsigned long *cpu_mask = (unsigned long *)&tmp;
for (k = 0; k < BITS_TO_LONGS(NR_CPUS); ++k)
atomic_set_mask(cpu_mask[k], &flush_mask[k]);
}
@@ -385,7 +386,7 @@
* We have to send the IPI only to
* CPUs affected.
*/
- send_IPI_mask(cpumask, INVALIDATE_TLB_VECTOR);
+ send_IPI_mask(tmp, INVALIDATE_TLB_VECTOR);
while (!cpus_empty(flush_cpumask))
/* nothing. lockup detection does not belong here */
On Sat, Feb 28, 2004 at 03:41:33AM -0700, Eric W. Biederman wrote:
> "Randy.Dunlap" <rddunlap@osdl.org> writes:
>
> > On 27 Feb 2004 01:00:04 -0700 Eric W. Biederman wrote:
> >
> > | > It works fine on 2.6.2. It works for me on 2.6.3 if not SMP.
> > | > If the kernel is built for SMP, when running kexec, I get a
> > | > BUG in arch/i386/kernel/smp.c at line 359.
> > | > I'm testing various workarounds for that BUG now.
> > |
> > | I will eyeball it...
> > |
> > | Is it the kernel that is shutting down, or the kernel that is being
> > | brought up that has problems?
> >
> > the kernel that is shutting down.
> >
> > | The back trace from the BUG would be interesting.
> >
> > see below. my bad. i should have included it.
> >
> > | As I see it flush_tlb_others is being called when we have shutdown
> > | cpus and the kernel still thinks we have the mm present on foreign
> > | cpus.
> >
> > Martin Bligh thinks that there is a tlb race here.
> > I printed the 2 cpu masks on my dual-proc macine and saw
> > 0 in one of them and 0xc in the other one.
>
> Ouch we have both cpus running when this happens, and we have not
> started any shutdown whatsoever. This is the bit that sets up
> the page tables for later use...
>
> I think identity_map_pages will have problems with a kernel that does
> the 4G/4G split, and it has known issues on some other architectures,
> because they treat init_mm specially. So the proper solution may be
> to simply rewrite identity_map_pages.
>
> Before we do that in the short term we need to see if
> identity_map_pages is actually doing anything bad. You are
> not using the 4G/4G split so that is not the cause. So either
> init_mm is now special in some way, or we have hit a generic kernel
> bug.
>
> So this may indeed be a tlb race. But it is init_mm->cpu_vm_mask and
> cpu_online map that are different. With the implication being
> that init_mm->cpu_vm_mask has cpus set that are not in cpu_online_map?
> Very weird especially on SMP.
>
> Without attribution I have a hard time making sense of which cpumask
> is which so I can't draw any conclusions. But I find it very
> interesting that it is bits 2 and 3 that are set. I wonder if
> there is any mixup between logical cpu identities and apic ids.
>
> Eric
> _______________________________________________
> fastboot mailing list
> fastboot@lists.osdl.org
> http://lists.osdl.org/mailman/listinfo/fastboot
--
Hariprasad Nellitheertha
Linux Technology Center
India Software Labs
IBM India, Bangalore
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Fastboot] Re: kexec "problem" [and patch updates]
2004-03-04 13:03 ` Hariprasad Nellitheertha
@ 2004-03-08 0:32 ` Eric W. Biederman
2004-03-08 18:35 ` Randy.Dunlap
1 sibling, 0 replies; 15+ messages in thread
From: Eric W. Biederman @ 2004-03-08 0:32 UTC (permalink / raw)
To: hari; +Cc: Randy.Dunlap, r3pek, fastboot, linux-kernel
Hariprasad Nellitheertha <hari@in.ibm.com> writes:
> Hello,
>
> I recreated this on a UNI system running an SMP kernel as well.
>
> The problem is because we now initialize cpu_vm_mask for init_mm with
> CPU_MASK_ALL (from 2.6.3 onwards) which makes all bits in cpumask 1.
> Hence BUG_ON(!cpus_equal(cpumask,tmp) fails. The change to set
> cpu_vm_mask to CPU_MASK_ALL was done to remove tlb flush optimizations
> for ppc64. On UNI kernels, CPU_MASK_ALL is 1 and hence the problem
> does not occur.
So the problem is that CPU_MASK_ALL includes cpus that are not currently
online. So it has gone from being wrong by including too few cpus
to being wrong by including too many cpus.
> I made a small patch which fixes this problem. The change is, essentially,
> to use "tmp" instead of "cpumask". This ensures that only the (other) online
> cpus are sent the IPI.
>
> I have done some testing with this patch. Kexec loads fine and I haven't seen
> anything untoward.
>
> Comments please.
Any chance we can fix this right and get a proper value in cpu_vm_mask
for init_mm? All that needs to happen is that each cpu as it is
started up is included in cpu_vm_mask.
The reason kexec sees this is that it is possibly the only generic
modifier of init_mm.
If fixing this needs to be kexec specific we need to simply remove
using init_mm.
Eric
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Fastboot] Re: kexec "problem" [and patch updates]
2004-03-04 13:03 ` Hariprasad Nellitheertha
2004-03-08 0:32 ` Eric W. Biederman
@ 2004-03-08 18:35 ` Randy.Dunlap
1 sibling, 0 replies; 15+ messages in thread
From: Randy.Dunlap @ 2004-03-08 18:35 UTC (permalink / raw)
To: hari; +Cc: ebiederm, r3pek, fastboot, linux-kernel
On Thu, 4 Mar 2004 18:33:10 +0530 Hariprasad Nellitheertha wrote:
| Hello,
|
| I recreated this on a UNI system running an SMP kernel as well.
|
| The problem is because we now initialize cpu_vm_mask for init_mm with
| CPU_MASK_ALL (from 2.6.3 onwards) which makes all bits in cpumask 1.
| Hence BUG_ON(!cpus_equal(cpumask,tmp) fails. The change to set
| cpu_vm_mask to CPU_MASK_ALL was done to remove tlb flush optimizations
| for ppc64. On UNI kernels, CPU_MASK_ALL is 1 and hence the problem
| does not occur.
|
| I made a small patch which fixes this problem. The change is, essentially,
| to use "tmp" instead of "cpumask". This ensures that only the (other) online
| cpus are sent the IPI.
|
| I have done some testing with this patch. Kexec loads fine and I haven't seen
| anything untoward.
Yes, that does work well... Thanks for the patch.
Is this satisfactory for pushing into the mainline kernel, or
should kexec use another method to solve this problem?
--
~Randy
| Comments please.
|
| Regards, Hari
|
|
| diff -Naur linux-2.6.3-before/arch/i386/kernel/smp.c linux-2.6.3/arch/i386/kernel/smp.c
| --- linux-2.6.3-before/arch/i386/kernel/smp.c 2004-02-18 09:27:15.000000000 +0530
| +++ linux-2.6.3/arch/i386/kernel/smp.c 2004-03-04 14:16:43.000000000 +0530
| @@ -356,7 +356,8 @@
| BUG_ON(cpus_empty(cpumask));
|
| cpus_and(tmp, cpumask, cpu_online_map);
| - BUG_ON(!cpus_equal(cpumask, tmp));
| + if(cpus_empty(tmp))
| + return;
| BUG_ON(cpu_isset(smp_processor_id(), cpumask));
| BUG_ON(!mm);
|
| @@ -371,12 +372,12 @@
| flush_mm = mm;
| flush_va = va;
| #if NR_CPUS <= BITS_PER_LONG
| - atomic_set_mask(cpumask, &flush_cpumask);
| + atomic_set_mask(tmp, &flush_cpumask);
| #else
| {
| int k;
| unsigned long *flush_mask = (unsigned long *)&flush_cpumask;
| - unsigned long *cpu_mask = (unsigned long *)&cpumask;
| + unsigned long *cpu_mask = (unsigned long *)&tmp;
| for (k = 0; k < BITS_TO_LONGS(NR_CPUS); ++k)
| atomic_set_mask(cpu_mask[k], &flush_mask[k]);
| }
| @@ -385,7 +386,7 @@
| * We have to send the IPI only to
| * CPUs affected.
| */
| - send_IPI_mask(cpumask, INVALIDATE_TLB_VECTOR);
| + send_IPI_mask(tmp, INVALIDATE_TLB_VECTOR);
|
| while (!cpus_empty(flush_cpumask))
| /* nothing. lockup detection does not belong here */
|
|
| On Sat, Feb 28, 2004 at 03:41:33AM -0700, Eric W. Biederman wrote:
| > "Randy.Dunlap" <rddunlap@osdl.org> writes:
| >
| > > On 27 Feb 2004 01:00:04 -0700 Eric W. Biederman wrote:
| > >
| > > | > It works fine on 2.6.2. It works for me on 2.6.3 if not SMP.
| > > | > If the kernel is built for SMP, when running kexec, I get a
| > > | > BUG in arch/i386/kernel/smp.c at line 359.
| > > | > I'm testing various workarounds for that BUG now.
| > > |
| > > | I will eyeball it...
| > > |
| > > | Is it the kernel that is shutting down, or the kernel that is being
| > > | brought up that has problems?
| > >
| > > the kernel that is shutting down.
| > >
| > > | The back trace from the BUG would be interesting.
| > >
| > > see below. my bad. i should have included it.
| > >
| > > | As I see it flush_tlb_others is being called when we have shutdown
| > > | cpus and the kernel still thinks we have the mm present on foreign
| > > | cpus.
| > >
| > > Martin Bligh thinks that there is a tlb race here.
| > > I printed the 2 cpu masks on my dual-proc macine and saw
| > > 0 in one of them and 0xc in the other one.
| >
| > Ouch we have both cpus running when this happens, and we have not
| > started any shutdown whatsoever. This is the bit that sets up
| > the page tables for later use...
| >
| > I think identity_map_pages will have problems with a kernel that does
| > the 4G/4G split, and it has known issues on some other architectures,
| > because they treat init_mm specially. So the proper solution may be
| > to simply rewrite identity_map_pages.
| >
| > Before we do that in the short term we need to see if
| > identity_map_pages is actually doing anything bad. You are
| > not using the 4G/4G split so that is not the cause. So either
| > init_mm is now special in some way, or we have hit a generic kernel
| > bug.
| >
| > So this may indeed be a tlb race. But it is init_mm->cpu_vm_mask and
| > cpu_online map that are different. With the implication being
| > that init_mm->cpu_vm_mask has cpus set that are not in cpu_online_map?
| > Very weird especially on SMP.
| >
| > Without attribution I have a hard time making sense of which cpumask
| > is which so I can't draw any conclusions. But I find it very
| > interesting that it is bits 2 and 3 that are set. I wonder if
| > there is any mixup between logical cpu identities and apic ids.
| >
| > Eric
| > _______________________________________________
|
| --
| Hariprasad Nellitheertha
| Linux Technology Center
| India Software Labs
| IBM India, Bangalore
| -
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2004-03-08 18:37 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-02-24 16:03 Latest AIO patchset Suparna Bhattacharya
2004-02-24 11:02 ` kexec "problem" Carlos Silva
2004-02-24 17:10 ` Randy.Dunlap
2004-02-24 17:24 ` Carlos Silva
2004-02-27 0:54 ` kexec "problem" [and patch updates] Randy.Dunlap
2004-02-27 8:00 ` [Fastboot] " Eric W. Biederman
2004-02-27 19:32 ` Randy.Dunlap
2004-02-28 10:41 ` Eric W. Biederman
2004-03-04 13:03 ` Hariprasad Nellitheertha
2004-03-08 0:32 ` Eric W. Biederman
2004-03-08 18:35 ` Randy.Dunlap
2004-02-25 18:45 ` Latest AIO patchset Hayim Shaul
2004-02-26 0:27 ` Benjamin LaHaise
2004-02-26 13:30 ` Hayim Shaul
2004-02-26 16:45 ` Daniel McNeil
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.