All of lore.kernel.org
 help / color / mirror / Atom feed
* kexec "problem"
  2004-02-24 16:03 Latest AIO patchset Suparna Bhattacharya
@ 2004-02-24 11:02 ` Carlos Silva
  2004-02-24 17:10   ` Randy.Dunlap
  2004-02-27  0:54   ` kexec "problem" [and patch updates] Randy.Dunlap
  2004-02-25 18:45 ` Latest AIO patchset Hayim Shaul
  1 sibling, 2 replies; 15+ messages in thread
From: Carlos Silva @ 2004-02-24 11:02 UTC (permalink / raw)
  To: linux-kernel

hi guys,

i have just compiled a kernel with the kexec patch. compiled kexec-tools
and when i try to load a kernel, it gives me this:
# ./do-kexec.sh /boot/bzImage-2.6.2-g
kexec_load failed: Invalid argument
entry       = 0x91764
nr_segments = 2
segment[0].buf   = 0x80b3480
segment[0].bufsz = 1880
segment[0].mem   = 0x90000
segment[0].memsz = 1880
segment[1].buf   = 0x40001008
segment[1].bufsz = 19795a
segment[1].mem   = 0x100000
segment[1].memsz = 19795a

anyone tried to run kexec and actually did it? i'm trying with kernel 2.6.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Latest AIO patchset
@ 2004-02-24 16:03 Suparna Bhattacharya
  2004-02-24 11:02 ` kexec "problem" Carlos Silva
  2004-02-25 18:45 ` Latest AIO patchset Hayim Shaul
  0 siblings, 2 replies; 15+ messages in thread
From: Suparna Bhattacharya @ 2004-02-24 16:03 UTC (permalink / raw)
  To: akpm, linux-aio; +Cc: linux-kernel


The latest set of AIO patches are being maintained at:
http://www.kernel.org/pub/linux/kernel/people/suparna/aio/

The patches have been reduced to a minimal set that addresses
the most relevant blocking points. Please let me know if you
think there is a case for bringing in any of the additional 
patches.
(The patches need to be applied in the order mention in the 
'series' file)

A new addition to the patchset is Chris Mason's nice and simple
implementation of AIO support for pipes using the retry 
infrastructure. He also fixed some problems in the AIO cancel
logic to make it play well with retries.

Besides this and some re-organization and cleaning up, there
are a couple of changes since the last set of patches in -mm, that
are worth a mention:

- Upfront readahead is now clipped to the readahead limit for
  the device (ra_pages) and happens only for AIO. This helps
  address the sendfile regression seen by Felix von Leitner.
  If your AIO read requests are likely to exceed the default 
  readahead size, then use hdparm -a <new size> to tune it.
  The patchset also currently includes Ram Pai's adaptive
  lazy readahead code which is required for good streaming AIO
  read performance.
- David Brownell's suggestion of enabling the fops to set up
  their own retry methods. This should make his USB gadgetfs
  AIO patch co-exist smoothly with fsaio.

Some basic results are up comparing streaming non-cached random 
AIO read/write throughputs for a single ext3 file using aio-stress 
for various io sizes with and without these patches, and also 
comparsions with O_DIRECT AIO throughputs. The short summary has
been a doubling of throughput using the fsaio patches, which is
also close to the results seen with O_DIRECT AIO.

As usual feedback, bug fixes, test results etc are welcome.

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: kexec "problem"
  2004-02-24 11:02 ` kexec "problem" Carlos Silva
@ 2004-02-24 17:10   ` Randy.Dunlap
  2004-02-24 17:24     ` Carlos Silva
  2004-02-27  0:54   ` kexec "problem" [and patch updates] Randy.Dunlap
  1 sibling, 1 reply; 15+ messages in thread
From: Randy.Dunlap @ 2004-02-24 17:10 UTC (permalink / raw)
  To: Carlos Silva; +Cc: linux-kernel

On Tue, 24 Feb 2004 11:02:21 -0000 (WET) Carlos Silva wrote:

| hi guys,
| 
| i have just compiled a kernel with the kexec patch. compiled kexec-tools
| and when i try to load a kernel, it gives me this:
| # ./do-kexec.sh /boot/bzImage-2.6.2-g
| kexec_load failed: Invalid argument
| entry       = 0x91764
| nr_segments = 2
| segment[0].buf   = 0x80b3480
| segment[0].bufsz = 1880
| segment[0].mem   = 0x90000
| segment[0].memsz = 1880
| segment[1].buf   = 0x40001008
| segment[1].bufsz = 19795a
| segment[1].mem   = 0x100000
| segment[1].memsz = 19795a
| 
| anyone tried to run kexec and actually did it? i'm trying with kernel 2.6.3

I haven't updated for 2.6.3 yet, or even tested it...
but I'll get to it soon.

--
~Randy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: kexec "problem"
  2004-02-24 17:10   ` Randy.Dunlap
@ 2004-02-24 17:24     ` Carlos Silva
  0 siblings, 0 replies; 15+ messages in thread
From: Carlos Silva @ 2004-02-24 17:24 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: linux-kernel

> On Tue, 24 Feb 2004 11:02:21 -0000 (WET) Carlos Silva wrote:
>
> | hi guys,
> |
> | i have just compiled a kernel with the kexec patch. compiled kexec-tools
> | and when i try to load a kernel, it gives me this:
> | # ./do-kexec.sh /boot/bzImage-2.6.2-g
> | kexec_load failed: Invalid argument
> | entry       = 0x91764
> | nr_segments = 2
> | segment[0].buf   = 0x80b3480
> | segment[0].bufsz = 1880
> | segment[0].mem   = 0x90000
> | segment[0].memsz = 1880
> | segment[1].buf   = 0x40001008
> | segment[1].bufsz = 19795a
> | segment[1].mem   = 0x100000
> | segment[1].memsz = 19795a
> |
> | anyone tried to run kexec and actually did it? i'm trying with kernel
> 2.6.3
>
> I haven't updated for 2.6.3 yet, or even tested it...
> but I'll get to it soon.
>
> --
> ~Randy
>


just changed line 13 on kexec.h from:
"long kexec_load(void *entry, unsigned long nr_segments, struct
kexec_segment *segments, unsigned long);"
to:
"long kexec_load(void *entry, unsigned long nr_segments, struct
kexec_segment *segments, unsigned long flags);"

and it gives me this error:
# ./do-kexec.sh /boot/bzImage-2.6.2-g
kexec_load failed: Function not implemented
entry       = 0x91764
nr_segments = 2
segment[0].buf   = 0x80b3480
segment[0].bufsz = 1880
segment[0].mem   = 0x90000
segment[0].memsz = 1880
segment[1].buf   = 0x40001008
segment[1].bufsz = 19795a
segment[1].mem   = 0x100000
segment[1].memsz = 19795a


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latest AIO patchset
  2004-02-24 16:03 Latest AIO patchset Suparna Bhattacharya
  2004-02-24 11:02 ` kexec "problem" Carlos Silva
@ 2004-02-25 18:45 ` Hayim Shaul
  2004-02-26  0:27   ` Benjamin LaHaise
  1 sibling, 1 reply; 15+ messages in thread
From: Hayim Shaul @ 2004-02-25 18:45 UTC (permalink / raw)
  To: Suparna Bhattacharya; +Cc: akpm, linux-aio, linux-kernel

> 
> Some basic results are up comparing streaming non-cached random 
> AIO read/write throughputs for a single ext3 file using aio-stress 
> for various io sizes with and without these patches, and also 
> comparsions with O_DIRECT AIO throughputs. The short summary has
> been a doubling of throughput using the fsaio patches, which is
> also close to the results seen with O_DIRECT AIO.
> 
> As usual feedback, bug fixes, test results etc are welcome.
> 

What exactly is the O_DIRECT flag? When I add this flag to the open func
it fails.

More specificaly, this function fails
  open("filename", O_RDWR | O_DIRECT | O_LARGEFILE | O_CREAT, S_IRWXU);   

but this one succeeds
  open("filename", O_RDWR | O_LARGEFILE | O_CREAT, S_IRWXU);   

I'm running linux 2.6.0 with libaio 0.3.92.

   Hayim.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latest AIO patchset
  2004-02-25 18:45 ` Latest AIO patchset Hayim Shaul
@ 2004-02-26  0:27   ` Benjamin LaHaise
  2004-02-26 13:30     ` Hayim Shaul
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin LaHaise @ 2004-02-26  0:27 UTC (permalink / raw)
  To: Hayim Shaul; +Cc: Suparna Bhattacharya, akpm, linux-aio, linux-kernel

On Wed, Feb 25, 2004 at 08:45:29PM +0200, Hayim Shaul wrote:
> What exactly is the O_DIRECT flag? When I add this flag to the open func
> it fails.
> 
> More specificaly, this function fails
>   open("filename", O_RDWR | O_DIRECT | O_LARGEFILE | O_CREAT, S_IRWXU);   
> 
> but this one succeeds
>   open("filename", O_RDWR | O_LARGEFILE | O_CREAT, S_IRWXU);   
> 
> I'm running linux 2.6.0 with libaio 0.3.92.

Which filesystem?  Not all support O_DIRECT.

		-ben

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latest AIO patchset
  2004-02-26  0:27   ` Benjamin LaHaise
@ 2004-02-26 13:30     ` Hayim Shaul
  2004-02-26 16:45       ` Daniel McNeil
  0 siblings, 1 reply; 15+ messages in thread
From: Hayim Shaul @ 2004-02-26 13:30 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Suparna Bhattacharya, akpm, linux-aio, linux-kernel

On Wed, 25 Feb 2004, Benjamin LaHaise wrote:

> On Wed, Feb 25, 2004 at 08:45:29PM +0200, Hayim Shaul wrote:
> > What exactly is the O_DIRECT flag? When I add this flag to the open func
> > it fails.
> > 
> > More specificaly, this function fails
> >   open("filename", O_RDWR | O_DIRECT | O_LARGEFILE | O_CREAT, S_IRWXU);   
> > 
> > but this one succeeds
> >   open("filename", O_RDWR | O_LARGEFILE | O_CREAT, S_IRWXU);   
> > 
> > I'm running linux 2.6.0 with libaio 0.3.92.
> 
> Which filesystem?  Not all support O_DIRECT.
> 

ext3
I'm think it does support ext3.

Actually, I was wrong. open does succeed. It return a valid fd
but after writing and exiting, the file is still zero size.

removing the O_DIRECT with the same prog writes quite alot to the file.

Hayim.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Latest AIO patchset
  2004-02-26 13:30     ` Hayim Shaul
@ 2004-02-26 16:45       ` Daniel McNeil
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel McNeil @ 2004-02-26 16:45 UTC (permalink / raw)
  To: Hayim Shaul
  Cc: Benjamin LaHaise, Suparna Bhattacharya, Andrew Morton, linux-aio,
	Linux Kernel Mailing List

Are you checking the return value of write()?

O_DIRECT has alignment requirements since it writes directly from
user-space to disk by passing the page cache.  Also the size
of the write has to be a multiple of 512 (for 2.6).
Try using posix_memalign() with pagesize as the alignment arg
to allocated the data buffer.  O_DIRECT does work on ext3.

Daniel

On Thu, 2004-02-26 at 05:30, Hayim Shaul wrote:
> On Wed, 25 Feb 2004, Benjamin LaHaise wrote:
> 
> > On Wed, Feb 25, 2004 at 08:45:29PM +0200, Hayim Shaul wrote:
> > > What exactly is the O_DIRECT flag? When I add this flag to the open func
> > > it fails.
> > > 
> > > More specificaly, this function fails
> > >   open("filename", O_RDWR | O_DIRECT | O_LARGEFILE | O_CREAT, S_IRWXU);   
> > > 
> > > but this one succeeds
> > >   open("filename", O_RDWR | O_LARGEFILE | O_CREAT, S_IRWXU);   
> > > 
> > > I'm running linux 2.6.0 with libaio 0.3.92.
> > 
> > Which filesystem?  Not all support O_DIRECT.
> > 
> 
> ext3
> I'm think it does support ext3.
> 
> Actually, I was wrong. open does succeed. It return a valid fd
> but after writing and exiting, the file is still zero size.
> 
> removing the O_DIRECT with the same prog writes quite alot to the file.
> 
> Hayim.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: kexec "problem" [and patch updates]
  2004-02-24 11:02 ` kexec "problem" Carlos Silva
  2004-02-24 17:10   ` Randy.Dunlap
@ 2004-02-27  0:54   ` Randy.Dunlap
  2004-02-27  8:00     ` [Fastboot] " Eric W. Biederman
  1 sibling, 1 reply; 15+ messages in thread
From: Randy.Dunlap @ 2004-02-27  0:54 UTC (permalink / raw)
  To: Carlos Silva; +Cc: linux-kernel, fastboot

On Tue, 24 Feb 2004 11:02:21 -0000 (WET) Carlos Silva wrote:

| hi guys,
| 
| i have just compiled a kernel with the kexec patch. compiled kexec-tools
| and when i try to load a kernel, it gives me this:
| # ./do-kexec.sh /boot/bzImage-2.6.2-g
| kexec_load failed: Invalid argument
| entry       = 0x91764
| nr_segments = 2
| segment[0].buf   = 0x80b3480
| segment[0].bufsz = 1880
| segment[0].mem   = 0x90000
| segment[0].memsz = 1880
| segment[1].buf   = 0x40001008
| segment[1].bufsz = 19795a
| segment[1].mem   = 0x100000
| segment[1].memsz = 19795a
| 
| anyone tried to run kexec and actually did it? i'm trying with kernel 2.6.3
| -

I updated the kexec patch for 2.6.2 and 2.6.3.
It works fine on 2.6.2.  It works for me on 2.6.3 if not SMP.
If the kernel is built for SMP, when running kexec, I get a
BUG in arch/i386/kernel/smp.c at line 359.
I'm testing various workarounds for that BUG now.

--
~Randy

kexec updates are at:
http://developer.osdl.org/rddunlap/kexec/2.6.2/
and
http://developer.osdl.org/rddunlap/kexec/2.6.3/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fastboot] Re: kexec "problem" [and patch updates]
  2004-02-27  0:54   ` kexec "problem" [and patch updates] Randy.Dunlap
@ 2004-02-27  8:00     ` Eric W. Biederman
  2004-02-27 19:32       ` Randy.Dunlap
  0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2004-02-27  8:00 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: Carlos Silva, fastboot, linux-kernel

"Randy.Dunlap" <rddunlap@osdl.org> writes:

> On Tue, 24 Feb 2004 11:02:21 -0000 (WET) Carlos Silva wrote:
> 
> | hi guys,
> | 
> | i have just compiled a kernel with the kexec patch. compiled kexec-tools
> | and when i try to load a kernel, it gives me this:
> | # ./do-kexec.sh /boot/bzImage-2.6.2-g
> | kexec_load failed: Invalid argument
> | entry       = 0x91764
> | nr_segments = 2
> | segment[0].buf   = 0x80b3480
> | segment[0].bufsz = 1880
> | segment[0].mem   = 0x90000
> | segment[0].memsz = 1880
> | segment[1].buf   = 0x40001008
> | segment[1].bufsz = 19795a
> | segment[1].mem   = 0x100000
> | segment[1].memsz = 19795a
> | 
> | anyone tried to run kexec and actually did it? i'm trying with kernel 2.6.3
> | -
> 
> I updated the kexec patch for 2.6.2 and 2.6.3.
> It works fine on 2.6.2.  It works for me on 2.6.3 if not SMP.
> If the kernel is built for SMP, when running kexec, I get a
> BUG in arch/i386/kernel/smp.c at line 359.
> I'm testing various workarounds for that BUG now.

I will eyeball it...

Is it the kernel that is shutting down, or the kernel that is being
brought up that has problems?

The back trace from the BUG would be interesting.

As I see it flush_tlb_others is being called when we have shutdown
cpus and the kernel still thinks we have the mm present on foreign
cpus.

So it appears we simply have a case that was not anticipated
by the authors of that code.  So we need to adjust either
the code we are calling or cpu_vm_mask so it does not list
other cpus after we have shut them down.

At least that it what it looks like at first glance.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fastboot] Re: kexec "problem" [and patch updates]
  2004-02-27  8:00     ` [Fastboot] " Eric W. Biederman
@ 2004-02-27 19:32       ` Randy.Dunlap
  2004-02-28 10:41         ` Eric W. Biederman
  0 siblings, 1 reply; 15+ messages in thread
From: Randy.Dunlap @ 2004-02-27 19:32 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: r3pek, fastboot, linux-kernel

On 27 Feb 2004 01:00:04 -0700 Eric W. Biederman wrote:

| "Randy.Dunlap" <rddunlap@osdl.org> writes:
| 
| > On Tue, 24 Feb 2004 11:02:21 -0000 (WET) Carlos Silva wrote:
| > 
| > | hi guys,
| > | 
| > | i have just compiled a kernel with the kexec patch. compiled kexec-tools
| > | and when i try to load a kernel, it gives me this:
| > | # ./do-kexec.sh /boot/bzImage-2.6.2-g
| > | kexec_load failed: Invalid argument
| > | entry       = 0x91764
| > | nr_segments = 2
| > | segment[0].buf   = 0x80b3480
| > | segment[0].bufsz = 1880
| > | segment[0].mem   = 0x90000
| > | segment[0].memsz = 1880
| > | segment[1].buf   = 0x40001008
| > | segment[1].bufsz = 19795a
| > | segment[1].mem   = 0x100000
| > | segment[1].memsz = 19795a
| > | 
| > | anyone tried to run kexec and actually did it? i'm trying with kernel 2.6.3
| > | -
| > 
| > I updated the kexec patch for 2.6.2 and 2.6.3.
| > It works fine on 2.6.2.  It works for me on 2.6.3 if not SMP.
| > If the kernel is built for SMP, when running kexec, I get a
| > BUG in arch/i386/kernel/smp.c at line 359.
| > I'm testing various workarounds for that BUG now.
| 
| I will eyeball it...
| 
| Is it the kernel that is shutting down, or the kernel that is being
| brought up that has problems?

the kernel that is shutting down.

| The back trace from the BUG would be interesting.

see below.  my bad.  i should have included it.

| As I see it flush_tlb_others is being called when we have shutdown
| cpus and the kernel still thinks we have the mm present on foreign
| cpus.

Martin Bligh thinks that there is a tlb race here.
I printed the 2 cpu masks on my dual-proc macine and saw
0 in one of them and 0xc in the other one.

| So it appears we simply have a case that was not anticipated
| by the authors of that code.  So we need to adjust either
| the code we are calling or cpu_vm_mask so it does not list
| other cpus after we have shut them down.
| 
| At least that it what it looks like at first glance.

Thanks,
--
~Randy


Feb 25 15:52:21 gargoyle kernel: ------------[ cut here ]------------
Feb 25 15:52:21 gargoyle kernel: kernel BUG at arch/i386/kernel/smp.c:359!
Feb 25 15:52:21 gargoyle kernel: invalid operand: 0000 [#1]
Feb 25 15:52:21 gargoyle kernel: CPU:    1
Feb 25 15:52:21 gargoyle kernel: EIP:    0060:[<c011673d>]    Not tainted
Feb 25 15:52:21 gargoyle kernel: EFLAGS: 00010206
Feb 25 15:52:21 gargoyle kernel: EIP is at flush_tlb_others+0x141/0x15c
Feb 25 15:52:21 gargoyle kernel: eax: 00000000   ebx: c043c9e0   ecx: c043c9e0   edx: 0000000c
Feb 25 15:52:21 gargoyle kernel: esi: f53effc4   edi: 00851da8   ebp: f5449ebc   esp: f5449ea8
Feb 25 15:52:21 gargoyle kernel: ds: 007b   es: 007b   ss: 0068
Feb 25 15:52:21 gargoyle kernel: Process kexec (pid: 1095, threadinfo=f5448000 task=f54719b0)
Feb 25 15:52:21 gargoyle kernel: Stack: f5449ed4 c014eae3 c1851d58 353f0000 00000000 f5449ed4 c01167e9 0000000c 
Feb 25 15:52:21 gargoyle kernel:        c043c9e0 ffffffff 0000000c f5449f20 c0150084 c043c9e0 f61d7610 003f1000 
Feb 25 15:52:21 gargoyle kernel:        003f1000 35000000 35000000 00400000 c0101354 c043ca14 c043c9e0 353f1000 
Feb 25 15:52:21 gargoyle kernel: Call Trace:
Feb 25 15:52:21 gargoyle kernel:  [<c014eae3>] pte_alloc_map+0xd9/0x12e
Feb 25 15:52:21 gargoyle kernel:  [<c01167e9>] flush_tlb_mm+0x47/0x8c
Feb 25 15:52:21 gargoyle kernel:  [<c0150084>] remap_page_range+0x1ae/0x218
Feb 25 15:52:21 gargoyle kernel:  [<c013dae7>] identity_map_pages+0xf7/0x130
Feb 25 15:52:21 gargoyle kernel:  [<c013dbd4>] kimage_alloc_reboot_code_pages+0xb4/0x164
Feb 25 15:52:21 gargoyle kernel:  [<c013d928>] kimage_alloc+0x100/0x186
Feb 25 15:52:21 gargoyle kernel:  [<c013e496>] sys_kexec_load+0x9c/0xff
Feb 25 15:52:21 gargoyle kernel:  [<c0109637>] syscall_call+0x7/0xb
Feb 25 15:52:21 gargoyle kernel: 
Feb 25 15:52:21 gargoyle kernel: Code: 0f 0b 67 01 ee df 3d c0 e9 d7 fe ff ff 0f 0b 64 01 ee df 3d 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fastboot] Re: kexec "problem" [and patch updates]
  2004-02-27 19:32       ` Randy.Dunlap
@ 2004-02-28 10:41         ` Eric W. Biederman
  2004-03-04 13:03           ` Hariprasad Nellitheertha
  0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2004-02-28 10:41 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: r3pek, fastboot, linux-kernel

"Randy.Dunlap" <rddunlap@osdl.org> writes:

> On 27 Feb 2004 01:00:04 -0700 Eric W. Biederman wrote:
> 
> | > It works fine on 2.6.2.  It works for me on 2.6.3 if not SMP.
> | > If the kernel is built for SMP, when running kexec, I get a
> | > BUG in arch/i386/kernel/smp.c at line 359.
> | > I'm testing various workarounds for that BUG now.
> | 
> | I will eyeball it...
> | 
> | Is it the kernel that is shutting down, or the kernel that is being
> | brought up that has problems?
> 
> the kernel that is shutting down.
> 
> | The back trace from the BUG would be interesting.
> 
> see below.  my bad.  i should have included it.
> 
> | As I see it flush_tlb_others is being called when we have shutdown
> | cpus and the kernel still thinks we have the mm present on foreign
> | cpus.
> 
> Martin Bligh thinks that there is a tlb race here.
> I printed the 2 cpu masks on my dual-proc macine and saw
> 0 in one of them and 0xc in the other one.

Ouch we have both cpus running when this happens, and we have not
started any shutdown whatsoever.  This is the bit that sets up
the page tables for later use...

I think identity_map_pages will have problems with a kernel that does
the 4G/4G split, and it has known issues on some other architectures,
because they treat init_mm specially.  So the proper solution may be
to simply rewrite identity_map_pages. 

Before we do that in the short term we need to see if
identity_map_pages is actually doing anything bad.  You are
not using the 4G/4G split so that is not the cause.  So either
init_mm is now special in some way, or we have hit a generic kernel
bug.

So this may indeed be a tlb race.  But it is init_mm->cpu_vm_mask and
cpu_online map that are different.  With the implication being
that init_mm->cpu_vm_mask has cpus set that are not in cpu_online_map?
Very weird especially on SMP.

Without attribution I have a hard time making sense of which cpumask
is which so I can't draw any conclusions.  But I find it very
interesting that it is bits 2 and 3 that are set.  I wonder if
there is any mixup between logical cpu identities and apic ids.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fastboot] Re: kexec "problem" [and patch updates]
  2004-02-28 10:41         ` Eric W. Biederman
@ 2004-03-04 13:03           ` Hariprasad Nellitheertha
  2004-03-08  0:32             ` Eric W. Biederman
  2004-03-08 18:35             ` Randy.Dunlap
  0 siblings, 2 replies; 15+ messages in thread
From: Hariprasad Nellitheertha @ 2004-03-04 13:03 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Randy.Dunlap, r3pek, fastboot, linux-kernel

Hello,

I recreated this on a UNI system running an SMP kernel as well. 

The problem is because we now initialize cpu_vm_mask for init_mm with 
CPU_MASK_ALL (from 2.6.3 onwards) which makes all bits in cpumask 1. 
Hence BUG_ON(!cpus_equal(cpumask,tmp) fails. The change to set 
cpu_vm_mask to CPU_MASK_ALL was done to remove tlb flush optimizations 
for ppc64. On UNI kernels, CPU_MASK_ALL is 1 and hence the problem 
does not occur.

I made a small patch which fixes this problem. The change is, essentially,
to use "tmp" instead of "cpumask". This ensures that only the (other) online 
cpus are sent the IPI. 

I have done some testing with this patch. Kexec loads fine and I haven't seen
anything untoward. 

Comments please.

Regards, Hari


diff -Naur linux-2.6.3-before/arch/i386/kernel/smp.c linux-2.6.3/arch/i386/kernel/smp.c
--- linux-2.6.3-before/arch/i386/kernel/smp.c	2004-02-18 09:27:15.000000000 +0530
+++ linux-2.6.3/arch/i386/kernel/smp.c	2004-03-04 14:16:43.000000000 +0530
@@ -356,7 +356,8 @@
 	BUG_ON(cpus_empty(cpumask));
 
 	cpus_and(tmp, cpumask, cpu_online_map);
-	BUG_ON(!cpus_equal(cpumask, tmp));
+	if(cpus_empty(tmp))
+		return;
 	BUG_ON(cpu_isset(smp_processor_id(), cpumask));
 	BUG_ON(!mm);
 
@@ -371,12 +372,12 @@
 	flush_mm = mm;
 	flush_va = va;
 #if NR_CPUS <= BITS_PER_LONG
-	atomic_set_mask(cpumask, &flush_cpumask);
+	atomic_set_mask(tmp, &flush_cpumask);
 #else
 	{
 		int k;
 		unsigned long *flush_mask = (unsigned long *)&flush_cpumask;
-		unsigned long *cpu_mask = (unsigned long *)&cpumask;
+		unsigned long *cpu_mask = (unsigned long *)&tmp;
 		for (k = 0; k < BITS_TO_LONGS(NR_CPUS); ++k)
 			atomic_set_mask(cpu_mask[k], &flush_mask[k]);
 	}
@@ -385,7 +386,7 @@
 	 * We have to send the IPI only to
 	 * CPUs affected.
 	 */
-	send_IPI_mask(cpumask, INVALIDATE_TLB_VECTOR);
+	send_IPI_mask(tmp, INVALIDATE_TLB_VECTOR);
 
 	while (!cpus_empty(flush_cpumask))
 		/* nothing. lockup detection does not belong here */


On Sat, Feb 28, 2004 at 03:41:33AM -0700, Eric W. Biederman wrote:
> "Randy.Dunlap" <rddunlap@osdl.org> writes:
> 
> > On 27 Feb 2004 01:00:04 -0700 Eric W. Biederman wrote:
> > 
> > | > It works fine on 2.6.2.  It works for me on 2.6.3 if not SMP.
> > | > If the kernel is built for SMP, when running kexec, I get a
> > | > BUG in arch/i386/kernel/smp.c at line 359.
> > | > I'm testing various workarounds for that BUG now.
> > | 
> > | I will eyeball it...
> > | 
> > | Is it the kernel that is shutting down, or the kernel that is being
> > | brought up that has problems?
> > 
> > the kernel that is shutting down.
> > 
> > | The back trace from the BUG would be interesting.
> > 
> > see below.  my bad.  i should have included it.
> > 
> > | As I see it flush_tlb_others is being called when we have shutdown
> > | cpus and the kernel still thinks we have the mm present on foreign
> > | cpus.
> > 
> > Martin Bligh thinks that there is a tlb race here.
> > I printed the 2 cpu masks on my dual-proc macine and saw
> > 0 in one of them and 0xc in the other one.
> 
> Ouch we have both cpus running when this happens, and we have not
> started any shutdown whatsoever.  This is the bit that sets up
> the page tables for later use...
> 
> I think identity_map_pages will have problems with a kernel that does
> the 4G/4G split, and it has known issues on some other architectures,
> because they treat init_mm specially.  So the proper solution may be
> to simply rewrite identity_map_pages. 
> 
> Before we do that in the short term we need to see if
> identity_map_pages is actually doing anything bad.  You are
> not using the 4G/4G split so that is not the cause.  So either
> init_mm is now special in some way, or we have hit a generic kernel
> bug.
> 
> So this may indeed be a tlb race.  But it is init_mm->cpu_vm_mask and
> cpu_online map that are different.  With the implication being
> that init_mm->cpu_vm_mask has cpus set that are not in cpu_online_map?
> Very weird especially on SMP.
> 
> Without attribution I have a hard time making sense of which cpumask
> is which so I can't draw any conclusions.  But I find it very
> interesting that it is bits 2 and 3 that are set.  I wonder if
> there is any mixup between logical cpu identities and apic ids.
> 
> Eric
> _______________________________________________
> fastboot mailing list
> fastboot@lists.osdl.org
> http://lists.osdl.org/mailman/listinfo/fastboot

-- 
Hariprasad Nellitheertha
Linux Technology Center
India Software Labs
IBM India, Bangalore

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fastboot] Re: kexec "problem" [and patch updates]
  2004-03-04 13:03           ` Hariprasad Nellitheertha
@ 2004-03-08  0:32             ` Eric W. Biederman
  2004-03-08 18:35             ` Randy.Dunlap
  1 sibling, 0 replies; 15+ messages in thread
From: Eric W. Biederman @ 2004-03-08  0:32 UTC (permalink / raw)
  To: hari; +Cc: Randy.Dunlap, r3pek, fastboot, linux-kernel

Hariprasad Nellitheertha <hari@in.ibm.com> writes:

> Hello,
> 
> I recreated this on a UNI system running an SMP kernel as well. 
> 
> The problem is because we now initialize cpu_vm_mask for init_mm with 
> CPU_MASK_ALL (from 2.6.3 onwards) which makes all bits in cpumask 1. 
> Hence BUG_ON(!cpus_equal(cpumask,tmp) fails. The change to set 
> cpu_vm_mask to CPU_MASK_ALL was done to remove tlb flush optimizations 
> for ppc64. On UNI kernels, CPU_MASK_ALL is 1 and hence the problem 
> does not occur.

So the problem is that CPU_MASK_ALL includes cpus that are not currently
online.  So it has gone from being wrong by including too few cpus
to being wrong by including too many cpus.
 
> I made a small patch which fixes this problem. The change is, essentially,
> to use "tmp" instead of "cpumask". This ensures that only the (other) online 
> cpus are sent the IPI. 
> 
> I have done some testing with this patch. Kexec loads fine and I haven't seen
> anything untoward. 
> 
> Comments please.

Any chance we can fix this right and get a proper value in cpu_vm_mask
for init_mm?  All that needs to happen is that each cpu as it is
started up is included in cpu_vm_mask.

The reason kexec sees this is that it is possibly the only generic
modifier of init_mm. 

If fixing this needs to be kexec specific we need to simply remove
using init_mm.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fastboot] Re: kexec "problem" [and patch updates]
  2004-03-04 13:03           ` Hariprasad Nellitheertha
  2004-03-08  0:32             ` Eric W. Biederman
@ 2004-03-08 18:35             ` Randy.Dunlap
  1 sibling, 0 replies; 15+ messages in thread
From: Randy.Dunlap @ 2004-03-08 18:35 UTC (permalink / raw)
  To: hari; +Cc: ebiederm, r3pek, fastboot, linux-kernel

On Thu, 4 Mar 2004 18:33:10 +0530 Hariprasad Nellitheertha wrote:

| Hello,
| 
| I recreated this on a UNI system running an SMP kernel as well. 
| 
| The problem is because we now initialize cpu_vm_mask for init_mm with 
| CPU_MASK_ALL (from 2.6.3 onwards) which makes all bits in cpumask 1. 
| Hence BUG_ON(!cpus_equal(cpumask,tmp) fails. The change to set 
| cpu_vm_mask to CPU_MASK_ALL was done to remove tlb flush optimizations 
| for ppc64. On UNI kernels, CPU_MASK_ALL is 1 and hence the problem 
| does not occur.
| 
| I made a small patch which fixes this problem. The change is, essentially,
| to use "tmp" instead of "cpumask". This ensures that only the (other) online 
| cpus are sent the IPI. 
| 
| I have done some testing with this patch. Kexec loads fine and I haven't seen
| anything untoward. 

Yes, that does work well... Thanks for the patch.

Is this satisfactory for pushing into the mainline kernel, or
should kexec use another method to solve this problem?

--
~Randy


| Comments please.
| 
| Regards, Hari
| 
| 
| diff -Naur linux-2.6.3-before/arch/i386/kernel/smp.c linux-2.6.3/arch/i386/kernel/smp.c
| --- linux-2.6.3-before/arch/i386/kernel/smp.c	2004-02-18 09:27:15.000000000 +0530
| +++ linux-2.6.3/arch/i386/kernel/smp.c	2004-03-04 14:16:43.000000000 +0530
| @@ -356,7 +356,8 @@
|  	BUG_ON(cpus_empty(cpumask));
|  
|  	cpus_and(tmp, cpumask, cpu_online_map);
| -	BUG_ON(!cpus_equal(cpumask, tmp));
| +	if(cpus_empty(tmp))
| +		return;
|  	BUG_ON(cpu_isset(smp_processor_id(), cpumask));
|  	BUG_ON(!mm);
|  
| @@ -371,12 +372,12 @@
|  	flush_mm = mm;
|  	flush_va = va;
|  #if NR_CPUS <= BITS_PER_LONG
| -	atomic_set_mask(cpumask, &flush_cpumask);
| +	atomic_set_mask(tmp, &flush_cpumask);
|  #else
|  	{
|  		int k;
|  		unsigned long *flush_mask = (unsigned long *)&flush_cpumask;
| -		unsigned long *cpu_mask = (unsigned long *)&cpumask;
| +		unsigned long *cpu_mask = (unsigned long *)&tmp;
|  		for (k = 0; k < BITS_TO_LONGS(NR_CPUS); ++k)
|  			atomic_set_mask(cpu_mask[k], &flush_mask[k]);
|  	}
| @@ -385,7 +386,7 @@
|  	 * We have to send the IPI only to
|  	 * CPUs affected.
|  	 */
| -	send_IPI_mask(cpumask, INVALIDATE_TLB_VECTOR);
| +	send_IPI_mask(tmp, INVALIDATE_TLB_VECTOR);
|  
|  	while (!cpus_empty(flush_cpumask))
|  		/* nothing. lockup detection does not belong here */
| 
| 
| On Sat, Feb 28, 2004 at 03:41:33AM -0700, Eric W. Biederman wrote:
| > "Randy.Dunlap" <rddunlap@osdl.org> writes:
| > 
| > > On 27 Feb 2004 01:00:04 -0700 Eric W. Biederman wrote:
| > > 
| > > | > It works fine on 2.6.2.  It works for me on 2.6.3 if not SMP.
| > > | > If the kernel is built for SMP, when running kexec, I get a
| > > | > BUG in arch/i386/kernel/smp.c at line 359.
| > > | > I'm testing various workarounds for that BUG now.
| > > | 
| > > | I will eyeball it...
| > > | 
| > > | Is it the kernel that is shutting down, or the kernel that is being
| > > | brought up that has problems?
| > > 
| > > the kernel that is shutting down.
| > > 
| > > | The back trace from the BUG would be interesting.
| > > 
| > > see below.  my bad.  i should have included it.
| > > 
| > > | As I see it flush_tlb_others is being called when we have shutdown
| > > | cpus and the kernel still thinks we have the mm present on foreign
| > > | cpus.
| > > 
| > > Martin Bligh thinks that there is a tlb race here.
| > > I printed the 2 cpu masks on my dual-proc macine and saw
| > > 0 in one of them and 0xc in the other one.
| > 
| > Ouch we have both cpus running when this happens, and we have not
| > started any shutdown whatsoever.  This is the bit that sets up
| > the page tables for later use...
| > 
| > I think identity_map_pages will have problems with a kernel that does
| > the 4G/4G split, and it has known issues on some other architectures,
| > because they treat init_mm specially.  So the proper solution may be
| > to simply rewrite identity_map_pages. 
| > 
| > Before we do that in the short term we need to see if
| > identity_map_pages is actually doing anything bad.  You are
| > not using the 4G/4G split so that is not the cause.  So either
| > init_mm is now special in some way, or we have hit a generic kernel
| > bug.
| > 
| > So this may indeed be a tlb race.  But it is init_mm->cpu_vm_mask and
| > cpu_online map that are different.  With the implication being
| > that init_mm->cpu_vm_mask has cpus set that are not in cpu_online_map?
| > Very weird especially on SMP.
| > 
| > Without attribution I have a hard time making sense of which cpumask
| > is which so I can't draw any conclusions.  But I find it very
| > interesting that it is bits 2 and 3 that are set.  I wonder if
| > there is any mixup between logical cpu identities and apic ids.
| > 
| > Eric
| > _______________________________________________

| 
| -- 
| Hariprasad Nellitheertha
| Linux Technology Center
| India Software Labs
| IBM India, Bangalore
| -

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2004-03-08 18:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-02-24 16:03 Latest AIO patchset Suparna Bhattacharya
2004-02-24 11:02 ` kexec "problem" Carlos Silva
2004-02-24 17:10   ` Randy.Dunlap
2004-02-24 17:24     ` Carlos Silva
2004-02-27  0:54   ` kexec "problem" [and patch updates] Randy.Dunlap
2004-02-27  8:00     ` [Fastboot] " Eric W. Biederman
2004-02-27 19:32       ` Randy.Dunlap
2004-02-28 10:41         ` Eric W. Biederman
2004-03-04 13:03           ` Hariprasad Nellitheertha
2004-03-08  0:32             ` Eric W. Biederman
2004-03-08 18:35             ` Randy.Dunlap
2004-02-25 18:45 ` Latest AIO patchset Hayim Shaul
2004-02-26  0:27   ` Benjamin LaHaise
2004-02-26 13:30     ` Hayim Shaul
2004-02-26 16:45       ` Daniel McNeil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.