linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* divide error: bdi_dirty_limit+0x5a/0x9e
@ 2012-09-24 10:23 Borislav Petkov
  2012-09-24 10:38 ` Srivatsa S. Bhat
  2012-09-24 14:23 ` Jan Kara
  0 siblings, 2 replies; 25+ messages in thread
From: Borislav Petkov @ 2012-09-24 10:23 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Fengguang Wu, Jan Kara, Peter Zijlstra, Andrew Morton,
	Johannes Weiner, Conny Seidel

Hi all,

we're able to trigger the oops below when doing CPU hotplug tests.

Disassembling the code section of the oops gives

   0:   1a 00                   sbb    (%rax),%al
   2:   b8 64 00 00 00          mov    $0x64,%eax
   7:   2b 05 5c a4 28 01       sub    0x128a45c(%rip),%eax        # 0x128a469
   d:   be 64 00 00 00          mov    $0x64,%esi
  12:   31 d2                   xor    %edx,%edx
  14:   8b 7d e0                mov    -0x20(%rbp),%edi
  17:   48 0f af c3             imul   %rbx,%rax
  1b:   48 f7 f6                div    %rsi
  1e:   31 d2                   xor    %edx,%edx
  20:   48 89 c1                mov    %rax,%rcx
  23:   48 0f af 4d e8          imul   -0x18(%rbp),%rcx
  28:   48 89 c8                mov    %rcx,%rax
  2b:*  48 f7 f7                div    %rdi     <-- trapping instruction
  2e:   31 d2                   xor    %edx,%edx
  30:   48 89 c1                mov    %rax,%rcx
  33:   41 8b 84 24 4c 01 00    mov    0x14c(%r12),%eax
  3a:   00 
  3b:   48 0f af c3             imul   %rbx,%rax
  3f:   48                      rex.W

in bdi_dirty_limit. The .s file contains then (annotations mine):

.globl bdi_dirty_limit
        .type   bdi_dirty_limit, @function
bdi_dirty_limit:
        pushq   %rbp    #
        movq    %rsp, %rbp      #,
        pushq   %r12    #
        pushq   %rbx    #
        subq    $48, %rsp       #,
        call    mcount
        movq    %rsi, %rbx      # dirty, dirty
        leaq    -32(%rbp), %rcx #, tmp65
        leaq    -24(%rbp), %rdx #, tmp66
        leaq    280(%rdi), %rsi #, tmp67
        movq    %rdi, %r12      # bdi, bdi
        movq    $writeout_completions, %rdi     #,
        call    fprop_fraction_percpu   #
        movl    $100, %eax      #, tmp69
        subl    bdi_min_ratio(%rip), %eax       # bdi_min_ratio, tmp70
        movl    $100, %esi      #, tmp75
        xorl    %edx, %edx      #
        mov     -32(%rbp), %edi # denominator, denominator
        imulq   %rbx, %rax      # dirty, tmp71
        divq    %rsi    # tmp75
        xorl    %edx, %edx      #			# most-significant part of bdi_dirty is already zeroed here
        movq    %rax, %rcx      # tmp71, tmp73
        imulq   -24(%rbp), %rcx # numerator, tmp73	# bdi_dirty *= numerator
        movq    %rcx, %rax      # tmp73,		# move bdi_dirty in place for next insn
        divq    %rdi		# denominator		<--- TRAP
        xorl    %edx, %edx      #
        movq    %rax, %rcx      #, tmp78
	...

and from looking at the register dump below, the dividend, which should
be in %rdx:%rax is 0 and the divisor (denominator) we've got from
bdi_writeout_fraction and is in %rdi is also 0. Which is strange because
fprop_fraction_percpu guards for division by zero by setting denominator
to 1 if it were zero but what about the case where den > num? Can that
even happen?

And also, what happens if num is 0? Which it kinda is by looking at %rcx
where there's copy of it.

I'll let people more knowledgeable with the code explain what actually
happens.

unsigned long bdi_dirty_limit(struct backing_dev_info *bdi, unsigned long dirty)
{
        u64 bdi_dirty;
        long numerator, denominator;

        /*
         * Calculate this BDI's share of the dirty ratio.
         */
        bdi_writeout_fraction(bdi, &numerator, &denominator);

        bdi_dirty = (dirty * (100 - bdi_min_ratio)) / 100;
        bdi_dirty *= numerator;
        do_div(bdi_dirty, denominator);

        bdi_dirty += (dirty * bdi->min_ratio) / 100;
        if (bdi_dirty > (dirty * bdi->max_ratio) / 100)
                bdi_dirty = dirty * bdi->max_ratio / 100;

        return bdi_dirty;
}

Sep 23 17:41:08 lemure kernel: [ 381.245776] divide error: 0000 [#1] SMP
Sep 23 17:41:08 lemure kernel: [ 381.249725] Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave i2c_piix4 tpm_tis rtc_cmos powernow_k8 tpm fam15
h_power k10temp tpm_bios mperf serio_raw crc32c_intel ghash_clmulni_intel
Sep 23 17:41:08 lemure kernel: [ 381.268531] CPU 0
Sep 23 17:41:08 lemure kernel: [ 381.270377] Pid: 6644, comm: flush-8:0 Not tainted 3.6.0-rc6-e5e77cf9-linus+ #1 AMD
Sep 23 17:41:08 lemure kernel: [ 381.279067] RIP: 0010:[<ffffffff810e8bc2>] [<ffffffff810e8bc2>] bdi_dirty_limit+0x5a/0x9e
Sep 23 17:41:08 lemure kernel: [ 381.287330] RSP: 0018:ffff88041ad03d40 EFLAGS: 00010246
Sep 23 17:41:08 lemure kernel: [ 381.292631] RAX: 0000000000000000 RBX: 00000000000621c3 RCX: 0000000000000000
Sep 23 17:41:08 lemure kernel: [ 381.299751] RDX: 0000000000000000 RSI: 0000000000000064 RDI: 0000000000000000
Sep 23 17:41:08 lemure kernel: [ 381.306870] RBP: ffff88041ad03d80 R08: 0000000000000008 R09: ffffffff8211e520
Sep 23 17:41:08 lemure kernel: [ 381.313989] R10: ffff88041ad03d10 R11: ffff88041ad03d10 R12: ffff88041a2d0158
Sep 23 17:41:08 lemure kernel: [ 381.321109] R13: ffff88041a2d0158 R14: ffff88041a2d02b0 R15: 0000000000000000
Sep 23 17:41:08 lemure kernel: [ 381.328228] FS: 00007f3db8ea7700(0000) GS:ffff88042ec00000(0000) knlGS:0000000000000000
Sep 23 17:41:08 lemure kernel: [ 381.336298] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 23 17:41:08 lemure kernel: [ 381.342034] CR2: 0000000000d84270 CR3: 0000000418ce4000 CR4: 00000000000407f0
Sep 23 17:41:08 lemure kernel: [ 381.349151] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 23 17:41:08 lemure kernel: [ 381.356263] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 23 17:41:08 lemure kernel: [ 381.363384] Process flush-8:0 (pid: 6644, threadinfo ffff88041ad02000, task ffff8804198826c0)
Sep 23 17:41:08 lemure kernel: [ 381.371884] Stack:
Sep 23 17:41:08 lemure kernel: [ 381.373890] ffff88041ad03d80 ffffffff810e8e7a 0000000100013eb3 0000000000000000
Sep 23 17:41:08 lemure kernel: [ 381.381330] 2000000000000000 0000000000000000 fffffffffffffff7 0000000000000000
Sep 23 17:41:08 lemure kernel: [ 381.388769] ffff88041ad03dc0 ffffffff8114f9bd 000000010000c983 00000000000c4386
Sep 23 17:41:08 lemure kernel: [ 381.396208] Call Trace:
Sep 23 17:41:09 lemure kernel: [ 381.398654] [<ffffffff810e8e7a>] ? global_dirty_limits+0x3c/0x130
Sep 23 17:41:09 lemure kernel: [ 381.404823] [<ffffffff8114f9bd>] over_bground_thresh+0x5c/0x76
Sep 23 17:41:09 lemure kernel: [ 381.410729] [<ffffffff811503aa>] wb_do_writeback+0x193/0x1e9
Sep 23 17:41:09 lemure kernel: [ 381.416464] [<ffffffff811504ca>] bdi_writeback_thread+0xca/0x1ec
Sep 23 17:41:09 lemure kernel: [ 381.422545] [<ffffffff81150400>] ? wb_do_writeback+0x1e9/0x1e9
Sep 23 17:41:09 lemure kernel: [ 381.428455] [<ffffffff8105e75b>] kthread+0x8d/0x95
Sep 23 17:41:09 lemure kernel: [ 381.433323] [<ffffffff81940474>] kernel_thread_helper+0x4/0x10
Sep 23 17:41:09 lemure kernel: [ 381.439231] [<ffffffff8105e6ce>] ? kthread_freezable_should_stop+0x62/0x62
Sep 23 17:41:09 lemure kernel: [ 381.446178] [<ffffffff81940470>] ? gs_change+0xb/0xb
Sep 23 17:41:09 lemure kernel: [ 381.451217] Code: 1a 00 b8 64 00 00 00 2b 05 5c a4 28 01 be 64 00 00 00 31 d2 8b 7d e0 48 0f af c3 48 f7 f6 31 d2 48 89 c1 48 0f af 4d e8 48 89 c8 <48> f7 f7 31 d2 48 89 c1 41 8b 84 24 4c 01 00 00 48 0f af c3 48
Sep 23 17:41:10 lemure kernel: [ 381.471131] RIP [<ffffffff810e8bc2>] bdi_dirty_limit+0x5a/0x9e
Sep 23 17:41:10 lemure kernel: [ 381.477057] RSP <ffff88041ad03d40>
Sep 23 17:41:10 lemure kernel: [ 381.480604] ---[ end trace 703f173ed75f76a9 ]---

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 10:23 divide error: bdi_dirty_limit+0x5a/0x9e Borislav Petkov
@ 2012-09-24 10:38 ` Srivatsa S. Bhat
  2012-09-24 11:05   ` Borislav Petkov
  2012-09-24 14:23 ` Jan Kara
  1 sibling, 1 reply; 25+ messages in thread
From: Srivatsa S. Bhat @ 2012-09-24 10:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-mm, linux-kernel, Fengguang Wu, Jan Kara, Peter Zijlstra,
	Andrew Morton, Johannes Weiner, Conny Seidel

On 09/24/2012 03:53 PM, Borislav Petkov wrote:
> Hi all,
> 
> we're able to trigger the oops below when doing CPU hotplug tests.
> 

I hit this problem as well, which I reported here, a few days ago:
https://lkml.org/lkml/2012/9/13/222

<snip>

> 	...
> 
> and from looking at the register dump below, the dividend, which should
> be in %rdx:%rax is 0 and the divisor (denominator) we've got from
> bdi_writeout_fraction and is in %rdi is also 0. Which is strange because
> fprop_fraction_percpu guards for division by zero by setting denominator
> to 1 if it were zero but what about the case where den > num? Can that
> even happen?
> 
> And also, what happens if num is 0? Which it kinda is by looking at %rcx
> where there's copy of it.
>

Going by the usage of percpu_counter_read_positive() (which is used to get
both the values of num and den), the least value that num or den can have
is zero. So, the C code to guard against divide-by-zero looks OK to me.
Which unfortunately keeps the mystery unsolved :(

Regards,
Srivatsa S. Bhat
 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 10:38 ` Srivatsa S. Bhat
@ 2012-09-24 11:05   ` Borislav Petkov
  2012-09-24 11:13     ` Srivatsa S. Bhat
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2012-09-24 11:05 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Borislav Petkov, linux-mm, linux-kernel, Fengguang Wu, Jan Kara,
	Peter Zijlstra, Andrew Morton, Johannes Weiner, Conny Seidel

On Mon, Sep 24, 2012 at 04:08:33PM +0530, Srivatsa S. Bhat wrote:
> On 09/24/2012 03:53 PM, Borislav Petkov wrote:
> > Hi all,
> > 
> > we're able to trigger the oops below when doing CPU hotplug tests.
> > 
> 
> I hit this problem as well, which I reported here, a few days ago:
> https://lkml.org/lkml/2012/9/13/222

Ok, your case shows even more info:

[  526.024180] divide error: 0000 [#1] SMP 
[  526.028144] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf fuse loop dm_mod iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm cdc_ether pcspkr usbnet shpchp pci_hotplug i2c_i801 i2c_core ioatdma mii crc32c_intel serio_raw microcode lpc_ich mfd_core i7core_edac bnx2 dca edac_core tpm_tis tpm sg tpm_bios rtc_cmos button uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[  526.028145] CPU 9 
[  526.028145] Pid: 2235, comm: flush-8:0 Not tainted 3.6.0-rc1-tglx-hotplug-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033 
[  526.028145] RIP: 0010:[<ffffffff811276f6>]  [<ffffffff811276f6>] bdi_dirty_limit+0x66/0xc0
[  526.028145] RSP: 0018:ffff8811530bfcc0  EFLAGS: 00010206
[  526.028145] RAX: 0000000000b9877e RBX: 00000000001a8112 RCX: 28f5c28f5c28f5c3
[  526.028145] RDX: 0000000000000000 RSI: 0000000000b9877e RDI: 0000000000000000

%rax contains something != 0 but %rdi definitely is 0.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 11:05   ` Borislav Petkov
@ 2012-09-24 11:13     ` Srivatsa S. Bhat
  2012-09-24 11:34       ` Fengguang Wu
  0 siblings, 1 reply; 25+ messages in thread
From: Srivatsa S. Bhat @ 2012-09-24 11:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-mm, linux-kernel, Fengguang Wu, Jan Kara, Peter Zijlstra,
	Andrew Morton, Johannes Weiner, Conny Seidel, Paul E. McKenney

On 09/24/2012 04:35 PM, Borislav Petkov wrote:
> On Mon, Sep 24, 2012 at 04:08:33PM +0530, Srivatsa S. Bhat wrote:
>> On 09/24/2012 03:53 PM, Borislav Petkov wrote:
>>> Hi all,
>>>
>>> we're able to trigger the oops below when doing CPU hotplug tests.
>>>
>>
>> I hit this problem as well, which I reported here, a few days ago:
>> https://lkml.org/lkml/2012/9/13/222
> 
> Ok, your case shows even more info:
> 
> [  526.024180] divide error: 0000 [#1] SMP 
> [  526.028144] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf fuse loop dm_mod iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm cdc_ether pcspkr usbnet shpchp pci_hotplug i2c_i801 i2c_core ioatdma mii crc32c_intel serio_raw microcode lpc_ich mfd_core i7core_edac bnx2 dca edac_core tpm_tis tpm sg tpm_bios rtc_cmos button uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
> [  526.028145] CPU 9 
> [  526.028145] Pid: 2235, comm: flush-8:0 Not tainted 3.6.0-rc1-tglx-hotplug-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033 
> [  526.028145] RIP: 0010:[<ffffffff811276f6>]  [<ffffffff811276f6>] bdi_dirty_limit+0x66/0xc0
> [  526.028145] RSP: 0018:ffff8811530bfcc0  EFLAGS: 00010206
> [  526.028145] RAX: 0000000000b9877e RBX: 00000000001a8112 RCX: 28f5c28f5c28f5c3
> [  526.028145] RDX: 0000000000000000 RSI: 0000000000b9877e RDI: 0000000000000000
> 
> %rax contains something != 0 but %rdi definitely is 0.
> 

Yep.. So I tried putting a BUG_ON(!den) in fprop_fraction_percpu() to
catch if we really got the code wrong somehow.. but unfortunately, with
that added, I haven't been successful in reproducing the bug :(

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 11:13     ` Srivatsa S. Bhat
@ 2012-09-24 11:34       ` Fengguang Wu
  2012-09-24 11:51         ` Srivatsa S. Bhat
  2012-09-24 12:20         ` Borislav Petkov
  0 siblings, 2 replies; 25+ messages in thread
From: Fengguang Wu @ 2012-09-24 11:34 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Borislav Petkov, linux-mm, linux-kernel, Jan Kara,
	Peter Zijlstra, Andrew Morton, Johannes Weiner, Conny Seidel,
	Paul E. McKenney

On Mon, Sep 24, 2012 at 04:43:11PM +0530, Srivatsa S. Bhat wrote:
> On 09/24/2012 04:35 PM, Borislav Petkov wrote:
> > On Mon, Sep 24, 2012 at 04:08:33PM +0530, Srivatsa S. Bhat wrote:
> >> On 09/24/2012 03:53 PM, Borislav Petkov wrote:
> >>> Hi all,
> >>>
> >>> we're able to trigger the oops below when doing CPU hotplug tests.
> >>>
> >>
> >> I hit this problem as well, which I reported here, a few days ago:
> >> https://lkml.org/lkml/2012/9/13/222
> > 
> > Ok, your case shows even more info:
> > 
> > [  526.024180] divide error: 0000 [#1] SMP 
> > [  526.028144] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf fuse loop dm_mod iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm cdc_ether pcspkr usbnet shpchp pci_hotplug i2c_i801 i2c_core ioatdma mii crc32c_intel serio_raw microcode lpc_ich mfd_core i7core_edac bnx2 dca edac_core tpm_tis tpm sg tpm_bios rtc_cmos button uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
> > [  526.028145] CPU 9 
> > [  526.028145] Pid: 2235, comm: flush-8:0 Not tainted 3.6.0-rc1-tglx-hotplug-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033 
> > [  526.028145] RIP: 0010:[<ffffffff811276f6>]  [<ffffffff811276f6>] bdi_dirty_limit+0x66/0xc0
> > [  526.028145] RSP: 0018:ffff8811530bfcc0  EFLAGS: 00010206
> > [  526.028145] RAX: 0000000000b9877e RBX: 00000000001a8112 RCX: 28f5c28f5c28f5c3
> > [  526.028145] RDX: 0000000000000000 RSI: 0000000000b9877e RDI: 0000000000000000
> > 
> > %rax contains something != 0 but %rdi definitely is 0.
> > 
> 
> Yep.. So I tried putting a BUG_ON(!den) in fprop_fraction_percpu() to
> catch if we really got the code wrong somehow.. but unfortunately, with
> that added, I haven't been successful in reproducing the bug :(

Will you test such a line? At least the generic do_div() only uses the
lower 32bits for division.

        WARN_ON(!(den & 0xffffffff));

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 11:34       ` Fengguang Wu
@ 2012-09-24 11:51         ` Srivatsa S. Bhat
  2012-09-24 12:20         ` Borislav Petkov
  1 sibling, 0 replies; 25+ messages in thread
From: Srivatsa S. Bhat @ 2012-09-24 11:51 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Borislav Petkov, linux-mm, linux-kernel, Jan Kara,
	Peter Zijlstra, Andrew Morton, Johannes Weiner, Conny Seidel,
	Paul E. McKenney

On 09/24/2012 05:04 PM, Fengguang Wu wrote:
> On Mon, Sep 24, 2012 at 04:43:11PM +0530, Srivatsa S. Bhat wrote:
>> On 09/24/2012 04:35 PM, Borislav Petkov wrote:
>>> On Mon, Sep 24, 2012 at 04:08:33PM +0530, Srivatsa S. Bhat wrote:
>>>> On 09/24/2012 03:53 PM, Borislav Petkov wrote:
>>>>> Hi all,
>>>>>
>>>>> we're able to trigger the oops below when doing CPU hotplug tests.
>>>>>
>>>>
>>>> I hit this problem as well, which I reported here, a few days ago:
>>>> https://lkml.org/lkml/2012/9/13/222
>>>
>>> Ok, your case shows even more info:
>>>
>>> [  526.024180] divide error: 0000 [#1] SMP 
>>> [  526.028144] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf fuse loop dm_mod iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm cdc_ether pcspkr usbnet shpchp pci_hotplug i2c_i801 i2c_core ioatdma mii crc32c_intel serio_raw microcode lpc_ich mfd_core i7core_edac bnx2 dca edac_core tpm_tis tpm sg tpm_bios rtc_cmos button uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
>>> [  526.028145] CPU 9 
>>> [  526.028145] Pid: 2235, comm: flush-8:0 Not tainted 3.6.0-rc1-tglx-hotplug-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033 
>>> [  526.028145] RIP: 0010:[<ffffffff811276f6>]  [<ffffffff811276f6>] bdi_dirty_limit+0x66/0xc0
>>> [  526.028145] RSP: 0018:ffff8811530bfcc0  EFLAGS: 00010206
>>> [  526.028145] RAX: 0000000000b9877e RBX: 00000000001a8112 RCX: 28f5c28f5c28f5c3
>>> [  526.028145] RDX: 0000000000000000 RSI: 0000000000b9877e RDI: 0000000000000000
>>>
>>> %rax contains something != 0 but %rdi definitely is 0.
>>>
>>
>> Yep.. So I tried putting a BUG_ON(!den) in fprop_fraction_percpu() to
>> catch if we really got the code wrong somehow.. but unfortunately, with
>> that added, I haven't been successful in reproducing the bug :(
> 
> Will you test such a line? At least the generic do_div() only uses the
> lower 32bits for division.
> 
>         WARN_ON(!(den & 0xffffffff));
> 

Sure, I'll test this and report back. Thanks for the suggestion!

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 11:34       ` Fengguang Wu
  2012-09-24 11:51         ` Srivatsa S. Bhat
@ 2012-09-24 12:20         ` Borislav Petkov
  2012-09-24 12:29           ` Fengguang Wu
  1 sibling, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2012-09-24 12:20 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Srivatsa S. Bhat, Borislav Petkov, linux-mm, linux-kernel,
	Jan Kara, Peter Zijlstra, Andrew Morton, Johannes Weiner,
	Conny Seidel, Paul E. McKenney

On Mon, Sep 24, 2012 at 07:34:47PM +0800, Fengguang Wu wrote:
> Will you test such a line? At least the generic do_div() only uses the
> lower 32bits for division.
> 
>         WARN_ON(!(den & 0xffffffff));

But, but, the asm output says:

  28:   48 89 c8                mov    %rcx,%rax
  2b:*  48 f7 f7                div    %rdi     <-- trapping instruction
  2e:   31 d2                   xor    %edx,%edx

and this version of DIV does an unsigned division of RDX:RAX by the
contents of a *64-bit register* ... in our case %rdi.

Srivatsa's oops  shows the same:

  28:   48 89 f0                mov    %rsi,%rax
  2b:*  48 f7 f7                div    %rdi     <-- trapping instruction
  2e:   41 8b 94 24 74 02 00    mov    0x274(%r12),%edx

Right?

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 12:20         ` Borislav Petkov
@ 2012-09-24 12:29           ` Fengguang Wu
  2012-09-24 12:56             ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Fengguang Wu @ 2012-09-24 12:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Srivatsa S. Bhat, linux-mm, linux-kernel, Jan Kara,
	Peter Zijlstra, Andrew Morton, Johannes Weiner, Conny Seidel,
	Paul E. McKenney

On Mon, Sep 24, 2012 at 02:20:53PM +0200, Borislav Petkov wrote:
> On Mon, Sep 24, 2012 at 07:34:47PM +0800, Fengguang Wu wrote:
> > Will you test such a line? At least the generic do_div() only uses the
> > lower 32bits for division.
> > 
> >         WARN_ON(!(den & 0xffffffff));
> 
> But, but, the asm output says:
> 
>   28:   48 89 c8                mov    %rcx,%rax
>   2b:*  48 f7 f7                div    %rdi     <-- trapping instruction
>   2e:   31 d2                   xor    %edx,%edx
> 
> and this version of DIV does an unsigned division of RDX:RAX by the
> contents of a *64-bit register* ... in our case %rdi.
> 
> Srivatsa's oops  shows the same:
> 
>   28:   48 89 f0                mov    %rsi,%rax
>   2b:*  48 f7 f7                div    %rdi     <-- trapping instruction
>   2e:   41 8b 94 24 74 02 00    mov    0x274(%r12),%edx
> 
> Right?

Right, that's why I said "at least". As for x86, I'm as clueless as you..

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 12:29           ` Fengguang Wu
@ 2012-09-24 12:56             ` Borislav Petkov
  2012-09-24 18:54               ` Srivatsa S. Bhat
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2012-09-24 12:56 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Srivatsa S. Bhat, linux-mm, linux-kernel, Jan Kara,
	Peter Zijlstra, Andrew Morton, Johannes Weiner, Conny Seidel,
	Paul E. McKenney

On Mon, Sep 24, 2012 at 08:29:00PM +0800, Fengguang Wu wrote:
> On Mon, Sep 24, 2012 at 02:20:53PM +0200, Borislav Petkov wrote:
> > On Mon, Sep 24, 2012 at 07:34:47PM +0800, Fengguang Wu wrote:
> > > Will you test such a line? At least the generic do_div() only uses the
> > > lower 32bits for division.
> > > 
> > >         WARN_ON(!(den & 0xffffffff));
> > 
> > But, but, the asm output says:
> > 
> >   28:   48 89 c8                mov    %rcx,%rax
> >   2b:*  48 f7 f7                div    %rdi     <-- trapping instruction
> >   2e:   31 d2                   xor    %edx,%edx
> > 
> > and this version of DIV does an unsigned division of RDX:RAX by the
> > contents of a *64-bit register* ... in our case %rdi.
> > 
> > Srivatsa's oops  shows the same:
> > 
> >   28:   48 89 f0                mov    %rsi,%rax
> >   2b:*  48 f7 f7                div    %rdi     <-- trapping instruction
> >   2e:   41 8b 94 24 74 02 00    mov    0x274(%r12),%edx
> > 
> > Right?
> 
> Right, that's why I said "at least". As for x86, I'm as clueless as you..

Right, both oopses are on x86 so I don't think it is the bitness of the
division.

Another thing those two have in common is that both happen when a CPU
comes online. Srivatsa's is when CPU9 comes online (oops is detected on
CPU9) and in our case CPU4 comes online but the oops says CPU0.

So it has to be hotplug-related.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 10:23 divide error: bdi_dirty_limit+0x5a/0x9e Borislav Petkov
  2012-09-24 10:38 ` Srivatsa S. Bhat
@ 2012-09-24 14:23 ` Jan Kara
  2012-09-24 14:36   ` Borislav Petkov
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Kara @ 2012-09-24 14:23 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-mm, linux-kernel, Fengguang Wu, Jan Kara, Peter Zijlstra,
	Andrew Morton, Johannes Weiner, Conny Seidel

[-- Attachment #1: Type: text/plain, Size: 8821 bytes --]

  Hello,

On Mon 24-09-12 12:23:24, Borislav Petkov wrote:
> we're able to trigger the oops below when doing CPU hotplug tests.
  Thanks for detailed report.

> Disassembling the code section of the oops gives
> 
>    0:   1a 00                   sbb    (%rax),%al
>    2:   b8 64 00 00 00          mov    $0x64,%eax
>    7:   2b 05 5c a4 28 01       sub    0x128a45c(%rip),%eax        # 0x128a469
>    d:   be 64 00 00 00          mov    $0x64,%esi
>   12:   31 d2                   xor    %edx,%edx
>   14:   8b 7d e0                mov    -0x20(%rbp),%edi
>   17:   48 0f af c3             imul   %rbx,%rax
>   1b:   48 f7 f6                div    %rsi
>   1e:   31 d2                   xor    %edx,%edx
>   20:   48 89 c1                mov    %rax,%rcx
>   23:   48 0f af 4d e8          imul   -0x18(%rbp),%rcx
>   28:   48 89 c8                mov    %rcx,%rax
>   2b:*  48 f7 f7                div    %rdi     <-- trapping instruction
>   2e:   31 d2                   xor    %edx,%edx
>   30:   48 89 c1                mov    %rax,%rcx
>   33:   41 8b 84 24 4c 01 00    mov    0x14c(%r12),%eax
>   3a:   00 
>   3b:   48 0f af c3             imul   %rbx,%rax
>   3f:   48                      rex.W
> 
> in bdi_dirty_limit. The .s file contains then (annotations mine):
> 
> .globl bdi_dirty_limit
>         .type   bdi_dirty_limit, @function
> bdi_dirty_limit:
>         pushq   %rbp    #
>         movq    %rsp, %rbp      #,
>         pushq   %r12    #
>         pushq   %rbx    #
>         subq    $48, %rsp       #,
>         call    mcount
>         movq    %rsi, %rbx      # dirty, dirty
>         leaq    -32(%rbp), %rcx #, tmp65
>         leaq    -24(%rbp), %rdx #, tmp66
>         leaq    280(%rdi), %rsi #, tmp67
>         movq    %rdi, %r12      # bdi, bdi
>         movq    $writeout_completions, %rdi     #,
>         call    fprop_fraction_percpu   #
>         movl    $100, %eax      #, tmp69
>         subl    bdi_min_ratio(%rip), %eax       # bdi_min_ratio, tmp70
>         movl    $100, %esi      #, tmp75
>         xorl    %edx, %edx      #
>         mov     -32(%rbp), %edi # denominator, denominator
>         imulq   %rbx, %rax      # dirty, tmp71
>         divq    %rsi    # tmp75
>         xorl    %edx, %edx      #			# most-significant part of bdi_dirty is already zeroed here
>         movq    %rax, %rcx      # tmp71, tmp73
>         imulq   -24(%rbp), %rcx # numerator, tmp73	# bdi_dirty *= numerator
>         movq    %rcx, %rax      # tmp73,		# move bdi_dirty in place for next insn
>         divq    %rdi		# denominator		<--- TRAP
>         xorl    %edx, %edx      #
>         movq    %rax, %rcx      #, tmp78
> 	...
> 
> and from looking at the register dump below, the dividend, which should
> be in %rdx:%rax is 0 and the divisor (denominator) we've got from
> bdi_writeout_fraction and is in %rdi is also 0. Which is strange because
> fprop_fraction_percpu guards for division by zero by setting denominator
> to 1 if it were zero but what about the case where den > num? Can that
> even happen?
> 
> And also, what happens if num is 0? Which it kinda is by looking at %rcx
> where there's copy of it.
  fprop_fraction_percpu() does:
        do {
                seq = read_seqcount_begin(&p->sequence);
                fprop_reflect_period_percpu(p, pl);
                num = percpu_counter_read_positive(&pl->events);
                den = percpu_counter_read_positive(&p->events);
        } while (read_seqcount_retry(&p->sequence, seq));

        /*
         * Make fraction <= 1 and denominator > 0 even in presence of
         * percpu
         * counter errors
         */
        if (den <= num) {
                if (num)
                        den = num;
                else
                        den = 1;
        }
        *denominator = den;
        *numerator = num;

  So after initial loop, num and den are >= 0 because
percpu_counter_read_positive() asserts that. If den == 0, then the
condition is true and thus we always set den to value >= 1. So at least in
the theoretical model of computation what you observe cannot happen :).

  Because of use of percpu_counter_read_positive() it also doesn't seem like
some catch with sign extension (we always deal with non-negative numbers)
and because you are on a 64-bit machine, s64 fits into long without.
However, do_div() assumes divisor is 32-bit and we can indeed observe that
in the disassembly where we prepare the divisor as:
         mov     -32(%rbp), %edi # denominator, denominator
(32-bit move insn used). I'm not quite sure if I read the stack in the dump
correctly but -32(%rbp) seems to be 0x2000000000000000 which would fit what
we see.

But I'm currently at a loss how (1 << 61) got to
writeout_completions->events->counter. Either it could be some memory
corruption (unlikely since more people see this) or there's a bug lurking
somewhere. Hum, maybe it could be a sign issue after all. Can you try
running with attached patch? Does it report anything?

> Sep 23 17:41:08 lemure kernel: [ 381.245776] divide error: 0000 [#1] SMP
> Sep 23 17:41:08 lemure kernel: [ 381.249725] Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave i2c_piix4 tpm_tis rtc_cmos powernow_k8 tpm fam15
> h_power k10temp tpm_bios mperf serio_raw crc32c_intel ghash_clmulni_intel
> Sep 23 17:41:08 lemure kernel: [ 381.268531] CPU 0
> Sep 23 17:41:08 lemure kernel: [ 381.270377] Pid: 6644, comm: flush-8:0 Not tainted 3.6.0-rc6-e5e77cf9-linus+ #1 AMD
> Sep 23 17:41:08 lemure kernel: [ 381.279067] RIP: 0010:[<ffffffff810e8bc2>] [<ffffffff810e8bc2>] bdi_dirty_limit+0x5a/0x9e
> Sep 23 17:41:08 lemure kernel: [ 381.287330] RSP: 0018:ffff88041ad03d40 EFLAGS: 00010246
> Sep 23 17:41:08 lemure kernel: [ 381.292631] RAX: 0000000000000000 RBX: 00000000000621c3 RCX: 0000000000000000
> Sep 23 17:41:08 lemure kernel: [ 381.299751] RDX: 0000000000000000 RSI: 0000000000000064 RDI: 0000000000000000
> Sep 23 17:41:08 lemure kernel: [ 381.306870] RBP: ffff88041ad03d80 R08: 0000000000000008 R09: ffffffff8211e520
> Sep 23 17:41:08 lemure kernel: [ 381.313989] R10: ffff88041ad03d10 R11: ffff88041ad03d10 R12: ffff88041a2d0158
> Sep 23 17:41:08 lemure kernel: [ 381.321109] R13: ffff88041a2d0158 R14: ffff88041a2d02b0 R15: 0000000000000000
> Sep 23 17:41:08 lemure kernel: [ 381.328228] FS: 00007f3db8ea7700(0000) GS:ffff88042ec00000(0000) knlGS:0000000000000000
> Sep 23 17:41:08 lemure kernel: [ 381.336298] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Sep 23 17:41:08 lemure kernel: [ 381.342034] CR2: 0000000000d84270 CR3: 0000000418ce4000 CR4: 00000000000407f0
> Sep 23 17:41:08 lemure kernel: [ 381.349151] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep 23 17:41:08 lemure kernel: [ 381.356263] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Sep 23 17:41:08 lemure kernel: [ 381.363384] Process flush-8:0 (pid: 6644, threadinfo ffff88041ad02000, task ffff8804198826c0)
> Sep 23 17:41:08 lemure kernel: [ 381.371884] Stack:
> Sep 23 17:41:08 lemure kernel: [ 381.373890] ffff88041ad03d80 ffffffff810e8e7a 0000000100013eb3 0000000000000000
> Sep 23 17:41:08 lemure kernel: [ 381.381330] 2000000000000000 0000000000000000 fffffffffffffff7 0000000000000000
> Sep 23 17:41:08 lemure kernel: [ 381.388769] ffff88041ad03dc0 ffffffff8114f9bd 000000010000c983 00000000000c4386
> Sep 23 17:41:08 lemure kernel: [ 381.396208] Call Trace:
> Sep 23 17:41:09 lemure kernel: [ 381.398654] [<ffffffff810e8e7a>] ? global_dirty_limits+0x3c/0x130
> Sep 23 17:41:09 lemure kernel: [ 381.404823] [<ffffffff8114f9bd>] over_bground_thresh+0x5c/0x76
> Sep 23 17:41:09 lemure kernel: [ 381.410729] [<ffffffff811503aa>] wb_do_writeback+0x193/0x1e9
> Sep 23 17:41:09 lemure kernel: [ 381.416464] [<ffffffff811504ca>] bdi_writeback_thread+0xca/0x1ec
> Sep 23 17:41:09 lemure kernel: [ 381.422545] [<ffffffff81150400>] ? wb_do_writeback+0x1e9/0x1e9
> Sep 23 17:41:09 lemure kernel: [ 381.428455] [<ffffffff8105e75b>] kthread+0x8d/0x95
> Sep 23 17:41:09 lemure kernel: [ 381.433323] [<ffffffff81940474>] kernel_thread_helper+0x4/0x10
> Sep 23 17:41:09 lemure kernel: [ 381.439231] [<ffffffff8105e6ce>] ? kthread_freezable_should_stop+0x62/0x62
> Sep 23 17:41:09 lemure kernel: [ 381.446178] [<ffffffff81940470>] ? gs_change+0xb/0xb
> Sep 23 17:41:09 lemure kernel: [ 381.451217] Code: 1a 00 b8 64 00 00 00 2b 05 5c a4 28 01 be 64 00 00 00 31 d2 8b 7d e0 48 0f af c3 48 f7 f6 31 d2 48 89 c1 48 0f af 4d e8 48 89 c8 <48> f7 f7 31 d2 48 89 c1 41 8b 84 24 4c 01 00 00 48 0f af c3 48
> Sep 23 17:41:10 lemure kernel: [ 381.471131] RIP [<ffffffff810e8bc2>] bdi_dirty_limit+0x5a/0x9e
> Sep 23 17:41:10 lemure kernel: [ 381.477057] RSP <ffff88041ad03d40>
> Sep 23 17:41:10 lemure kernel: [ 381.480604] ---[ end trace 703f173ed75f76a9 ]---
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-lib-Debug-flex-proportions-code.patch --]
[-- Type: text/x-patch, Size: 877 bytes --]

>From dd0947226a0d5868ba0c2b8808162898396035b7 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Mon, 24 Sep 2012 16:17:16 +0200
Subject: [PATCH] lib: Debug flex proportions code

Signed-off-by: Jan Kara <jack@suse.cz>
---
 lib/flex_proportions.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/lib/flex_proportions.c b/lib/flex_proportions.c
index c785554..f88f793 100644
--- a/lib/flex_proportions.c
+++ b/lib/flex_proportions.c
@@ -62,11 +62,13 @@ void fprop_global_destroy(struct fprop_global *p)
  */
 bool fprop_new_period(struct fprop_global *p, int periods)
 {
-	u64 events;
+	s64 events;
 	unsigned long flags;
 
 	local_irq_save(flags);
 	events = percpu_counter_sum(&p->events);
+	if (events < 0)
+		printk("Got negative events: %lld\n", (long long)events);
 	/*
 	 * Don't do anything if there are no events.
 	 */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 14:23 ` Jan Kara
@ 2012-09-24 14:36   ` Borislav Petkov
  2012-09-24 18:16     ` Conny Seidel
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2012-09-24 14:36 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-kernel, Fengguang Wu, Peter Zijlstra,
	Andrew Morton, Johannes Weiner, Conny Seidel

On Mon, Sep 24, 2012 at 04:23:05PM +0200, Jan Kara wrote:
>   fprop_fraction_percpu() does:
>         do {
>                 seq = read_seqcount_begin(&p->sequence);
>                 fprop_reflect_period_percpu(p, pl);
>                 num = percpu_counter_read_positive(&pl->events);
>                 den = percpu_counter_read_positive(&p->events);
>         } while (read_seqcount_retry(&p->sequence, seq));
> 
>         /*
>          * Make fraction <= 1 and denominator > 0 even in presence of
>          * percpu
>          * counter errors
>          */
>         if (den <= num) {
>                 if (num)
>                         den = num;
>                 else
>                         den = 1;
>         }
>         *denominator = den;
>         *numerator = num;
> 
>   So after initial loop, num and den are >= 0 because
> percpu_counter_read_positive() asserts that. If den == 0, then the
> condition is true and thus we always set den to value >= 1. So at least in
> the theoretical model of computation what you observe cannot happen :).
> 
>   Because of use of percpu_counter_read_positive() it also doesn't seem like
> some catch with sign extension (we always deal with non-negative numbers)
> and because you are on a 64-bit machine, s64 fits into long without.
> However, do_div() assumes divisor is 32-bit and we can indeed observe that
> in the disassembly where we prepare the divisor as:
>          mov     -32(%rbp), %edi # denominator, denominator
> (32-bit move insn used). I'm not quite sure if I read the stack in the dump
> correctly but -32(%rbp) seems to be 0x2000000000000000 which would fit what
> we see.

Ok yes, I see exactly what you're saying. And the normalization code
in fprop_fraction_percpu above doesn't catch the large denominator
(0x2000000000000000) den > num case.

[ … ]

Conny, would you test pls?

> From dd0947226a0d5868ba0c2b8808162898396035b7 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Mon, 24 Sep 2012 16:17:16 +0200
> Subject: [PATCH] lib: Debug flex proportions code
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  lib/flex_proportions.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/lib/flex_proportions.c b/lib/flex_proportions.c
> index c785554..f88f793 100644
> --- a/lib/flex_proportions.c
> +++ b/lib/flex_proportions.c
> @@ -62,11 +62,13 @@ void fprop_global_destroy(struct fprop_global *p)
>   */
>  bool fprop_new_period(struct fprop_global *p, int periods)
>  {
> -	u64 events;
> +	s64 events;
>  	unsigned long flags;
>  
>  	local_irq_save(flags);
>  	events = percpu_counter_sum(&p->events);
> +	if (events < 0)
> +		printk("Got negative events: %lld\n", (long long)events);
>  	/*
>  	 * Don't do anything if there are no events.
>  	 */
> -- 
> 1.7.1
> 


-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 14:36   ` Borislav Petkov
@ 2012-09-24 18:16     ` Conny Seidel
  2012-09-24 18:19       ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Conny Seidel @ 2012-09-24 18:16 UTC (permalink / raw)
  To: Borislav Petkov, Jan Kara
  Cc: linux-mm, linux-kernel, Fengguang Wu, Peter Zijlstra,
	Andrew Morton, Johannes Weiner

[-- Attachment #1: Type: text/plain, Size: 1517 bytes --]

Hi,

On Mon, 24 Sep 2012 16:36:09 +0200
Borislav Petkov <bp@amd64.org> wrote:
>[ … ]
>
>Conny, would you test pls?

Sure thing.
Out of ~25 runs I only triggered it once, without the patch the
trigger-rate is higher.

[   55.098249] Broke affinity for irq 81
[   55.105108] smpboot: CPU 1 is now offline
[   55.311216] smpboot: Booting Node 0 Processor 1 APIC 0x11
[   55.333022] LVT offset 0 assigned for vector 0x400
[   55.545877] smpboot: CPU 2 is now offline
[   55.753050] smpboot: Booting Node 0 Processor 2 APIC 0x12
[   55.775582] LVT offset 0 assigned for vector 0x400
[   55.986747] smpboot: CPU 3 is now offline
[   56.193839] smpboot: Booting Node 0 Processor 3 APIC 0x13
[   56.212643] LVT offset 0 assigned for vector 0x400
[   56.423201] Got negative events: -25

The Divide error wasn't triggered with the patch applied.


--
Kind regards.

Conny Seidel

##################################################################
# Email : conny.seidel@amd.com            GnuPG-Key : 0xA6AB055D #
# Fingerprint: 17C4 5DB2 7C4C C1C7 1452 8148 F139 7C09 A6AB 055D #
##################################################################
# Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach      #
# General Managers: Alberto Bozzo                                #
# Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen #
#               HRB Nr. 43632                                    #
##################################################################

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 18:16     ` Conny Seidel
@ 2012-09-24 18:19       ` Borislav Petkov
  2012-09-24 18:48         ` Srivatsa S. Bhat
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2012-09-24 18:19 UTC (permalink / raw)
  To: Conny Seidel
  Cc: Jan Kara, linux-mm, linux-kernel, Fengguang Wu, Peter Zijlstra,
	Andrew Morton, Johannes Weiner

On Mon, Sep 24, 2012 at 08:16:50PM +0200, Conny Seidel wrote:
> Hi,
> 
> On Mon, 24 Sep 2012 16:36:09 +0200
> Borislav Petkov <bp@amd64.org> wrote:
> >[ … ]
> >
> >Conny, would you test pls?
> 
> Sure thing.
> Out of ~25 runs I only triggered it once, without the patch the
> trigger-rate is higher.
> 
> [   55.098249] Broke affinity for irq 81
> [   55.105108] smpboot: CPU 1 is now offline
> [   55.311216] smpboot: Booting Node 0 Processor 1 APIC 0x11
> [   55.333022] LVT offset 0 assigned for vector 0x400
> [   55.545877] smpboot: CPU 2 is now offline
> [   55.753050] smpboot: Booting Node 0 Processor 2 APIC 0x12
> [   55.775582] LVT offset 0 assigned for vector 0x400
> [   55.986747] smpboot: CPU 3 is now offline
> [   56.193839] smpboot: Booting Node 0 Processor 3 APIC 0x13
> [   56.212643] LVT offset 0 assigned for vector 0x400
> [   56.423201] Got negative events: -25

I see it:

__percpu_counter_sum does for_each_online_cpu without doing
get/put_online_cpus().

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 18:19       ` Borislav Petkov
@ 2012-09-24 18:48         ` Srivatsa S. Bhat
  2012-09-24 19:31           ` Borislav Petkov
  2012-09-24 20:48           ` [PATCH] CPU hotplug, writeback: Don't call writeback_set_ratelimit() too often during hotplug Srivatsa S. Bhat
  0 siblings, 2 replies; 25+ messages in thread
From: Srivatsa S. Bhat @ 2012-09-24 18:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Conny Seidel, Jan Kara, linux-mm, linux-kernel, Fengguang Wu,
	Peter Zijlstra, Andrew Morton, Johannes Weiner, Paul E. McKenney

On 09/24/2012 11:49 PM, Borislav Petkov wrote:
> On Mon, Sep 24, 2012 at 08:16:50PM +0200, Conny Seidel wrote:
>> Hi,
>>
>> On Mon, 24 Sep 2012 16:36:09 +0200
>> Borislav Petkov <bp@amd64.org> wrote:
>>> [ … ]
>>>
>>> Conny, would you test pls?
>>
>> Sure thing.
>> Out of ~25 runs I only triggered it once, without the patch the
>> trigger-rate is higher.
>>
>> [   55.098249] Broke affinity for irq 81
>> [   55.105108] smpboot: CPU 1 is now offline
>> [   55.311216] smpboot: Booting Node 0 Processor 1 APIC 0x11
>> [   55.333022] LVT offset 0 assigned for vector 0x400
>> [   55.545877] smpboot: CPU 2 is now offline
>> [   55.753050] smpboot: Booting Node 0 Processor 2 APIC 0x12
>> [   55.775582] LVT offset 0 assigned for vector 0x400
>> [   55.986747] smpboot: CPU 3 is now offline
>> [   56.193839] smpboot: Booting Node 0 Processor 3 APIC 0x13
>> [   56.212643] LVT offset 0 assigned for vector 0x400
>> [   56.423201] Got negative events: -25
> 
> I see it:
> 
> __percpu_counter_sum does for_each_online_cpu without doing
> get/put_online_cpus().
> 

Maybe I'm missing something, but that doesn't immediately tell me
what's the exact source of the bug.. Note that there is a hotplug
callback percpu_counter_hotcpu_callback() that takes the same
fbc->lock before updating/resetting the percpu counters of offline
CPU. So, though the synchronization is a bit weird, I don't
immediately see a problematic race condition there.

And, speaking of hotplug callbacks, on a slightly different note,
I see one defined as ratelimit_handler(), which calls
writeback_set_ratelimit() for *every single* state change in the
hotplug sequence! Is that really intentional? num_online_cpus()
changes its value only -once- for every hotplug :-)

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 12:56             ` Borislav Petkov
@ 2012-09-24 18:54               ` Srivatsa S. Bhat
  0 siblings, 0 replies; 25+ messages in thread
From: Srivatsa S. Bhat @ 2012-09-24 18:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Fengguang Wu, linux-mm, linux-kernel, Jan Kara, Peter Zijlstra,
	Andrew Morton, Johannes Weiner, Conny Seidel, Paul E. McKenney,
	Tejun Heo

On 09/24/2012 06:26 PM, Borislav Petkov wrote:
> On Mon, Sep 24, 2012 at 08:29:00PM +0800, Fengguang Wu wrote:
>> On Mon, Sep 24, 2012 at 02:20:53PM +0200, Borislav Petkov wrote:
>>> On Mon, Sep 24, 2012 at 07:34:47PM +0800, Fengguang Wu wrote:
>>>> Will you test such a line? At least the generic do_div() only uses the
>>>> lower 32bits for division.
>>>>
>>>>         WARN_ON(!(den & 0xffffffff));
>>>
>>> But, but, the asm output says:
>>>
>>>   28:   48 89 c8                mov    %rcx,%rax
>>>   2b:*  48 f7 f7                div    %rdi     <-- trapping instruction
>>>   2e:   31 d2                   xor    %edx,%edx
>>>
>>> and this version of DIV does an unsigned division of RDX:RAX by the
>>> contents of a *64-bit register* ... in our case %rdi.
>>>
>>> Srivatsa's oops  shows the same:
>>>
>>>   28:   48 89 f0                mov    %rsi,%rax
>>>   2b:*  48 f7 f7                div    %rdi     <-- trapping instruction
>>>   2e:   41 8b 94 24 74 02 00    mov    0x274(%r12),%edx
>>>
>>> Right?
>>
>> Right, that's why I said "at least". As for x86, I'm as clueless as you..
> 
> Right, both oopses are on x86 so I don't think it is the bitness of the
> division.
> 
> Another thing those two have in common is that both happen when a CPU
> comes online. Srivatsa's is when CPU9 comes online (oops is detected on
> CPU9) and in our case CPU4 comes online but the oops says CPU0.
> 

I had posted another dump from one of my tests. That one triggers while
offlining a CPU (CPU 9).

https://lkml.org/lkml/2012/9/14/235

> So it has to be hotplug-related.

Regards,
Srivatsa S. Bhat



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 18:48         ` Srivatsa S. Bhat
@ 2012-09-24 19:31           ` Borislav Petkov
  2012-09-24 20:07             ` Jan Kara
  2012-09-24 20:48           ` [PATCH] CPU hotplug, writeback: Don't call writeback_set_ratelimit() too often during hotplug Srivatsa S. Bhat
  1 sibling, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2012-09-24 19:31 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Conny Seidel, Jan Kara, linux-mm, linux-kernel, Fengguang Wu,
	Peter Zijlstra, Andrew Morton, Johannes Weiner, Paul E. McKenney

On Tue, Sep 25, 2012 at 12:18:46AM +0530, Srivatsa S. Bhat wrote:
> >> Sure thing.
> >> Out of ~25 runs I only triggered it once, without the patch the
> >> trigger-rate is higher.
> >>
> >> [   55.098249] Broke affinity for irq 81
> >> [   55.105108] smpboot: CPU 1 is now offline
> >> [   55.311216] smpboot: Booting Node 0 Processor 1 APIC 0x11
> >> [   55.333022] LVT offset 0 assigned for vector 0x400
> >> [   55.545877] smpboot: CPU 2 is now offline
> >> [   55.753050] smpboot: Booting Node 0 Processor 2 APIC 0x12
> >> [   55.775582] LVT offset 0 assigned for vector 0x400
> >> [   55.986747] smpboot: CPU 3 is now offline
> >> [   56.193839] smpboot: Booting Node 0 Processor 3 APIC 0x13
> >> [   56.212643] LVT offset 0 assigned for vector 0x400
> >> [   56.423201] Got negative events: -25
> > 
> > I see it:
> > 
> > __percpu_counter_sum does for_each_online_cpu without doing
> > get/put_online_cpus().
> > 
> 
> Maybe I'm missing something, but that doesn't immediately tell me
> what's the exact source of the bug.. Note that there is a hotplug
> callback percpu_counter_hotcpu_callback() that takes the same
> fbc->lock before updating/resetting the percpu counters of offline
> CPU. So, though the synchronization is a bit weird, I don't
> immediately see a problematic race condition there.

Well, those oopses both happen when a cpu comes online.

According to when percpu_counter_hotcpu_callback is run (at CPU_DEAD)
then those percpu variables should have correctly updated values.

So there has to be some other case where we read garbage which is a
negative value - otherwise we wouldn't be seeing the debug output.

For example, look at the log output above: we bring down cpu 3 just to
bring it right back online. So there has to be something fishy along
that codepath...

Hmm.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 19:31           ` Borislav Petkov
@ 2012-09-24 20:07             ` Jan Kara
  2012-09-24 20:17               ` Jan Kara
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2012-09-24 20:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Srivatsa S. Bhat, Conny Seidel, Jan Kara, linux-mm, linux-kernel,
	Fengguang Wu, Peter Zijlstra, Andrew Morton, Johannes Weiner,
	Paul E. McKenney

On Mon 24-09-12 21:31:35, Borislav Petkov wrote:
> On Tue, Sep 25, 2012 at 12:18:46AM +0530, Srivatsa S. Bhat wrote:
> > >> Sure thing.
> > >> Out of ~25 runs I only triggered it once, without the patch the
> > >> trigger-rate is higher.
> > >>
> > >> [   55.098249] Broke affinity for irq 81
> > >> [   55.105108] smpboot: CPU 1 is now offline
> > >> [   55.311216] smpboot: Booting Node 0 Processor 1 APIC 0x11
> > >> [   55.333022] LVT offset 0 assigned for vector 0x400
> > >> [   55.545877] smpboot: CPU 2 is now offline
> > >> [   55.753050] smpboot: Booting Node 0 Processor 2 APIC 0x12
> > >> [   55.775582] LVT offset 0 assigned for vector 0x400
> > >> [   55.986747] smpboot: CPU 3 is now offline
> > >> [   56.193839] smpboot: Booting Node 0 Processor 3 APIC 0x13
> > >> [   56.212643] LVT offset 0 assigned for vector 0x400
> > >> [   56.423201] Got negative events: -25
> > > 
> > > I see it:
> > > 
> > > __percpu_counter_sum does for_each_online_cpu without doing
> > > get/put_online_cpus().
> > > 
> > 
> > Maybe I'm missing something, but that doesn't immediately tell me
> > what's the exact source of the bug.. Note that there is a hotplug
> > callback percpu_counter_hotcpu_callback() that takes the same
> > fbc->lock before updating/resetting the percpu counters of offline
> > CPU. So, though the synchronization is a bit weird, I don't
> > immediately see a problematic race condition there.
> 
> Well, those oopses both happen when a cpu comes online.
> 
> According to when percpu_counter_hotcpu_callback is run (at CPU_DEAD)
> then those percpu variables should have correctly updated values.
> 
> So there has to be some other case where we read garbage which is a
> negative value - otherwise we wouldn't be seeing the debug output.
> 
> For example, look at the log output above: we bring down cpu 3 just to
> bring it right back online. So there has to be something fishy along
> that codepath...
  Well, I think the race happens when a CPU is dying and we call
percpu_counter_sum() after it is marked offline but before callbacks are
run. percpu_counter_sum() then does not add died CPU's counter in the sum
and thus total can go negative. If get/put_online_cpus() fixes this race,
I'd be happy.

  OTOH in theory, percpu_counter_sum() can return negative values even
without CPU hotplug when percpu_counter_sum() races with cpu local
operations. It cannot happen with the current flexible proportion code
but I think making the code more robust is a good idea. I'll send a patch
for this. Still fixing the percpu counters would be nice as these races
could cause random errors to computed proportions and that's bad for
writeback.

								Honza

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 20:07             ` Jan Kara
@ 2012-09-24 20:17               ` Jan Kara
  2012-09-24 21:21                 ` Andrew Morton
  2012-09-25  8:57                 ` Conny Seidel
  0 siblings, 2 replies; 25+ messages in thread
From: Jan Kara @ 2012-09-24 20:17 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Srivatsa S. Bhat, Conny Seidel, Jan Kara, linux-mm, linux-kernel,
	Fengguang Wu, Peter Zijlstra, Andrew Morton, Johannes Weiner,
	Paul E. McKenney

[-- Attachment #1: Type: text/plain, Size: 2906 bytes --]

On Mon 24-09-12 22:07:37, Jan Kara wrote:
> On Mon 24-09-12 21:31:35, Borislav Petkov wrote:
> > On Tue, Sep 25, 2012 at 12:18:46AM +0530, Srivatsa S. Bhat wrote:
> > > >> Sure thing.
> > > >> Out of ~25 runs I only triggered it once, without the patch the
> > > >> trigger-rate is higher.
> > > >>
> > > >> [   55.098249] Broke affinity for irq 81
> > > >> [   55.105108] smpboot: CPU 1 is now offline
> > > >> [   55.311216] smpboot: Booting Node 0 Processor 1 APIC 0x11
> > > >> [   55.333022] LVT offset 0 assigned for vector 0x400
> > > >> [   55.545877] smpboot: CPU 2 is now offline
> > > >> [   55.753050] smpboot: Booting Node 0 Processor 2 APIC 0x12
> > > >> [   55.775582] LVT offset 0 assigned for vector 0x400
> > > >> [   55.986747] smpboot: CPU 3 is now offline
> > > >> [   56.193839] smpboot: Booting Node 0 Processor 3 APIC 0x13
> > > >> [   56.212643] LVT offset 0 assigned for vector 0x400
> > > >> [   56.423201] Got negative events: -25
> > > > 
> > > > I see it:
> > > > 
> > > > __percpu_counter_sum does for_each_online_cpu without doing
> > > > get/put_online_cpus().
> > > > 
> > > 
> > > Maybe I'm missing something, but that doesn't immediately tell me
> > > what's the exact source of the bug.. Note that there is a hotplug
> > > callback percpu_counter_hotcpu_callback() that takes the same
> > > fbc->lock before updating/resetting the percpu counters of offline
> > > CPU. So, though the synchronization is a bit weird, I don't
> > > immediately see a problematic race condition there.
> > 
> > Well, those oopses both happen when a cpu comes online.
> > 
> > According to when percpu_counter_hotcpu_callback is run (at CPU_DEAD)
> > then those percpu variables should have correctly updated values.
> > 
> > So there has to be some other case where we read garbage which is a
> > negative value - otherwise we wouldn't be seeing the debug output.
> > 
> > For example, look at the log output above: we bring down cpu 3 just to
> > bring it right back online. So there has to be something fishy along
> > that codepath...
>   Well, I think the race happens when a CPU is dying and we call
> percpu_counter_sum() after it is marked offline but before callbacks are
> run. percpu_counter_sum() then does not add died CPU's counter in the sum
> and thus total can go negative. If get/put_online_cpus() fixes this race,
> I'd be happy.
> 
>   OTOH in theory, percpu_counter_sum() can return negative values even
> without CPU hotplug when percpu_counter_sum() races with cpu local
> operations. It cannot happen with the current flexible proportion code
> but I think making the code more robust is a good idea. I'll send a patch
> for this. Still fixing the percpu counters would be nice as these races
> could cause random errors to computed proportions and that's bad for
> writeback.
  In the attachment is a fix. Fengguang, can you please merge it? Thanks!

								Honza

[-- Attachment #2: 0001-lib-Fix-corruption-of-denominator-in-flexible-propor.patch --]
[-- Type: text/x-patch, Size: 1456 bytes --]

>From 1fd707552a67adf869958e479910d2f70452351b Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Mon, 24 Sep 2012 16:17:16 +0200
Subject: [PATCH] lib: Fix corruption of denominator in flexible proportions

When racing with CPU hotplug, percpu_counter_sum() can return negative
values for the number of observed events. This confuses fprop_new_period(),
which uses unsigned type and as a result number of events is set to big
*positive* number. From that moment on, things go pear shaped and can result
e.g. in division by zero as denominator is later truncated to 32-bits.

Fix the issue by using a signed type in fprop_new_period(). That makes us
bail out from the function without doing anything (mistakenly) thinking
there are no events to age. That makes aging somewhat inaccurate but getting
accurate data would be rather hard.

Reported-by: Borislav Petkov <bp@amd64.org>
Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 lib/flex_proportions.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/flex_proportions.c b/lib/flex_proportions.c
index c785554..ebf3bac 100644
--- a/lib/flex_proportions.c
+++ b/lib/flex_proportions.c
@@ -62,7 +62,7 @@ void fprop_global_destroy(struct fprop_global *p)
  */
 bool fprop_new_period(struct fprop_global *p, int periods)
 {
-	u64 events;
+	s64 events;
 	unsigned long flags;
 
 	local_irq_save(flags);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH] CPU hotplug, writeback: Don't call writeback_set_ratelimit() too often during hotplug
  2012-09-24 18:48         ` Srivatsa S. Bhat
  2012-09-24 19:31           ` Borislav Petkov
@ 2012-09-24 20:48           ` Srivatsa S. Bhat
  2012-09-28 12:27             ` Fengguang Wu
  1 sibling, 1 reply; 25+ messages in thread
From: Srivatsa S. Bhat @ 2012-09-24 20:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Conny Seidel, Jan Kara, linux-mm, linux-kernel, Fengguang Wu,
	Peter Zijlstra, Andrew Morton, Johannes Weiner, Paul E. McKenney,
	Thomas Gleixner


From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

The CPU hotplug callback related to writeback calls writeback_set_ratelimit()
during every state change in the hotplug sequence. This is unnecessary
since num_online_cpus() changes only once during the entire hotplug operation.

So invoke the function only once per hotplug, thereby avoiding the
unnecessary repetition of those costly calculations.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 mm/page-writeback.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 5ad5ce2..830893b 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1602,10 +1602,18 @@ void writeback_set_ratelimit(void)
 }
 
 static int __cpuinit
-ratelimit_handler(struct notifier_block *self, unsigned long u, void *v)
+ratelimit_handler(struct notifier_block *self, unsigned long action,
+		  void *hcpu)
 {
-	writeback_set_ratelimit();
-	return NOTIFY_DONE;
+
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_ONLINE:
+	case CPU_DEAD:
+		writeback_set_ratelimit();
+		return NOTIFY_OK;
+	default:
+		return NOTIFY_DONE;
+	}
 }
 
 static struct notifier_block __cpuinitdata ratelimit_nb = {



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 20:17               ` Jan Kara
@ 2012-09-24 21:21                 ` Andrew Morton
  2012-09-24 22:27                   ` Jan Kara
  2012-09-25  8:57                 ` Conny Seidel
  1 sibling, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2012-09-24 21:21 UTC (permalink / raw)
  To: Jan Kara
  Cc: Borislav Petkov, Srivatsa S. Bhat, Conny Seidel, linux-mm,
	linux-kernel, Fengguang Wu, Peter Zijlstra, Johannes Weiner,
	Paul E. McKenney

On Mon, 24 Sep 2012 22:17:26 +0200
Jan Kara <jack@suse.cz> wrote:

>   In the attachment is a fix. Fengguang, can you please merge it? Thanks!

I grabbed it.  I also added the (missing, important) text "This bug causes
a divide-by-zero oops in bdi_dirty_limit() in Borislav's 3.6.0-rc6
based kernel." And I stuck a cc:stable onto it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 21:21                 ` Andrew Morton
@ 2012-09-24 22:27                   ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2012-09-24 22:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Borislav Petkov, Srivatsa S. Bhat, Conny Seidel,
	linux-mm, linux-kernel, Fengguang Wu, Peter Zijlstra,
	Johannes Weiner, Paul E. McKenney

On Mon 24-09-12 14:21:00, Andrew Morton wrote:
> On Mon, 24 Sep 2012 22:17:26 +0200
> Jan Kara <jack@suse.cz> wrote:
> 
> >   In the attachment is a fix. Fengguang, can you please merge it? Thanks!
> 
> I grabbed it.  I also added the (missing, important) text "This bug causes
> a divide-by-zero oops in bdi_dirty_limit() in Borislav's 3.6.0-rc6
> based kernel." And I stuck a cc:stable onto it.
  Thanks!

								Honza

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: divide error: bdi_dirty_limit+0x5a/0x9e
  2012-09-24 20:17               ` Jan Kara
  2012-09-24 21:21                 ` Andrew Morton
@ 2012-09-25  8:57                 ` Conny Seidel
  1 sibling, 0 replies; 25+ messages in thread
From: Conny Seidel @ 2012-09-25  8:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: Borislav Petkov, Srivatsa S. Bhat, Conny Seidel, linux-mm,
	linux-kernel, Fengguang Wu, Peter Zijlstra, Andrew Morton,
	Johannes Weiner, Paul E. McKenney

* Jan Kara <jack@suse.cz>:
>  [...]

The patch works for me. Tested it a couple of times on several machines
without triggering the issue.

Thanks for the fix.

> From 1fd707552a67adf869958e479910d2f70452351b Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Mon, 24 Sep 2012 16:17:16 +0200
> Subject: [PATCH] lib: Fix corruption of denominator in flexible proportions
>
> When racing with CPU hotplug, percpu_counter_sum() can return negative
> values for the number of observed events. This confuses fprop_new_period(),
> which uses unsigned type and as a result number of events is set to big
> *positive* number. From that moment on, things go pear shaped and can result
> e.g. in division by zero as denominator is later truncated to 32-bits.
>
> Fix the issue by using a signed type in fprop_new_period(). That makes us
> bail out from the function without doing anything (mistakenly) thinking
> there are no events to age. That makes aging somewhat inaccurate but getting
> accurate data would be rather hard.
>
> Reported-by: Borislav Petkov <bp@amd64.org>
> Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
Tested-by: Conny Seidel <conny.seidel@amd.com>
> ---
>  lib/flex_proportions.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/lib/flex_proportions.c b/lib/flex_proportions.c
> index c785554..ebf3bac 100644
> --- a/lib/flex_proportions.c
> +++ b/lib/flex_proportions.c
> @@ -62,7 +62,7 @@ void fprop_global_destroy(struct fprop_global *p)
>   */
>  bool fprop_new_period(struct fprop_global *p, int periods)
>  {
> -	u64 events;
> +	s64 events;
>  	unsigned long flags;
>
>  	local_irq_save(flags);
> --
> 1.7.1
>


--
Kind regards.

Conny Seidel

##################################################################
# Email : conny.seidel@amd.com            GnuPG-Key : 0xA6AB055D #
# Fingerprint: 17C4 5DB2 7C4C C1C7 1452 8148 F139 7C09 A6AB 055D #
##################################################################
# Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach      #
# General Managers: Alberto Bozzo                                #
# Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen #
#               HRB Nr. 43632                                    #
##################################################################


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] CPU hotplug, writeback: Don't call writeback_set_ratelimit() too often during hotplug
  2012-09-24 20:48           ` [PATCH] CPU hotplug, writeback: Don't call writeback_set_ratelimit() too often during hotplug Srivatsa S. Bhat
@ 2012-09-28 12:27             ` Fengguang Wu
  2012-09-28 14:46               ` Srivatsa S. Bhat
  2012-10-03 23:11               ` Ni zhan Chen
  0 siblings, 2 replies; 25+ messages in thread
From: Fengguang Wu @ 2012-09-28 12:27 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Borislav Petkov, Conny Seidel, Jan Kara, linux-mm, linux-kernel,
	Peter Zijlstra, Andrew Morton, Johannes Weiner, Paul E. McKenney,
	Thomas Gleixner

On Tue, Sep 25, 2012 at 02:18:20AM +0530, Srivatsa S. Bhat wrote:
> 
> From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> 
> The CPU hotplug callback related to writeback calls writeback_set_ratelimit()
> during every state change in the hotplug sequence. This is unnecessary
> since num_online_cpus() changes only once during the entire hotplug operation.
> 
> So invoke the function only once per hotplug, thereby avoiding the
> unnecessary repetition of those costly calculations.
> 
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---

Looks good to me. I'll include it in the writeback tree.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] CPU hotplug, writeback: Don't call writeback_set_ratelimit() too often during hotplug
  2012-09-28 12:27             ` Fengguang Wu
@ 2012-09-28 14:46               ` Srivatsa S. Bhat
  2012-10-03 23:11               ` Ni zhan Chen
  1 sibling, 0 replies; 25+ messages in thread
From: Srivatsa S. Bhat @ 2012-09-28 14:46 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Borislav Petkov, Conny Seidel, Jan Kara, linux-mm, linux-kernel,
	Peter Zijlstra, Andrew Morton, Johannes Weiner, Paul E. McKenney,
	Thomas Gleixner

On 09/28/2012 05:57 PM, Fengguang Wu wrote:
> On Tue, Sep 25, 2012 at 02:18:20AM +0530, Srivatsa S. Bhat wrote:
>>
>> From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>
>> The CPU hotplug callback related to writeback calls writeback_set_ratelimit()
>> during every state change in the hotplug sequence. This is unnecessary
>> since num_online_cpus() changes only once during the entire hotplug operation.
>>
>> So invoke the function only once per hotplug, thereby avoiding the
>> unnecessary repetition of those costly calculations.
>>
>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> ---
> 
> Looks good to me. I'll include it in the writeback tree.
> 

Great, thanks!
 
Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] CPU hotplug, writeback: Don't call writeback_set_ratelimit() too often during hotplug
  2012-09-28 12:27             ` Fengguang Wu
  2012-09-28 14:46               ` Srivatsa S. Bhat
@ 2012-10-03 23:11               ` Ni zhan Chen
  1 sibling, 0 replies; 25+ messages in thread
From: Ni zhan Chen @ 2012-10-03 23:11 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Srivatsa S. Bhat, Borislav Petkov, Conny Seidel, Jan Kara,
	linux-mm, linux-kernel, Peter Zijlstra, Andrew Morton,
	Johannes Weiner, Paul E. McKenney, Thomas Gleixner

On 09/28/2012 08:27 PM, Fengguang Wu wrote:
> On Tue, Sep 25, 2012 at 02:18:20AM +0530, Srivatsa S. Bhat wrote:
>> From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>
>> The CPU hotplug callback related to writeback calls writeback_set_ratelimit()
>> during every state change in the hotplug sequence. This is unnecessary
>> since num_online_cpus() changes only once during the entire hotplug operation.
>>
>> So invoke the function only once per hotplug, thereby avoiding the
>> unnecessary repetition of those costly calculations.
>>
>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> ---
> Looks good to me. I'll include it in the writeback tree.

Hi Fengguang,

Could you tell me when  inode->i_state & I_DIRTY will be set? thanks.

Regards,
Chen

>
> Thanks,
> Fengguang
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2012-10-03 23:11 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-24 10:23 divide error: bdi_dirty_limit+0x5a/0x9e Borislav Petkov
2012-09-24 10:38 ` Srivatsa S. Bhat
2012-09-24 11:05   ` Borislav Petkov
2012-09-24 11:13     ` Srivatsa S. Bhat
2012-09-24 11:34       ` Fengguang Wu
2012-09-24 11:51         ` Srivatsa S. Bhat
2012-09-24 12:20         ` Borislav Petkov
2012-09-24 12:29           ` Fengguang Wu
2012-09-24 12:56             ` Borislav Petkov
2012-09-24 18:54               ` Srivatsa S. Bhat
2012-09-24 14:23 ` Jan Kara
2012-09-24 14:36   ` Borislav Petkov
2012-09-24 18:16     ` Conny Seidel
2012-09-24 18:19       ` Borislav Petkov
2012-09-24 18:48         ` Srivatsa S. Bhat
2012-09-24 19:31           ` Borislav Petkov
2012-09-24 20:07             ` Jan Kara
2012-09-24 20:17               ` Jan Kara
2012-09-24 21:21                 ` Andrew Morton
2012-09-24 22:27                   ` Jan Kara
2012-09-25  8:57                 ` Conny Seidel
2012-09-24 20:48           ` [PATCH] CPU hotplug, writeback: Don't call writeback_set_ratelimit() too often during hotplug Srivatsa S. Bhat
2012-09-28 12:27             ` Fengguang Wu
2012-09-28 14:46               ` Srivatsa S. Bhat
2012-10-03 23:11               ` Ni zhan Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).