All of lore.kernel.org
 help / color / mirror / Atom feed
* [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in 3.14+ on Hyper-V 2012 R2
@ 2014-07-06 20:18 ` Sitsofe Wheeler
  0 siblings, 0 replies; 13+ messages in thread
From: Sitsofe Wheeler @ 2014-07-06 20:18 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: K. Y. Srinivasan, David S. Miller, devel, linux-kernel, netdev

With the 3.14 kernel Hyper-V no longer reliably enables its networking
devices in time on cloud images leading to network devices permanently
remaining offline.

After a painful round of bisection I've narrowed this down to commit
b679ef73edc251f6d200a7dd2396e9fef9e36fc3 :

# bad: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14
# good: [d8ec26d7f8287f5788a494f56e8814210f0e64be] Linux 3.13
git bisect start 'v3.14' 'v3.13'
# good: [82c477669a4665eb4e52030792051e0559ee2a36] Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 82c477669a4665eb4e52030792051e0559ee2a36
# bad: [ca2a650f3dfdc30d71d21bcbb04d2d057779f3f9] Merge branch 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma
git bisect bad ca2a650f3dfdc30d71d21bcbb04d2d057779f3f9
# bad: [205e2210daa975d92ace485a65a31ccc4077fe1a] iwlwifi: disable TX AMPDU by default for iwldvm
git bisect bad 205e2210daa975d92ace485a65a31ccc4077fe1a
# bad: [09db30805300e9ed5ad43d4d339115cf1d9c84e1] dccp: re-enable debug macro
git bisect bad 09db30805300e9ed5ad43d4d339115cf1d9c84e1
# bad: [d9120198ddef2c0b61ca6659ace41b7c1e7c8f08] clk: shmobile: rcar-gen2: Use kick bit to allow Z clock frequency change
git bisect bad d9120198ddef2c0b61ca6659ace41b7c1e7c8f08
# bad: [1b07da516ee25250f458c76c012ebe4cd677a84f] hyperv: Move state setting for link query
git bisect bad 1b07da516ee25250f458c76c012ebe4cd677a84f
# bad: [53611c0ce9f6e2fa2e31f9ab4ad8c08c512085ba] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
git bisect bad 53611c0ce9f6e2fa2e31f9ab4ad8c08c512085ba
# bad: [a34fe10750ebe524a39f97bd78ab4d232a554edb] parisc: locks: remove redundant arch_*_relax operations
git bisect bad a34fe10750ebe524a39f97bd78ab4d232a554edb
# bad: [004e5cf743086990e5fc04a14437b3966d7fa9a2] Merge branch 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes
git bisect bad 004e5cf743086990e5fc04a14437b3966d7fa9a2
# bad: [a4ecdf82f8ea49f7d3a072121dcbd0bf3a7cb93a] Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad a4ecdf82f8ea49f7d3a072121dcbd0bf3a7cb93a
# bad: [c60f7d5a8e7c639de5d9dfe07e1e91d302d506e4] Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
git bisect bad c60f7d5a8e7c639de5d9dfe07e1e91d302d506e4
# bad: [bf21d605bf7d18d2b3cdb1c19fc1b2a1549c1f11] Merge branch 'drm-fixes-3.14' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
git bisect bad bf21d605bf7d18d2b3cdb1c19fc1b2a1549c1f11
# bad: [07ae78c9798b79bad3d3adf983c94ba23fde54d4] drm/radeon/cik: stop the sdma engines in the enable() function
git bisect bad 07ae78c9798b79bad3d3adf983c94ba23fde54d4
# bad: [7848865914c6a63ead674f0f5604b77df7d3874f] drm/radeon: fix runpm disabling on non-PX harder
git bisect bad 7848865914c6a63ead674f0f5604b77df7d3874f
# bad: [e9e352e9100b98aed1a5fb9e33355c29fb07d5b1] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform
git bisect bad e9e352e9100b98aed1a5fb9e33355c29fb07d5b1
# good: [6e1f586d31ad49063da391db12632b31c7b00d76] qlcnic: Fix SR-IOV cleanup code path
git bisect good 6e1f586d31ad49063da391db12632b31c7b00d76
# good: [562e74fefc36eb57286455c68a60f2776659a7e1] Merge tag 'cris-for-3.14' of git://jni.nu/cris
git bisect good 562e74fefc36eb57286455c68a60f2776659a7e1
# good: [f1499382f114231cbd1e3dee7e656b50ce9d8236] Merge tag 'xfs-for-linus-v3.14-rc1-2' of git://oss.sgi.com/xfs/xfs
git bisect good f1499382f114231cbd1e3dee7e656b50ce9d8236
# good: [0e47c969c65e213421450c31043353ebe3c67e0c] Merge tag 'for-linus-20140127' of git://git.infradead.org/linux-mtd
git bisect good 0e47c969c65e213421450c31043353ebe3c67e0c
# bad: [30c867eebfbd1c25310aec9f152578deaf793080] Merge tag 'blackfin-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-linux
git bisect bad 30c867eebfbd1c25310aec9f152578deaf793080
# bad: [c044dc2132d19d8c643cdd340f21afcec177c046] qeth: fix build of s390 allmodconfig
git bisect bad c044dc2132d19d8c643cdd340f21afcec177c046
# bad: [d922e1cb1ea17ac7f0a5c3c2be98d4bd80d055b8] net: Document promote_secondaries
git bisect bad d922e1cb1ea17ac7f0a5c3c2be98d4bd80d055b8
# good: [f2ebd477f141bc09b10fb8deb612a4d9b8999bba] bonding: restructure locking of bond_ab_arp_probe()
git bisect good f2ebd477f141bc09b10fb8deb612a4d9b8999bba
# bad: [b679ef73edc251f6d200a7dd2396e9fef9e36fc3] hyperv: Add support for physically discontinuous receive buffer
git bisect bad b679ef73edc251f6d200a7dd2396e9fef9e36fc3
# good: [a452ce345d63ddf92cd101e4196569f8718ad319] net: Fix memory leak if TPROXY used with TCP early demux
git bisect good a452ce345d63ddf92cd101e4196569f8718ad319
# good: [731073b9c99d46c6b6c01184f67ee6f75fd7a163] sky2: initialize napi before registering device
git bisect good 731073b9c99d46c6b6c01184f67ee6f75fd7a163
# first bad commit: [b679ef73edc251f6d200a7dd2396e9fef9e36fc3] hyperv: Add support for physically discontinuous receive buffer

commit b679ef73edc251f6d200a7dd2396e9fef9e36fc3
Author: Haiyang Zhang <haiyangz@microsoft.com>
Date:   Mon Jan 27 15:03:42 2014 -0800

    hyperv: Add support for physically discontinuous receive buffer
    
    This will allow us to use bigger receive buffer, and prevent allocation failure
    due to fragmented memory.
    
    Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
    Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

The problem can be intermittent (sometimes it happens rarely, sometimes
it happens seemingly every boot) so I used the following script to
perform a check:

#!/bin/bash
ok=1
pass=0
bootcount=$(</root/bootcount)
bootcount=$((bootcount + 1))
while [[ $ok -ne 0 ]] && [[ $pass -lt 10 ]]; do
        pass=$((pass + 1))
        ping -qc 1 kernel.org
        ok=$?
        if [[ $ok -eq 0 ]]; then
                echo $bootcount > /root/bootcount
                sync
                reboot
        fi
        sleep 1
done
echo "No network"
read

With kernels equal to or after b679ef73edc251f6d200a7dd2396e9fef9e36fc3
the system will usually stop rebooting before 20 passes but the most
extreme cases were always less than 100. With a pre
b679ef73edc251f6d200a7dd2396e9fef9e36fc3 kernel it did over 390 passes
before I manually stopped it.

Originally filed on https://bugzilla.redhat.com/show_bug.cgi?id=1095387
and then on https://bugzilla.kernel.org/show_bug.cgi?id=78771 but
without reply...

Might also be related to
http://thread.gmane.org/gmane.linux.kernel/1711873/focus=1733398
(Regression in hyperv network driver in 3.14).

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in 3.14+ on Hyper-V 2012 R2
@ 2014-07-06 20:18 ` Sitsofe Wheeler
  0 siblings, 0 replies; 13+ messages in thread
From: Sitsofe Wheeler @ 2014-07-06 20:18 UTC (permalink / raw)
  To: Haiyang Zhang; +Cc: devel, netdev, David S. Miller, linux-kernel

With the 3.14 kernel Hyper-V no longer reliably enables its networking
devices in time on cloud images leading to network devices permanently
remaining offline.

After a painful round of bisection I've narrowed this down to commit
b679ef73edc251f6d200a7dd2396e9fef9e36fc3 :

# bad: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14
# good: [d8ec26d7f8287f5788a494f56e8814210f0e64be] Linux 3.13
git bisect start 'v3.14' 'v3.13'
# good: [82c477669a4665eb4e52030792051e0559ee2a36] Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 82c477669a4665eb4e52030792051e0559ee2a36
# bad: [ca2a650f3dfdc30d71d21bcbb04d2d057779f3f9] Merge branch 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma
git bisect bad ca2a650f3dfdc30d71d21bcbb04d2d057779f3f9
# bad: [205e2210daa975d92ace485a65a31ccc4077fe1a] iwlwifi: disable TX AMPDU by default for iwldvm
git bisect bad 205e2210daa975d92ace485a65a31ccc4077fe1a
# bad: [09db30805300e9ed5ad43d4d339115cf1d9c84e1] dccp: re-enable debug macro
git bisect bad 09db30805300e9ed5ad43d4d339115cf1d9c84e1
# bad: [d9120198ddef2c0b61ca6659ace41b7c1e7c8f08] clk: shmobile: rcar-gen2: Use kick bit to allow Z clock frequency change
git bisect bad d9120198ddef2c0b61ca6659ace41b7c1e7c8f08
# bad: [1b07da516ee25250f458c76c012ebe4cd677a84f] hyperv: Move state setting for link query
git bisect bad 1b07da516ee25250f458c76c012ebe4cd677a84f
# bad: [53611c0ce9f6e2fa2e31f9ab4ad8c08c512085ba] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
git bisect bad 53611c0ce9f6e2fa2e31f9ab4ad8c08c512085ba
# bad: [a34fe10750ebe524a39f97bd78ab4d232a554edb] parisc: locks: remove redundant arch_*_relax operations
git bisect bad a34fe10750ebe524a39f97bd78ab4d232a554edb
# bad: [004e5cf743086990e5fc04a14437b3966d7fa9a2] Merge branch 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes
git bisect bad 004e5cf743086990e5fc04a14437b3966d7fa9a2
# bad: [a4ecdf82f8ea49f7d3a072121dcbd0bf3a7cb93a] Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad a4ecdf82f8ea49f7d3a072121dcbd0bf3a7cb93a
# bad: [c60f7d5a8e7c639de5d9dfe07e1e91d302d506e4] Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
git bisect bad c60f7d5a8e7c639de5d9dfe07e1e91d302d506e4
# bad: [bf21d605bf7d18d2b3cdb1c19fc1b2a1549c1f11] Merge branch 'drm-fixes-3.14' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
git bisect bad bf21d605bf7d18d2b3cdb1c19fc1b2a1549c1f11
# bad: [07ae78c9798b79bad3d3adf983c94ba23fde54d4] drm/radeon/cik: stop the sdma engines in the enable() function
git bisect bad 07ae78c9798b79bad3d3adf983c94ba23fde54d4
# bad: [7848865914c6a63ead674f0f5604b77df7d3874f] drm/radeon: fix runpm disabling on non-PX harder
git bisect bad 7848865914c6a63ead674f0f5604b77df7d3874f
# bad: [e9e352e9100b98aed1a5fb9e33355c29fb07d5b1] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform
git bisect bad e9e352e9100b98aed1a5fb9e33355c29fb07d5b1
# good: [6e1f586d31ad49063da391db12632b31c7b00d76] qlcnic: Fix SR-IOV cleanup code path
git bisect good 6e1f586d31ad49063da391db12632b31c7b00d76
# good: [562e74fefc36eb57286455c68a60f2776659a7e1] Merge tag 'cris-for-3.14' of git://jni.nu/cris
git bisect good 562e74fefc36eb57286455c68a60f2776659a7e1
# good: [f1499382f114231cbd1e3dee7e656b50ce9d8236] Merge tag 'xfs-for-linus-v3.14-rc1-2' of git://oss.sgi.com/xfs/xfs
git bisect good f1499382f114231cbd1e3dee7e656b50ce9d8236
# good: [0e47c969c65e213421450c31043353ebe3c67e0c] Merge tag 'for-linus-20140127' of git://git.infradead.org/linux-mtd
git bisect good 0e47c969c65e213421450c31043353ebe3c67e0c
# bad: [30c867eebfbd1c25310aec9f152578deaf793080] Merge tag 'blackfin-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-linux
git bisect bad 30c867eebfbd1c25310aec9f152578deaf793080
# bad: [c044dc2132d19d8c643cdd340f21afcec177c046] qeth: fix build of s390 allmodconfig
git bisect bad c044dc2132d19d8c643cdd340f21afcec177c046
# bad: [d922e1cb1ea17ac7f0a5c3c2be98d4bd80d055b8] net: Document promote_secondaries
git bisect bad d922e1cb1ea17ac7f0a5c3c2be98d4bd80d055b8
# good: [f2ebd477f141bc09b10fb8deb612a4d9b8999bba] bonding: restructure locking of bond_ab_arp_probe()
git bisect good f2ebd477f141bc09b10fb8deb612a4d9b8999bba
# bad: [b679ef73edc251f6d200a7dd2396e9fef9e36fc3] hyperv: Add support for physically discontinuous receive buffer
git bisect bad b679ef73edc251f6d200a7dd2396e9fef9e36fc3
# good: [a452ce345d63ddf92cd101e4196569f8718ad319] net: Fix memory leak if TPROXY used with TCP early demux
git bisect good a452ce345d63ddf92cd101e4196569f8718ad319
# good: [731073b9c99d46c6b6c01184f67ee6f75fd7a163] sky2: initialize napi before registering device
git bisect good 731073b9c99d46c6b6c01184f67ee6f75fd7a163
# first bad commit: [b679ef73edc251f6d200a7dd2396e9fef9e36fc3] hyperv: Add support for physically discontinuous receive buffer

commit b679ef73edc251f6d200a7dd2396e9fef9e36fc3
Author: Haiyang Zhang <haiyangz@microsoft.com>
Date:   Mon Jan 27 15:03:42 2014 -0800

    hyperv: Add support for physically discontinuous receive buffer
    
    This will allow us to use bigger receive buffer, and prevent allocation failure
    due to fragmented memory.
    
    Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
    Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

The problem can be intermittent (sometimes it happens rarely, sometimes
it happens seemingly every boot) so I used the following script to
perform a check:

#!/bin/bash
ok=1
pass=0
bootcount=$(</root/bootcount)
bootcount=$((bootcount + 1))
while [[ $ok -ne 0 ]] && [[ $pass -lt 10 ]]; do
        pass=$((pass + 1))
        ping -qc 1 kernel.org
        ok=$?
        if [[ $ok -eq 0 ]]; then
                echo $bootcount > /root/bootcount
                sync
                reboot
        fi
        sleep 1
done
echo "No network"
read

With kernels equal to or after b679ef73edc251f6d200a7dd2396e9fef9e36fc3
the system will usually stop rebooting before 20 passes but the most
extreme cases were always less than 100. With a pre
b679ef73edc251f6d200a7dd2396e9fef9e36fc3 kernel it did over 390 passes
before I manually stopped it.

Originally filed on https://bugzilla.redhat.com/show_bug.cgi?id=1095387
and then on https://bugzilla.kernel.org/show_bug.cgi?id=78771 but
without reply...

Might also be related to
http://thread.gmane.org/gmane.linux.kernel/1711873/focus=1733398
(Regression in hyperv network driver in 3.14).

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in 3.14+ on Hyper-V 2012 R2
  2014-07-06 20:18 ` Sitsofe Wheeler
  (?)
@ 2014-07-07 16:54 ` Haiyang Zhang
  2014-07-07 18:13   ` Sitsofe Wheeler
  -1 siblings, 1 reply; 13+ messages in thread
From: Haiyang Zhang @ 2014-07-07 16:54 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: KY Srinivasan, David S. Miller, devel, linux-kernel, netdev



> -----Original Message-----
> From: Sitsofe Wheeler [mailto:sitsofe@gmail.com]
> Sent: Sunday, July 6, 2014 4:18 PM
> To: Haiyang Zhang
> Cc: KY Srinivasan; David S. Miller; devel@linuxdriverproject.org; linux-
> kernel@vger.kernel.org; netdev@vger.kernel.org
> Subject: [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in
> 3.14+ on Hyper-V 2012 R2
> 
> With the 3.14 kernel Hyper-V no longer reliably enables its networking devices
> in time on cloud images leading to network devices permanently remaining
> offline.
> 
> After a painful round of bisection I've narrowed this down to commit
> b679ef73edc251f6d200a7dd2396e9fef9e36fc3 :
> 
> # bad: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14 # good:
> [d8ec26d7f8287f5788a494f56e8814210f0e64be] Linux 3.13 git bisect start
> 'v3.14' 'v3.13'
> # good: [82c477669a4665eb4e52030792051e0559ee2a36] Merge branch 'perf-
> urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good 82c477669a4665eb4e52030792051e0559ee2a36
> # bad: [ca2a650f3dfdc30d71d21bcbb04d2d057779f3f9] Merge branch 'for-
> linus' of git://git.infradead.org/users/vkoul/slave-dma
> git bisect bad ca2a650f3dfdc30d71d21bcbb04d2d057779f3f9
> # bad: [205e2210daa975d92ace485a65a31ccc4077fe1a] iwlwifi: disable TX
> AMPDU by default for iwldvm git bisect bad
> 205e2210daa975d92ace485a65a31ccc4077fe1a
> # bad: [09db30805300e9ed5ad43d4d339115cf1d9c84e1] dccp: re-enable debug
> macro git bisect bad 09db30805300e9ed5ad43d4d339115cf1d9c84e1
> # bad: [d9120198ddef2c0b61ca6659ace41b7c1e7c8f08] clk: shmobile: rcar-
> gen2: Use kick bit to allow Z clock frequency change git bisect bad
> d9120198ddef2c0b61ca6659ace41b7c1e7c8f08
> # bad: [1b07da516ee25250f458c76c012ebe4cd677a84f] hyperv: Move state
> setting for link query git bisect bad
> 1b07da516ee25250f458c76c012ebe4cd677a84f
> # bad: [53611c0ce9f6e2fa2e31f9ab4ad8c08c512085ba] Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
> git bisect bad 53611c0ce9f6e2fa2e31f9ab4ad8c08c512085ba
> # bad: [a34fe10750ebe524a39f97bd78ab4d232a554edb] parisc: locks: remove
> redundant arch_*_relax operations git bisect bad
> a34fe10750ebe524a39f97bd78ab4d232a554edb
> # bad: [004e5cf743086990e5fc04a14437b3966d7fa9a2] Merge branch 'exynos-
> drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos
> into drm-fixes git bisect bad 004e5cf743086990e5fc04a14437b3966d7fa9a2
> # bad: [a4ecdf82f8ea49f7d3a072121dcbd0bf3a7cb93a] Merge branch 'x86-
> urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad a4ecdf82f8ea49f7d3a072121dcbd0bf3a7cb93a
> # bad: [c60f7d5a8e7c639de5d9dfe07e1e91d302d506e4] Merge branch 'drm-
> fixes' of git://people.freedesktop.org/~airlied/linux
> git bisect bad c60f7d5a8e7c639de5d9dfe07e1e91d302d506e4
> # bad: [bf21d605bf7d18d2b3cdb1c19fc1b2a1549c1f11] Merge branch 'drm-
> fixes-3.14' of git://people.freedesktop.org/~agd5f/linux into drm-fixes git bisect
> bad bf21d605bf7d18d2b3cdb1c19fc1b2a1549c1f11
> # bad: [07ae78c9798b79bad3d3adf983c94ba23fde54d4] drm/radeon/cik: stop
> the sdma engines in the enable() function git bisect bad
> 07ae78c9798b79bad3d3adf983c94ba23fde54d4
> # bad: [7848865914c6a63ead674f0f5604b77df7d3874f] drm/radeon: fix runpm
> disabling on non-PX harder git bisect bad
> 7848865914c6a63ead674f0f5604b77df7d3874f
> # bad: [e9e352e9100b98aed1a5fb9e33355c29fb07d5b1] Merge tag 'for-linus'
> of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform
> git bisect bad e9e352e9100b98aed1a5fb9e33355c29fb07d5b1
> # good: [6e1f586d31ad49063da391db12632b31c7b00d76] qlcnic: Fix SR-IOV
> cleanup code path git bisect good
> 6e1f586d31ad49063da391db12632b31c7b00d76
> # good: [562e74fefc36eb57286455c68a60f2776659a7e1] Merge tag 'cris-for-
> 3.14' of git://jni.nu/cris git bisect good
> 562e74fefc36eb57286455c68a60f2776659a7e1
> # good: [f1499382f114231cbd1e3dee7e656b50ce9d8236] Merge tag 'xfs-for-
> linus-v3.14-rc1-2' of git://oss.sgi.com/xfs/xfs git bisect good
> f1499382f114231cbd1e3dee7e656b50ce9d8236
> # good: [0e47c969c65e213421450c31043353ebe3c67e0c] Merge tag 'for-linus-
> 20140127' of git://git.infradead.org/linux-mtd git bisect good
> 0e47c969c65e213421450c31043353ebe3c67e0c
> # bad: [30c867eebfbd1c25310aec9f152578deaf793080] Merge tag 'blackfin-
> for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-
> linux
> git bisect bad 30c867eebfbd1c25310aec9f152578deaf793080
> # bad: [c044dc2132d19d8c643cdd340f21afcec177c046] qeth: fix build of s390
> allmodconfig git bisect bad c044dc2132d19d8c643cdd340f21afcec177c046
> # bad: [d922e1cb1ea17ac7f0a5c3c2be98d4bd80d055b8] net: Document
> promote_secondaries git bisect bad
> d922e1cb1ea17ac7f0a5c3c2be98d4bd80d055b8
> # good: [f2ebd477f141bc09b10fb8deb612a4d9b8999bba] bonding: restructure
> locking of bond_ab_arp_probe() git bisect good
> f2ebd477f141bc09b10fb8deb612a4d9b8999bba
> # bad: [b679ef73edc251f6d200a7dd2396e9fef9e36fc3] hyperv: Add support for
> physically discontinuous receive buffer git bisect bad
> b679ef73edc251f6d200a7dd2396e9fef9e36fc3
> # good: [a452ce345d63ddf92cd101e4196569f8718ad319] net: Fix memory leak
> if TPROXY used with TCP early demux git bisect good
> a452ce345d63ddf92cd101e4196569f8718ad319
> # good: [731073b9c99d46c6b6c01184f67ee6f75fd7a163] sky2: initialize napi
> before registering device git bisect good
> 731073b9c99d46c6b6c01184f67ee6f75fd7a163
> # first bad commit: [b679ef73edc251f6d200a7dd2396e9fef9e36fc3] hyperv:
> Add support for physically discontinuous receive buffer
> 
> commit b679ef73edc251f6d200a7dd2396e9fef9e36fc3
> Author: Haiyang Zhang <haiyangz@microsoft.com>
> Date:   Mon Jan 27 15:03:42 2014 -0800
> 
>     hyperv: Add support for physically discontinuous receive buffer
> 
>     This will allow us to use bigger receive buffer, and prevent allocation failure
>     due to fragmented memory.
> 
>     Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
>     Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> The problem can be intermittent (sometimes it happens rarely, sometimes it
> happens seemingly every boot) so I used the following script to perform a
> check:
> 
> #!/bin/bash
> ok=1
> pass=0
> bootcount=$(</root/bootcount)
> bootcount=$((bootcount + 1))
> while [[ $ok -ne 0 ]] && [[ $pass -lt 10 ]]; do
>         pass=$((pass + 1))
>         ping -qc 1 kernel.org
>         ok=$?
>         if [[ $ok -eq 0 ]]; then
>                 echo $bootcount > /root/bootcount
>                 sync
>                 reboot
>         fi
>         sleep 1
> done
> echo "No network"
> read
> 
> With kernels equal to or after b679ef73edc251f6d200a7dd2396e9fef9e36fc3
> the system will usually stop rebooting before 20 passes but the most extreme
> cases were always less than 100. With a pre
> b679ef73edc251f6d200a7dd2396e9fef9e36fc3 kernel it did over 390 passes
> before I manually stopped it.
> 
> Originally filed on https://bugzilla.redhat.com/show_bug.cgi?id=1095387
> and then on https://bugzilla.kernel.org/show_bug.cgi?id=78771 but without
> reply...
> 
> Might also be related to
> http://thread.gmane.org/gmane.linux.kernel/1711873/focus=1733398
> (Regression in hyperv network driver in 3.14).
> 
> --
> Sitsofe | http://sucs.org/~sits/

What's the memory size assigned to the Linux guest? And, have you seen any related 
messages in the dmesg log after this issue?

Thanks,
- Haiyang


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in 3.14+ on Hyper-V 2012 R2
  2014-07-07 16:54 ` Haiyang Zhang
@ 2014-07-07 18:13   ` Sitsofe Wheeler
  2014-07-11  5:52       ` Sitsofe Wheeler
  0 siblings, 1 reply; 13+ messages in thread
From: Sitsofe Wheeler @ 2014-07-07 18:13 UTC (permalink / raw)
  To: Haiyang Zhang; +Cc: KY Srinivasan, David S. Miller, devel, linux-kernel, netdev

On Mon, Jul 07, 2014 at 04:54:20PM +0000, Haiyang Zhang wrote:
> 
> > -----Original Message-----
> > From: Sitsofe Wheeler [mailto:sitsofe@gmail.com]
> > Sent: Sunday, July 6, 2014 4:18 PM
> > To: Haiyang Zhang
> > Cc: KY Srinivasan; David S. Miller; devel@linuxdriverproject.org; linux-
> > kernel@vger.kernel.org; netdev@vger.kernel.org
> > Subject: [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in
> > 3.14+ on Hyper-V 2012 R2
> > 
> > With the 3.14 kernel Hyper-V no longer reliably enables its
> > networking devices in time on cloud images leading to network
> > devices permanently remaining offline.
> > 
<snip>
> > the system will usually stop rebooting before 20 passes but the most
> > extreme cases were always less than 100. With a pre
> > b679ef73edc251f6d200a7dd2396e9fef9e36fc3 kernel it did over 390
> > passes before I manually stopped it.
> > 
> > Originally filed on https://bugzilla.redhat.com/show_bug.cgi?id=1095387
> > and then on https://bugzilla.kernel.org/show_bug.cgi?id=78771 but without
> > reply...
> > 
> > Might also be related to
> > http://thread.gmane.org/gmane.linux.kernel/1711873/focus=1733398
> > (Regression in hyperv network driver in 3.14).
> 
> What's the memory size assigned to the Linux guest? And, have you seen
> any related messages in the dmesg log after this issue?

(Feel free to trim my emails when replying - it makes it easier to see
your reply :-)

I've had as little as 256 MBytes and as much as 4 GBytes (non-dynamic)
and still seen the issue.

See https://bugzilla.kernel.org/attachment.cgi?id=142201 for a recent
dmesg (an older dmesg snippet can be seen on the Red Hat bugzilla).

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in 3.14+ on Hyper-V 2012 R2
  2014-07-07 18:13   ` Sitsofe Wheeler
@ 2014-07-11  5:52       ` Sitsofe Wheeler
  0 siblings, 0 replies; 13+ messages in thread
From: Sitsofe Wheeler @ 2014-07-11  5:52 UTC (permalink / raw)
  To: Haiyang Zhang; +Cc: KY Srinivasan, David S. Miller, devel, linux-kernel, netdev

On Mon, Jul 07, 2014 at 07:13:41PM +0100, Sitsofe Wheeler wrote:
> On Mon, Jul 07, 2014 at 04:54:20PM +0000, Haiyang Zhang wrote:
> > 
> > > -----Original Message-----
> > > From: Sitsofe Wheeler [mailto:sitsofe@gmail.com]
> > > Sent: Sunday, July 6, 2014 4:18 PM
> > > To: Haiyang Zhang
> > > Cc: KY Srinivasan; David S. Miller; devel@linuxdriverproject.org; linux-
> > > kernel@vger.kernel.org; netdev@vger.kernel.org
> > > Subject: [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in
> > > 3.14+ on Hyper-V 2012 R2
> > > 
> > > With the 3.14 kernel Hyper-V no longer reliably enables its
> > > networking devices in time on cloud images leading to network
> > > devices permanently remaining offline.
> > > 
> <snip>
> > > the system will usually stop rebooting before 20 passes but the most
> > > extreme cases were always less than 100. With a pre
> > > b679ef73edc251f6d200a7dd2396e9fef9e36fc3 kernel it did over 390
> > > passes before I manually stopped it.
> > > 
> > > Originally filed on https://bugzilla.redhat.com/show_bug.cgi?id=1095387
> > > and then on https://bugzilla.kernel.org/show_bug.cgi?id=78771 but without
> > > reply...
> > > 
> > > Might also be related to
> > > http://thread.gmane.org/gmane.linux.kernel/1711873/focus=1733398
> > > (Regression in hyperv network driver in 3.14).
> > 
> > What's the memory size assigned to the Linux guest? And, have you seen
> > any related messages in the dmesg log after this issue?
> 
> (Feel free to trim my emails when replying - it makes it easier to see
> your reply :-)
> 
> I've had as little as 256 MBytes and as much as 4 GBytes (non-dynamic)
> and still seen the issue.
> 
> See https://bugzilla.kernel.org/attachment.cgi?id=142201 for a recent
> dmesg (an older dmesg snippet can be seen on the Red Hat bugzilla).

Oops that should have been
https://bugzilla.kernel.org/attachment.cgi?id=142351 (either way it's
information linked off
https://bugzilla.kernel.org/show_bug.cgi?id=78771 ).

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in 3.14+ on Hyper-V 2012 R2
@ 2014-07-11  5:52       ` Sitsofe Wheeler
  0 siblings, 0 replies; 13+ messages in thread
From: Sitsofe Wheeler @ 2014-07-11  5:52 UTC (permalink / raw)
  To: Haiyang Zhang; +Cc: devel, netdev, David S. Miller, linux-kernel

On Mon, Jul 07, 2014 at 07:13:41PM +0100, Sitsofe Wheeler wrote:
> On Mon, Jul 07, 2014 at 04:54:20PM +0000, Haiyang Zhang wrote:
> > 
> > > -----Original Message-----
> > > From: Sitsofe Wheeler [mailto:sitsofe@gmail.com]
> > > Sent: Sunday, July 6, 2014 4:18 PM
> > > To: Haiyang Zhang
> > > Cc: KY Srinivasan; David S. Miller; devel@linuxdriverproject.org; linux-
> > > kernel@vger.kernel.org; netdev@vger.kernel.org
> > > Subject: [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in
> > > 3.14+ on Hyper-V 2012 R2
> > > 
> > > With the 3.14 kernel Hyper-V no longer reliably enables its
> > > networking devices in time on cloud images leading to network
> > > devices permanently remaining offline.
> > > 
> <snip>
> > > the system will usually stop rebooting before 20 passes but the most
> > > extreme cases were always less than 100. With a pre
> > > b679ef73edc251f6d200a7dd2396e9fef9e36fc3 kernel it did over 390
> > > passes before I manually stopped it.
> > > 
> > > Originally filed on https://bugzilla.redhat.com/show_bug.cgi?id=1095387
> > > and then on https://bugzilla.kernel.org/show_bug.cgi?id=78771 but without
> > > reply...
> > > 
> > > Might also be related to
> > > http://thread.gmane.org/gmane.linux.kernel/1711873/focus=1733398
> > > (Regression in hyperv network driver in 3.14).
> > 
> > What's the memory size assigned to the Linux guest? And, have you seen
> > any related messages in the dmesg log after this issue?
> 
> (Feel free to trim my emails when replying - it makes it easier to see
> your reply :-)
> 
> I've had as little as 256 MBytes and as much as 4 GBytes (non-dynamic)
> and still seen the issue.
> 
> See https://bugzilla.kernel.org/attachment.cgi?id=142201 for a recent
> dmesg (an older dmesg snippet can be seen on the Red Hat bugzilla).

Oops that should have been
https://bugzilla.kernel.org/attachment.cgi?id=142351 (either way it's
information linked off
https://bugzilla.kernel.org/show_bug.cgi?id=78771 ).

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in 3.14+ on Hyper-V 2012 R2
  2014-07-11  5:52       ` Sitsofe Wheeler
  (?)
@ 2014-07-11 15:25       ` Haiyang Zhang
  2014-07-14 21:30           ` Sitsofe Wheeler
  -1 siblings, 1 reply; 13+ messages in thread
From: Haiyang Zhang @ 2014-07-11 15:25 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: KY Srinivasan, David S. Miller, devel, linux-kernel, netdev



> -----Original Message-----
> From: Sitsofe Wheeler [mailto:sitsofe@gmail.com]
> Sent: Friday, July 11, 2014 1:53 AM
> To: Haiyang Zhang
> Cc: KY Srinivasan; David S. Miller; devel@linuxdriverproject.org; linux-
> kernel@vger.kernel.org; netdev@vger.kernel.org
> Subject: Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers is
> racy in 3.14+ on Hyper-V 2012 R2

> Oops that should have been
> https://bugzilla.kernel.org/attachment.cgi?id=142351 (either way it's
> information linked off
> https://bugzilla.kernel.org/show_bug.cgi?id=78771 ).

Thanks for the dmesg. By looking at it, seems the netvsc driver was loaded properly, and 2 NICs are up, one NIC is down (probably not set to connected in HyperV manager?). Or, this dmesg wasn't the one when bug happens?

[    8.514493] hv_netvsc: hv_netvsc channel opened successfully
[    9.343318] hv_netvsc vmbus_0_14: Send section size: 6144, Section count:170
[    9.345831] hv_netvsc vmbus_0_14: Device MAC 00:15:5d:6f:02:8f link state up

[    9.347101] hv_netvsc: hv_netvsc channel opened successfully
[   10.170308] hv_netvsc vmbus_0_15: Send section size: 6144, Section count:170
[   10.170702] hv_netvsc vmbus_0_15: Device MAC 00:15:5d:6f:02:a5 link state up

[   10.172826] hv_netvsc: hv_netvsc channel opened successfully
[   10.988146] hv_netvsc vmbus_0_16: Send section size: 6144, Section count:170
[   10.989069] hv_netvsc vmbus_0_16: Device MAC 00:15:5d:6f:02:a6 link state down

Since you found the commit b679ef73edc is related to this problem, could you do a simple test:
  Reduce the receive buffer size back to 2MB, like below, then re-test it, see if the problem goes away?
	drivers/net/hyperv/hyperv_net.h
	#define NETVSC_RECEIVE_BUFFER_SIZE		(1024*1024*2)	/* 2MB */

Thanks,
- Haiyang


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in 3.14+ on Hyper-V 2012 R2
  2014-07-11 15:25       ` Haiyang Zhang
@ 2014-07-14 21:30           ` Sitsofe Wheeler
  0 siblings, 0 replies; 13+ messages in thread
From: Sitsofe Wheeler @ 2014-07-14 21:30 UTC (permalink / raw)
  To: Haiyang Zhang; +Cc: KY Srinivasan, David S. Miller, devel, linux-kernel, netdev

On Fri, Jul 11, 2014 at 03:25:11PM +0000, Haiyang Zhang wrote:
> 
> > -----Original Message-----
> > From: Sitsofe Wheeler [mailto:sitsofe@gmail.com]
> > Sent: Friday, July 11, 2014 1:53 AM
> > To: Haiyang Zhang
> > Cc: KY Srinivasan; David S. Miller; devel@linuxdriverproject.org; linux-
> > kernel@vger.kernel.org; netdev@vger.kernel.org
> > Subject: Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers is
> > racy in 3.14+ on Hyper-V 2012 R2
> 
> > Oops that should have been
> > https://bugzilla.kernel.org/attachment.cgi?id=142351 (either way it's
> > information linked off
> > https://bugzilla.kernel.org/show_bug.cgi?id=78771 ).
> 
> Thanks for the dmesg. By looking at it, seems the netvsc driver was
> loaded properly, and 2 NICs are up, one NIC is down (probably not set
> to connected in HyperV manager?). Or, this dmesg wasn't the one when
> bug happens?

This was a dmesg where the bug did happen and your first guess is right
- there are 3 NICs but only the first two are connected and the last is
set to "Not connected" in Hyper-V Manager.

> Since you found the commit b679ef73edc is related to this problem,
> could you do a simple test:
>   Reduce the receive buffer size back to 2MB, like below, then re-test
>   it, see if the problem goes away?
> 	drivers/net/hyperv/hyperv_net.h
> 	#define NETVSC_RECEIVE_BUFFER_SIZE		(1024*1024*2)	/* 2MB */

After doing this I was able to reach over 900 reboots where the network
connected properly.

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in 3.14+ on Hyper-V 2012 R2
@ 2014-07-14 21:30           ` Sitsofe Wheeler
  0 siblings, 0 replies; 13+ messages in thread
From: Sitsofe Wheeler @ 2014-07-14 21:30 UTC (permalink / raw)
  To: Haiyang Zhang; +Cc: devel, netdev, David S. Miller, linux-kernel

On Fri, Jul 11, 2014 at 03:25:11PM +0000, Haiyang Zhang wrote:
> 
> > -----Original Message-----
> > From: Sitsofe Wheeler [mailto:sitsofe@gmail.com]
> > Sent: Friday, July 11, 2014 1:53 AM
> > To: Haiyang Zhang
> > Cc: KY Srinivasan; David S. Miller; devel@linuxdriverproject.org; linux-
> > kernel@vger.kernel.org; netdev@vger.kernel.org
> > Subject: Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers is
> > racy in 3.14+ on Hyper-V 2012 R2
> 
> > Oops that should have been
> > https://bugzilla.kernel.org/attachment.cgi?id=142351 (either way it's
> > information linked off
> > https://bugzilla.kernel.org/show_bug.cgi?id=78771 ).
> 
> Thanks for the dmesg. By looking at it, seems the netvsc driver was
> loaded properly, and 2 NICs are up, one NIC is down (probably not set
> to connected in HyperV manager?). Or, this dmesg wasn't the one when
> bug happens?

This was a dmesg where the bug did happen and your first guess is right
- there are 3 NICs but only the first two are connected and the last is
set to "Not connected" in Hyper-V Manager.

> Since you found the commit b679ef73edc is related to this problem,
> could you do a simple test:
>   Reduce the receive buffer size back to 2MB, like below, then re-test
>   it, see if the problem goes away?
> 	drivers/net/hyperv/hyperv_net.h
> 	#define NETVSC_RECEIVE_BUFFER_SIZE		(1024*1024*2)	/* 2MB */

After doing this I was able to reach over 900 reboots where the network
connected properly.

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in 3.14+ on Hyper-V 2012 R2
  2014-07-14 21:30           ` Sitsofe Wheeler
  (?)
@ 2014-07-14 22:39           ` Haiyang Zhang
  2014-07-15  5:08               ` Sitsofe Wheeler
  -1 siblings, 1 reply; 13+ messages in thread
From: Haiyang Zhang @ 2014-07-14 22:39 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: KY Srinivasan, David S. Miller, devel, linux-kernel, netdev

> -----Original Message-----
> From: Sitsofe Wheeler [mailto:sitsofe@gmail.com]
> Sent: Monday, July 14, 2014 5:31 PM
> To: Haiyang Zhang
> Cc: KY Srinivasan; David S. Miller; devel@linuxdriverproject.org; linux-
> kernel@vger.kernel.org; netdev@vger.kernel.org
> Subject: Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers is
> racy in 3.14+ on Hyper-V 2012 R2
> 
> On Fri, Jul 11, 2014 at 03:25:11PM +0000, Haiyang Zhang wrote:
> > Since you found the commit b679ef73edc is related to this problem,
> > could you do a simple test:
> >   Reduce the receive buffer size back to 2MB, like below, then re-test
> >   it, see if the problem goes away?
> > 	drivers/net/hyperv/hyperv_net.h
> > 	#define NETVSC_RECEIVE_BUFFER_SIZE		(1024*1024*2)	/* 2MB
> */
> 
> After doing this I was able to reach over 900 reboots where the network
> connected properly.

Thanks for the tests! I will make a patch that can automatically retry
smaller memory allocs when memory is insufficient.

Thanks,
- Haiyang


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in 3.14+ on Hyper-V 2012 R2
  2014-07-14 22:39           ` Haiyang Zhang
@ 2014-07-15  5:08               ` Sitsofe Wheeler
  0 siblings, 0 replies; 13+ messages in thread
From: Sitsofe Wheeler @ 2014-07-15  5:08 UTC (permalink / raw)
  To: Haiyang Zhang; +Cc: KY Srinivasan, David S. Miller, devel, linux-kernel, netdev

On Mon, Jul 14, 2014 at 10:39:48PM +0000, Haiyang Zhang wrote:
> > -----Original Message-----
> > From: Sitsofe Wheeler [mailto:sitsofe@gmail.com]
> > Sent: Monday, July 14, 2014 5:31 PM
> > To: Haiyang Zhang
> > Cc: KY Srinivasan; David S. Miller; devel@linuxdriverproject.org; linux-
> > kernel@vger.kernel.org; netdev@vger.kernel.org
> > Subject: Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers is
> > racy in 3.14+ on Hyper-V 2012 R2
> 
> Thanks for the tests! I will make a patch that can automatically retry
> smaller memory allocs when memory is insufficient.

This concerns me a bit - why would there be insufficient memory on a 64
bit VM with 4 GBytes of RAM just after startup (presumably the host's
memory isn't the issue)? Additionally, while things might fail just when
things are starting up, doing ifup eth0 at some point later succeeds so
whatever issue it had seems temporary.

Perhaps it would be wise to adding some debugging output to see if the
allocation really failed and why...

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in 3.14+ on Hyper-V 2012 R2
@ 2014-07-15  5:08               ` Sitsofe Wheeler
  0 siblings, 0 replies; 13+ messages in thread
From: Sitsofe Wheeler @ 2014-07-15  5:08 UTC (permalink / raw)
  To: Haiyang Zhang; +Cc: devel, netdev, David S. Miller, linux-kernel

On Mon, Jul 14, 2014 at 10:39:48PM +0000, Haiyang Zhang wrote:
> > -----Original Message-----
> > From: Sitsofe Wheeler [mailto:sitsofe@gmail.com]
> > Sent: Monday, July 14, 2014 5:31 PM
> > To: Haiyang Zhang
> > Cc: KY Srinivasan; David S. Miller; devel@linuxdriverproject.org; linux-
> > kernel@vger.kernel.org; netdev@vger.kernel.org
> > Subject: Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers is
> > racy in 3.14+ on Hyper-V 2012 R2
> 
> Thanks for the tests! I will make a patch that can automatically retry
> smaller memory allocs when memory is insufficient.

This concerns me a bit - why would there be insufficient memory on a 64
bit VM with 4 GBytes of RAM just after startup (presumably the host's
memory isn't the issue)? Additionally, while things might fail just when
things are starting up, doing ifup eth0 at some point later succeeds so
whatever issue it had seems temporary.

Perhaps it would be wise to adding some debugging output to see if the
allocation really failed and why...

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in 3.14+ on Hyper-V 2012 R2
  2014-07-15  5:08               ` Sitsofe Wheeler
  (?)
@ 2014-07-18 21:30               ` Haiyang Zhang
  -1 siblings, 0 replies; 13+ messages in thread
From: Haiyang Zhang @ 2014-07-18 21:30 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: KY Srinivasan, David S. Miller, devel, linux-kernel, netdev



> -----Original Message-----
> From: Sitsofe Wheeler [mailto:sitsofe@gmail.com]
> Sent: Tuesday, July 15, 2014 1:09 AM
> To: Haiyang Zhang
> Cc: KY Srinivasan; David S. Miller; devel@linuxdriverproject.org; linux-
> kernel@vger.kernel.org; netdev@vger.kernel.org
> Subject: Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers is
> racy in 3.14+ on Hyper-V 2012 R2
> 
> On Mon, Jul 14, 2014 at 10:39:48PM +0000, Haiyang Zhang wrote:
> > > -----Original Message-----
> > > From: Sitsofe Wheeler [mailto:sitsofe@gmail.com]
> > > Sent: Monday, July 14, 2014 5:31 PM
> > > To: Haiyang Zhang
> > > Cc: KY Srinivasan; David S. Miller; devel@linuxdriverproject.org;
> linux-
> > > kernel@vger.kernel.org; netdev@vger.kernel.org
> > > Subject: Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers
> is
> > > racy in 3.14+ on Hyper-V 2012 R2
> >
> > Thanks for the tests! I will make a patch that can automatically retry
> > smaller memory allocs when memory is insufficient.
> 
> This concerns me a bit - why would there be insufficient memory on a 64
> bit VM with 4 GBytes of RAM just after startup (presumably the host's
> memory isn't the issue)? Additionally, while things might fail just when
> things are starting up, doing ifup eth0 at some point later succeeds so
> whatever issue it had seems temporary.
> 
> Perhaps it would be wise to adding some debugging output to see if the
> allocation really failed and why...

Actually, there will be debug log in dmesg if the memory allocation fails. 
But it didn't show up in your dmesg. And since it can be recovered by 
"ifup eth0" later, the NIC must have been properly loaded (buffer alloc was 
successful but took a bit longer time). I think the larger receive-buffer 
size (16MB) may take longer time, because vzalloc() may sleep. And, that's 
why we don't see the bug with a small buffer size, because the allocation 
is quick.

Could you try put "LINKDELAY=60" into the this file?
	/etc/sysconfig/network-scripts/ifcfg-eth0
And see if the problem goes away?

Thanks,
- Haiyang


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-07-18 21:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-06 20:18 [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in 3.14+ on Hyper-V 2012 R2 Sitsofe Wheeler
2014-07-06 20:18 ` Sitsofe Wheeler
2014-07-07 16:54 ` Haiyang Zhang
2014-07-07 18:13   ` Sitsofe Wheeler
2014-07-11  5:52     ` Sitsofe Wheeler
2014-07-11  5:52       ` Sitsofe Wheeler
2014-07-11 15:25       ` Haiyang Zhang
2014-07-14 21:30         ` Sitsofe Wheeler
2014-07-14 21:30           ` Sitsofe Wheeler
2014-07-14 22:39           ` Haiyang Zhang
2014-07-15  5:08             ` Sitsofe Wheeler
2014-07-15  5:08               ` Sitsofe Wheeler
2014-07-18 21:30               ` Haiyang Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.