All of lore.kernel.org
 help / color / mirror / Atom feed
* NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
@ 2015-10-16  1:05 Paul Menzel
  2015-10-16  7:54 ` Paul Menzel
  0 siblings, 1 reply; 22+ messages in thread
From: Paul Menzel @ 2015-10-16  1:05 UTC (permalink / raw)
  To: linux-scsi; +Cc: James E. J. Bottomley

[-- Attachment #1: Type: text/plain, Size: 2470 bytes --]

Dear Linux SCSI folks,


using Debian Sid/unstable with Linux 4.2.3-1 upgrading from systemd
227-1 to 227-2 [1] and other packages, the system doesn’t start up
anymore and the /dev/md1 device doesn’t seem to be found and I am
dropped into shell from initramfs (BusyBox).

Only having wireless LAN and no serial or USB debug capabilities, and
mount a USB storage device did not work, I manually copied the beginning
of the Oops.

```
BUG: unable to handle kernel NULL pointer dereference at 00000014
IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
*pdpt = 000000003696e001 *pde = 000000000000000000
Oops: 0000 [#1] SMB
Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal thermal_sys floppy(+)
CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 4.2.3-1
Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009
task: f68dd040 ti: f6988000 task.ti: f6988000
EIP: 0060:[<fh28a00c>] EFLAGS: 00010246 CPU: 1
EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod]
EAX: 00000000 EBX: f6a30cd8 ECX: f6c03d2c EDX: 00000000
ESI: 00000000 EDI: f828e100 EBP: f6989ba8 ESP: f6989b88
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 00000014 CR3: 3696d780 CR4: 000006f0
Stack:
 af83346c3 00000000 00000001 fffffff5 f6a7d150 f6a30cd8 f6a30d3c 00000000
 f6989bbc c1390cb7 f6a30cd8 f8334660 00000000 f6989bd0 c1390d0f f6a30cd8
 f8334660 00000000 f6989c0c c13916cb f694a614 f68dd040 00000000 00000008
Call Trace:
 […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod]
 […] ? __rpm_callback+0x27/0x60
[…]
```

I tried also to boot with Linux 4.1 and it fails the same way.

Is that a known problem and has been fixed in the mean time? It’d be
great if you helped me getting the system to boot again. Please tell me
if you need more information to debug this issue and I’ll do my best to
get it.


Thanks,

Paul


[1] http://metadata.ftp-master.debian.org/changelogs//main/s/systemd/systemd_227-2_changelog

-- 
GPG-Schlüssel: 33623E9B
Fingerabdruck = 0EB1 649D 4361 D04F 3C70  6F71 4DD7 BF75 3362 3E9B

Giant Monkey Software Engineering GmbH

Brunnenstr. 7D
10119 Berlin Mitte

Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel

USt-IdNr.: DE281524720
HRB 139495 B Amtsgericht Charlottenburg

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2015-10-16  1:05 NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] Paul Menzel
@ 2015-10-16  7:54 ` Paul Menzel
  2015-10-16  8:52   ` Paul Menzel
  2015-10-20  1:39   ` Ben Hutchings
  0 siblings, 2 replies; 22+ messages in thread
From: Paul Menzel @ 2015-10-16  7:54 UTC (permalink / raw)
  To: James E. J. Bottomley, linux-scsi; +Cc: submit

[-- Attachment #1: Type: text/plain, Size: 4268 bytes --]

Package: linux-image-4.2.0-1-686-pae
Version: 4.2.3-2
Severity: important


Dear Linux SCSI folks,


please don’t include the address submit@bugs.debian.org in your reply.


Am Freitag, den 16.10.2015, 03:05 +0200 schrieb Paul Menzel:

> using Debian Sid/unstable with Linux 4.2.3-1 upgrading from systemd
> 227-1 to 227-2 [1] and other packages, the system doesn’t start up
> anymore and the /dev/md1 device doesn’t seem to be found and I am
> dropped into shell from initramfs (BusyBox).
> 
> Only having wireless LAN and no serial or USB debug capabilities, and
> mount a USB storage device did not work, I manually copied the beginning
> of the Oops.
> 
> ```
> BUG: unable to handle kernel NULL pointer dereference at 00000014
> IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
> *pdpt = 000000003696e001 *pde = 000000000000000000
> Oops: 0000 [#1] SMB
> Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal thermal_sys floppy(+)
> CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 4.2.3-1
> Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009
> task: f68dd040 ti: f6988000 task.ti: f6988000
> EIP: 0060:[<fh28a00c>] EFLAGS: 00010246 CPU: 1
> EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod]
> EAX: 00000000 EBX: f6a30cd8 ECX: f6c03d2c EDX: 00000000
> ESI: 00000000 EDI: f828e100 EBP: f6989ba8 ESP: f6989b88
>  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> CR0: 8005003b CR2: 00000014 CR3: 3696d780 CR4: 000006f0
> Stack:
>  af83346c3 00000000 00000001 fffffff5 f6a7d150 f6a30cd8 f6a30d3c 00000000
>  f6989bbc c1390cb7 f6a30cd8 f8334660 00000000 f6989bd0 c1390d0f f6a30cd8
>  f8334660 00000000 f6989c0c c13916cb f694a614 f68dd040 00000000 00000008
> Call Trace:
>  […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod]
>  […] ? __rpm_callback+0x27/0x60
> […]
> ```
> 
> I tried also to boot with Linux 4.1 and it fails the same way.
> 
> Is that a known problem and has been fixed in the mean time? It’d be
> great if you helped me getting the system to boot again. Please tell me
> if you need more information to debug this issue and I’ll do my best to
> get it.

Ben Hutchings asked me to test the patch below to get more debug
information.

```
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 8bd54a6..dd5b5b2 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -144,6 +144,12 @@ static int sr_runtime_suspend(struct device *dev)
 {
 	struct scsi_cd *cd = dev_get_drvdata(dev);
 
+	if (WARN_ON(!cd)) {
+		pr_info("%s: cd == NULL; power.usage_count = %d\n",
+			__func__, atomic_read(&dev->power.usage_count));
+		return 0;
+	}
+
 	if (cd->media_present)
 		return -EBUSY;
 	else
@@ -652,7 +658,13 @@ static int sr_probe(struct device *dev)
 	struct scsi_cd *cd;
 	int minor, error;
 
-	scsi_autopm_get_device(sdev);
+	error = scsi_autopm_get_device(sdev);
+	if (error) {
+		pr_err("%s: scsi_autopm_get_device returned %d\n",
+		       __func__, error);
+		return error;
+	}
+
 	error = -ENODEV;
 	if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM)
 		goto fail;
@@ -719,6 +731,9 @@ static int sr_probe(struct device *dev)
 	if (register_cdrom(&cd->cdi))
 		goto fail_put;
 
+	pr_info("%s: power.usage_count = %d\n",
+		__func__, atomic_read(&dev->power.usage_count));
+
 	/*
 	 * Initialize block layer runtime PM stuffs before the
 	 * periodic event checking request gets started in add_disk.
```

I’ll try that as soon as a spare drive has arrived, where I can copy the
data to as a backup.

More thoughts are welcome! Especially, if that error suggests a failing
drive or not.


Thanks,

Paul


> [1] http://metadata.ftp-master.debian.org/changelogs//main/s/systemd/systemd_227-2_changelog
-- 
GPG-Schlüssel: 33623E9B
Fingerabdruck = 0EB1 649D 4361 D04F 3C70  6F71 4DD7 BF75 3362 3E9B

Giant Monkey Software Engineering GmbH

Brunnenstr. 7D
10119 Berlin Mitte

Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel

USt-IdNr.: DE281524720
HRB 139495 B Amtsgericht Charlottenburg

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2015-10-16  7:54 ` Paul Menzel
@ 2015-10-16  8:52   ` Paul Menzel
  2015-10-20  1:39   ` Ben Hutchings
  1 sibling, 0 replies; 22+ messages in thread
From: Paul Menzel @ 2015-10-16  8:52 UTC (permalink / raw)
  To: linux-scsi; +Cc: James E. J. Bottomley, 801925

[-- Attachment #1: Type: text/plain, Size: 4735 bytes --]

Dear Linux SCSI folks,


Am Freitag, den 16.10.2015, 09:54 +0200 schrieb Paul Menzel:
> Package: linux-image-4.2.0-1-686-pae
> Version: 4.2.3-2
> Severity: important

> please don’t include the address submit@bugs.debian.org in your reply.

this issue is now also tracked in the Debian Bug Tracking System [2] and
has the number #801925 [3]. Please keep that address in CC.

> Am Freitag, den 16.10.2015, 03:05 +0200 schrieb Paul Menzel:
> 
> > using Debian Sid/unstable with Linux 4.2.3-1 upgrading from systemd
> > 227-1 to 227-2 [1] and other packages, the system doesn’t start up
> > anymore and the /dev/md1 device doesn’t seem to be found and I am
> > dropped into shell from initramfs (BusyBox).
> > 
> > Only having wireless LAN and no serial or USB debug capabilities, and
> > mount a USB storage device did not work, I manually copied the beginning
> > of the Oops.
> > 
> > ```
> > BUG: unable to handle kernel NULL pointer dereference at 00000014
> > IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
> > *pdpt = 000000003696e001 *pde = 000000000000000000
> > Oops: 0000 [#1] SMB
> > Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal thermal_sys floppy(+)
> > CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 4.2.3-1
> > Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009
> > task: f68dd040 ti: f6988000 task.ti: f6988000
> > EIP: 0060:[<fh28a00c>] EFLAGS: 00010246 CPU: 1
> > EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod]
> > EAX: 00000000 EBX: f6a30cd8 ECX: f6c03d2c EDX: 00000000
> > ESI: 00000000 EDI: f828e100 EBP: f6989ba8 ESP: f6989b88
> >  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > CR0: 8005003b CR2: 00000014 CR3: 3696d780 CR4: 000006f0
> > Stack:
> >  af83346c3 00000000 00000001 fffffff5 f6a7d150 f6a30cd8 f6a30d3c 00000000
> >  f6989bbc c1390cb7 f6a30cd8 f8334660 00000000 f6989bd0 c1390d0f f6a30cd8
> >  f8334660 00000000 f6989c0c c13916cb f694a614 f68dd040 00000000 00000008
> > Call Trace:
> >  […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod]
> >  […] ? __rpm_callback+0x27/0x60
> > […]
> > ```
> > 
> > I tried also to boot with Linux 4.1 and it fails the same way.
> > 
> > Is that a known problem and has been fixed in the mean time? It’d be
> > great if you helped me getting the system to boot again. Please tell me
> > if you need more information to debug this issue and I’ll do my best to
> > get it.
> 
> Ben Hutchings asked me to test the patch below to get more debug
> information.
> 
> ```
> diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
> index 8bd54a6..dd5b5b2 100644
> --- a/drivers/scsi/sr.c
> +++ b/drivers/scsi/sr.c
> @@ -144,6 +144,12 @@ static int sr_runtime_suspend(struct device *dev)
>  {
>  	struct scsi_cd *cd = dev_get_drvdata(dev);
>  
> +	if (WARN_ON(!cd)) {
> +		pr_info("%s: cd == NULL; power.usage_count = %d\n",
> +			__func__, atomic_read(&dev->power.usage_count));
> +		return 0;
> +	}
> +
>  	if (cd->media_present)
>  		return -EBUSY;
>  	else
> @@ -652,7 +658,13 @@ static int sr_probe(struct device *dev)
>  	struct scsi_cd *cd;
>  	int minor, error;
>  
> -	scsi_autopm_get_device(sdev);
> +	error = scsi_autopm_get_device(sdev);
> +	if (error) {
> +		pr_err("%s: scsi_autopm_get_device returned %d\n",
> +		       __func__, error);
> +		return error;
> +	}
> +
>  	error = -ENODEV;
>  	if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM)
>  		goto fail;
> @@ -719,6 +731,9 @@ static int sr_probe(struct device *dev)
>  	if (register_cdrom(&cd->cdi))
>  		goto fail_put;
>  
> +	pr_info("%s: power.usage_count = %d\n",
> +		__func__, atomic_read(&dev->power.usage_count));
> +
>  	/*
>  	 * Initialize block layer runtime PM stuffs before the
>  	 * periodic event checking request gets started in add_disk.
> ```
> 
> I’ll try that as soon as a spare drive has arrived, where I can copy the
> data to as a backup.
> 
> More thoughts are welcome! Especially, if that error suggests a failing
> drive or not.


Thanks,

Paul


> > [1] http://metadata.ftp-master.debian.org/changelogs//main/s/systemd/systemd_227-2_changelog
[2] https://www.debian.org/Bugs/
[3] https://bugs.debian.org/801925
-- 
GPG-Schlüssel: 33623E9B
Fingerabdruck = 0EB1 649D 4361 D04F 3C70  6F71 4DD7 BF75 3362 3E9B

Giant Monkey Software Engineering GmbH

Brunnenstr. 7D
10119 Berlin Mitte

Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel

USt-IdNr.: DE281524720
HRB 139495 B Amtsgericht Charlottenburg

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2015-10-16  7:54 ` Paul Menzel
  2015-10-16  8:52   ` Paul Menzel
@ 2015-10-20  1:39   ` Ben Hutchings
  2015-10-31  9:39     ` Paul Menzel
  1 sibling, 1 reply; 22+ messages in thread
From: Ben Hutchings @ 2015-10-20  1:39 UTC (permalink / raw)
  To: Paul Menzel, James E. J. Bottomley, linux-scsi; +Cc: submit

[-- Attachment #1: Type: text/plain, Size: 2027 bytes --]

On Fri, 2015-10-16 at 09:54 +0200, Paul Menzel wrote:
[...]
> > BUG: unable to handle kernel NULL pointer dereference at 00000014
> > IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]
> > *pdpt = 000000003696e001 *pde = 000000000000000000
> > Oops: 0000 [#1] SMB
> > Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal thermal_sys floppy(+)
> > CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 4.2.3-1
> > Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009
> > task: f68dd040 ti: f6988000 task.ti: f6988000
> > EIP: 0060:[] EFLAGS: 00010246 CPU: 1
> > EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod]
> > EAX: 00000000 EBX: f6a30cd8 ECX: f6c03d2c EDX: 00000000
> > ESI: 00000000 EDI: f828e100 EBP: f6989ba8 ESP: f6989b88
> >  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > CR0: 8005003b CR2: 00000014 CR3: 3696d780 CR4: 000006f0
> > Stack:
> >  af83346c3 00000000 00000001 fffffff5 f6a7d150 f6a30cd8 f6a30d3c 00000000
> >  f6989bbc c1390cb7 f6a30cd8 f8334660 00000000 f6989bd0 c1390d0f f6a30cd8
> >  f8334660 00000000 f6989c0c c13916cb f694a614 f68dd040 00000000 00000008
> > Call Trace:
> >  […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod]
> >  […] ? __rpm_callback+0x27/0x60
> > […]
[...]
> Ben Hutchings asked me to test the patch below to get more debug
> information.
[...]

Well, that didn't help much.  Paul hit another oops, this time in
sd_mod but again apparently related to runtime PM.  My patch only
touched sr_mod.

This time he sent photos of the complete oops; see
<https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801925;filename=20151020_005.jpg;att=4;msg=15>
and
<https://bugs.debian.org/cgi-bin/bugreport.cgi?filename=20151020_006.jpg;bug=801925;att=3;msg=15>

Ben.

-- 
Ben Hutchings
The first rule of tautology club is the first rule of tautology club.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2015-10-20  1:39   ` Ben Hutchings
@ 2015-10-31  9:39     ` Paul Menzel
  2015-11-01  1:56       ` Alan Stern
  2015-11-01  2:05       ` Alan Stern
  0 siblings, 2 replies; 22+ messages in thread
From: Paul Menzel @ 2015-10-31  9:39 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: James E. J. Bottomley, AlanStern, linux-scsi, 801925

[-- Attachment #1: Type: text/plain, Size: 2992 bytes --]

Control: notfound -1 3.19-1~exp1
Control: found -1 4.2.5-1


Am Dienstag, den 20.10.2015, 02:39 +0100 schrieb Ben Hutchings:
> On Fri, 2015-10-16 at 09:54 +0200, Paul Menzel wrote:
> [...]
> > > BUG: unable to handle kernel NULL pointer dereference at 00000014
> > > IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]
> > > *pdpt = 000000003696e001 *pde = 000000000000000000
> > > Oops: 0000 [#1] SMB
> > > Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal thermal_sys floppy(+)
> > > CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 4.2.3-1
> > > Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009
> > > task: f68dd040 ti: f6988000 task.ti: f6988000
> > > EIP: 0060:[] EFLAGS: 00010246 CPU: 1
> > > EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod]
> > > EAX: 00000000 EBX: f6a30cd8 ECX: f6c03d2c EDX: 00000000
> > > ESI: 00000000 EDI: f828e100 EBP: f6989ba8 ESP: f6989b88
> > >  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > > CR0: 8005003b CR2: 00000014 CR3: 3696d780 CR4: 000006f0
> > > Stack:
> > >  af83346c3 00000000 00000001 fffffff5 f6a7d150 f6a30cd8 f6a30d3c 00000000
> > >  f6989bbc c1390cb7 f6a30cd8 f8334660 00000000 f6989bd0 c1390d0f f6a30cd8
> > >  f8334660 00000000 f6989c0c c13916cb f694a614 f68dd040 00000000 00000008
> > > Call Trace:
> > >  […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod]
> > >  […] ? __rpm_callback+0x27/0x60
> > > […]
> [...]
> > Ben Hutchings asked me to test the patch below to get more debug
> > information.
> [...]
> 
> Well, that didn't help much.  Paul hit another oops, this time in
> sd_mod but again apparently related to runtime PM.  My patch only
> touched sr_mod.
> 
> This time he sent photos of the complete oops; see
> <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801925;filename=20151020_005.jpg;att=4;msg=15>
> and
> <https://bugs.debian.org/cgi-bin/bugreport.cgi?filename=20151020_006.jpg;bug=801925;att=3;msg=15>

after backing up my data, I tested a little bit more, and using Linux
3.19 the drive is detected and the system boots.

Does anything stand out what changed in this area between Linux 3.19 and
4.1?


Thanks

Paul
-- 
go~mus | Besuchermanagement

▶ 18. – 20. November 2015 // Messe Köln – Stand D054

Besuchen Sie uns auf der EXPONATEC und lernen Sie die Software für
Besuchermanagement kennen, die von führenden Museumsverbänden in Europa
eingesetzt wird.

Mehr Infos über go~mus finden Sie unter https://www.gomus.de

~

GPG-Schlüssel: 33623E9B
Fingerabdruck = 0EB1 649D 4361 D04F 3C70  6F71 4DD7 BF75 3362 3E9B

Giant Monkey Software Engineering GmbH

Brunnenstr. 7D
10119 Berlin Mitte

Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel

USt-IdNr.: DE281524720
HRB 139495 B Amtsgericht Charlottenburg

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2015-10-31  9:39     ` Paul Menzel
@ 2015-11-01  1:56       ` Alan Stern
  2015-11-01  2:05       ` Alan Stern
  1 sibling, 0 replies; 22+ messages in thread
From: Alan Stern @ 2015-11-01  1:56 UTC (permalink / raw)
  To: Paul Menzel; +Cc: Ben Hutchings, James E. J. Bottomley, linux-scsi, 801925

On Sat, 31 Oct 2015, Paul Menzel wrote:

> > Well, that didn't help much.  Paul hit another oops, this time in
> > sd_mod but again apparently related to runtime PM.  My patch only
> > touched sr_mod.
> > 
> > This time he sent photos of the complete oops; see
> > <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801925;filename=20151020_005.jpg;att=4;msg=15>
> > and
> > <https://bugs.debian.org/cgi-bin/bugreport.cgi?filename=20151020_006.jpg;bug=801925;att=3;msg=15>
> 
> after backing up my data, I tested a little bit more, and using Linux
> 3.19 the drive is detected and the system boots.
> 
> Does anything stand out what changed in this area between Linux 3.19 and
> 4.1?

I believe the problem shown in that photo was fixed by commit
49718f0fb8c9 ("SCSI: Fix NULL pointer dereference in runtime PM"),
which was merged in 4.2 and has been back-ported to various stable 
releases.

Alan Stern


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2015-10-31  9:39     ` Paul Menzel
  2015-11-01  1:56       ` Alan Stern
@ 2015-11-01  2:05       ` Alan Stern
  2016-01-09 15:23         ` Paul Menzel
  1 sibling, 1 reply; 22+ messages in thread
From: Alan Stern @ 2015-11-01  2:05 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Ben Hutchings, James E. J. Bottomley, SCSI development list, 801925

On Sat, 31 Oct 2015, Alan Stern wrote:

> I believe the problem shown in that photo was fixed by commit
> 49718f0fb8c9 ("SCSI: Fix NULL pointer dereference in runtime PM"),
> which was merged in 4.2 and has been back-ported to various stable 
> releases.

On second thought, it seems more likely that this issue probably was
_caused_ by that commit.  The fix can be found in these two emails:

	http://marc.info/?l=linux-scsi&m=144185206825609&w=2
	http://marc.info/?l=linux-scsi&m=144185208525611&w=2

which have not been merged yet as far as I know even though they were
submitted back in September.

Alan Stern


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2015-11-01  2:05       ` Alan Stern
@ 2016-01-09 15:23         ` Paul Menzel
  2016-01-09 16:36           ` Alan Stern
  0 siblings, 1 reply; 22+ messages in thread
From: Paul Menzel @ 2016-01-09 15:23 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ben Hutchings, James E. J. Bottomley, SCSI development list,
	801925, Erich Schubert, Alexandre Rossi

[-- Attachment #1: Type: text/plain, Size: 1446 bytes --]

Version: 4.4~rc8-1~exp1

Dear Alan,


Thank you for your help!

There were some follow-ups to the bug report [1], but I think you and I
were not in CC.

Am Samstag, den 31.10.2015, 22:05 -0400 schrieb Alan Stern:
> On Sat, 31 Oct 2015, Alan Stern wrote:
> 
> > I believe the problem shown in that photo was fixed by commit
> > 49718f0fb8c9 ("SCSI: Fix NULL pointer dereference in runtime PM"),
> > which was merged in 4.2 and has been back-ported to various stable 
> > releases.
> 
> On second thought, it seems more likely that this issue probably was
> _caused_ by that commit.  The fix can be found in these two emails:
> 
> 	http://marc.info/?l=linux-scsi&m=144185206825609&w=2
> 	http://marc.info/?l=linux-scsi&m=144185208525611&w=2
> 
> which have not been merged yet as far as I know even though they were
> submitted back in September.

I can only say, that I am still unable to boot my system with Linux
4.4-rc8 [2]. Are these patches included there?


Thanks,

Paul


[1] https://bugs.debian.org/801925
[2] https://packages.debian.org/experimental/linux-image-4.4.0-rc8-686-pae-dbg
-- 
GPG-Schlüssel: 33623E9B
Fingerabdruck = 0EB1 649D 4361 D04F 3C70  6F71 4DD7 BF75 3362 3E9B

Giant Monkey Software Engineering GmbH

Brunnenstr. 7D
10119 Berlin Mitte

Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel

USt-IdNr.: DE281524720
HRB 139495 B Amtsgericht Charlottenburg

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2016-01-09 15:23         ` Paul Menzel
@ 2016-01-09 16:36           ` Alan Stern
  2016-01-10 11:44             ` Erich Schubert
  0 siblings, 1 reply; 22+ messages in thread
From: Alan Stern @ 2016-01-09 16:36 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Ben Hutchings, James E. J. Bottomley, SCSI development list,
	801925, Erich Schubert, Alexandre Rossi

On Sat, 9 Jan 2016, Paul Menzel wrote:

> Version: 4.4~rc8-1~exp1
> 
> Dear Alan,
> 
> 
> Thank you for your help!
> 
> There were some follow-ups to the bug report [1], but I think you and I
> were not in CC.

I wasn't.

> > 	http://marc.info/?l=linux-scsi&m=144185206825609&w=2
> > 	http://marc.info/?l=linux-scsi&m=144185208525611&w=2

> I can only say, that I am still unable to boot my system with Linux
> 4.4-rc8 [2]. Are these patches included there?

They are.  I don't see how they could cause a NULL pointer dereference 
in sd_resume(), though.  If you revert them, does the problem go away?

Also, can you add some debugging statements to sd_resume() so we can 
see where the NULL pointer comes from?

Alan Stern


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2016-01-09 16:36           ` Alan Stern
@ 2016-01-10 11:44             ` Erich Schubert
  2016-01-10 15:32               ` Alan Stern
  0 siblings, 1 reply; 22+ messages in thread
From: Erich Schubert @ 2016-01-10 11:44 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul Menzel, Ben Hutchings, James E. J. Bottomley,
	SCSI development list, 801925, Alexandre Rossi

Hi all,
4.4-rc8 does not fix the problem for me.
Anything beyond 4.1.0 remains unable to boot this computer.

Unfortunately, because the error occurs during early early SCSI
initialization, I do not have easy access to the log - no disk, no
network.
It happens during SATA initialization: "scsi_runtime_resume".
So my back trace looks different than Alex in
https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=42;filename=scsi-null-pointer-dereference.log;bug=801925;att=1
but like the one Paul is seeing:
https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=15;filename=20151020_006.jpg;bug=801925;att=3
I will try to do a photo next time, too.

Here is some dmesg output from a successful boot on 4.1.0:
Note there are some ACPI Errors there (but probably not related).
---
ahci 0000:00:1f.2: version 3.0
ahci 0000:00:1f.2: SSS flag set, parallel bus scan disabled
ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 3 Gbps 0x1 impl SATA mode
ahci 0000:00:1f.2: flags: 64bit ncq sntf stag pm led clo pio slum part ems apst
scsi host0: ahci
scsi host1: ahci
scsi host2: ahci
scsi host3: ahci
scsi host4: ahci
scsi host5: ahci
ata1: SATA max UDMA/133 abar m2048@0xc0728000 port 0xc0728100 irq 30
ata2: DUMMY
ata3: DUMMY
ata4: DUMMY
ata5: DUMMY
ata6: DUMMY
usb 3-1: new high-speed USB device number 2 using ehci-pci
usb 4-1: new high-speed USB device number 2 using ehci-pci
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20150410/psargs-359)
ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.PRT0._SDD]
(Node ffff8802458b1608), AE_NOT_FOUND (20150410/psparse-536)
ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20150410/psargs-359)
ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.PRT0._GTF]
(Node ffff8802458b15e0), AE_NOT_FOUND (20150410/psparse-536)
ata1.00: ATA-8: TOSHIBA THNSNS256GMCP, TA2ABBF0, max UDMA/133
ata1.00: 500118192 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20150410/psargs-359)
ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.PRT0._SDD]
(Node ffff8802458b1608), AE_NOT_FOUND (20150410/psparse-536)
ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20150410/psargs-359)
ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.PRT0._GTF]
(Node ffff8802458b15e0), AE_NOT_FOUND (20150410/psparse-536)
ata1.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access     ATA      TOSHIBA THNSNS25 BBF0 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 500118192 512-byte logical blocks: (256 GB/238 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 >
sd 0:0:0:0: [sda] Attached SCSI disk
PM: Starting manual resume from disk
PM: Hibernation image partition 8:6 present
PM: Looking for hibernation image.
PM: Image not found (code -22)
PM: Hibernation image not present or could not be loaded.
---

On Sat, Jan 9, 2016 at 5:36 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Sat, 9 Jan 2016, Paul Menzel wrote:
>
>> Version: 4.4~rc8-1~exp1
>>
>> Dear Alan,
>>
>>
>> Thank you for your help!
>>
>> There were some follow-ups to the bug report [1], but I think you and I
>> were not in CC.
>
> I wasn't.
>
>> >     http://marc.info/?l=linux-scsi&m=144185206825609&w=2
>> >     http://marc.info/?l=linux-scsi&m=144185208525611&w=2
>
>> I can only say, that I am still unable to boot my system with Linux
>> 4.4-rc8 [2]. Are these patches included there?
>
> They are.  I don't see how they could cause a NULL pointer dereference
> in sd_resume(), though.  If you revert them, does the problem go away?
>
> Also, can you add some debugging statements to sd_resume() so we can
> see where the NULL pointer comes from?
>
> Alan Stern
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2016-01-10 11:44             ` Erich Schubert
@ 2016-01-10 15:32               ` Alan Stern
  0 siblings, 0 replies; 22+ messages in thread
From: Alan Stern @ 2016-01-10 15:32 UTC (permalink / raw)
  To: Erich Schubert
  Cc: Paul Menzel, Ben Hutchings, SCSI development list, 801925,
	Alexandre Rossi

On Sun, 10 Jan 2016, Erich Schubert wrote:

> Hi all,
> 4.4-rc8 does not fix the problem for me.
> Anything beyond 4.1.0 remains unable to boot this computer.
> 
> Unfortunately, because the error occurs during early early SCSI
> initialization, I do not have easy access to the log - no disk, no
> network.
> It happens during SATA initialization: "scsi_runtime_resume".

You didn't include any debugging information.  However...

> So my back trace looks different than Alex in
> https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=42;filename=scsi-null-pointer-dereference.log;bug=801925;att=1
> but like the one Paul is seeing:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=15;filename=20151020_006.jpg;bug=801925;att=3

The information in that bug report says that the failure happens in
sr_runtime_resume, not in scsi_runtime_resume.  Compare with the
Subject: line in this email thread.

> I will try to do a photo next time, too.

If I send you a patch, can you build and test it?

Alan Stern


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2016-02-22 23:15               ` Alexandre Rossi
@ 2016-02-23 15:14                 ` Alan Stern
  0 siblings, 0 replies; 22+ messages in thread
From: Alan Stern @ 2016-02-23 15:14 UTC (permalink / raw)
  To: Alexandre Rossi
  Cc: Paul Menzel, Erich Schubert, Ben Hutchings,
	SCSI development list, 801925

On Tue, 23 Feb 2016, Alexandre Rossi wrote:

> Okay now I've tried with 4.4. The oops does not occur. So this is
> fixed for me in 4.4.
> 
> If there is interest in backporting to 4.3, 13b438914341 ("SCSI: fix
> crashes in sd and sr runtime PM") is not enough to backport. Something
> in 4.4, most probably 4fd41a8552af ("SCSI: Fix NULL pointer
> dereference in runtime PM") is also needed.

Although that commit isn't in 4.3.x yet, it should be added soon.  
Maybe in the next release.

Alan Stern


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2016-02-18 16:27             ` Alan Stern
@ 2016-02-22 23:15               ` Alexandre Rossi
  2016-02-23 15:14                 ` Alan Stern
  0 siblings, 1 reply; 22+ messages in thread
From: Alexandre Rossi @ 2016-02-22 23:15 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul Menzel, Erich Schubert, Ben Hutchings,
	SCSI development list, 801925

Hello,

>> > As this is Linux 4.3 and not 4.4, I guess this is a different problem
>> > though. Alexandre, where you able to capture the stack trace? I’d submit
>> > a new bug report with this.
>>
>> Here is a photo. Please ping me if you need to test some debugging patches.
>
> It looks like the problem occurs in blk_post_runtime_resume().  Since
> there have been recent changes to this routine, it's hard to tell
> whether you're using the most up-to-date code.
>
> In particular, the first few lines of blk_post_runtime_resume() in
> block/blk-core.c should look like this:
>
> void blk_post_runtime_resume(struct request_queue *q, int err)
> {
>         if (!q->dev)
>                 return;
>
> The test was introduced by commit 4fd41a8552af ("SCSI: Fix NULL pointer
> dereference in runtime PM"), which was added to the mainline kernel
> between 4.3 and 4.4.  I don't know what the commit ID would be for a
> .stable kernel.

Okay now I've tried with 4.4. The oops does not occur. So this is
fixed for me in 4.4.

If there is interest in backporting to 4.3, 13b438914341 ("SCSI: fix
crashes in sd and sr runtime PM") is not enough to backport. Something
in 4.4, most probably 4fd41a8552af ("SCSI: Fix NULL pointer
dereference in runtime PM") is also needed.

Thanks a lot,

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2016-02-09 19:56           ` Alexandre Rossi
  2016-02-09 20:51             ` Ben Hutchings
@ 2016-02-18 16:27             ` Alan Stern
  2016-02-22 23:15               ` Alexandre Rossi
  1 sibling, 1 reply; 22+ messages in thread
From: Alan Stern @ 2016-02-18 16:27 UTC (permalink / raw)
  To: Alexandre Rossi
  Cc: Paul Menzel, Erich Schubert, Ben Hutchings,
	SCSI development list, 801925

On Tue, 9 Feb 2016, Alexandre Rossi wrote:

> Hi,
> 
> netconsole does not seem to work so early in the boot process this time.
> 
> > As this is Linux 4.3 and not 4.4, I guess this is a different problem
> > though. Alexandre, where you able to capture the stack trace? I’d submit
> > a new bug report with this.
> 
> Here is a photo. Please ping me if you need to test some debugging patches.

It looks like the problem occurs in blk_post_runtime_resume().  Since 
there have been recent changes to this routine, it's hard to tell 
whether you're using the most up-to-date code.

In particular, the first few lines of blk_post_runtime_resume() in 
block/blk-core.c should look like this:

void blk_post_runtime_resume(struct request_queue *q, int err)
{
	if (!q->dev)
		return;

The test was introduced by commit 4fd41a8552af ("SCSI: Fix NULL pointer
dereference in runtime PM"), which was added to the mainline kernel
between 4.3 and 4.4.  I don't know what the commit ID would be for a
.stable kernel.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2016-02-09 19:56           ` Alexandre Rossi
@ 2016-02-09 20:51             ` Ben Hutchings
  2016-02-18 16:27             ` Alan Stern
  1 sibling, 0 replies; 22+ messages in thread
From: Ben Hutchings @ 2016-02-09 20:51 UTC (permalink / raw)
  To: Alexandre Rossi, Paul Menzel
  Cc: Alan Stern, Erich Schubert, SCSI development list, 801925

[-- Attachment #1: Type: text/plain, Size: 746 bytes --]

On Tue, 2016-02-09 at 20:56 +0100, Alexandre Rossi wrote:
> Hi,
> 
> netconsole does not seem to work so early in the boot process this time.
> 
> > As this is Linux 4.3 and not 4.4, I guess this is a different problem
> > though. Alexandre, where you able to capture the stack trace? I’d submit
> > a new bug report with this.
> 
> Here is a photo. Please ping me if you need to test some debugging patches.

I'm pretty sure this crash is fixed by commit 4fd41a8552af ("SCSI: Fix NULL
pointer dereference in runtime PM"), which I've now queued up for 4.3
(though it's already in 4.4 which I'll probably upload to unstable soon).

Ben.

-- 
Ben Hutchings
Design a system any fool can use, and only a fool will want to use it.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2016-02-09 16:47         ` Paul Menzel
@ 2016-02-09 19:56           ` Alexandre Rossi
  2016-02-09 20:51             ` Ben Hutchings
  2016-02-18 16:27             ` Alan Stern
  0 siblings, 2 replies; 22+ messages in thread
From: Alexandre Rossi @ 2016-02-09 19:56 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Alan Stern, Erich Schubert, Ben Hutchings, SCSI development list, 801925

[-- Attachment #1: Type: text/plain, Size: 352 bytes --]

Hi,

netconsole does not seem to work so early in the boot process this time.

> As this is Linux 4.3 and not 4.4, I guess this is a different problem
> though. Alexandre, where you able to capture the stack trace? I’d submit
> a new bug report with this.

Here is a photo. Please ping me if you need to test some debugging patches.

Alex

[-- Attachment #2: null-pointer-dereference-blk_post_runtime_resume.jpeg --]
[-- Type: image/jpeg, Size: 160584 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2016-01-20 22:07       ` Alexandre Rossi
@ 2016-02-09 16:47         ` Paul Menzel
  2016-02-09 19:56           ` Alexandre Rossi
  0 siblings, 1 reply; 22+ messages in thread
From: Paul Menzel @ 2016-02-09 16:47 UTC (permalink / raw)
  To: Alexandre Rossi
  Cc: Alan Stern, Erich Schubert, Ben Hutchings, SCSI development list, 801925

[-- Attachment #1: Type: text/plain, Size: 1707 bytes --]

Dear Debian and Linux folks,


Am Mittwoch, den 20.01.2016, 23:07 +0100 schrieb Alexandre Rossi:

> >> Could you please attach the debugging patch. Hopefully Alexandre, Erich,
> >> or I will have some spare time to build an image from it.
> >
> > Actually, this patch is an attempt at a fix.  After looking more
> > carefully at your log pictures, I realized what the problem must be.
> >
> > It's too bad nobody was able to capture a log where the error
> > occurred in sr_runtime_suspend, though -- all the logs in the bug
> > report show sd_runtime_resume.
> 
> I just tested the patch applied on top of 4.3.3 (4.3.3-6 in Debian).
> 
> It still crashes at boot, but the stacktrace is different : it happens
> in blk_post_runtime_resume . Maybe I'm bit by a different bug or maybe
> the I need to try with 4.4.
> 
> I'll post the captured log when I have access to a wired network. I'd
> be happy to provide the logs of a debugging patch.

I tried Linux 4.3.5-1 [1], which entered Debian Sid/unstable yesterday,
and I get the same null pointer dereference as Alexandre.

As this is Linux 4.3 and not 4.4, I guess this is a different problem
though. Alexandre, where you able to capture the stack trace? I’d submit
a new bug report with this.


Thanks,

Paul


[1] http://metadata.ftp-master.debian.org/changelogs/main/l/linux/linux_4.3.5-1_changelog
-- 
GPG-Schlüssel: 33623E9B
Fingerabdruck = 0EB1 649D 4361 D04F 3C70  6F71 4DD7 BF75 3362 3E9B

Giant Monkey Software Engineering GmbH

Brunnenstr. 7D
10119 Berlin Mitte

Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel

USt-IdNr.: DE281524720
HRB 139495 B Amtsgericht Charlottenburg

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2016-01-19 21:08     ` Alan Stern
  2016-01-19 23:20       ` Paul Menzel
@ 2016-01-20 22:07       ` Alexandre Rossi
  2016-02-09 16:47         ` Paul Menzel
  1 sibling, 1 reply; 22+ messages in thread
From: Alexandre Rossi @ 2016-01-20 22:07 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul Menzel, Erich Schubert, Ben Hutchings,
	SCSI development list, 801925

Hi,

>> Could you please attach the debugging patch. Hopefully Alexandre, Erich,
>> or I will have some spare time to build an image from it.
>
> Actually, this patch is an attempt at a fix.  After looking more
> carefully at your log pictures, I realized what the problem must be.
>
> It's too bad nobody was able to capture a log where the error
> occurred in sr_runtime_suspend, though -- all the logs in the bug
> report show sd_runtime_resume.

I just tested the patch applied on top of 4.3.3 (4.3.3-6 in Debian).

It still crashes at boot, but the stacktrace is different : it happens
in blk_post_runtime_resume . Maybe I'm bit by a different bug or maybe
the I need to try with 4.4.

I'll post the captured log when I have access to a wired network. I'd
be happy to provide the logs of a debugging patch.

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2016-01-19 21:08     ` Alan Stern
@ 2016-01-19 23:20       ` Paul Menzel
  2016-01-20 22:07       ` Alexandre Rossi
  1 sibling, 0 replies; 22+ messages in thread
From: Paul Menzel @ 2016-01-19 23:20 UTC (permalink / raw)
  To: Alan Stern
  Cc: Erich Schubert, Ben Hutchings, SCSI development list, 801925,
	Alexandre Rossi

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]

Dear Alan,


Am Dienstag, den 19.01.2016, 16:08 -0500 schrieb Alan Stern:
> On Tue, 19 Jan 2016, Paul Menzel wrote:
>
> > Could you please attach the debugging patch. Hopefully Alexandre, Erich,
> > or I will have some spare time to build an image from it.
> 
> Actually, this patch is an attempt at a fix.  After looking more 
> carefully at your log pictures, I realized what the problem must be.  

that indeed fixed it for me. I applied your patch on
linux-image-4.4.0-rc8-686 [1] and was able to get to the LUKS passphrase
dialog. Awesome! Thank you very, very much!

[…]


Thanks,

Paul


[1] https://packages.debian.org/experimental/linux-image-4.4.0-rc8-686
-- 
GPG-Schlüssel: 33623E9B
Fingerabdruck = 0EB1 649D 4361 D04F 3C70  6F71 4DD7 BF75 3362 3E9B

Giant Monkey Software Engineering GmbH

Brunnenstr. 7D
10119 Berlin Mitte

Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel

USt-IdNr.: DE281524720
HRB 139495 B Amtsgericht Charlottenburg

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2016-01-19 16:52   ` Paul Menzel
@ 2016-01-19 21:08     ` Alan Stern
  2016-01-19 23:20       ` Paul Menzel
  2016-01-20 22:07       ` Alexandre Rossi
  0 siblings, 2 replies; 22+ messages in thread
From: Alan Stern @ 2016-01-19 21:08 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Erich Schubert, Ben Hutchings, SCSI development list, 801925,
	Alexandre Rossi

[-- Attachment #1: Type: TEXT/PLAIN, Size: 567 bytes --]

On Tue, 19 Jan 2016, Paul Menzel wrote:

> Could you please attach the debugging patch. Hopefully Alexandre, Erich,
> or I will have some spare time to build an image from it.

Actually, this patch is an attempt at a fix.  After looking more 
carefully at your log pictures, I realized what the problem must be.  

It's too bad nobody was able to capture a log where the error 
occurred in sr_runtime_suspend, though -- all the logs in the bug 
report show sd_runtime_resume.

> Alan, thank you a lot for being so responsive and helpful!

You're welcome.

Alan Stern

[-- Attachment #2: Type: TEXT/PLAIN, Size: 1577 bytes --]

 drivers/scsi/sd.c |    7 +++++--
 drivers/scsi/sr.c |    4 ++++
 2 files changed, 9 insertions(+), 2 deletions(-)

Index: usb-4.4/drivers/scsi/sd.c
===================================================================
--- usb-4.4.orig/drivers/scsi/sd.c
+++ usb-4.4/drivers/scsi/sd.c
@@ -3275,8 +3275,8 @@ static int sd_suspend_common(struct devi
 	struct scsi_disk *sdkp = dev_get_drvdata(dev);
 	int ret = 0;
 
-	if (!sdkp)
-		return 0;	/* this can happen */
+	if (!sdkp)	/* E.g.: runtime suspend following sd_remove() */
+		return 0;
 
 	if (sdkp->WCE && sdkp->media_present) {
 		sd_printk(KERN_NOTICE, sdkp, "Synchronizing SCSI cache\n");
@@ -3315,6 +3315,9 @@ static int sd_resume(struct device *dev)
 {
 	struct scsi_disk *sdkp = dev_get_drvdata(dev);
 
+	if (!sdkp)	/* E.g.: runtime resume at the start of sd_probe() */
+		return 0;
+
 	if (!sdkp->device->manage_start_stop)
 		return 0;
 
Index: usb-4.4/drivers/scsi/sr.c
===================================================================
--- usb-4.4.orig/drivers/scsi/sr.c
+++ usb-4.4/drivers/scsi/sr.c
@@ -144,6 +144,9 @@ static int sr_runtime_suspend(struct dev
 {
 	struct scsi_cd *cd = dev_get_drvdata(dev);
 
+	if (!cd)	/* E.g.: runtime suspend following sr_remove() */
+		return 0;
+
 	if (cd->media_present)
 		return -EBUSY;
 	else
@@ -985,6 +988,7 @@ static int sr_remove(struct device *dev)
 	scsi_autopm_get_device(cd->device);
 
 	del_gendisk(cd->disk);
+	dev_set_drvdata(dev, NULL);
 
 	mutex_lock(&sr_ref_mutex);
 	kref_put(&cd->kref, sr_kref_release);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
  2016-01-19 16:38 ` Alan Stern
@ 2016-01-19 16:52   ` Paul Menzel
  2016-01-19 21:08     ` Alan Stern
  0 siblings, 1 reply; 22+ messages in thread
From: Paul Menzel @ 2016-01-19 16:52 UTC (permalink / raw)
  To: Alan Stern
  Cc: Erich Schubert, Ben Hutchings, SCSI development list, 801925,
	Alexandre Rossi

[-- Attachment #1: Type: text/plain, Size: 1827 bytes --]

Dear Alan, dear Erich,


Am Dienstag, den 19.01.2016, 11:38 -0500 schrieb Alan Stern:
> On Tue, 19 Jan 2016, Erich Schubert wrote:

> > Attached are photos of the Kernel null pointer BUG that I'm observing.
> > 
> > These shots are with 4.4.0-rc8.
> > As you can see, I have a similar trace to Paul, but the error occurs
> > one stack frame earlier?
>
> Yours is only slighly similar to Paul's.  He got an error in 
> sr_runtime_suspend, but your error occurs in sd_resume -- a completely
> different function in a different source file.

if I remember correctly, it happened it different places for me too. In
the backlog you should see, that Ben gave me a patch to try and then it
wasn’t triggered as it failed somewhere else.

> > Maybe Alex issue is the same bug, but triggered slightly different or
> > just the kernel compiled differently.
> > __rpm_callback, scsi_autopm_put_device, __pm_runtime_resume, sd_probe
> > is present in all of these traces.
> > 
> > Sorry, I do not have a lot of time right now to help testing or debugging.
>
> I can't tell what's going wrong without some real debugging.  This
> means somebody has to build and test a patched kernel.  There are no
> problems on my computer, so it will have to be one or more of you
> guys.

Could you please attach the debugging patch. Hopefully Alexandre, Erich,
or I will have some spare time to build an image from it.

Alan, thank you a lot for being so responsive and helpful!


Thanks,

Paul
-- 
GPG-Schlüssel: 33623E9B
Fingerabdruck = 0EB1 649D 4361 D04F 3C70  6F71 4DD7 BF75 3362 3E9B

Giant Monkey Software Engineering GmbH

Brunnenstr. 7D
10119 Berlin Mitte

Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel

USt-IdNr.: DE281524720
HRB 139495 B Amtsgericht Charlottenburg

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
       [not found] <CAGKbab941Nh7Sy1NTZC6ySxG_P5g7HpjATQ_GSCvDY8y=qgmHA@mail.gmail.com>
@ 2016-01-19 16:38 ` Alan Stern
  2016-01-19 16:52   ` Paul Menzel
  0 siblings, 1 reply; 22+ messages in thread
From: Alan Stern @ 2016-01-19 16:38 UTC (permalink / raw)
  To: Erich Schubert
  Cc: Paul Menzel, Ben Hutchings, SCSI development list, 801925,
	Alexandre Rossi

On Tue, 19 Jan 2016, Erich Schubert wrote:

> Hi,
> Attached are photos of the Kernel null pointer BUG that I'm observing.
> 
> These shots are with 4.4.0-rc8.
> As you can see, I have a similar trace to Paul, but the error occurs
> one stack frame earlier?

Yours is only slighly similar to Paul's.  He got an error in 
sr_runtime_suspend, but your error occurs in sd_resume -- a completely 
different function in a different source file.

> Maybe Alex issue is the same bug, but triggered slightly different or
> just the kernel compiled differently.
> __rpm_callback, scsi_autopm_put_device, __pm_runtime_resume, sd_probe
> is present in all of these traces.
> 
> Sorry, I do not have a lot of time right now to help testing or debugging.

I can't tell what's going wrong without some real debugging.  This
means somebody has to build and test a patched kernel.  There are no
problems on my computer, so it will have to be one or more of you guys.

Alan Stern


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-02-23 15:14 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-16  1:05 NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] Paul Menzel
2015-10-16  7:54 ` Paul Menzel
2015-10-16  8:52   ` Paul Menzel
2015-10-20  1:39   ` Ben Hutchings
2015-10-31  9:39     ` Paul Menzel
2015-11-01  1:56       ` Alan Stern
2015-11-01  2:05       ` Alan Stern
2016-01-09 15:23         ` Paul Menzel
2016-01-09 16:36           ` Alan Stern
2016-01-10 11:44             ` Erich Schubert
2016-01-10 15:32               ` Alan Stern
     [not found] <CAGKbab941Nh7Sy1NTZC6ySxG_P5g7HpjATQ_GSCvDY8y=qgmHA@mail.gmail.com>
2016-01-19 16:38 ` Alan Stern
2016-01-19 16:52   ` Paul Menzel
2016-01-19 21:08     ` Alan Stern
2016-01-19 23:20       ` Paul Menzel
2016-01-20 22:07       ` Alexandre Rossi
2016-02-09 16:47         ` Paul Menzel
2016-02-09 19:56           ` Alexandre Rossi
2016-02-09 20:51             ` Ben Hutchings
2016-02-18 16:27             ` Alan Stern
2016-02-22 23:15               ` Alexandre Rossi
2016-02-23 15:14                 ` Alan Stern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.