[PATCH 0/2] Drivers: scsi: storvsc: Fix issues with hot-add/remove of LUNs

Discussion:

[PATCH 0/2] Drivers: scsi: storvsc: Fix issues with hot-add/remove of LUNs

K. Y. Srinivasan

2014-08-17 03:09:14 UTC

This patch-set addresses issues with LUN hot-add and remove. When the host
notifies the guest that a scan is needed, scan the host. Also, prior to
discovering new devices that may have been added, ensure we handle the
LUN remove case first.

K. Y. Srinivasan (2):
Drivers: scsi: storvsc: In responce to a scan event, scan the host
Drivers: scsi: storvsc: Force discovery of LUNs that may have been
removed.

drivers/scsi/storvsc_drv.c | 43 ++++++++++++++++++++++++++++++++-----------
1 files changed, 32 insertions(+), 11 deletions(-)

--
1.7.4.1

K. Y. Srinivasan

2014-08-17 03:09:48 UTC

The host asks the guest to scan when a LUN is removed or added.
The only way a guest can identify the removed LUN is when an I/O is
attempted on a removed LUN - the SRB status code indicates that the LUN
is invalid. We currently handle this SRB status and remove the device.

Rather than waiting for an I/O to remove the device, force the discovery of
LUNs that may have been removed prior to discovering LUNs that may have
been added.

Signed-off-by: K. Y. Srinivasan <***@microsoft.com>
---
drivers/scsi/storvsc_drv.c | 26 ++++++++++++++++++++++++++
1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 02d1db6..fb38ca9 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -430,10 +430,36 @@ static void storvsc_host_scan(struct work_struct *work)
{
struct storvsc_scan_work *wrk;
struct Scsi_Host *host;
+ struct scsi_device *sdev;
+ unsigned long flags;

wrk = container_of(work, struct storvsc_scan_work, work);
host = wrk->host;

+ /*
+ * Before scanning the host, first check to see if any of the
+ * currrently known devices have been hot removed. We issue a
+ * "unit ready" command against all currently known devices.
+ * This I/O will result in an error for devices that have been
+ * removed. As part of handling the I/O error, we remove the device.
+ *
+ * When a LUN is added or removed, the host sends us a signal to
+ * scan the host. Thus we are forced to discover the LUNs that
+ * may have been removed this way.
+ */
+ mutex_lock(&host->scan_mutex);
+ spin_lock_irqsave(host->host_lock, flags);
+ list_for_each_entry(sdev, &host->__devices, siblings) {
+ spin_unlock_irqrestore(host->host_lock, flags);
+ scsi_test_unit_ready(sdev, 1, 1, NULL);
+ spin_lock_irqsave(host->host_lock, flags);
+ continue;
+ }
+ spin_unlock_irqrestore(host->host_lock, flags);
+ mutex_unlock(&host->scan_mutex);
+ /*
+ * Now scan the host to discover LUNs that may have been added.
+ */
scsi_scan_host(host);

kfree(wrk);

--
1.7.4.1

Christoph Hellwig

2014-08-19 17:54:31 UTC

Post by K. Y. Srinivasan
The host asks the guest to scan when a LUN is removed or added.
The only way a guest can identify the removed LUN is when an I/O is
attempted on a removed LUN - the SRB status code indicates that the LUN
is invalid. We currently handle this SRB status and remove the device.
Rather than waiting for an I/O to remove the device, force the discovery of
LUNs that may have been removed prior to discovering LUNs that may have
been added.

This looks pretty reasonable to me, but I wonder if we should move this
up to common code so that it happens for any host rescan triggered by
sysfs or other drivers as well.

KY Srinivasan

2014-08-26 22:54:51 UTC

-----Original Message-----
Sent: Tuesday, August 19, 2014 10:55 AM
To: KY Srinivasan
Subject: Re: [PATCH 2/2] Drivers: scsi: storvsc: Force discovery of LUNs that
may have been removed.

Post by K. Y. Srinivasan
The host asks the guest to scan when a LUN is removed or added.
The only way a guest can identify the removed LUN is when an I/O is
attempted on a removed LUN - the SRB status code indicates that the
LUN is invalid. We currently handle this SRB status and remove the device.
Rather than waiting for an I/O to remove the device, force the
discovery of LUNs that may have been removed prior to discovering LUNs
that may have been added.

This looks pretty reasonable to me, but I wonder if we should move this up
to common code so that it happens for any host rescan triggered by sysfs or
other drivers as well.

Ping. Any decision on if/when this patch may be checked in.

Regards,

K. Y

Christoph Hellwig

2014-08-29 01:11:29 UTC

Post by KY Srinivasan

This looks pretty reasonable to me, but I wonder if we should move this up
to common code so that it happens for any host rescan triggered by sysfs or
other drivers as well.

Ping. Any decision on if/when this patch may be checked in.

I'll need a review or two for it still..

Hannes Reinecke

2014-08-27 14:31:21 UTC

Post by Christoph Hellwig

Post by K. Y. Srinivasan
The host asks the guest to scan when a LUN is removed or added.
The only way a guest can identify the removed LUN is when an I/O is
attempted on a removed LUN - the SRB status code indicates that the LUN
is invalid. We currently handle this SRB status and remove the device.
Rather than waiting for an I/O to remove the device, force the discovery of
LUNs that may have been removed prior to discovering LUNs that may have
been added.

This looks pretty reasonable to me, but I wonder if we should move this
up to common code so that it happens for any host rescan triggered by
sysfs or other drivers as well.

Not without proper testing.
Currently we cannot rescan existing devices; the inquiry string is
nailed to the sdev structure. The only way to really refresh the
information is to delete it and rescan it again.

And I really do _not_ want to do this automatically as the device
might be busy due to various reasons (think of multipathing).
It tooks us ages to get this working with FC, and we finally settled
to have a soft-remove implemented in the transport class.
And we still have issues with SAS HBAs, where at least the standard
defines a mechanism. Trying this in the SCSI midlayer itself
is the road to disaster.

If we were to attempt this we would need to lift the dev_loss_tmo
mechanism from the FC transport layer and make this a generic
facility for every HBA. But this is quite some work.

Cheers,

Hannes

--
Dr. Hannes Reinecke zSeries & Storage
***@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

Mike Christie

2014-08-29 02:42:03 UTC

Post by Hannes Reinecke

Post by Christoph Hellwig

Post by K. Y. Srinivasan
The host asks the guest to scan when a LUN is removed or added.
The only way a guest can identify the removed LUN is when an I/O is
attempted on a removed LUN - the SRB status code indicates that the LUN
is invalid. We currently handle this SRB status and remove the device.
Rather than waiting for an I/O to remove the device, force the discovery of
LUNs that may have been removed prior to discovering LUNs that may have
been added.

This looks pretty reasonable to me, but I wonder if we should move this
up to common code so that it happens for any host rescan triggered by
sysfs or other drivers as well.

Not without proper testing.
Currently we cannot rescan existing devices; the inquiry string is
nailed to the sdev structure. The only way to really refresh the
information is to delete it and rescan it again.

How are distros handling 0x6/0x3f/0x0e (report luns changed) when it
gets passed to userspace? Is everyone kicking off a new full (add and
delete) scan to handle this or logging it? Is the driver returning this
when the LUNs change?

Also is the driver getting a 0x5/0x25/0 (invalid LUN) when the LUN does
not exist, or is it just getting that SRB_STATUS_INVALID_LUN error code?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Hannes Reinecke

2014-08-29 06:19:11 UTC

Post by Mike Christie

Post by Hannes Reinecke

Post by Christoph Hellwig

Post by K. Y. Srinivasan
The host asks the guest to scan when a LUN is removed or added.
The only way a guest can identify the removed LUN is when an I/O is
attempted on a removed LUN - the SRB status code indicates that the LUN
is invalid. We currently handle this SRB status and remove the device.
Rather than waiting for an I/O to remove the device, force the discovery of
LUNs that may have been removed prior to discovering LUNs that may have
been added.

This looks pretty reasonable to me, but I wonder if we should move this
up to common code so that it happens for any host rescan triggered by
sysfs or other drivers as well.

Not without proper testing.
Currently we cannot rescan existing devices; the inquiry string is
nailed to the sdev structure. The only way to really refresh the
information is to delete it and rescan it again.

How are distros handling 0x6/0x3f/0x0e (report luns changed) when it
gets passed to userspace? Is everyone kicking off a new full (add and
delete) scan to handle this or logging it? Is the driver returning this
when the LUNs change?

Currently it's logged to userspace and ignored.
Doing an automated rescan has proven to be dangerous, as it
might disconnect any LUNs which are still in use by applications.
Especially HA or database setups tends to become very annoyed
when you do an automated rescan.

Cheers,

Hannes

--
Dr. Hannes Reinecke zSeries & Storage
***@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

Bart Van Assche

2014-08-29 07:39:26 UTC

Post by Hannes Reinecke

Post by Mike Christie
How are distros handling 0x6/0x3f/0x0e (report luns changed) when it
gets passed to userspace? Is everyone kicking off a new full (add and
delete) scan to handle this or logging it? Is the driver returning this
when the LUNs change?

Currently it's logged to userspace and ignored.
Doing an automated rescan has proven to be dangerous, as it
might disconnect any LUNs which are still in use by applications.
Especially HA or database setups tends to become very annoyed
when you do an automated rescan.

Has it already been considered to add newly discovered LUNs
automatically and to leave it to the user to remove stale LUNs manually
? That would be similar to what the rescan-scsi-bus.sh script does
without option -r/--remove.

Bart.

Hannes Reinecke

2014-08-29 08:13:44 UTC

Post by Bart Van Assche

Post by Hannes Reinecke

Post by Mike Christie
How are distros handling 0x6/0x3f/0x0e (report luns changed) when it
gets passed to userspace? Is everyone kicking off a new full (add and
delete) scan to handle this or logging it? Is the driver returning this
when the LUNs change?

Currently it's logged to userspace and ignored.
Doing an automated rescan has proven to be dangerous, as it
might disconnect any LUNs which are still in use by applications.
Especially HA or database setups tends to become very annoyed
when you do an automated rescan.

Has it already been considered to add newly discovered LUNs
automatically and to leave it to the user to remove stale LUNs manually
? That would be similar to what the rescan-scsi-bus.sh script does
without option -r/--remove.

As of now we're still missing an in-kernel infrastructure which
would allow us to react on any sense codes; currently we're relying
on the administrator to setup a udev rule here.

Cheers,

Hannes

--
Dr. Hannes Reinecke zSeries & Storage
***@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

James Bottomley

2014-08-29 13:53:42 UTC

Post by Hannes Reinecke

Post by Bart Van Assche

Post by Hannes Reinecke

Post by Mike Christie
How are distros handling 0x6/0x3f/0x0e (report luns changed) when it
gets passed to userspace? Is everyone kicking off a new full (add and
delete) scan to handle this or logging it? Is the driver returning this
when the LUNs change?

Currently it's logged to userspace and ignored.
Doing an automated rescan has proven to be dangerous, as it
might disconnect any LUNs which are still in use by applications.
Especially HA or database setups tends to become very annoyed
when you do an automated rescan.

Has it already been considered to add newly discovered LUNs
automatically and to leave it to the user to remove stale LUNs manually
? That would be similar to what the rescan-scsi-bus.sh script does
without option -r/--remove.

As of now we're still missing an in-kernel infrastructure which
would allow us to react on any sense codes; currently we're relying
on the administrator to setup a udev rule here.

Um, I thought this was supposed to solve that problem:

commit 279afdfe78a020b4b1a68bffd0009b961b12982e
Author: Ewan D. Milne <***@redhat.com>
Date: Thu Aug 8 15:07:48 2013 -0400

[SCSI] Generate uevents on certain unit attention codes

The idea was supposed to be that, as you say, log scrubbers are hard to
configure and break every time someone fixes a spelling error, so we
could now listen for a report luns data change uevent instead.

James

Ewan Milne

2014-08-29 15:01:37 UTC

Post by Mike Christie

Post by Hannes Reinecke

Post by Christoph Hellwig

Post by K. Y. Srinivasan
The host asks the guest to scan when a LUN is removed or added.
The only way a guest can identify the removed LUN is when an I/O is
attempted on a removed LUN - the SRB status code indicates that the LUN
is invalid. We currently handle this SRB status and remove the device.
Rather than waiting for an I/O to remove the device, force the discovery of
LUNs that may have been removed prior to discovering LUNs that may have
been added.

This looks pretty reasonable to me, but I wonder if we should move this
up to common code so that it happens for any host rescan triggered by
sysfs or other drivers as well.

Not without proper testing.
Currently we cannot rescan existing devices; the inquiry string is
nailed to the sdev structure. The only way to really refresh the
information is to delete it and rescan it again.

How are distros handling 0x6/0x3f/0x0e (report luns changed) when it
gets passed to userspace? Is everyone kicking off a new full (add and
delete) scan to handle this or logging it? Is the driver returning this
when the LUNs change?

Currently the udev rules we have to handle these events are installed
with a separate package, and only the REPORTED LUNS DATA HAS CHANGED
does anything, the others are commented out. It turns out that e.g.
multipath stops using a path if it notices that the capacity has changed
and we need to do some more work there, it is under discussion.

We do not delete LUNs that disappear from the REPORT LUNS inventory,
although someone could write their own udev rule to do that if desired.

Beware the case where a LUN is remapped to a different LUN number, or
if LUN's WWID is used for a device with different data (e.g. a LUN
deleted and re-added and the WWID is the same although I don't know
if this actually happens).

Consider that the UA just provides notification to userspace of a
change -- lack of notification does not prevent someone from deciding
to rescan for new LUNs via sysfs any time they feel like it. So
you can't just change the storage configuration and hope that no-one
notices until you are done making changes.

-Ewan

Post by Mike Christie
Also is the driver getting a 0x5/0x25/0 (invalid LUN) when the LUN does
not exist, or is it just getting that SRB_STATUS_INVALID_LUN error code?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
More majordomo info at http://vger.kernel.org/majordomo-info.html

K. Y. Srinivasan

2014-08-17 03:09:47 UTC

The virtual HBA that storvsc implements can support multiple channels and
targets. So, scan the host when the host notifies that a scan is needed.

Signed-off-by: K. Y. Srinivasan <***@microsoft.com>
---
drivers/scsi/storvsc_drv.c | 19 +++++++------------
1 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 15ba695..02d1db6 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -426,21 +426,16 @@ done:
kfree(wrk);
}

-static void storvsc_bus_scan(struct work_struct *work)
+static void storvsc_host_scan(struct work_struct *work)
{
struct storvsc_scan_work *wrk;
- int id, order_id;
+ struct Scsi_Host *host;

wrk = container_of(work, struct storvsc_scan_work, work);
- for (id = 0; id < wrk->host->max_id; ++id) {
- if (wrk->host->reverse_ordering)
- order_id = wrk->host->max_id - id - 1;
- else
- order_id = id;
-
- scsi_scan_target(&wrk->host->shost_gendev, 0,
- order_id, SCAN_WILD_CARD, 1);
- }
+ host = wrk->host;
+
+ scsi_scan_host(host);
+
kfree(wrk);
}

@@ -1209,7 +1204,7 @@ static void storvsc_on_receive(struct hv_device *device,
if (!work)
return;

- INIT_WORK(&work->work, storvsc_bus_scan);
+ INIT_WORK(&work->work, storvsc_host_scan);
work->host = stor_device->host;
schedule_work(&work->work);
break;

--
1.7.4.1

Christoph Hellwig

2014-09-25 13:47:26 UTC

So can anywayone give me a review for those two patches?

Post by K. Y. Srinivasan
This patch-set addresses issues with LUN hot-add and remove. When the host
notifies the guest that a scan is needed, scan the host. Also, prior to
discovering new devices that may have been added, ensure we handle the
LUN remove case first.
Drivers: scsi: storvsc: In responce to a scan event, scan the host
Drivers: scsi: storvsc: Force discovery of LUNs that may have been
removed.
drivers/scsi/storvsc_drv.c | 43 ++++++++++++++++++++++++++++++++-----------
1 files changed, 32 insertions(+), 11 deletions(-)
--
1.7.4.1
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
More majordomo info at http://vger.kernel.org/majordomo-info.html

---end quoted text---

KY Srinivasan

2014-09-29 01:47:02 UTC

-----Original Message-----
Sent: Thursday, September 25, 2014 6:47 AM
To: KY Srinivasan
Subject: Re: [PATCH 0/2] Drivers: scsi: storvsc: Fix issues with hot-
add/remove of LUNs
So can anywayone give me a review for those two patches?

Olaf,

Could you review these patches.

K. Y

Post by K. Y. Srinivasan
This patch-set addresses issues with LUN hot-add and remove. When the
host notifies the guest that a scan is needed, scan the host. Also,
prior to discovering new devices that may have been added, ensure we
handle the LUN remove case first.
Drivers: scsi: storvsc: In responce to a scan event, scan the host
Drivers: scsi: storvsc: Force discovery of LUNs that may have been
removed.
drivers/scsi/storvsc_drv.c | 43 ++++++++++++++++++++++++++++++++-

----------

Post by K. Y. Srinivasan
1 files changed, 32 insertions(+), 11 deletions(-)
--
1.7.4.1
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi"

majordomo

Post by K. Y. Srinivasan
info at http://vger.kernel.org/majordomo-info.html

---end quoted text---

KY Srinivasan

2014-10-07 17:44:38 UTC

-----Original Message-----
Sent: Thursday, September 25, 2014 6:47 AM
To: KY Srinivasan
Subject: Re: [PATCH 0/2] Drivers: scsi: storvsc: Fix issues with hot-
add/remove of LUNs
So can anywayone give me a review for those two patches?

Did you get the reviews?

K. Y

Post by K. Y. Srinivasan
This patch-set addresses issues with LUN hot-add and remove. When the
host notifies the guest that a scan is needed, scan the host. Also,
prior to discovering new devices that may have been added, ensure we
handle the LUN remove case first.
Drivers: scsi: storvsc: In responce to a scan event, scan the host
Drivers: scsi: storvsc: Force discovery of LUNs that may have been
removed.
drivers/scsi/storvsc_drv.c | 43 ++++++++++++++++++++++++++++++++-

----------

Post by K. Y. Srinivasan
1 files changed, 32 insertions(+), 11 deletions(-)
--
1.7.4.1
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi"

majordomo

Post by K. Y. Srinivasan
info at http://vger.kernel.org/majordomo-info.html

---end quoted text---

Christoph Hellwig

2014-10-17 13:22:02 UTC

Post by KY Srinivasan

Post by Christoph Hellwig
So can anywayone give me a review for those two patches?

Did you get the reviews?

No.

16 Replies
1 View
Permalink to this page
Disable enhanced parsing

Thread Navigation

K. Y. Srinivasan 2014-08-17 03:09:14 UTC

K. Y. Srinivasan 2014-08-17 03:09:48 UTC

Christoph Hellwig 2014-08-19 17:54:31 UTC

KY Srinivasan 2014-08-26 22:54:51 UTC

Christoph Hellwig 2014-08-29 01:11:29 UTC

Hannes Reinecke 2014-08-27 14:31:21 UTC

Mike Christie 2014-08-29 02:42:03 UTC

Hannes Reinecke 2014-08-29 06:19:11 UTC

Bart Van Assche 2014-08-29 07:39:26 UTC

Hannes Reinecke 2014-08-29 08:13:44 UTC

James Bottomley 2014-08-29 13:53:42 UTC

Ewan Milne 2014-08-29 15:01:37 UTC

K. Y. Srinivasan 2014-08-17 03:09:47 UTC

Christoph Hellwig 2014-09-25 13:47:26 UTC

KY Srinivasan 2014-09-29 01:47:02 UTC

KY Srinivasan 2014-10-07 17:44:38 UTC

Christoph Hellwig 2014-10-17 13:22:02 UTC

about - legalese

Loading...