Skip to main content

Unplanned Outage: Some IT services unvailable

Last Updated:
2011-11-17 05:00:00
Event:
2011-11-17 05:00:00
Status:
Closed
Brief Description:
User Impact:
N/A
Workaround:
There is no workaround for this issue
Current Status:
N/A
Services Affected:
Full Description:
Some services, including PeopleSoft, Lyris, bulkmail, and the HR web site were affected by an issue this morning. Those services are now available.
CIT TDX ID:



Timeline of Changes

Description Current Status Date Time
A disk array outage that lasted approximately 6 minutes has affected several IT services, including PeopleSoft, Lyris, bulkmail, WebDAV, DNSDB, and the HR web site. \n\nThe disk array is back online and services are being rapidly restored. A disk array outage that lasted approximately 6 minutes has affected several IT services, including PeopleSoft, Lyris, bulkmail, WebDAV, DNSDB, and the HR web site. \n\nThe disk array is back online and services are being rapidly restored. 2011-11-17 05:00:00
PeopleSoft, Lyris, bulkmail, DNSDB, and the HR web site are now available. PeopleSoft, Lyris, bulkmail, DNSDB, and the HR web site are now available. 2011-11-17 05:00:00
We are seeing intermittent WebDAV availability in connection with this morning's outage. We are seeing intermittent WebDAV availability in connection with this morning's outage. 2011-11-17 05:00:00
We are seeing intermittent WebDAV availability in connection with this morning's outage. We are seeing intermittent WebDAV availability in connection with this morning's outage. 2011-11-17 05:00:00
WebDAV performance is restored. At this point, the overall issue is considered resolved.\n\nSummary of what happened:\n\nShortly before 9AM a disk spindle failed in one of the disk arrays used by the Storage Farm service. While this event should not cause an outage (because of redundancy built into these arrays), this event did cause the storage associated with this entire array (20TB) to go offline for 6.5 minutes. \n\nThe servers accessing this storage were inspected and corrective action was taken. Some servers were not impacted, while others required a reboot or other processing. \n\nThis event is being analyzed by the vendor (IBM) and they will provide a Root Cause Analysis, which will be analyzed to determine what additional changes should be implemented to improve the quality of the storage service. WebDAV performance is restored. At this point, the overall issue is considered resolved.\n\nSummary of what happened:\n\nShortly before 9AM a disk spindle failed in one of the disk arrays used by the Storage Farm service. While this event should not cause an outage (because of redundancy built into these arrays), this event did cause the storage associated with this entire array (20TB) to go offline for 6.5 minutes. \n\nThe servers accessing this storage were inspected and corrective action was taken. Some servers were not impacted, while others required a reboot or other processing. \n\nThis event is being analyzed by the vendor (IBM) and they will provide a Root Cause Analysis, which will be analyzed to determine what additional changes should be implemented to improve the quality of the storage service. 2011-11-17 05:00:00