Skip to main content

Performance: Service Disruption: Multiple campus services (virtual machine issue)

Last Updated:
2014-03-16 04:00:00
Event:
2014-03-16 04:00:00
Status:
Closed
Brief Description:
Cornell VPN, CUWebLogin, Wi-Fi, some Cornell websites
User Impact:
N/A
Workaround:
There is no workaround for this issue
Current Status:
N/A
Services Affected:
Full Description:
Several campus services, including Cornell websites, Wi-Fi, Cornell VPN, and CUWebLogin are being affected by a problem with multiple virtual machines. CIT is investigating and working to resolve the issue.
CIT TDX ID:



Timeline of Changes

Description Current Status Date Time
Efforts are still ongoing to address and resolve the issue. Efforts are still ongoing to address and resolve the issue. 2014-03-16 04:00:00
Calls have been placed to vendors, specific efforts have been made on VPN, and partial VPN service has been restored. There are still intermittent problems with authentication, and staff are working toward a fix that is hoped will improve performance. Calls have been placed to vendors, specific efforts have been made on VPN, and partial VPN service has been restored. There are still intermittent problems with authentication, and staff are working toward a fix that is hoped will improve performance. 2014-03-16 04:00:00
As a step toward resolution, restarts of affected systems are beginning. There will be rolling outages of about 10-15 minutes across the services over the next several hours as they are shut down and come back online. As a step toward resolution, restarts of affected systems are beginning. There will be rolling outages of about 10-15 minutes across the services over the next several hours as they are shut down and come back online. 2014-03-16 04:00:00
At this time, we believe rerouting has solved most authentication issues. There may be some cases remaining of specific services holding on to the authentication path that was not working properly, and it may take some more time to restore authentication for these. Work is being done to resolve authentication issues for these remaining services. At this time, we believe rerouting has solved most authentication issues. There may be some cases remaining of specific services holding on to the authentication path that was not working properly, and it may take some more time to restore authentication for these. Work is being done to resolve authentication issues for these remaining services. 2014-03-16 04:00:00
The controlled shutdown and restart process is about about 25% complete. So far, no issues have been detected with the restarted services. Technicians will soon begin contacting service owners to make sure they double-check to ensure their individual services are working properly. The controlled shutdown and restart process is about about 25% complete. So far, no issues have been detected with the restarted services. Technicians will soon begin contacting service owners to make sure they double-check to ensure their individual services are working properly. 2014-03-16 04:00:00
The restart process has reached systems that require a degree more time to verify they have shut down and restarted with no problems. This group should be done soon, and when it is, about one-third of systems will have been restarted. \n\nSo far, it is still the case that the services that have been restarted appear to be functioning normally, and staff have begun contacting service owners to double-check their services are working properly. No issues have been reported with authentication since the fix was put in place. \n\nBased on the current status and progress, the tentative time estimated for resolution of the incident based on the current rate of progress is between 7 and 8pm.\n The restart process has reached systems that require a degree more time to verify they have shut down and restarted with no problems. This group should be done soon, and when it is, about one-third of systems will have been restarted. \n\nSo far, it is still the case that the services that have been restarted appear to be functioning normally, and staff have begun contacting service owners to double-check their services are working properly. No issues have been reported with authentication since the fix was put in place. \n\nBased on the current status and progress, the tentative time estimated for resolution of the incident based on the current rate of progress is between 7 and 8pm.\n 2014-03-16 04:00:00
The process is well over half complete and effort is being applied to servers that need particular attention. The process is well over half complete and effort is being applied to servers that need particular attention. 2014-03-16 04:00:00
The process of controlled shutdown and restart of affected systems is working toward completion. Monitoring and special focus on more complex systems is continuing. The process of controlled shutdown and restart of affected systems is working toward completion. Monitoring and special focus on more complex systems is continuing. 2014-03-16 04:00:00
The incident appears to be resolved. Service owners who encounter unexpected behavior or other issues should contact the IT Service Desk at 255-5500. The issue occurred when the scheduled maintenance completed properly but had unintended consequences within the datacenter VMware environment. Staff are investigating and will take available steps to prevent similar issues in the future. The incident appears to be resolved. Service owners who encounter unexpected behavior or other issues should contact the IT Service Desk at 255-5500. The issue occurred when the scheduled maintenance completed properly but had unintended consequences within the datacenter VMware environment. Staff are investigating and will take available steps to prevent similar issues in the future. 2014-03-16 04:00:00