Research IT

provides research data and computing technologies, consulting, and community for the UC Berkeley campus. Our goal is to advance research through IT innovation.

Status and Service Updates

Savio HPC Services Resumed: Mon, 8/12

We are excited to report that Savio HPC services have resumed. As planned, the Berkeley IT team repaired the automated transfer switch in the data center over the weekend. The data center's power is restored, and the Savio supercluster is back online. Jobs have started running, and HPC services, including Open OnDemand and Globus, are also back in service.

Savio outage (data center repair)- Starting 5 PM, Friday Aug 9th

The Berkeley IT team is planning to repair the automated transfer switch in the Earl Warren Hall Data Center. The work is needed to automate the power failover to generators during future power outages. We have scheduled a Savio downtime to accommodate the repair work, which calls for a full power shutdown of the data center. The downtime will start at 5:00 PM on Friday, Aug 9th and we anticipate to return the HPC services on Monday, Aug 12th by 5:00 PM. A scheduler reservation is already in place to ensure that no jobs run after 5:00 PM on Friday, August 9th. If you plan to submit jobs, please request proper walltime to ensure that jobs complete before the downtime. Otherwise, your jobs will wait in the queue until the cluster is back online.

Savio HPC Open OnDemand Service back online: Mon, 7/15

The Open OnDemand HPC service at https://ood.brc.berkeley.edu/ is back online. We appreciate your patience while we were working through some issues. The service has some changes. We have upgraded Open OnDemand to the latest version, 3.1.7. We have also adopted CILogon for user authentication to eliminate the repetitive login problems you might have experienced. Please select the appropriate institute, primarily the University of California, Berkeley, at the login page. The command-line tool email_lookup.sh can help clarify at which institute you should log in.

Savio HPC Services are back online except for OOD: Wed, 7/10

The Savio HPC system (with the new Rocky Linux 8 OS installed and implemented), with the exception of the Open OnDemand (OOD) service, has been returned to service and is available to users. We will need more time to configure OOD, so please do not attempt to use OOD at this time. Please note, however, that the Savio documentation has not yet been fully updated to reflect changes due to the new Rocky Linux 8 OS (e.g., changes in the software stack and software module farms, changes in how to compile user code, etc.). Therefore, until the updates in the Savio documentation have been completed, we suggest that Savio users refer to the LBNL Science IT documentation for the Lawrencium HPC system at https://scienceit-docs.lbl.gov/hpc/rocky8-migration/ , https://scienceit-docs.lbl.gov/hpc/software/software-module-farm/ , and https://scienceit-docs.lbl.gov/hpc/software/module-management/ (which is similar to though not exactly the same as Savio) as a temporary guide to some of the changes that have taken place on the Savio system due to the Savio Rocky Linux 8 OS upgrade.

Savio Downtime: Fri, 7/5 - Wed, 7/10

As you know, we have been working on upgrading the Savio operation system to Rocky 8. To complete the OS upgrade, we coordinated with the data center group on campus to combine our work with the long-awaited power work needed at the data center. The joint downtime will start at 5PM on Friday, 7/5. We anticipate to return the services by the end of Wednesday, 7/10. A scheduler reservation is in place to ensure no jobs run after 5PM on Friday, 7/5. If you plan to submit jobs, please request the appropriate wall time to ensure job completion before the downtime. Otherwise, your jobs will wait in the queue until Savio is back online.

Savio Scratch File System is Back up and Running: Mon., 7/1

The Savio /global/scratch parallel file system is back up and running and usable again.

Savio Scratch File System is Down: Mon, 6/24

The /global/scratch parallel file system started having access problems on Friday, 6/21. The investigation is underway. We apologize for any inconvenience this may cause and will keep you posted about the investigation's status.

News Articles