“Theta ESP Wiki” Confluence site is live

The Theta Early Science Program Confluence space is now available. This is meant to function like a Wiki, but using the Confluence technology. Anyone from Theta ESP projects may put content in the space, and for the most part its structure should evolve organically. ALCF will put content here as well, such as slides and videos from ESP meetings, instructions for accessing early hardware when it becomes available, etc. Do not include any NDA information (Intel or Cray) in content you put in this space; we will have another venue for that if/when needed.

The video recordings of the Kick-Off and Training Session 1 videoconferences are available on the Confluence site: Expand “Meetings/Workshops” under the navigation panel at the left and click to bring up the relevant meeting page.

To get access to the site, follow these instructions:

  1. Sign up for a Confluence account:
    • Go to https://collab.cels.anl.gov and in the top-right corner of the webpage, click “Sign up”.
    • On the Sign Up screen, enter your contact information, including your preferred email address and click the “Sign up” button.
  2. Obtain access to the Theta ESP Wiki space:
    • Send an email to accounts@alcf.anl.gov with your Confluence username requesting access to the Theta ESP wiki.
    • You will be notified via email within two business days when you have been added to the Theta ESP wiki along with the link to the login.
  3. Log into the Theta ESP Wiki space:

Theta ESP Training Session 1 Videoconference

There will be a training-session videoconference on Wednesday 9 September, from 10:00 AM – 5:30 PM CDT.

The purpose of the meeting is to introduce Theta hardware, particularly the Knights Landing CPU, and the software tools to program it. Presentations will be given by Intel and Cray speakers. See agenda on the meeting website.

The intended audience is ALCF Theta ESP project members who will be working on code development and testing for Theta. You will receive an invitation to participate, either directly or through your project’s PI/co-PI. Session content will include Intel NDA material, so attendees’ institutions must have an appropriate nondisclosure agreement signed with Intel.

To register for the videoconference, please visit the Videoconference Registration Site.

Theta ESP Kick-Off Videoconference

There will be a 90-minute Kick-Off videoconference for the Theta ESP projects. Presentations will cover structure of the program, timeline, expectations of the projects, and events.

The videoconference is by invitation only. All ESP project PIs and co-PIs were notified.

NOTE ON BLOG ENTRIES: This is the first entry associated with the Theta ESP. All older entries were for the ESP for what is now our production system, Mira.

Minimum partition size on Mira is 512 nodes; maximum backfill is 8192 nodes

As has been explained in a recent email to Mira users, the minimum partition you can use on the machine is 512 nodes. If you request fewer nodes, you still pay from your allocation for all 512, and the unused nodes are idle. On Cetus, the minimum partition size is 128 nodes.

As some of you exhaust your ESP allocations on Mira, you will notice your jobs going into the “backfill” queue. These are queued with low priority relative to positive-allocation-balance jobs, but will run if resources are available and no normal jobs are available to fit the space. The maximim size job allowed in backfill mode is 8192 nodes.

Cetus Down

There will be an official notice going out soon, but be aware that Cetus is down and will be down for a number of days. This is related to the Vesta downtime—the BG/Q rack that’s currently designated as Cetus is being combined with Vesta to make Vesta a 2-rack system. We have a new rack that will be designated as Cetus. My best estimate is 5 days of downtime for Cetus (yesterday’s notice to vesta-notify and mira-notify lists estimated 5 days downtime for Vesta).

Early Science on Mira is on; time allocations in place

The Early Science period is officially underway. Mira came back online after acceptance testing on the evening of Monday 17 December. After an initial glitch in setting up the computer time allocations for the ESP projects, the correct allocations are now in place. These are what you were awarded as target allocation when your project was selected for the ESP. On Mira, the command

        cbank-list-allocations -u yourUserName -r mira

will show you the amout and usage of your allocation.

Our one-rack test and development machine, Cetus (cetus.alcf.anl.gov) is now also available to ESP users.

The Early Science period should last through mid-March. When there is concrete information about the exact transition date, I’ll send out an email with the date and information about how the transition to production usage will impact the Early Science projects. You should have used up your ESP project allocations by then.

Allocations for *_esp projects are active

Please note that, since the 24 accepted racks of Mira were turned over to Early Science, time allocations for the ESP projects have been in place. The amounts of the allocations are not yet correct; all were set to a placeholder value of 50 million core hours. When ALCF and Mira are ready, we will establish the formal allocations for the projects. These allocations will be based on the target awards from the letters informing you of your Early Science Program awards. Those are:

PROJECT               AWARD (in millions of Mira core-hours)
--------------------  --------------------------------------
GFDL_esp              150
Mat_Design_esp         50
Autoignition_esp      150
Bulk_Properties_esp   150
DarkUniverse_esp      150
MADNESS_MPQC_esp      150
CFDAnisotropic_esp    150
GroundMotion_esp      150
HSCD_esp              150
TurbNuclComb_esp      150
LatticeQCD_esp        150
TurbChannelFlow_esp    60
AbInitioC12_esp       110
NAMD_esp               80
PlasmaMicroturb_esp    50
MultiscaleMolSim_esp  150

Mira access added; usage announcement mailed out

Those of you who responded to the email asking for new ESP users that need to get access to Mira for Early Science runs on the accepted 24 racks of the machine: you should all now have access to Mira (and Vesta), except possibly those just now getting an ALCF account for the first time (you have additional application procedures to do, for which you should’ve received instructions).

Today I sent out an email to mira-early-users with some details about using Mira between now and the start of the 48-rack acceptance testing (around mid November). For those of you in the Early Science Program projects, you can now run jobs up to 16 racks in the “ESP” queue. For 24 racks and higher, you will land in the “ESP-bigrun” queue, which will be manually managed. This should mainly be for scaling tests, not scientific runs, since when you run on 24 racks and higher you’ll be using some of the unaccepted Mira nodes, and can’t expect reliability as you get on the 24 accepted racks (where all your jobs of 16 racks and less will run).

Remember that on BG/Q, 1 rack is 1024 nodes (same as BG/P), but is 16K cores (as opposed to 4K cores on BG/P).

Mira access for general ESP users—send names this week

A reminder to send me information about additional from your ESP project teams that you’d like to be able to access Mira starting next week.

In order to get on the list I’ll be handing off to User Services, you need to send in the information by the end of this week (tomorrow, Friday 4 Oct). Send to earlyscience@alcf.anl.gov .

If you already have Mira access, you don’t need to do anything. The information I need to get someone Mira access is:

  • First, Last name.
  • ALCF username
  • email address
  • ESP project (short name, from list below)
PI                      Project Short Name
Venkatramani Balaji     GFDL_esp
Larry Curtiss           Mat_Design_esp
Christos Frouzakis      Autoignition_esp
Mark Gordon             Bulk_Properties_esp
Salman Habib            DarkUniverse_esp
Robert Harrison         MADNESS_MPQC_esp
Kenneth Jansen          CFDAnisotropic_esp
Thomas Jordan           GroundMotion_esp
Alexei Khokhlov         HSCD_esp
Don Lamb                TurbNuclComb_esp
Paul Mackenzie          LatticeQCD_esp
Robert Moser            TurbChannelFlow_esp
Steven C Pieper         AbInitioC12_esp
Benoit Roux             NAMD_esp
William Tang            PlasmaMicroturb_esp
Gregory Voth            MultiscaleMolSim_esp