There will be a 90-minute Kick-Off videoconference for the Theta ESP projects. Presentations will cover structure of the program, timeline, expectations of the projects, and events.
The videoconference is by invitation only. All ESP project PIs and co-PIs were notified.
NOTE ON BLOG ENTRIES: This is the first entry associated with the Theta ESP. All older entries were for the ESP for what is now our production system, Mira.
As has been explained in a recent email to Mira users, the minimum partition you can use on the machine is 512 nodes. If you request fewer nodes, you still pay from your allocation for all 512, and the unused nodes are idle. On Cetus, the minimum partition size is 128 nodes.
As some of you exhaust your ESP allocations on Mira, you will notice your jobs going into the “backfill” queue. These are queued with low priority relative to positive-allocation-balance jobs, but will run if resources are available and no normal jobs are available to fit the space. The maximim size job allowed in backfill mode is 8192 nodes.
This coming Monday and Tuesday (4-5 Feb. 2013), Vesta will be down for extended maintenance, to install the latest BG/Q system driver from IBM (V1R2M0). Eventually, this driver version will be installed on Mira. Please help ALCF and yourselves by building and testing your Early Science codes on Vesta after the upgrade, if you can. Let us know if something breaks.
There will be an official notice going out soon, but be aware that Cetus is down and will be down for a number of days. This is related to the Vesta downtimeâthe BG/Q rack that’s currently designated as Cetus is being combined with Vesta to make Vesta a 2-rack system. We have a new rack that will be designated as Cetus. My best estimate is 5 days of downtime for Cetus (yesterday’s notice to vesta-notify and mira-notify lists estimated 5 days downtime for Vesta).
The Early Science period is officially underway. Mira came back online after acceptance testing on the evening of Monday 17 December. After an initial glitch in setting up the computer time allocations for the ESP projects, the correct allocations are now in place. These are what you were awarded as target allocation when your project was selected for the ESP. On Mira, the command
Â Â Â Â cbank-list-allocations -u yourUserName -r mira
will show you the amout and usage of your allocation.
Our one-rack test and development machine, Cetus (cetus.alcf.anl.gov) is now also available to ESP users.
The Early Science period should last through mid-March. When there is concrete information about the exact transition date, I’ll send out an email with the date and information about how the transition to production usage will impact the Early Science projects. You should have used up your ESP project allocations by then.
Please note that, since the 24 accepted racks of Mira were turned over to Early Science, time allocations for the ESP projects have been in place. The amounts of the allocations are not yet correct; all were set to a placeholder value of 50 million core hours. When ALCF and Mira are ready, we will establish the formal allocations for the projects. These allocations will be based on the target awards from the letters informing you of your Early Science Program awards. Those are:
PROJECT AWARD (in millions of Mira core-hours)
Those of you who responded to the email asking for new ESP users that need to get access to Mira for Early Science runs on the accepted 24 racks of the machine: you should all now have access to Mira (and Vesta), except possibly those just now getting an ALCF account for the first time (you have additional application procedures to do, for which you should’ve received instructions).
Today I sent out an email to mira-early-users with some details about using Mira between now and the start of the 48-rack acceptance testing (around mid November). For those of you in the Early Science Program projects, you can now run jobs up to 16 racks in the “ESP” queue. For 24 racks and higher, you will land in the “ESP-bigrun” queue, which will be manually managed. This should mainly be for scaling tests, not scientific runs, since when you run on 24 racks and higher you’ll be using some of the unaccepted Mira nodes, and can’t expect reliability as you get on the 24 accepted racks (where all your jobs of 16 racks and less will run).
Remember that on BG/Q, 1 rack is 1024 nodes (same as BG/P), but is 16K cores (as opposed to 4K cores on BG/P).
A reminder to send me information about additional from your ESP project teams that you’d like to be able to access Mira starting next week.
In order to get on the list I’ll be handing off to User Services, you need to send in the information by the end of this week (tomorrow, Friday 4 Oct). Send to firstname.lastname@example.org .
If you already have Mira access, you don’t need to do anything. The information I need to get someone Mira access is:
- First, Last name.
- ALCF username
- email address
- ESP project (short name, from list below)
PI Â Â Â Â Â Â Â Â Â Â Â Project Short Name
Venkatramani Balaji Â Â GFDL_esp
Larry Curtiss Â Â Â Â Â Mat_Design_esp
Christos Frouzakis Â Â Â Autoignition_esp
Mark Gordon Â Â Â Â Â Â Bulk_Properties_esp
Salman Habib Â Â Â Â Â Â DarkUniverse_esp
Robert Harrison Â Â Â Â MADNESS_MPQC_esp
Kenneth Jansen Â Â Â Â Â CFDAnisotropic_esp
Thomas Jordan Â Â Â Â Â GroundMotion_esp
Alexei Khokhlov Â Â Â Â HSCD_esp
Don Lamb Â Â Â Â Â Â Â Â TurbNuclComb_esp
Paul Mackenzie Â Â Â Â Â LatticeQCD_esp
Robert Moser Â Â Â Â Â Â TurbChannelFlow_esp
Steven C Pieper Â Â Â Â AbInitioC12_esp
Benoit Roux Â Â Â Â Â Â NAMD_esp
William Tang Â Â Â Â Â Â PlasmaMicroturb_esp
Gregory Voth Â Â Â Â Â Â MultiscaleMolSim_esp
The current revised estimate for the two-week period when ESP projects will have acces to the full Mira for testing codes at scale (ESV period), is the last half of August, possibly starting around August 20. Please plan for this. (I still haven’t heard from most of the ESP projects who their 1 or 2 ESV users will be. Please email email@example.com with this information.
[Update 10/3/2012: As indicated in a couple of mass emailings to ESP project participants, the plan changed and there will be no specific 2-week ESV period. We have accepted 24 of the 48 racks, and access will be extended starting the week of Oct. 7 to all ESP project participants to run science on those 24 racks.]
So far, I’ve only heard back from 3 projects about the one or two people designated to get access to Mira during the ESV window. Please think it over and send me (firstname.lastname@example.org) the names of those for your project.
Here’s a snip from the email of June 4:
There will be a two-week period (called the Early Science Verification – ESV), prior to the Mira acceptance-test period, when the Early Science codes can be run on the system. Their running correctly is a prerequisite for starting the acceptance testing (assuming the problem is with the system, and not the code or input decks). The ESV is an important milestone for the Early Science Program, the ALCF, and IBM. Please be prepared to exercise your application at scaleâideally running problems comparable with your planned production runs during the Early Science period. This is also a good opportunity for you to get your first multi-rack testing on Mira done.
Our best estimate for the intended start of acceptance testing is mid August, which places the ESV in the first two weeks of August. We expect to have the login/compile servers ready for your access at least one week prior. Please prepare your codes, and plan for members of your teams to run these tests. Only 1 or 2 key people from each project will have access to Mira in this period, so please identify those people in advance.
Current estimate is that ESV will start laterâsecond half of August or after.