Changes are being made gradually, to avoid affecting current production. PanDA Elastic Compute Cloud (EC2) Port PanDA database back to mySQL. It will be used as a test-bed for non-LHC experiments. LSST MC production is the first use-case for the new instance. Next step will be refactoring PanDA monitoring. Extending the scope: Why we are looking for opportunistic resources.

Several venues to explore in the next years. Research and Commercial Clouds. Common characteristic to the opportunistic resources: we have to be agile in how we use them. Quick onto them (software, data and workloads) when they become available. Robust against their disappearing under us with no notice. Lack of resources slows down pace of discovery. Extending the scope: Compute Engine (GCE) preview project. Google allocated additional resources for ATLAS for free.

Centos based custom built images, with SL5 compatibility libraries to run ATLAS software. Condor head node, proxies are at BNL.

Output exported to BNL SE. Work on capturing the GCE setup in Puppet. Transparent inclusion of cloud resources into ATLAS Grid. The idea is to test long term stability while running a cloud cluster similar in size to Tier 2 site in ATLAS.

Intended for CPU intensive Monte-Carlo simulation workloads. Planned as a production type of run. We also tested high PROOF based analysis cluster. GCE was rock solid. Most problems that we had were on the ATLAS side.

We ran computationally intensive jobs. Physics event generators, Fast detector simulation, Full detector simulation. Completed 458,000 jobs, generated and processed about 214 M events.

Failed and Finished Jobs. Each LCF is unique. Unique architecture and hardware. Specialized OS, weak worker nodes, limited memory per WN.

Code cross-compilation is typically required. Unique job submission systems. Pilot submission to a worker node is typically not feasible. Tests on BlueGene at BNL and ANL. PanDA project at Oak-Ridge National Laboratory LCF Titan. Slide from Ken Read. Collaboration between ANL, BNL, ORNL, SLAC, UTA, UTK. Cross-disciplinary project - HEP, Nuclear Physics, High-Performance Computing.

PanDA on Oak-Ridge Leadership Computing Facilities. PanDA deployment at OLCF was discussed and agreed, including AIMS project component. Cyber-Security issues were discussed both for the near and longer term. Discussion with OLCF Operations.

ROOT based analysis is tested. Connections are seen as not sufficient or reliable. Data needs to be preplaced. Data comes from specific places. Grid sites organization in clouds in ATLAS. Nothing can happen utilizing remote resources on the time of running job.

Canonical HEP strategy : Jobs go to data Data are partitioned between sites. Some sites are more important (get more important data) than others.

A dataset (collection of files produced under the same conditions and the same SW) is a unit of replication. Data and replica catalogs are used to broker jobs. Analysis job requires data from several sites triggers data replication and consolidation at one site or job splitting on several jobs running on all sites.

A data analysis job must wait for all its data to be present at the site. The situation can easily degrade into a complex n-to-m matching problem.



