REPACSS User Guide Overview¶
The REmotely-managed Power Aware Computing Systems and Services (REPACSS) is a high-performance computing (HPC) data center and AI infrastructure prototype that demonstrates the feasibility of using variable energy for advanced computing tasks, with the goal of reducing costs and improving efficiency. REPACSS is designed to support intensive computational and data-driven research, powered by variable energy sources. The system consists of a combination of compute, GPU, and storage nodes, interconnected with high-speed networking to ensure efficient data transfer and processing. The facility is structured to accommodate diverse workloads, including large-scale simulations, AI training, and data analytics. This documentation serves as the official reference for all users seeking to utilize REPACSS systems in accordance with established operational standards, security policies, and best practices.
Documentation Overview¶
-
Absolute Beginner’s Guide
A high-level introduction to high-performance computing concepts and REPACSS usage for new users. -
Scheduling Policies and Compute Accounting
Definitions of resource allocation models, charge factors, runtime limits, and job prioritization policies. -
Sample SLURM Job Scripts
Verified examples to support the construction and submission of batch job scripts. -
Core Job Submission Procedures
Step-by-step overview of SLURM commands, job queues, and execution workflows. -
File Transfer Protocols and Utilities
Guidance on secure and efficient data transfers via Globus. -
UNIX File Access and Permissions
Overview of file ownership, group collaboration, and secure file access in a shared computing environment. -
Sharing File Access with Others
Guidance on how to share your files with others. -
Multi-Factor Authentication (MFA) Procedures
Official process for securing access to REPACSS resources using two-factor authentication.
System Infrastructure and Technical Overview¶
-
System Architecture
Technical description of the computational environment, including node types, CPUs, memory configurations, and interconnect topology. -
Known Issues and Hardware Anomalies
Documented performance irregularities and hardware-specific limitations. -
Software and Module Management
Procedures for loading, managing, and deploying software modules within the REPACSS environment.
Resource Scheduling and Job Execution¶
- Batch and Interactive Job Submission
- Accessing Interactive Sessions
- Monitoring Job Performance and Troubleshooting
- Operational Best Practices for Job Submission
Optimization and Performance Engineering¶
Reference Materials¶
User Support¶
- Technical Support and Contact Information
Resources for user assistance, including help desk contact methods and ticket submission.
Contribution and Governance¶
This documentation is actively maintained by the REPACSS Systems Administration and Support Team. All technical content is subject to periodic review to ensure compliance with institutional standards and research computing best practices.
Contributions, corrections, and feedback may be submitted via
GitHub Pull Request, pending approval from the REPACSS documentation team.