Red Hat MRG

Red Hat Enterprise MRG Grid Features

MRG Grid key features include:

  • Scalable grid scheduler: MRG's Grid scheduler is based on Condor, which powers many of the largest grids in the world and easily scales beyond tens of thousands of nodes.
  • Virtualization: MRG Grid allows submitting of a virtual machine (VM) as a user job and supports migration of the VM.
  • Cloud scheduling: MRG Grid enables leveraging computing resources at cloud-based environments like Amazon EC2.
  • Desktop cycle-harvesting: Desktop Cycle-Harvesting allows you to leverage the unused capacity of desktops to add processing power to your grid.
  • ClassAds: ClassAds provides a flexible language for policy and meta-data description.
  • Policies: MRG Grid enables flexible, customizable policies specified by jobs and resources via ClassAds.
  • Low-latency scheduling: MRG's integration of Messaging and Grid technology enables scheduling jobs and getting results from MRG Grid deployments in the millisecond range. MRG also includes integration with Microsoft Excel for running calculations on a grid.
  • Concurrency limits: MRG includes the ability to set currency limits on jobs. These limits can restrict the instances of licensed software running at a time or govern access to scarce resources.
  • Dynamic provisioning: MRG Grid dynamically adjusts resource slots based on jobs.
  • Federated grids/clusters: A mechanism known as flocking allows independent pools to use each others' resources, controllable by customizable policies.
  • Multiple Standards-Based APIs: Web Service interface provides job submission and management functionality; CLI provides a highly scriptable, with consistent output, interface to all functionality.
  • Workflow Management: The ability to specify job dependencies, via DAGMan, allows for construction and execution of complex workflows.
  • High Availability: The Negotiator and Collector, via HAD, and the Schedd, via Schedd Fail-over, can have their state replicated to allow for graceful fail-over upon service disruption.
  • Database Support: All data about jobs and resources can be stored in a database via Quill.
  • Compute On-Demand (COD): The ability for a node or set of nodes to be claimed by a user in such a way that others may use the claimed nodes until the user needs them.
  • Dynamic Pool Creation: Through a technology known as Glide-ins, nodes can be dynamically added to a pool to service user jobs.
  • Priority Based Scheduling
    • Priority scheduling is performed at the granularity of a user.
    • Fair-share scheduling can be performed on groups of users.
    • Priority management is controllable by administrators.
  • Accounting: User and group resource utilization is tracked and accessible to administrators.
  • Security
    • Authentication, multiple mechanisms (kerberos, ssl, shared secret, claimtobe, filesystem, remote-filesystem)
    • Privacy, network encryption (blowfish, 3des)
    • Integrity, of network traffic (md5)
    • Authorization, through flexible configuration policies
  • Account Remapping
    • Allows for execution across administrative domains.
    • Enhance security by using a restricted pool of users to run jobs on execute machines.
  • Privilege Separation: Only a single, specialized, audited component requires root/administrator permissions on execute nodes.
  • Parallel Universe
    • Provides an extensible framework for running parallel (including MPI) jobs.
    • Co-allocation of compute nodes is done automatically.
    • Framework implementation for MPICH1, MPICH2, and LAM provided.
  • Java Universe: Explicit support of jobs written in Java.
  • Time Scheduling for Job Execution (Cron): Allows a job or multiple jobs to be started at specific times, with customizable policy for failures such as missed deadlines.
  • Backfill: Allows otherwise unused nodes to run jobs provided by BOINC.
  • File Staging: Support for automatic file staging, e.g. job input, and online file io (i.e. file streaming from submit to execute nodes) via Chirp and remote syscalls, in the absense of a shared filesystem.
  • Dedicated and Undedicated Node Management: Allows for dedicated resources (clusters) to be augmented with otherwise undedicated (desktops) using flexible policies.
  • Master-Worker (MW): A C++ framework allowing a single master process to allocate and manage multiple worker processes, which process data based on master specified policies.
  • Condor-C: Allows for jobs in one queue to be moved to another queue.
  • Hawkeye: Allows for automated monitoring of one or more pools.
Grid Management Screenshot

Grid Management Screenshot

Screenshots of Grid Management tool

MRG Grid also enables enterprises to move to a utility model of computing, where they can:

  • Schedule a variety of applications across a heterogeneous pool of available resources.
  • Automatically handle seasonal workloads with high efficiency, utilization, and flexibility.
  • Dynamically allocate, provision, or acquire additional computing resources for additional applications and loads.
  • Execute across a diverse set of environments, ranging from virtual machines to bare-metal hardware to cloud-based infrastructure.