Support & Downloads

Quisque actraqum nunc no dolor sit ametaugue dolor. Lorem ipsum dolor sit amet, consyect etur adipiscing elit.

 

Contact Info
198 West 21th Street, Suite 721
New York, NY 10010
youremail@yourdomain.com
+88 (0) 101 0000 000
Follow Us

GLOBAL | INDIA

Onsite HPC Engineer – Hands on experience in handling Cluster (Ref:125)

Job title:
Onsite HPC Engineer – Hands on experience in handling Cluster (Ref:125)
Description:

Positions: 2 No
Experience: L2 4 years. & L3 6 years ( Remote support)
Location: Bangalore

Requirement:

Support :  Onsite HPC Engineer

Scope of Work:

On Site Dedicated Resource

  • HPC Cluster Support
  • OS Support
  • OS trouble shooting
  • Infiniband support
  • Re-Installations of nodes in the event of OS corruption or similar incidents
  • Patch deployment
  • OS and Application upgrade as an when needed
  • Compilers & Libraries installation and fine-tuning
  • Application installation
  • Scheduler – creating and maintaining the Queues and policies & integrating Applications with scheduler
  • MPI implementing and support
  • Taking periodical backups, verify and make it ready for restoration on any failures
  • Regular Health checkups of Cluster
  • Patch updates on requirement basis
  • Problem identification and resolution of all HPC Components
  • Problem escalation to respective teams internally and externally(OEMs & ISVs)
  • Generate cluster usage reports periodically
  • Maintain call logs with response, resolution time with provided solution
  • Creating and Maintaining FAQ

Service Level objectives

The standards of IT Support services are strictly driven by service level objectives defined on the best practices of ITIL. IT Services revolve around below management

  1. Incident management
  2. Change management
  3. Problem management
  4. Knowledge management
  5. Service level management

Incident Management:

The main objective of incident management is to restore the normal operation as quickly as possible and ensure no negative effects on day to day operations.

Change Management:

When there’s change required such as addition, deletion or updating of any IT infrastructure, the change shall be managed based on the below metrics

  • Lower disruption of service
  • Decrease on any back out activities
  • Appropriate usage of resources involved in the change

Problem Management:

The problem Management is to identify the root-cause where the incident is repeating, analyze the root cause and suggest a change.

Knowledge Management:

Learning’s based on the incidents and changes shall be documented for future reference

Service Level Management:

The service levels are defined into three categories:

  1. High severity
  2. Medium severity
  3. Low severity

Addressing the issues reported or identified will be based on the above said service levels.

  • Job Title Applying for with reference number

Post a Comment