As a Reactive Problem Manager, you will play a pivotal role in ensuring the timely resolution of critical incidents and problems that impact the organization’s IT infrastructure and services. You will be responsible for swiftly identifying, managing, and coordinating the resolution of high-priority incidents, working closely with various technical teams to minimize downtime and mitigate risks. Your analytical skills, proactive approach, and ability to collaborate effectively will contribute to maintaining a stable and efficient technology environment.
Problem Identification and Analysis:
Able to identify recurring Major incidents and patterns to determine underlying problems causing service disruptions.
Conduct thorough root cause analyses to detect the fundamental issues contributing to Major incidents and drive preventive and corrective actions.
Analyze and Quality checks for Major incident data to identify opportunities for improving service quality and stability.
Coordination and Collaboration:
Serve as a central point of contact for Problem management queries, ensuring clear and effective communication between technical teams, stakeholders, and leadership. Collaborate with Major incident response teams to develop and implement strategies to mitigate and prevent future incidents.
Engage and collaborate with the account team and the technical teams to implement strategies to mitigate and prevent future incidents.
Leading the Root Cause Analysis calls and driving to identifying the root cause and finalizing the RCA report
Resolution and Documentation:
Coordinate and track the implementation of solutions, workarounds, and fixes for identified problems.
Maintain accurate and up-to-date documentation of incident details, problem investigations, and resolution actions.
Contribute to the creation of knowledge base articles and documentation to support Major incident response and problem-solving efforts.
Working on Preventive Action reports and ensuring the Problem tickets and Preventive actions are completed within the defined SLAs.
Providing a high-end trend analysis and area of focus in terms of Technology, Process, and People.
If needed business need to respond promptly to critical Major incidents, assess their severity, and initiate the appropriate escalation and resolution processes.
Collaborate with cross-functional teams, including technical support, operations, and development, to diagnose and resolve Major incidents efficiently.
Lead incident bridges and facilitate communication between teams during high-pressure situations.
Participate in post-incident reviews to assess the effectiveness of problem-solving efforts and identify areas for improvement.
Propose and implement process enhancements to streamline incident and problem management workflows.
Stay current with industry best practices and emerging technologies related to incident and problem management.
Qualifications and Skills:
Bachelor’s degree in Computer Science, Information Technology, or a related field (or equivalent work experience).
Proven experience of more than 5 years in incident and problem management within a fast-paced IT environment.
Strong analytical and problem-solving skills, with the ability to dissect complex technical issues.
Excellent communication and interpersonal skills, capable of effectively conveying technical information to both technical and non-technical stakeholders.
Familiarity with incident management tools and methodologies (e.g., ITIL, ITSM).
Ability to work under pressure, prioritize tasks, and manage multiple incidents concurrently.
Experience with root cause analysis techniques and continuous improvement initiatives.
Proficiency in collaborating across cross-functional teams and building strong working relationships.
Relevant certifications in incident management (e.g., ITIL Foundation) or related areas.
Previous experience working on Power BI and knowledge of Advance Excel.
Knowledge of IT infrastructure, Datacenter, Network, and Mainframe architecture.