Incident Manager

  • Endurance International
  • Burlington, NC, USA
  • Jan 14, 2018

Job Description

Endurance International Group is a publicly traded (NASDAQ: EIGI) technology company that helps power small and medium-sized businesses online. Through its proprietary cloud platform, Endurance provides web presence solutions including web hosting, eCommerce, eMarketing and mobile business tools to approximately 4.4 million subscribers around the globe. The companys world-class family of brands includes Bluehost, HostGator, iPage, Domain.com, A Small Orange, MOJO Marketplace, BigRock and ResellerClub, among others.

In India, Endurance has a very strong presence through its brands Hostgator, BlueHost , BigRock, ResellerClub & Logicboxes. Through its strong performance & brand marketing activities, Endurance has been able to cement itself as an innovative family of brands that are market leaders within their categories.

Headquartered in Burlington , Massachusetts, Endurance employs more than 2,700 people across the United States in Utah, Texas, Washington and Arizona and in the United Kingdom , India. Israel and Brazil.

What You Will Be Doing

As part of the global Service Assurance team, you will play a key role in the performance and stability of our product infrastructure and platforms across all data center and business locations. This team is committed to delivering the highest system uptime and operations transparency. You will be performing critical duties and have essential functions within our incident, event and problem management processes and you must be able to demonstrate that you can stay composed, focused, and effective under pressure.

* Manage the restoration process for the customer outages by guiding technical teams to execute timely resolution procedures.

* Coordinate and manage communication bridges and execute escalation procedures with intelligence and authority. Maintains bridge commander presence throughout the event.

* Provide timely, succinct and clear written and verbal communication to all stakeholders during internal crisis events including delivery of a written Service Interruption Report within 24 hours of service restored declaration.

* Track, report and manage all follow-up actions for timely closure including procedure, process, training, technology and people actions associated with improving services.

* Help drive standard processes, training, and responsiveness for internal crisis events.

* Completes daily, weekly and month-end reporting and analysis of key performance indicators for technical leadership team, including analysis of key performance metrics to help management evaluate success of programs and projects

Top Skills

* You are first and foremost a team player who must work across different functional groups and business units seamlessly

* Strong communication (English) skills are particularly important for this role. Must be able to write concise, internal-customer, documents that anticipate and answer executive level questions after an outage or internal crisis event.

* Experience with incident and problem management processes and procedures. Knowledge of monitoring infrastructure and capabilities.

* Experience with monitoring tools like Zabbix, NewRelic, Graphite and Nagios. Understanding and some skill creating and configuring zabbix templates is highly desirable

* Exposure to Change Management within an administration role or knowledge of ITIL service support principles.

* Strong analytical skills with Intermediate experience SQL syntax and Advanced knowledge of Microsoft Excel is a must SDL2017