Posted at: 6 January

[Job-26769] Senior/Specialist SRE, Colombia

Company

CI&T

CI&T is a Brazil-based B2B information technology and software development company specializing in digital transformation and AI solutions for a global clientele across various industries.

Remote Hiring Policy:

CI&T supports remote work and hires globally, with team members located in various regions including the United States, Canada, the United Kingdom, Portugal, China, Colombia, Japan, and Australia.

Job Type

Full-time

Allowed Applicant Locations

Colombia

Job Description

We are tech transformation specialists, uniting human expertise with AI to create scalable tech solutions.
With over 8,000 CI&Ters around the world, we’ve built partnerships with more than 1,000 clients during our 30 years of history. Artificial Intelligence is our reality.

Your mission:
The Observability & Monitoring Specialist is responsible for enhancing the visibility, performance, and health of SCT’s application landscape. This role focuses on bridging existing monitoring gaps across 100+ applications by leveraging modern observability solutions to ensure proactive issue detection and rapid incident resolution.

Working as a core part of the infrastructure team, this specialist will optimize our monitoring stack—specifically Splunk, LogicMonitor, and AppDynamics—to transform raw data into actionable insights, ensuring that development and operations teams have the telemetry needed to maintain high-availability systems.

Key Responsibilities:
- Advanced Observability & Monitoring Strategy: Identify and remediate visibility gaps across a landscape of 100+ applications, ensuring full-stack monitoring from infrastructure to the end-user experience. Design and implement modern monitoring patterns to move from reactive alerting to proactive anomaly detection.
- Platform Management (Splunk, LogicMonitor, AppDynamics)
Splunk: Optimize log aggregation, create complex dashboards, and develop advanced queries (SPL) to support troubleshooting and security auditing.
LogicMonitor: Manage system-level monitoring (CPU, Memory, Disk, Network) and refine threshold logic to reduce alert noise while maintaining high sensitivity to critical failures.
AppDynamics: Configure Application Performance Monitoring (APM) to track business transactions, map dependencies, and identify code-level bottlenecks.
- Reliability Engineering & Performance Analysis: Correlate data across multiple platforms to provide a holistic view of system health and performance. Partner with application owners to define meaningful Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
- Automation & Modernization: Automate the deployment and configuration of monitoring agents across Windows and Linux environments to ensure "monitoring-as-code" standards. Advocate for and implement modern observability solutions (OpenTelemetry, tracing, etc.) as the application landscape evolves.
- Deployment & Incident Response Support: Provide deep-dive technical support during high-priority incidents by leveraging AppDynamics and Splunk for rapid root cause analysis (RCA). Create and maintain operational dashboards and runbooks that enable 24x7 support teams to respond effectively to alerts.
- Continuous Improvement & Governance: Audit the existing monitoring environment to eliminate redundant alerts and ensure compliance with internal ITGC and security standards. Conduct knowledge-sharing sessions to empower application teams to self-serve using the observability toolkit.

Professional Expectations:
- Fluent English skills to interact with multicultural team and Amerian client everyday.
- Demonstrate a "data-driven" mindset, using metrics to influence technical decisions and infrastructure investments.
- Exhibit strong collaboration skills, acting as the bridge between infrastructure stability and application performance.
- Proactively identify trends in system behavior to prevent outages before they impact the business.

If you like it, just apply and good luck!
#LI-JM2
Our benefits include:

- Premium Healthcare
- Meal voucher
- Maternity and Parental leaves
- Mobile services subsidy
- Sick pay-Life insurance
- CI&T University
- Colombian Holidays
- Paid Vacations
And many others.


Collaboration is our superpower, diversity unites us, and excellence is our standard.
We value diverse identities and life experiences, fostering a diverse, inclusive, and safe work environment. We encourage applications from diverse and underrepresented groups to our job positions.