Senior Site Reliability Engineer - Exchange

Posted 2 months ago


Not disclosed
Not disclosed
Time commitment
Full time
Company size
Between 201 - 500


Network Architecture
Cloud Networking
Creative Thinking
Cloud Security
Linux Operating System

Job description

Pintu is offering an opportunity for a full-time Site Reliability to join our Exchange SRE Team. The individual in this role will experience running complex geographically distributed Cloud setups that are serving a large number of client connections, both ad-hoc and streaming ones.
This position requires outstanding technical proficiency, professionalism, solid communication, exceptional problem-solving skills, and an eager attitude.
The successful candidate will play a key role in building, operating, and evolving an error-free, low-latency, high capacity, and throughput next-gen Crypto Exchange, its matching engines or back-end software systems that serve millions of customers (retail or institutional investors, B2B2C clients, market makers, etc.).
The ideal candidate should be knowledgeable in the trading technologies domain, infrastructure as code concepts, various orchestration engines and containerization technologies, monitoring engines, and stacks, and have familiarity with high-performance computing and networking.
Strong written and oral communication is a must, as the applicant will frequently be interacting with the business stakeholders and product teams to achieve Pintu's strategic business goals.
Essential Functions / Responsibilities
  • Analyze Business/Product requirements and propose effective and efficient technical solutions in delivering changes and innovations to the Pintu Exchange infrastructure and landscape
  • Work with a project focus group (product engineering, product management, architecture, and CTO) to compile a work breakdown structure of tasks for given deliverables and provide realistic estimates for completion or project assignments
  • Design, build, maintain and improve Pintu’s Exchange infrastructure and respective tooling. Ensure infrastructure elasticity and automated scalability for cost-efficiency in resources utilization while ensuring the system’s high availability and fault tolerance
  • Collaborate with other Developers, SREs, and QA Engineers to execute full-cycle integration, functional, and regression testing. Own and resolve all priority defects identified within the solution codebase efficiently and in a timely fashion
  • Promote software changes across all environments, safely and responsibly, through Development, Staging environments to deploying updates to the Production environment in a zero-downtime manner
  • Provide effective infrastructure Level 1 technical support during business and, occasionally, off hours depending on a rotation schedule. Design, build, maintain and improve the respective infrastructure monitoring tooling that is critical for both:momentum situational awareness and pro-active incident response future infrastructure capacity planning activities
  • Participate in team exercises to identify and implement areas for continuous improvement, and be proactive in bringing your ideas across
  • Educate and mentor your engineering colleagues in the areas of your own expertise and domain knowledge, and be open-minded and approachable
Experience Required
  • 5+ years of SRE experience, ideally working with Amazon Web Services and Google Cloud environment. MS Azure.
  • Experience in designing and implementing AWS and/or GCP setup from scratch
  • Experience building and running cross-regional resilient solutions
  • Experience in architecting, building, deploying, and operating enterprise-ready container solutions on Kubernetes
  • Solid experience in setting up and maintaining message broker infrastructure (Kafka, RocketMQ, etc.)
  • Experience in setting up Cloud Persistence layer (AWS Aurora, GCP BigQuery, etc.)
  • Experience implementing large Service mesh via Istio or any other relevant solution
  • Experience building on-demand, short-lived environments (for debugging, profiling, and load-testing scenarios)
  • Experience working in small focus teams of high-skilled engineers
Necessary Skills
  • Solid understanding of Cloud networking concepts (VPC, peering, interconnects, etc.)
  • Good understanding of Cloud Security principles (VPN, Application Firewall(s), IAM, etc.)
  • Experience with operating systems, especially good knowledge of the Linux operating system and understanding of network architectures
  • Have deep knowledge of Docker and Kubernetes
  • Solid knowledge of Bash, Ansible, and Terraform scripting
  • Well-versed in using SDLC CI/CD pipelines for automated infrastructure management of large-scale system deployments
  • Excellent written and verbal communication skills
  • An energetic, creative, and autonomous self-starter
Preferred/Bonus Skills
  • Knowledge of Makefiles
  • Knowledge of Python and the respective libraries
  • Hand-on experience working with Cloudfare Enterprise stack
  • Knowledge of TCP/IP and UDP networking protocols
  • Experience in infrastructure performance and chaos testing
  • Experience working with GitHub Actions
  • Hands-on experience and knowledge of Hardware Security Modules (HSM) or hardware enclave solutions.
  • Experience in financial technology, with crypto and/or traditional financial know-how a strong plus

How to get hired by clients?


Sign up

Create a profile by sharing with us your personal and professional details.


Take our online talent assessment for skills and competencies evaluation.


Get matched to in-demand jobs and accelerate your freelance career.

Interested in more opportunities like these?

Join now