About the Role
We are seeking a skilled AI Engineer to design and build robust AI pipelines for a production-ready voice AI agent. You'll be responsible for setting up the complete architecture—from voice cloning and LLM fine-tuning to real-time STT/LLM/TTS integration—with a focus on performance, reliability, and natural interaction.
This is a hands-on, model-building role ideal for engineers who thrive on building intelligent systems from the ground up.
Responsibilities
- Design and implement end-to-end AI pipelines integrating:
- Speech-to-Text (STT)
- Language Model (LLM) reasoning
- Text-to-Speech (TTS)
- Fine-tune lightweight open-source LLMs (e.g., Phi-2, Mistral, TinyLLaMA) on domain-specific conversational data
- Clone realistic voice models using open-source TTS libraries (e.g., Coqui, Bark, TorToiSe)
- Optimize inference latency for real-time voice interaction
- Preprocess and curate audio/text datasets for model training
- Integrate AI components into the backend of a voice agent system
- Set up training and deployment infrastructure using cloud GPU platforms (e.g., AWS, Lambda Labs, GCP)
- Build monitoring, logging, and testing tools for model behavior and drift
- Collaborate with product and external telephony teams to align AI output with business goals
Required Skills
- Strong Python development skills, with experience in ML libraries such as PyTorch and Hugging Face Transformers
- Hands-on experience training or fine-tuning TTS and LLM models
- Experience building STT/LLM/TTS pipelines for real-time or near real-time systems
- Familiarity with voice synthesis tools (Coqui TTS, Bark, etc.)
- Knowledge of prompt engineering and conversational design
- Experience deploying models on GPU-backed environments using Docker or Conda
- Ability to work with raw datasets (audio and text) and implement efficient data pipelines
Nice to Have
- Experience with telephony integration (e.g., Twilio, SIP, ViciDial)
- Experience with LoRA or QLoRA for parameter-efficient fine-tuning
- Familiarity with vector search and retrieval-augmented generation (RAG)
- Knowledge of speech prosody control and expressive speech synthesis
Salary & Benefits
- Competitive salary and fringe benefits.
- Paid Time off
- Leaves encashment
- EOBI
- Fuel Card
- Professional Development
- Career Advancement
- Team Building Activities
- Innovative Work Environment
- Work-Life Balance
- Company trips
- Wellness Programs
- Performance based promotion
Location: Blue Area Islamabad
Office Hours: 05:00 PM to 02:00 AM (Monday to Friday)
Job Types: Part-time, Temporary, Contract
Education:
- Bachelor's (Preferred)
Language:
- Speak English fluently (Preferred)
Work Location: Remote
Report job