About the Role

We are seeking a skilled AI Engineer to design and build robust AI pipelines for a production-ready voice AI agent. You'll be responsible for setting up the complete architecture—from voice cloning and LLM fine-tuning to real-time STT/LLM/TTS integration—with a focus on performance, reliability, and natural interaction.

This is a hands-on, model-building role ideal for engineers who thrive on building intelligent systems from the ground up.

Responsibilities

Design and implement end-to-end AI pipelines integrating:
Speech-to-Text (STT)
Language Model (LLM) reasoning
Text-to-Speech (TTS)
Fine-tune lightweight open-source LLMs (e.g., Phi-2, Mistral, TinyLLaMA) on domain-specific conversational data
Clone realistic voice models using open-source TTS libraries (e.g., Coqui, Bark, TorToiSe)
Optimize inference latency for real-time voice interaction
Preprocess and curate audio/text datasets for model training
Integrate AI components into the backend of a voice agent system
Set up training and deployment infrastructure using cloud GPU platforms (e.g., AWS, Lambda Labs, GCP)
Build monitoring, logging, and testing tools for model behavior and drift
Collaborate with product and external telephony teams to align AI output with business goals

Required Skills

Strong Python development skills, with experience in ML libraries such as PyTorch and Hugging Face Transformers
Hands-on experience training or fine-tuning TTS and LLM models
Experience building STT/LLM/TTS pipelines for real-time or near real-time systems
Familiarity with voice synthesis tools (Coqui TTS, Bark, etc.)
Knowledge of prompt engineering and conversational design
Experience deploying models on GPU-backed environments using Docker or Conda
Ability to work with raw datasets (audio and text) and implement efficient data pipelines

Nice to Have

Experience with telephony integration (e.g., Twilio, SIP, ViciDial)
Experience with LoRA or QLoRA for parameter-efficient fine-tuning
Familiarity with vector search and retrieval-augmented generation (RAG)
Knowledge of speech prosody control and expressive speech synthesis

Salary & Benefits

Competitive salary and fringe benefits.
Paid Time off
Leaves encashment
EOBI
Fuel Card
Professional Development
Career Advancement
Team Building Activities
Innovative Work Environment
Work-Life Balance
Company trips
Wellness Programs
Performance based promotion

Location: Blue Area Islamabad

Office Hours: 05:00 PM to 02:00 AM (Monday to Friday)

Job Types: Part-time, Temporary, Contract

Education:

Bachelor's (Preferred)

Language:

Speak English fluently (Preferred)

Work Location: Remote

Save Apply

Report job

AI Engineer (Project based / Part Time / Freelancer)

International Rescue Committee is hiring of services for provision of Competency-based Refresher training of Community Midwives (CMWs) under PATRIP funded project in district Chagai & Chaman, Balochistan, Pakistan.

Support Engineer (Civil Engineer for WASH Project)

Customer Support Representative required for UK based Taxi Companies

Creative Content Writer | Office Based

Customer Care Representative (UK Based Experience Required)