As a Data Scientist you will take full ownership of end-to-end model development cycles—from ideation and exploratory data analysis through deployment and performance monitoring. You will collaborate closely with engineering, product, and domain-expert teams to discover new signal sources, architect robust feature pipelines, and deliver models that materially improve portfolio performance, user engagement, and internal decision-making.
What You’ll Do
Model Development & Research
- Design, prototype, and validate supervised and unsupervised ML models (e.g., gradient boosting, Transformers, temporal graph networks) for prediction, classification, clustering, and anomaly detection.
- Own the complete ML lifecycle: data gathering, feature engineering, algorithm selection, hyperparameter tuning, model evaluation/interpretation, and production deployment.
Data Engineering & Pipeline Automation
- Build scalable ETL workflows that transform terabytes of structured and unstructured data—time-series, textual news, alternative data, clickstreams—into model-ready feature stores.
- Collaborate with data engineering to optimize data schemas, indexing strategies, and distributed processing on Spark, Snowflake, or Airflow-based DAGs.
Experimentation & Performance Tracking
- Establish rigorous offline and live A/B testing frameworks to quantify ROI, mitigate information leakage, and ensure statistical validity across multiple market regimes.
- Implement monitoring dashboards for drift detection, re-training triggers, and post-deployment model governance.
Cross-functional Partnership
- Translate complex model outputs into intuitive visualizations and insights for product managers, executives, and non-technical stakeholders.
- Mentor junior scientists and review code/experiments to uphold best practices in reproducibility and documentation.
What We’re Looking For
- Technical Skills
- 3 + years building and deploying ML models in production (Python preferred; familiarity with Scala/Java a plus).
- Deep hands-on experience with at least two of: PyTorch, TensorFlow/Keras, XGBoost/LightGBM, Hugging Face Transformers, HDBSCAN/graph analytics libraries.
- Proficiency in SQL and one or more big-data ecosystems (Spark, Dask, Google BigQuery, or similar).
- Comfort with MLOps tooling (MLflow, Kubeflow, SageMaker, Vertex AI) and containerization (Docker/K8s).
- Solid understanding of statistical inference, time-series analysis, and model explainability techniques (SHAP, counterfactuals, LIME).
Domain & Soft Skills
- Strong problem-solving mindset, self-directed research skills, and the ability to simplify complex ideas for diverse audiences.
- Track record of shipping impactful features in a rapid, iterative, and collaborative environment.
- Excellent written and verbal communication in English.
Nice-to-Haves
- Experience with reinforcement learning and recommendation engines.
- Knowledge of retrieval-augmented generation (RAG) pipelines and large-language-model fine-tuning.
- Familiarity with cloud infrastructure costs and optimization strategies (AWS/GCP/Azure).
Job Type: Full-time
Report job