Data Engineer — Generative AI & RAG Systems
We’re seeking an AI Integration Engineer who thrives at the intersection of data engineering, systems design, and operations. In this role, you’ll design and maintain pipelines that transform both structured and unstructured data into AI-ready formats, enabling the development and deployment of large-scale RAG-based document processing and AI-driven insight generation systems.
This is a fully remote role, so you can work from anywhere while collaborating with global teams to ensure that our AI solutions are scalable, reliable, and production-ready.
What You’ll Do
Data Infrastructure & Integration
- Build and optimize data ingestion pipelines for internal and external sources.
- Process structured and unstructured content (documents, customer feedback, multimedia) into usable formats.
- Define data models and mapping logic to turn raw inputs into structured insights.
- Develop semantic layers that integrate analytics data from multiple systems.
AI/ML System Integration
- Build and maintain APIs and backend services for prompt-to-answer workflows.
- Implement advanced retrieval strategies (vector search, hybrid search, information retrieval).
- Manage document ingestion pipelines with parsing, OCR, chunking, and embeddings.
- Integrate data pipelines with multiple LLM providers (OpenAI, Azure AI, Anthropic, etc.).
Infrastructure & Operations
- Ensure performance, reliability, and scalability of AI-powered response systems.
- Apply governance practices for secure and ethical AI data use.
- Develop monitoring and validation processes to track data quality and detect bias.
- Implement observability, logging, and alerting for AI infrastructure.
Collaboration & Documentation
- Partner with data scientists and analytics teams to align on data requirements.
- Work cross-functionally with business and technical stakeholders.
- Maintain documentation for data flows, mappings, and operational processes.
What We’re Looking For
Background & Experience
- Bachelor’s in Computer Science, Engineering, or related field (or equivalent experience).
- 5+ years of experience building scalable data solutions for large-scale analytics.
- Proven ability to lead and deliver projects end-to-end.
Technical Skills
- Proficiency in Python, Java, or R.
- Strong SQL and experience with cloud data platforms (Snowflake, Redshift, Databricks).
- Hands-on work with distributed systems (Spark, Hadoop).
- Familiarity with orchestration tools (Airflow) and CI/CD workflows.
Data & AI Expertise
- Solid understanding of ETL, data modeling, and data warehousing.
- Experience with vector databases (Pinecone, Milvus, Weaviate, Chroma).
- Comfortable handling unstructured formats (JSON, text, images, audio, video).
- Knowledge of AI/ML lifecycles and data requirements.
Cloud & Infrastructure
- Practical experience with AWS, Azure, or GCP.
- Containerization and orchestration (Docker, Kubernetes).
- Strong API development background (REST, GraphQL, gRPC).
Nice to Have
- Master’s degree in Computer Science, Engineering, or related discipline.
- Experience with cloud-native AI/ML platforms.
- Knowledge of data labeling, augmentation, and governance tools.
- Familiarity with responsible AI principles and data privacy compliance.
- Basic frontend development skills (HTML, CSS, JavaScript).
Tech Stack You’ll Work With
Languages & Frameworks: Python, Java, R, Spark, Hadoop, FastAPI, Django, Flask
Data & AI Platforms: Snowflake, Redshift, Databricks, Pinecone, Milvus, Weaviate, Chroma, LangChain, LlamaIndex
Cloud & Infrastructure: AWS, Azure, GCP, Docker, Kubernetes, Airflow, Kafka
Dev Tools: Git, GitHub, GitLab, Jenkins, GitHub Actions, Jupyter
Location: Remote — work from anywhere.