Senior Data Engineer
We are looking for a Data Engineer who specializes in building efficient, scalable, and reliable data pipelines with Databricks and Apache Spark. This fully remote role allows you to work from anywhere while playing a critical part in enabling data-driven decision-making across the organization. You’ll be responsible for designing and maintaining ETL/ELT processes, ensuring strong data governance, and optimizing infrastructure for performance and scalability.
As a key member of our data team, you’ll collaborate with analysts, scientists, and business stakeholders to deliver clean, trustworthy, and high-performing data systems that support analytics, reporting, and advanced AI/ML initiatives.
What You’ll Be Doing
Data Infrastructure & Pipelines
- Build and maintain ETL/ELT workflows on Databricks.
- Design scalable data pipelines using Apache Spark and PySpark.
- Develop data models, transformations, and validation processes to ensure accuracy and reliability.
- Work with structured, semi-structured, and unstructured data from multiple sources.
- Tune and optimize pipelines and queries for large-scale datasets.
Automation & Orchestration
- Orchestrate workflows with Databricks Jobs, Delta Live Tables, Airflow, or Azure Data Factory.
- Contribute to automation, CI/CD, and testing efforts to increase efficiency and reduce manual intervention.
Data Governance & Reliability
- Apply best practices in security, compliance, and governance (e.g., GDPR, CCPA).
- Implement monitoring systems to ensure data integrity and consistency.
Collaboration & Knowledge Sharing
- Partner with data scientists, analysts, and business teams to understand requirements and deliver tailored solutions.
- Document workflows, transformations, and models to promote transparency and reusability.
What We’re Looking For
Background & Experience
- Bachelor’s degree in Computer Science, Engineering, Data Science, or equivalent professional experience.
- 3+ years of experience as a Data Engineer (or similar role) with a strong track record of building pipelines in Databricks.
Technical Expertise
- Proficiency in Python and PySpark for data processing.
- Deep knowledge of Apache Spark and distributed systems.
- Cloud experience (AWS, Azure, or GCP).
- SQL expertise for advanced queries and optimization.
- Familiarity with workflow orchestration tools (Airflow, Azure Data Factory, etc.).
- Strong understanding of ETL/ELT, data modeling, and data quality practices.
Bonus Points For
- Master’s degree in Computer Science, Engineering, or related field.
- Experience with Delta Lake and Delta Live Tables.
- Exposure to ML pipelines and preparing data for AI/ML.
- Knowledge of data governance frameworks and observability tools.
- Hands-on experience with CI/CD for data engineering.
Tech Stack You’ll Work With
- Programming & Frameworks: Python, PySpark, Apache Spark
- Data Platforms: Databricks, Delta Lake, Snowflake, Redshift
- Cloud Providers: AWS, Azure, Google Cloud
- Workflow Tools: Airflow, Azure Data Factory, Databricks Jobs, Delta Live Tables
- CI/CD & Version Control: Git, GitHub, GitLab, Jenkins, GitHub Actions
Location: 100% Remote - work from anywhere.