AI Data Engineer

The CHI Software team is not standing still. We love our job and give it one hundred percent of us! Every new project is a challenge that we face successfully. The only thing that can stop us is... Wait, it’s nothing! The number of projects is growing, and with them, our team too. And now we need а AI Data Engineer.

Requirements:

Proficiency in programming languages such as Python, Java, Scala, SQL, or Go;
Experience with big data tools like Apache Spark, Hadoop, Hive, Flink, or Presto;
Knowledge of data streaming platforms, such as Apache Kafka, RabbitMQ, AWS Kinesis, or GCP Pub/Sub;
Familiarity with relational and NoSQL databases, including PostgreSQL, MySQL, MongoDB, Cassandra, Snowflake, or Redis;
Hands-on experience with cloud platforms: AWS (S3, Glue, Lambda, Redshift), Azure (Data Factory, Databricks), or Google Cloud (BigQuery, Dataflow);
Understanding of DevOps tools, including Docker, Kubernetes, Terraform, Jenkins, Airflow, or dbt;
Knowledge of data validation and monitoring tools, such as Great Expectations, Monte Carlo, Grafana, or Prometheus;
Strong collaboration skills and the ability to work with data scientists, DevOps teams, and stakeholders;
Experience designing scalable ETL/ELT pipelines for processing structured, semi-structured, and unstructured data;
Expertise in real-time data streaming solutions (e.g., Apache Kafka, Apache Flink);
Familiarity with AI/ML tools like TensorFlow, PyTorch, Scikit-learn, or MLflow;
Knowledge of data privacy regulations (e.g., GDPR, CCPA) and experience normalizing, cleaning, and anonymizing sensitive data;
Hands-on experience with tools like Databricks for collaborative data engineering.

Responsibilities:

Building and Maintaining Data Pipelines:
Designing scalable ETL/ELT pipelines to process structured, semi-structured, and unstructured data;
Implementing real-time data streaming solutions using tools like Apache Kafka and Apache Flink.
Developing Data Architectures:
Designing robust and scalable architectures for AI workflows, including data lakes and data warehouses;
Utilizing platforms like Snowflake, Redshift, and BigQuery for efficient data storage and querying.
Integrating and Preparing Datasets:
Handling large-scale datasets for training and testing AI/ML models.
Normalizing, cleaning, and anonymizing sensitive data for compliance with privacy regulations (e.g., GDPR, CCPA).
Optimizing System Performance:
Tuning performance for big data frameworks like Apache Spark and Hadoop.
Utilizing tools like Databricks for collaborative data engineering.
Ensuring Data Quality and Reliability:
Implementing data validation and observability tools like Great Expectations and Monte Carlo.
Monitoring and troubleshooting data pipelines to ensure reliability.
Collaboration:
Working closely with data scientists, DevOps, and stakeholders to align data workflows with business and AI needs.