Generative AI Data Engineers
The CHI Software team is not standing still. We love our job and give it one hundred percent of us! Every new project is a challenge that we face successfully. The only thing that can stop us is... Wait, it’s nothing! The number of projects is growing, and with them, our team too. And now we need а Generative AI Data Engineers.
Requirements:
- Proficiency in programming languages: Python, R;
- Experience with generative AI frameworks such as Hugging Face Transformers, OpenAI GPT APIs, TensorFlow GAN, PyTorch Lightning;
- Knowledge of data tools, including Apache Parquet, Delta Lake, and Feature Stores (Feast);
- Familiarity with big data technologies like Spark, Hadoop, Ray;
- Hands-on experience with cloud platforms: AWS (SageMaker, S3, EC2), Azure AI, GCP AI Platform;
- Expertise in specialized tools for various data types:
- Text: NLTK, Spacy, Gensim.
- Images: OpenCV, PIL, PyTorch Vision.
- Audio: Librosa, Torchaudio.
Responsibilities:
- Data Preparation for Generative Models:
Collecting, labeling, and processing unstructured data such as images, text, audio, and videos;
Utilizing advanced preprocessing techniques like tokenization, embedding generation, and image vectorization. - Building Data Pipelines:
Creating pipelines for managing large-scale, high-dimensional datasets;
Developing real-time pipelines for adaptive model updates. - Integration with Generative AI Frameworks:
Preparing data for models like GPT, DALL-E, and Stable Diffusion;
Integrating with APIs from OpenAI, Hugging Face, and other generative AI providers. - Data Anonymization and Augmentation:
Ensuring compliance with privacy regulations by anonymizing sensitive data;
Applying augmentation techniques (e.g., flipping, cropping, text paraphrasing) to enhance training datasets. - System Monitoring and Optimization:
Tracking model training performance using tools like TensorBoard;
Optimizing storage and retrieval for large datasets on cloud platforms. - Collaboration:
Working with ML engineers to fine-tune generative models;
Partnering with product teams to align AI capabilities with business goals.