about the company
our client is a health-tech company, transforming primary care through a digital platform that offers affordable, on-demand outpatient services. They aim to enhance the patient experience while significantly reducing healthcare costs for individuals and corporations alike.
about the role
you will design, build, and optimize data pipelines and infrastructure to support Large Language Model (LLM) applications. This includes enabling efficient data extraction, preprocessing, and integration with real-time systems while ensuring scalability and reliability.
about the job
- design and deploy scalable architectures for LLM inference, including distributed model serving and latency optimization.
- develop robust ETL pipelines to preprocess large-scale unstructured text data for LLM fine-tuning and real-time applications.
- implement vector search systems using tool like Pinecone to support LLM-based retrieval-augmented generation (RAG) workflows.
- optimize tokenization, embeddings, and pretraining workflows to enhance LLM efficiency and output quality.
- integrate LLMs with cloud-native solutions like AWS Bedrock to streamline model deployment and inferencing pipelines.
... knowledge, skills and experience
- experience designing and implementing scalable data pipelines tailored for LLM fine-tuning and real-time inferencing.
- proficient in integrating LLMs with APIs and streaming data sources for dynamic content generation.
- expertise in handling unstructured text data, vector databases, and embeddings for LLM workflows.
- familiarity with transformer architectures and preprocessing techniques for NLP data.
- good understanding of data extraction frameworks in production (including class-based data models using tools like Pydantic).
- expertise in optimizing prompt engineering strategies to enhance LLM performance and outputs.
- proficiency in leveraging cloud technologies such as AWS.
how to apply
interested candidates may contact Hua Hui at +6017 960 0313 for a confidential discussion.