LLM engineer (удаленная работа)

5 февраля 2026

Уровень зарплаты:
з.п. не указана
Требуемый опыт работы:
Не указан

Вакансия: LLM engineer

Описание вакансии

We are looking for LLM/ ML Infrastructure engineer experienced with Rust/C++ and CUDA for remote work.

Our client is building a decentralized AI infrastructure focused on running and serving ML models directly on user-owned hardware (on-prem / edge environments).

A core component of the product is a proprietary capsule runtime for deploying and running ML models. Currently, some components rely on popular open-source solutions (e.g., llama.cpp). Still, the strategic goal is to replace community-driven components with in-house ML infrastructure to gain complete control over performance, optimization, and long-term evolution.

In parallel, the company is developing:

  • its own network for generating high-quality, domain-specific datasets,

  • fine-tuned compact models for specialized use cases,

  • a research track focused on ranking, aggregation, accuracy improvements, and latency reduction.

The primary target audience is B2B IT companies.

The long-term product vision is to move beyond generic code generation and focus on high-performance, hardware-aware, and efficiency-optimized code generation.

ML Direction

1. Applied ML Track (Primary focus for this role)

  • Development of ML inference infrastructure

  • Building and evolving proprietary runtime capsules

  • Porting and implementing ML algorithms on a custom architecture

  • Low-level performance optimization across hardware platforms

2. Research Track

  • ML research with published papers

  • Improvements in answer quality and inference efficiency

  • Experiments with aggregation, ranking, and latency reduction

This position is primarily focused on the applied ML / engineering track.

Role

This is a strongly engineering-oriented ML role focused on inference, performance, and systems-level implementation rather than model experimentation.

Approximately 90% of the work is hands-on coding and optimization.

You will

  • Implement ML algorithms from research papers into production-ready code

  • Port existing ML inference algorithms to the company s proprietary architecture

  • Develop and optimize inference

  • Optimize performance, memory usage, and latency

  • Integrate and adapt open-source ML solutions (LLaMA, VLMs, llama.cpp, etc.)

  • Contribute to the foundational architecture of the ML platform

Key Responsibilities

Inference Infrastructure Development:

Design and implementation of a cross-platform engine for ML model inference

Development of low-level components in Rust and C++ with focus on maximum performance

Creation and integration of APIs for interaction with the inference engine

Performance Optimization:

Implementation of modern optimization algorithms: Flash Attention, PagedAttention, continuous batching

Development and optimization of CUDA kernels for GPU-accelerated computations

Profiling and performance tuning across various GPU architectures

Optimization of memory usage and model throughput

Model Operations:

Implementation of efficient model quantization methods (GPTQ, AWQ, GGUF)

Development of memory management system for working with large language models

Integration of support for various model architectures (LLaMA, Mistral, Qwen, and others)

We are waiting from you

  • Strong proficiency in Rust or C++

  • Hands-on experience with GPU / hardware acceleration, including:

    • CUDA, AMD or Metal (Apple Silicon)

  • Solid understanding of:

    • LLM principles

    • core ML algorithms

    • modern ML approaches used in production systems

  • Ability to read ML research papers and implement them in code

  • Ability to write clean, efficient, highly optimized code

  • Interest in systems-level ML and low-level performance optimization

  • High level of autonomy:

    • take existing algorithms from research or open-source,

    • understand them deeply,

    • adapt and integrate them into a new architecture

  • Fruent English

What The Company Offers

  • Remote-first setup (work from anywhere)

  • Dubai working hours

  • High level of ownership and autonomy

  • Flat structure

  • Salary in cryptocurrency

  • An opportunity to create a great product that will break the AI market



Посмотрите похожие вакансии

AI Engineer / LLM Engineer (Agent & RAG focus)
Компания: Симакин Дмитрий Павлович
Зарплата: з.п. не указана
LLM Engineer Удалённо
Компания: ИЦ АЙ-ТЕКО
Зарплата: з.п. не указана
Lead LLM Engineer
Компания: DCS
Зарплата: з.п. не указана
AI/LLM Engineer в GenAI Evaluation
Компания: HeadHunter
Зарплата: з.п. не указана