Now training the next generation of frontier models

The human layer
behind smarter AI

Aiyana Tech is the trusted human-in-the-loop partner for frontier AI labs and enterprise model teams. We design evaluations, write reference data, and refine model behavior — at the quality bar real research demands.

Join as an expert See what we do

Trusted by 2,400+ domain experts across 60+ countries

aiyana@evaluator ~ task_842.json

● live

Prompt id: PRJ-FRONTIER-072

"Explain the trade-offs between retrieval-augmented generation and fine-tuning for a domain-specific medical assistant."

RESPONSE A ★ 4.6

RAG offers up-to-date grounding with auditable citations, but adds latency and depends on retrieval quality...

RESPONSE B · PREFERRED ★ 4.9

Fine-tuning bakes domain idiom into the weights for crisp, low-latency answers — best when corpora are stable and licensed...

Helpfulness · 9 Accuracy · 10 Safety · 10

Rubric · v3.2

Cite primary sources
Flag clinical claims
No prescriptive advice
Discuss latency budget

QUALITY GATE PASSED

inter-rater κ = 0.91 94 / 100

14 experts reviewing

Powering training pipelines at

NEURAL LABS FRONTIER MODELS AXIOM AI HELIOS RESEARCH OPENMIND POLARIS SENTIO PARALLEL.CO ATELIER ML NORTHWIND AI

What we do

Human signal at machine scale

From red-teaming a frontier model to writing thousands of expert reference answers — we build the human pipeline labs need to ship safer, sharper AI.

RLHF · DPO

Evaluation & rating

Structured comparisons, Likert and rubric scoring, and continuous preference data — calibrated by inter-rater agreement above κ = 0.85.

SFT · gold sets

Expert authoring

MDs, lawyers, PhDs, engineers and writers craft gold-standard responses for domains where generic crowds simply can't deliver.

Safety

Red-teaming & safety

Adversarial probing, jailbreak audits, and policy stress-tests by trained safety specialists — with full chain-of-evidence reporting.

Agents

Agentic & tool-use

Multi-step task design and evaluation for agentic models — browser, code, planning, and tool-use traces graded by domain experts.

Methodology

Rubric & taxonomy design

Senior reviewers co-design the rubrics, taxonomies and gold-set methodology that make your evaluations reproducible and defensible.

40+ langs

Multilingual coverage

Native speakers across 40+ languages — including low-resource locales — for translation quality, cultural nuance and global rollouts.

How it works

A pipeline built for research-grade quality

Every project flows through a deliberate, instrumented pipeline — so quality isn't accidental, it's measured.

STEP 01

Scoping

We sit with your research team to map model behaviors, define rubrics, and align on quality gates.

STEP 02

Talent matching

We assemble a vetted cohort — domain experts, multilingual reviewers, safety specialists — calibrated to the task.

STEP 03

Production

Workers complete tasks in our QA-instrumented platform with live agreement, gold checks, and reviewer escalation.

STEP 04

Delivery

Clean, schema-typed data lands in your bucket with full audit trail, agreement metrics, and a debrief.

2,400+

Vetted experts

60+ countries · 40+ languages

4.2M

Tasks completed

across 80+ enterprise projects

κ 0.91

Avg. inter-rater agreement

research-grade calibration

< 36h

Project ramp time

cohort live in under two days

For workers

Get paid to teach the next generation of AI.

If you're a domain expert — a writer, engineer, scientist, lawyer, doctor or polyglot — your judgment is exactly what frontier models need. Remote, flexible, and well-compensated work directly with the labs shaping AI.

Work from anywhere, set your own hours
Competitive pay in USD, paid weekly
Real impact on real, deployed models
Grow into senior reviewer & project lead roles

Start your application Learn more →

Lara, MD · Project lead

Lisbon · 2y at Aiyana

"I came in writing reference answers in clinical reasoning. Two years later I lead a 40-person team auditing medical safety for one of the largest models in the world. Best decision I've made."

★ 4.97 reviewer rating

PAYOUT · APR

$3,840

↗ 18% vs last month

Questions, answered

Common questions about working with us

The short answers — see the full FAQ if you want more depth.

Do I need an AI/ML background?

No. We hire domain experts — clinicians, lawyers, writers, engineers, translators. We train you on the rest.

How much can I earn?

Most experts earn $25–$55/hr depending on domain and seniority. Senior reviewers and leads earn meaningfully more.

Is it really remote?

Yes — 100% remote. All you need is a laptop and a stable internet connection.

What does the assessment look like?

A short calibration task in your domain, followed by a brief async interview. Most candidates hear back within a week.

See all FAQs →

Build alongside the people shaping AI.

Whether you're a lab seeking a high-trust training partner, or an expert ready to put your judgment to work — let's talk.

Apply to join Talk to our team

The human layer behind smarter AI