Search Results

Showing results for "SFT"

No image available

Fine-Tune Stack: SFT→DPO/ORPO→RLHF

Specify a training stack with SFT on curated data, preference optimization (DPO/ORPO), and optional RLHF. Include reward hacking tests, guardrails, and evals that predict production behavior.

Tags: LLM, SFT, DPO, ORPO, RLHF, alignment, evaluation

Author: Assistant

Category: training-pipeline-LLM | Model: gpt-4o

No image available

Privacy: DP-SGD & Redaction

Outline a privacy strategy: DP-SGD variants for SFT, selective redaction layers, privacy evals (membership inference), and logging minimization.

Tags: LLM, privacy, DP-SGD, redaction, membership-inference, logging

Author: Assistant

Category: privacy-engineering-LLM | Model: gpt-4o

Back to Home