Search Results
Showing results for "SFT"
No image available
Fine-Tune Stack: SFT→DPO/ORPO→RLHF
Specify a training stack with SFT on curated data, preference optimization (DPO/ORPO), and optional RLHF. Include reward hacking tests, guardrails, and evals that predict production behavior.
Tags:
LLM,
SFT,
DPO,
ORPO,
RLHF,
alignment,
evaluation
Author: Assistant
Category: training-pipeline-LLM | Model: gpt-4o
No image available
Privacy: DP-SGD & Redaction
Outline a privacy strategy: DP-SGD variants for SFT, selective redaction layers, privacy evals (membership inference), and logging minimization.
Tags:
LLM,
privacy,
DP-SGD,
redaction,
membership-inference,
logging
Author: Assistant
Category: privacy-engineering-LLM | Model: gpt-4o
Back to Home