Search Results

Showing results for "LLM"

No image available

smart price for iPhone email for customer

1.ask lowest price in iPhone 17 in amazon 2. who is best seller ? some cheat ,some is real ,choice ipo company, a lot of recommand for buyer ,LLM like chatgpt 3.what time we could get? chatgpt,perp...

Author: [email protected]

Category: MCP | Model:

No image available

RAG for Research Labs

Blueprint a RAG system for a lab wiki and PDFs: chunking policy, hybrid retrieval, and citation-anchored answers. Add privacy filters.

Tags: IR, RAG, academia, pdf, privacy, blueprint

Author: Assistant

Category: applied-IR-LLM-academia | Model: gpt-4o

No image available

Retrieval Eval Harness

Build an eval harness: recall@k, calibrated precision, answer faithfulness, and human-time-to-verify. Include topic-aware test buckets and data drift alarms.

Tags: LLM, retrieval, eval, faithfulness, drift, metrics

Author: Assistant

Category: evaluation-frameworks-LLM | Model: gpt-4o

No image available

KV Offload & Memory Tiers

Engineer a KV-cache offload strategy spanning HBM→HBM2e→CPU RAM→NVMe. Define admission/eviction, compression, and reuse heuristics; simulate hit rates across context lengths (8k–256k).

Tags: LLM, KV-cache, offload, NVMe, memory, context-length

Author: Assistant

Category: systems-architecture-LLM | Model: gpt-4o

No image available

Quantization Suite: INT8/INT4/NF4

Create a quantization evaluation suite (GPTQ/AWQ/RTN): perplexity, zero-shot accuracy, calibration set selection, and layer-wise sensitivity. Output deployment guidelines by architecture and hardware ...

Tags: LLM, quantization, INT8, INT4, NF4, AWQ, GPTQ

Author: Assistant

Category: model-compression-LLM | Model: gpt-4o

No image available

MoE Routing & Load Balancing

Design an expert-parallel MoE serving topology: gate calibration, capacity factor, expert sharding, and interconnect constraints (NVLink/IB). Provide hot-spot diagnostics and expert-drop policies for ...

Tags: LLM, MoE, experts, routing, capacity, NVLink, InfiniBand

Author: Assistant

Category: distributed-systems-LLM | Model: gpt-4o

No image available

LoRA/QLoRA Strategy

Recommend when to use LoRA/QLoRA vs full finetune. Define rank search, target layers, and quantization-aware adapters. Include memory/perf tables per GPU class.

Tags: LLM, LoRA, QLoRA, finetuning, adapters, GPU

Author: Assistant

Category: parameter-efficient-tuning-LLM | Model: gpt-4o

No image available

Capacity and Cloud Cost Planner

For workload <service>, model 12-month capacity and cloud cost. Include CPU/GPU, storage, egress, and LLM inference. Compare reserved vs spot vs savings plans. Map to SLOs and traffic seasonality. Out...

Tags: "CTO;FinOps;capacity;SLO;cost"

Author: ChatGPT

Category: CTO | Model: GPT-5 Thinking

Back to Home