LLMs for Engineers
Subscribe
Sign in
Home
Archive
About
ai
Low-Budget Judge for High-End Hallucination Verdicts
… boosting LLM accuracy by >5% amidst label scarcity and budget constraints.
Nov 21, 2024
•
Daniel Omeiza
4
LLMs Know More Than What They Say
... and how that provides winning evals
Aug 15, 2024
•
Ruby Pai
16
Hybrid Evaluation: Scaling human feedback with custom evaluation models
...how to really get model based evals to work for you
Nov 15, 2023
•
Ansup Babu
and
Arjun Bansal
9
Which Llama-2 Inference API should I use?
understanding the complete trade-offs of Llama-2 providers
Oct 31, 2023
•
Wenzhe Xue
2
Ready, Set, Test: Building Evaluation into Your LLM Workflow
... with llmeval
Oct 13, 2023
•
Niklas Nielsen
How do I evaluate LLM coding agents? 🧑💻
...aka when can I hire an AI software engineer?
Aug 31, 2023
•
Arjun Bansal
1
🕵️🗺️ Where do I deploy Llama-2? 🦙🦙
We share the most cost efficient way to run Llama-2
Aug 22, 2023
•
Arjun Bansal
3
Llama-2 and the open source LLM 🌊
Anyone can own and run full stack LLM applications like never before
Aug 3, 2023
•
Arjun Bansal
1
Evaluating LLM Agents and Applications
A lot of AI research such as HELM and BigBench has been devoted to building test suites to evaluate the accuracy of large language models.
Jul 11, 2023
•
Arjun Bansal
4
1
Evolution of LLM Agents
...and how to avert a crisis on further progress!
Jun 21, 2023
•
Arjun Bansal
and
Niklas Nielsen
4
3 ways to improve LLM Agent chains with debugging
Tl;dr: Cost, reliability & accuracy
May 3, 2023
•
Arjun Bansal
3
3
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts