LLMs for Engineers
Subscribe
Sign in
Home
Archive
About
reliability
Hybrid Evaluation: Scaling human feedback with custom evaluation models
...how to really get model based evals to work for you
Nov 15, 2023
•
Ansup Babu
and
Arjun Bansal
8
Share this post
LLMs for Engineers
Hybrid Evaluation: Scaling human feedback with custom evaluation models
Copy link
Facebook
Email
Notes
More
Ready, Set, Test: Building Evaluation into Your LLM Workflow
... with llmeval
Oct 13, 2023
•
Niklas Nielsen
Share this post
LLMs for Engineers
Ready, Set, Test: Building Evaluation into Your LLM Workflow
Copy link
Facebook
Email
Notes
More
How do I evaluate LLM coding agents? 🧑💻
...aka when can I hire an AI software engineer?
Aug 31, 2023
•
Arjun Bansal
1
Share this post
LLMs for Engineers
How do I evaluate LLM coding agents? 🧑💻
Copy link
Facebook
Email
Notes
More
Evaluating LLM Agents and Applications
A lot of AI research such as HELM and BigBench has been devoted to building test suites to evaluate the accuracy of large language models.
Jul 11, 2023
•
Arjun Bansal
4
Share this post
LLMs for Engineers
Evaluating LLM Agents and Applications
Copy link
Facebook
Email
Notes
More
Evolution of LLM Agents
...and how to avert a crisis on further progress!
Jun 21, 2023
•
Arjun Bansal
and
Niklas Nielsen
4
Share this post
LLMs for Engineers
Evolution of LLM Agents
Copy link
Facebook
Email
Notes
More
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts