Agent Inspector (Debug your agents) - Productivity Pro
This is a submission for the Agent.ai Challenge: Productivity-Pro Agent (See Details) What I Built Agent Inspector is a must-have tool for those building and iterating on agents on Agent.ai. Agent Inspector provides you information beyond what Agent.AI's built in debugger shows. It tells you what format the output of an agent should be, validates if the agent is in-fact returning that output, assesses toxicity of the agent's output. Provides a clear overall Pass/Fail result. If there's opportunity to improve the prompt, or provide more data to the agent, it makes suggestions. The agent has become useful enough that I use every time I create a new agent or am iterating on an existing one. Without giving away all the secrets publicly here are some of the things that have gone into it: LLM actions with specific models chosen for special skills they have. Invoking agents. I created another utility agent that simply grabs the current time and date and returns it in a useful way for agents. The current date and time agent uses a serverless function to grab the date and time and return a JSON response providing that information in a way that is easily understood by LLMs to better handle time related requests. This modular way of building allowed me to both help others build agents that need to utilize time and date, as well as power the functionality for this agent. The Debug Agent is also intended to be used by other agents - that is the primary way to use it. To build the Agent Inspector and ensure quality in all of it's tests I actually had to build an agent for testing it. Multiple prompt engineering techniques. The agent does hit a serverless function within the previously mentioned current date and time agent I created that it invokes. Working around some issues found in the if statement action and the inability to export multiple variables to other agents - This agent actually has a companion JSON variant that returns the data in JSON so you can take automated action based on the test results. Tests performed by the Agent Inspector: Expected Data type of output based on prompt Validation that the output is actually matching the prompt with confidence score. Relevancy of output to prompt provided - with confidence score. Likelihood of hallucination - validates LLM has all needed information to provide answer. - with confidence score. Toxicity - is the response using offensive or harmful language. (This takes into account the prompt and does not simply flag simply using 1 "bad" word as toxic, so if the prompt is talking about a subject in an academic sense it is not going to say the response is toxic unless it goes too far.) Fluffy/substance test of text answer - Checks the response for fluffy content that doesn't have much substance. Think of it like this: if it's supposed to be a blog post - is this something someone's going to in turn just pass to an LLM to summarize because it's not concise and meaty enough? Suggested prompt revisions - based on all of the tests the agent suggests improvements to encourage better results from your LLM action or agent. Execution timing - know how long your agent takes to execute from start to finish. Provides some best practices and norms. Why this agent meets the criteria for this challenge This agent directly provides meaningful insights and validates LLM and agent output enabling agent builders to move significantly faster to produce reliable quality agents. It's designed to be part of the agent building workflow and can even be used for automating actions within your agent - for example if there is toxic content in an LLM action response, you can choose to have your agent do something other than return that directly to the user. Demo Watch a video showing Agent Inspector as well as instructions to set it up Agent.ai Experience Overall I've enjoyed the experience, and want to push the platform to do things it was never designed to do. I'm providing feedback on a lot of areas where the platform can be improved - understand that my feedback is coming from a place of appreciating it for what it is, and just wanting to see it further thrive. Being a developer I'm always going to want more, but there are limitations I ran into that prevent me from being able to deliver even more with this agent and other agents. Feedback for the Agent.ai team.
This is a submission for the Agent.ai Challenge: Productivity-Pro Agent (See Details)
What I Built
Agent Inspector is a must-have tool for those building and iterating on agents on Agent.ai. Agent Inspector provides you information beyond what Agent.AI's built in debugger shows. It tells you what format the output of an agent should be, validates if the agent is in-fact returning that output, assesses toxicity of the agent's output. Provides a clear overall Pass/Fail result. If there's opportunity to improve the prompt, or provide more data to the agent, it makes suggestions.
The agent has become useful enough that I use every time I create a new agent or am iterating on an existing one.
Without giving away all the secrets publicly here are some of the things that have gone into it:
- LLM actions with specific models chosen for special skills they have.
- Invoking agents. I created another utility agent that simply grabs the current time and date and returns it in a useful way for agents. The current date and time agent uses a serverless function to grab the date and time and return a JSON response providing that information in a way that is easily understood by LLMs to better handle time related requests. This modular way of building allowed me to both help others build agents that need to utilize time and date, as well as power the functionality for this agent. The Debug Agent is also intended to be used by other agents - that is the primary way to use it.
- To build the Agent Inspector and ensure quality in all of it's tests I actually had to build an agent for testing it.
- Multiple prompt engineering techniques.
- The agent does hit a serverless function within the previously mentioned current date and time agent I created that it invokes.
- Working around some issues found in the if statement action and the inability to export multiple variables to other agents - This agent actually has a companion JSON variant that returns the data in JSON so you can take automated action based on the test results.
Tests performed by the Agent Inspector:
- Expected Data type of output based on prompt
- Validation that the output is actually matching the prompt with confidence score.
- Relevancy of output to prompt provided - with confidence score.
- Likelihood of hallucination - validates LLM has all needed information to provide answer. - with confidence score.
- Toxicity - is the response using offensive or harmful language. (This takes into account the prompt and does not simply flag simply using 1 "bad" word as toxic, so if the prompt is talking about a subject in an academic sense it is not going to say the response is toxic unless it goes too far.)
- Fluffy/substance test of text answer - Checks the response for fluffy content that doesn't have much substance. Think of it like this: if it's supposed to be a blog post - is this something someone's going to in turn just pass to an LLM to summarize because it's not concise and meaty enough?
- Suggested prompt revisions - based on all of the tests the agent suggests improvements to encourage better results from your LLM action or agent.
- Execution timing - know how long your agent takes to execute from start to finish. Provides some best practices and norms.
Why this agent meets the criteria for this challenge
This agent directly provides meaningful insights and validates LLM and agent output enabling agent builders to move significantly faster to produce reliable quality agents. It's designed to be part of the agent building workflow and can even be used for automating actions within your agent - for example if there is toxic content in an LLM action response, you can choose to have your agent do something other than return that directly to the user.
Demo
Watch a video showing Agent Inspector as well as instructions to set it up
Agent.ai Experience
Overall I've enjoyed the experience, and want to push the platform to do things it was never designed to do. I'm providing feedback on a lot of areas where the platform can be improved - understand that my feedback is coming from a place of appreciating it for what it is, and just wanting to see it further thrive. Being a developer I'm always going to want more, but there are limitations I ran into that prevent me from being able to deliver even more with this agent and other agents.
What's Your Reaction?