User Guide: Interpreting Your Results

Understand the key concepts to improve and interpret your AI's performance.

Table of Contents

Jump to a section

What is a Prompt?
How to Write an Effective Prompt
Using the Prompt Builder
Refusal Levels & Strategies
Testing Your Prompts Live
Understanding Prompt Explanations
Guardrails & Negative Prompts
What is a Token?
Why Continuous Testing is Crucial
How to Read Your Report

What is a Prompt?

The starting point of any AI interaction.

A "prompt" is the instruction, question, or input you give to an AI model. The quality of the prompt directly influences the quality of the AI's response. A weak prompt leads to a weak response, while a well-crafted prompt guides the AI to the desired outcome.

Our evaluation process tests your AI with a wide variety of prompts—from simple to complex, and from helpful to malicious—to see how it performs under different conditions.

How to Write an Effective Prompt

The art of giving clear instructions.

Effective prompting is the single most important skill for getting reliable results from an AI. The goal is to remove ambiguity and provide a clear path for the model. Our Prompt Builder handles this for you, but the principles are key:

Be Specific: Instead of "write about our product," say "write a 100-word product description."
Provide Context: Give the AI the background information it needs to understand the task.
Define a Role and Tone: Tell the AI to act as a "helpful customer support agent" with a "friendly and professional tone."
Set Constraints: Specify the desired format, length, and style of the output.

Using the Prompt Builder

Build professional prompts with our step-by-step wizard.

Our Prompt Builder is a guided, 6-step wizard that helps you create production-ready AI prompts without needing to be a prompt engineering expert. Here's what each step does:

Refusal Levels & Strategies

Choosing the right approach for your use case.

Not all AI applications need the same level of strictness. A customer service chatbot can be more flexible, while a medical advice screener needs strict boundaries. Here's how to choose:

Gentle NudgeMost Helpful

Best for: General-purpose assistants, creative tools, brainstorming bots

"I appreciate your question! While that's a bit outside my usual focus, let me see if I can help in a related way..."

Helpful RedirectRecommended

Best for: Customer support, product assistants, educational tutors

"I understand you're asking about X. While I'm not able to help with that specifically, I'd be happy to assist you with Y or Z instead."

Firm BoundariesMore Restrictive

Best for: Professional services, financial advisors, brand-sensitive applications

"I cannot assist with that request as it falls outside my designated responsibilities. My role is specifically to help with [defined scope]."

Strict RefusalMost Secure

Best for: Medical/legal screeners, compliance tools, high-risk applications

"I cannot and will not provide assistance with this request."

💡 Pro Tip: Test your refusal strategy in the Prompt Sandbox! Send off-topic or edge-case requests to see how your AI responds. The automatic explanations will show you exactly which parts of your prompt triggered the refusal.

Testing Your Prompts Live

The Prompt Sandbox environment.

The Prompt Sandbox (available in step 6 of the Prompt Builder) is your testing ground. It's a full-featured chatbot that uses your exact prompt so you can see how it performs in real conversations.

Key Features:

Multi-turn conversations: Test how your prompt handles follow-up questions and context from previous messages
Live editing: Change your prompt text and immediately test with the new version - no need to rebuild
Temperature control: Slider from 0 (deterministic) to 1 (creative) lets you see how randomness affects responses
Safety settings: Test different content filter levels (Block None, Block High, Block Medium & Up, Block Low & Up) to see how your prompt interacts with Gemini's safety filters
Automatic explanations: After each response, AI analyzes which parts of your prompt influenced the output

What to Test:

✅ Happy Path

Send normal, expected requests to verify your AI handles its core job well

🎭 Edge Cases

Test ambiguous or unusual requests to see how your AI handles uncertainty

🚫 Refusals

Try off-topic or inappropriate requests to verify your guardrails work

🔄 Follow-ups

Send multi-turn conversations to ensure context is maintained properly

ℹ️ Note: The sandbox is separate from the Help Chatbot (at /dashboard/chatbot). The help chatbot assists with Promptalytica platform questions, while the sandbox tests YOUR custom prompts.

Understanding Prompt Explanations

See exactly why your AI responded that way.

One of the most powerful features of the Prompt Sandbox is automatic explanations. After each successful response, an AI analyzer examines your system prompt, the user's message, and the AI's response to explain the connection.

What Explanations Tell You:

Which prompt sections were most influential: See if your tone guidance, context, or guardrails drove the response
Why the AI refused (if applicable): Understand which guardrail or boundary instruction triggered a refusal
Prompt effectiveness rating: Get a 1-10 score on how well your prompt guided the AI
Specific quotes: See exact phrases from your prompt that influenced the output

Example Explanation:

**Primary Influence:** Your instruction "Your tone should be friendly and professional" directly shaped this polite greeting response.

**Context Usage:** The AI referenced your provided company information about "24/7 support" when offering help.

**Guardrail Check:** No refusal was needed as the request aligned with your defined scope.

**Effectiveness Score:** 9/10 - Clear prompt led to an on-target response.

How to Use Explanations:

1. Identify weak spots: If explanations show the AI isn't using your context or tone guidance, strengthen those sections

2. Refine guardrails: If refusals aren't triggering when they should (or triggering too often), adjust your boundary instructions

3. Validate improvements: After editing your prompt, test again and compare explanation scores to see if you're getting better

4. Learn patterns: Over time, you'll see which types of instructions work best for your use case

What are Guardrails & Negative Prompts?

The safety system for your AI model.

"Guardrails" are the rules that prevent an AI from generating harmful or undesirable content. A key part of building strong guardrails is using **negative prompts**—explicitly telling the model what it *should not* do.

Example of a Negative Prompt:

"**Safety Guardrails:** You must not generate content related to the following topics: harassment, hate speech, sexually explicit content. If a user's request falls into one of these categories, you must refuse to provide a helpful answer."

Without clear negative prompts, an AI might try to be "helpful" in dangerous ways. The **Refusal** and **Harmfulness** metrics in your report directly measure the effectiveness of your guardrails. A high score means your AI is doing a good job of staying on topic and deflecting risky requests, protecting your brand and your users.

What is a Token?

The building blocks of AI language.

Think of tokens as pieces of words. AI models don't see text as words and sentences like humans do; they break everything down into tokens. For example, the word "chatbot" might be one token, but a more complex word like "unreliability" could be broken into "un," "rely," and "ability". Punctuation also counts as tokens.

Because all inputs and outputs are measured in tokens, your usage on the platform is also calculated in tokens. Every AI-powered feature, such as generating an evaluation report or analyzing a chat log, consumes tokens from your monthly allowance.

Why Continuous Testing is Crucial

Evaluation is a process, not a one-time task.

Launching an AI is not a "set it and forget it" task. The AI landscape is constantly changing, and so are the risks.

New Exploits Emerge: Malicious actors are always finding new ways to "jailbreak" models. Regular testing ensures your guardrails hold up against the latest threats.
Model Updates: The underlying AI models you use are frequently updated by their providers. An update can subtly change a model's behavior, introducing new flaws or biases that didn't exist before.
Evolving User Needs: How your customers interact with your AI will change over time. Continuous evaluation helps you ensure your AI remains helpful and relevant.

How to Read Your Report

Breaking down the components of your evaluation.

User Guide: Interpreting Your Results

Step 1: Define Your Goal

Step 2: Provide Context

Step 3: Set Audience & Tone

Step 4: Choose Refusal Strategy

Step 5: Add Safety Guardrails

Step 6: Test & Refine Your Prompt

Gentle NudgeMost Helpful

Helpful RedirectRecommended

Firm BoundariesMore Restrictive

Strict RefusalMost Secure

Key Features:

What to Test:

What Explanations Tell You:

How to Use Explanations:

Response Quality Charts

Overall Summary

Detailed Analysis

Recommendations & Prompt Suggestions

Evaluation Inputs