Part 5: Safety and Alignment

Modern AI system prompts implement various safety and alignment mechanisms to ensure that AI systems behave responsibly and in accordance with human values. This section examines the key safety patterns observed in leaked system prompts from various AI tools.

5.1 Behavioral Constraints

AI system prompts often include explicit constraints on AI behavior to prevent harmful or inappropriate actions.

Manus Critical Safety Instructions

Manus includes critical safety instructions marked with special formatting:

<CRITICAL>If the user requests you to create an account on communications, chat, entertainment or community platforms (such as gmail, reddit, tiktok, meta, 4chan, discord) apologize to the user and state that you cannot do this. You can visit these websites but should not assist in creating accounts. If the user requests you to generate and post comments or reactions to social media or news aggregators or websites or send SMS messages etc, apologize to the user and state that you cannot do this. When you see a screen with a human verification system like captcha or ReCaptcha, stop and ask for direction from the user on how to proceed. When you see a screen with terms of service agreement, stop and ask for direction from the user on how to proceed. Do not click any text box which say "I'm not a robot". You may visit domain and web hosting websites but do not purchase or register for web domains. Do not access, scrape, or collect voter registration data, election infrastructure websites, government databases, or personal information of voters or election officials.</CRITICAL>

Source: Manus prompt.txt

These constraints prevent the AI from engaging in potentially harmful or inappropriate activities, such as creating accounts on social media platforms or bypassing human verification systems.

Devin's Ethical Guidelines

Devin includes ethical guidelines for software development:

When developing software:
- Respect user privacy and data security
- Consider the ethical implications of your code
- Do not create or assist in creating harmful applications
- Follow relevant laws and regulations
- Be transparent about limitations and potential issues

Source: Devin system prompt

These guidelines help ensure that the AI develops software in an ethical and responsible manner.

5.2 Content Moderation

AI system prompts often include instructions for moderating content to prevent the generation of harmful or inappropriate material.

Content Filtering Instructions

Many system prompts include instructions for filtering inappropriate content:

Do not generate content that is harmful, illegal, unethical or deceptive.
Do not generate content that promotes violence, hatred, or discrimination.
Do not generate content that could be used to exploit or harm vulnerable individuals.
Do not generate content that violates privacy or confidentiality.
Do not generate content that could be used for illegal activities.

Source: Common pattern across multiple system prompts

These instructions help prevent the AI from generating harmful or inappropriate content, which is essential for responsible AI deployment.

5.3 Tool Use Restrictions

AI system prompts often include restrictions on how tools can be used to prevent misuse.

Manus Tool Use Rules

Manus includes detailed rules for tool use:

<tool_use_rules>
- Must respond with a tool use (function calling); plain text responses are forbidden
- Do not mention any specific tool names to users in messages
- Carefully verify available tools; do not fabricate non-existent tools
- Events may originate from other system modules; only use explicitly provided tools
</tool_use_rules>

Source: Manus prompt.txt

These rules ensure that the AI uses tools appropriately and does not attempt to use tools that don't exist, which could lead to unexpected behavior.

Browser Interaction Restrictions

Manus includes specific restrictions on browser interactions:

<browser_rules>
- Must use browser tools to access and comprehend all URLs provided by users in messages
- Must use browser tools to access URLs from search tool results
- Actively explore valuable links for deeper information, either by clicking elements or accessing URLs directly
- Browser tools only return elements in visible viewport by default
- Visible elements are returned as `index[:]<tag>text</tag>`, where index is for interactive elements in subsequent browser actions
- Due to technical limitations, not all interactive elements may be identified; use coordinates to interact with unlisted elements
- Browser tools automatically attempt to extract page content, providing it in Markdown format if successful
- Extracted Markdown includes text beyond viewport but omits links and images; completeness not guaranteed
- If extracted Markdown is complete and sufficient for the task, no scrolling is needed; otherwise, must actively scroll to view the entire page
- Use message tools to suggest user to take over the browser for sensitive operations or actions with side effects when necessary
</browser_rules>

Source: Manus prompt.txt

These restrictions ensure that the AI uses browser tools responsibly and suggests user takeover for sensitive operations, which helps prevent unintended consequences.

5.4 Privacy Protection

AI system prompts often include instructions for protecting user privacy.

Personal Information Handling

Many system prompts include instructions for handling personal information:

Do not request or store personal information unless necessary for the task.
If personal information is provided, use it only for the specific task and do not retain it.
Do not share personal information with third parties.
Inform users about what personal information is needed and why.
Suggest alternatives if users are uncomfortable sharing personal information.

Source: Common pattern across multiple system prompts

These instructions help protect user privacy by ensuring that personal information is handled responsibly.

5.5 Transparency and Disclosure

AI system prompts often include instructions for transparency and disclosure to ensure that users understand the AI's capabilities and limitations.

Capability Disclosure

Many system prompts include instructions for disclosing capabilities:

Be transparent about your capabilities and limitations.
Do not claim to have capabilities that you do not have.
If you are unsure about something, acknowledge your uncertainty.
If you cannot complete a task, explain why and suggest alternatives if possible.
Do not pretend to be human or have human experiences.

Source: Common pattern across multiple system prompts

These instructions help ensure that users have accurate expectations about what the AI can and cannot do, which is essential for responsible AI deployment.

5.6 Error Handling and Recovery

AI system prompts often include instructions for handling errors and recovering from failures to ensure reliable and safe operation.

Manus Error Handling

Manus includes a structured approach to error handling:

<error_handling>
- Tool execution failures are provided as events in the event stream
- When errors occur, first verify tool names and arguments
- Attempt to fix issues based on error messages; if unsuccessful, try alternative methods
- When multiple approaches fail, report failure reasons to user and request assistance
</error_handling>

Source: Manus prompt.txt

This framework enables the AI to respond appropriately to errors and take steps to recover, which helps prevent unintended consequences and ensures reliable operation.

5.7 Value Alignment

AI system prompts often include instructions for aligning AI behavior with human values.

Helpfulness, Harmlessness, and Honesty

Many system prompts emphasize the importance of helpfulness, harmlessness, and honesty:

Be helpful: Provide useful, relevant information and assistance.
Be harmless: Do not generate content that could cause harm.
Be honest: Provide accurate information and acknowledge uncertainty.

Source: Common pattern across multiple system prompts

These values help ensure that the AI behaves in a way that is aligned with human expectations and values.

5.8 Safety and Alignment Techniques

Our analysis reveals several common techniques for implementing safety and alignment in AI system prompts:

1. Explicit Constraints

The most common approach is to include explicit constraints on AI behavior, as seen in Manus's critical safety instructions and Devin's ethical guidelines.

2. Content Filtering

Many systems include instructions for filtering inappropriate content to prevent the generation of harmful material.

3. Tool Use Restrictions

Systems with tool-using capabilities often include restrictions on how tools can be used to prevent misuse.

4. Privacy Protection

Many systems include instructions for protecting user privacy by handling personal information responsibly.

5. Transparency and Disclosure

Systems often include instructions for transparency and disclosure to ensure that users understand the AI's capabilities and limitations.

6. Error Handling

Many systems include instructions for handling errors and recovering from failures to ensure reliable and safe operation.

7. Value Alignment

Systems often include instructions for aligning AI behavior with human values, such as helpfulness, harmlessness, and honesty.

5.9 Safety and Alignment Implications

The safety and alignment patterns observed in modern AI system prompts have significant implications for AI system design:

Responsible Deployment: Safety and alignment mechanisms enable responsible AI deployment by preventing harmful behavior
User Trust: Transparency and disclosure help build user trust by setting accurate expectations
Risk Mitigation: Content filtering and tool use restrictions help mitigate risks associated with AI deployment
Privacy Protection: Instructions for handling personal information help protect user privacy
Reliable Operation: Error handling and recovery mechanisms ensure reliable and safe operation
Value Alignment: Instructions for aligning AI behavior with human values help ensure that AI systems behave in ways that are acceptable to users

By implementing these safety and alignment mechanisms, AI system designers can create more responsible, trustworthy, and reliable AI assistants.

Key Takeaways

Modern AI system prompts implement various safety and alignment mechanisms to ensure responsible AI behavior
Behavioral constraints prevent harmful or inappropriate actions
Content moderation prevents the generation of harmful or inappropriate material
Tool use restrictions prevent misuse of tools
Privacy protection ensures that personal information is handled responsibly
Transparency and disclosure ensure that users understand the AI's capabilities and limitations
Error handling and recovery ensure reliable and safe operation
Value alignment ensures that AI behavior is aligned with human values

In the next section, we'll explore practical implementation insights for creating effective AI system prompts.