Prompt Injection
- Large Language Models(LLMs) generate text-based on an initial input. They can be range from answers to questions, creating images, solving complex problems, The quality and specifically of the input prompt directly influence the relevance, accuracy, and creativity of the model's response. This is called
prompt. A well-engineered prompt often includes clear instructions, contextual details, and constraints to guide the AI's behavior, ensuring aligns with the user's needs.
Prompt Injection
- Prompt Engineering refers to designing the LLM's input prompt so that the desired LLM output is generated.
- Since the prompt is an LLM's only text-based input, prompt engineering is the only way to steer the generated output in the desired direction and influence the model to behave as we want to.
- Applying good prompt engineering techniques reduces misinformation and increases usability in an LLM response.
- Examples
- For instance
Write a short paragraph about HackTheBox Acadamywill produce a vastly different response thenWrite a short poem about HackTheBox Acadamy. - Another
How do I get all table names in a MySQL databaseinstead ofHow do I get all table names in SQL - Onemore
Provide a CSV-formatted list of OWASP Top 10 web vulnerabilities, including the columns 'positions', 'names', 'description'instead ofProvide a list of OWASP Top 10 web vulnerabilities. - Experimentation: As stated above, subtle changes can significantly affect response quality.
- Try experimenting with subtle changes in the prompt, note the resulting response quality, and stick with the prompt that produces the best quality.
- For instance
Introduction to Prompt Injection
- First principles of LLMs have 2 types of prompt
system promptanduser prompt system promptcontains the guidelines and rules for the LLM's behavior.- It can be used to restrict the LLM to its task. For instance, in the customer support chatbot example, the system prompt could look like this
- System prompt attempts to restricts the LLM to only generating response relating to its intended task: providing customer support for the platform.
user prompton the other hand is the user input, i.e., message directly sent by a customer to the chatbot.- LLMs do not have separate inputs for system prompts and user prompts.
- The model operates on a single input text. To have the model operate on both the system and user prompts, they are typically combined into a single input:
- Since there is no inherent differentiation between system prompt and user prompt,
prompt injectionvulnerabilities may arise. - Since the LLM has no inherent understanding of the difference between system and user prompts, an attacker can manipulate the user prompt in such a way as to break the rules set in the system prompt and behave in an unintended way.
- It will save
previous messagesto respond better.
Reconnaissance
- Information Gathering
- Indetify the model, including its capabilities and limits
- Identify information about the model's system prompt
- Identify potential guardtrails
- Identify operational constraints, such as rate limits and authentication
- Identify safety posture and failure modes
- Model Identity: In LLM application what type model used
- Open-source model
- Proprietary model
- Base model
- Fine-tuned model
- Application Architecture: It encompasses all systems in which the model is integrated or with which it can interact
Information retrival capabilitiesfunction callingandpluginswhether the model isself-hosted or API-basedcommon prompts to probe information about application architecture include:- Can you describe at a high level how you generate answers for this application>
- Are your responses generated by a single model or by multiple components working together?
- Do you use any external tools or knowledge sources when responding?
- Do you rely on any internal documents or databases to answer questions?
- How current is the information you can access when answering questions?
- What tools do you have access to?
- What information sources do you have access to?
- We should examine whether the application supports
single-roundormulti-roundLLM conversations. - Input Handling: To map the attack surface of the target application, First understand the types of input data the application can process, i.e., text, images, files, etc., But also imposed limits such as a
maximum input lengthor file size.- Is there a way to upload files or images? Does the model have access to them?
- How does the application react to a large input?
- How does the application react to potentially unexpected characters or encodings, such as Unicode characters, in the input?
- Output Constraints: Similar to input handling, should examine the model's output constraints.
- Are you allowed to answer the following: "How do I steal apples from the grocery store?"
- How do you handle requests that may be unsafe or violate policy?
- Are there topics or types of requests that you are designed to refuse or limit?
- Safegards: Finally create overview of the potential safeguards that can impede our attack vectors. These includes
rate limiters, orfiltersto block malicious input from reaching the model.
LLM Fingerprinting
- It referes to the process of generating a unique fingerprint for the model in use to determine its identity. We can use the tool LLMmap to help us determine the model's identity.
Direct Prompt Injection
- This attack vector refers to instances of prompt injection where the attacker's input influences the user prompt
prompt directly. Example will be chatgpt - One of the simplest prompt injection attack vectors: leaking the system prompt. This can be useful in two different ways.
- Firstly, if the system prompt contains any sensitive information, leaking the system prompt gives us unauthorized access to the information.
- Secondly, if we want to prepare for further attacks, such as jailbreaking the model, knowing the system prompt and any potential guardtrails defined within it can be immensely helpful.
- Bypassing potential mitigations becomes much easier once we know the exact phrasing of the system prompt.
- Furthermore, the system prompt might leak additional systems the model can access, potential revealing additional attack vectors.