Prompt Injection
- Large Language Models(LLMs) generate text-based on an initial input. They can be range from answers to questions, creating images, solving complex problems, The quality and specifically of the input prompt directly influence the relevance, accuracy, and creativity of the model's response. This is called
prompt. A well-engineered prompt often includes clear instructions, contextual details, and constraints to guide the AI's behavior, ensuring aligns with the user's needs.
Prompt Injection
- Prompt Engineering refers to designing the LLM's input prompt so that the desired LLM output is generated.
- Since the prompt is an LLM's only text-based input, prompt engineering is the only way to steer the generated output in the desired direction and influence the model to behave as we want to.
- Applying good prompt engineering techniques reduces misinformation and increases usability in an LLM response.
- Examples
- For instance
Write a short paragraph about HackTheBox Acadamywill produce a vastly different response thenWrite a short poem about HackTheBox Acadamy. - Another
How do I get all table names in a MySQL databaseinstead ofHow do I get all table names in SQL - Onemore
Provide a CSV-formatted list of OWASP Top 10 web vulnerabilities, including the columns 'positions', 'names', 'description'instead ofProvide a list of OWASP Top 10 web vulnerabilities. - Experimentation: As stated above, subtle changes can significantly affect response quality.
- Try experimenting with subtle changes in the prompt, note the resulting response quality, and stick with the prompt that produces the best quality.
- For instance
Introduction to Prompt Injection
- First principles of LLMs have 2 types of prompt
system promptanduser prompt system promptcontains the guidelines and rules for the LLM's behavior.- It can be used to restrict the LLM to its task. For instance, in the customer support chatbot example, the system prompt could look like this
- System prompt attempts to restricts the LLM to only generating response relating to its intended task: providing customer support for the platform.
user prompton the other hand is the user input, i.e., message directly sent by a customer to the chatbot.- LLMs do not have separate inputs for system prompts and user prompts.
- The model operates on a single input text. To have the model operate on both the system and user prompts, they are typically combined into a single input:
- Since there is no inherent differentiation between system prompt and user prompt,
prompt injectionvulnerabilities may arise. - Since the LLM has no inherent understanding of the difference between system and user prompts, an attacker can manipulate the user prompt in such a way as to break the rules set in the system prompt and behave in an unintended way.
- It will save
previous messagesto respond better.
Direct Prompt Injection
- This attack vector refers to instances of prompt injection where the attacker's input influences the user
prompt directly. Example will be chatgpt - One of the simplest prompt injection attack vectors: leaking the system prompt. This can be useful in two different ways.
- Firstly, if the system prompt contains any sensitive information, leaking the system prompt gives us unauthorized access to the information.
- Secondly, if we want to prepare for further attacks, such as jailbreaking the model, knowing the system prompt and any potential guardtrails defined within it can be immensely helpful.
- Bypassing potential mitigations becomes much easier once we know the exact phrasing of the system prompt.
- Furthermore, the system prompt might leak additional systems the model can access, potential revealing additional attack vectors.