panda_guard.role.attacks.gptfuzzer_attack.fuzzer package
Submodules
panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core module
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.GPTFuzzer(question: str, target: BaseLLM, predictor: Predictor, initial_seed: list[str], mutate_policy: MutatePolicy, select_policy: SelectPolicy, max_query: int = -1, max_jailbreak: int = -1, max_reject: int = -1, max_iteration: int = -1, energy: int = 1, result_file: str = None, generate_in_batch: bool = False)[source]
Bases:
objectThe main fuzzing engine that generates attack prompts, evaluates them, and performs the fuzzing process.
- Parameters:
question – The question being asked by the user to the LLM.
target – The target LLM to be attacked.
predictor – A predictor object to evaluate the attack success.
initial_seed – A list of initial prompts to start the fuzzing process.
mutate_policy – The policy to mutate the prompts during fuzzing.
select_policy – The policy to select the next prompt to mutate.
max_query – The maximum number of queries to make.
max_jailbreak – The maximum number of successful jailbreaks to achieve.
max_reject – The maximum number of rejections to tolerate.
max_iteration – The maximum number of iterations to run the fuzzing process.
energy – The energy parameter, affecting the fuzzing process.
result_file – The file to store the fuzzing results.
generate_in_batch – Whether to generate responses in batch mode.
- evaluate(prompt_nodes: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode], target_llm_gen_config: LLMGenerateConfig)[source]
Evaluates a list of prompt nodes by generating responses using the target LLM and checking the results using the predictor.
- Parameters:
prompt_nodes – The list of prompt nodes to evaluate.
target_llm_gen_config – The configuration used for generating responses from the target LLM.
- Returns:
The messages generated during evaluation.
- is_stop()[source]
Determines if the fuzzing process should stop based on the configured limits for queries, jailbreaks, rejections, and iterations.
- Returns:
True if fuzzing should stop, otherwise False.
- run(target_llm_gen_config)[source]
Runs the fuzzing process, generating and evaluating attack prompts until one of the stopping conditions is met.
- Parameters:
target_llm_gen_config – The configuration for generating responses from the target LLM.
- Returns:
A list of messages that were evaluated during fuzzing.
- setup()[source]
Set up the fuzzing process by assigning the fuzzer to the mutate and select policies.
- update(prompt_nodes: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]
Updates the fuzzing state based on the results of the prompt nodes.
- Parameters:
prompt_nodes – The list of prompt nodes to update.
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode(fuzzer: GPTFuzzer, prompt: str, response: str = None, results: list[int] = None, parent: PromptNode = None, mutator: Mutator = None)[source]
Bases:
objectA class representing a node in the prompt generation tree for the fuzzing process.
- Parameters:
fuzzer – The GPTFuzzer instance responsible for managing the fuzzing process.
prompt – The prompt to be tested by the fuzzer.
response – The model’s response to the prompt (if any).
results – A list of integers representing the evaluation results for this prompt.
parent – The parent node in the fuzzing tree.
mutator – The mutator used to modify the prompt.
- property index
- property num_jailbreak
Returns the total number of successful jailbreaks for this prompt node.
- Returns:
The number of successful jailbreaks.
- property num_query
Returns the total number of queries made for this prompt node.
- Returns:
The number of queries.
- property num_reject
Returns the total number of rejections for this prompt node.
- Returns:
The number of rejections.
panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator module
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.MutatePolicy(mutators: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.Mutator], fuzzer: GPTFuzzer | None = None)[source]
Bases:
objectDefines the mutation strategy policy, including the mutators to use.
- Parameters:
mutators – A list of mutator strategies to apply.
fuzzer – The GPTFuzzer instance managing the fuzzing process.
- property fuzzer
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.MutateRandomSinglePolicy(mutators: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.Mutator], fuzzer: GPTFuzzer | None = None, concatentate: bool = True)[source]
Bases:
MutatePolicyA random mutation strategy that randomly selects a mutator to apply to a single prompt.
- Parameters:
mutators – A list of mutator strategies to apply.
fuzzer – The GPTFuzzer instance managing the fuzzing process.
concatentate – A flag to indicate whether to concatenate the mutated prompt with the original one.
- mutate_single(prompt_node: PromptNode) list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode][source]
Mutates a single prompt by randomly selecting a mutator and applying it.
- Parameters:
prompt_node – The prompt node to mutate.
- Returns:
A list of mutated prompt nodes.
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.Mutator(fuzzer: GPTFuzzer)[source]
Bases:
objectBase class to define the mutation strategy for modifying prompts.
- Parameters:
fuzzer – An instance of GPTFuzzer, which represents the manager of the fuzzing process.
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.OpenAIMutatorBase(model: BaseLLM, llm_gen_config: LLMGenerateConfig, fuzzer: GPTFuzzer | None = None)[source]
Bases:
MutatorBase class for mutation strategies that use OpenAI’s API to generate responses.
- Parameters:
model – The LLM model to use for generating responses.
llm_gen_config – The configuration used to generate responses.
fuzzer – The GPTFuzzer instance managing the fuzzing process.
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.OpenAIMutatorCrossOver(model: BaseLLM, fuzzer: GPTFuzzer | None = None)[source]
Bases:
OpenAIMutatorBaseMutation strategy that performs crossover between two prompt templates.
- Parameters:
model – The LLM model to use for generating responses.
fuzzer – The GPTFuzzer instance managing the fuzzing process.
- cross_over(seed: str, prompt_nodes: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]
Performs crossover between the seed prompt and a random prompt from the provided prompt nodes.
- Parameters:
seed – The seed prompt to perform crossover with.
prompt_nodes – A list of PromptNode instances to select a random prompt from.
- Returns:
A crossover prompt combining the seed and a random prompt.
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.OpenAIMutatorExpand(model: BaseLLM, fuzzer: GPTFuzzer | None = None)[source]
Bases:
OpenAIMutatorBaseMutation strategy that adds sentences at the beginning of the given prompt template.
- Parameters:
model – The LLM model to use for generating responses.
fuzzer – The GPTFuzzer instance managing the fuzzing process.
- expand(seed: str, _: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]
Adds sentences at the beginning of the provided prompt template.
- Parameters:
seed – The original prompt template.
_ – The list of prompt nodes (not used in this mutation strategy).
- Returns:
A prompt with additional sentences at the beginning.
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.OpenAIMutatorGenerateSimilar(model: BaseLLM, fuzzer: GPTFuzzer | None = None)[source]
Bases:
OpenAIMutatorBaseMutation strategy that generates similar prompts based on the provided seed prompt.
- Parameters:
model – The LLM model to use for generating similar prompts.
fuzzer – The GPTFuzzer instance managing the fuzzing process.
- generate_similar(seed: str, _: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]
Generates a similar prompt based on the seed, ensuring that the placeholder is included.
- Parameters:
seed – The original prompt to generate a similar prompt from.
_ – The list of prompt nodes (not used in this mutation strategy).
- Returns:
A generated similar prompt with the placeholder.
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.OpenAIMutatorRephrase(model: BaseLLM, fuzzer: GPTFuzzer | None = None)[source]
Bases:
OpenAIMutatorBaseA mutation strategy that rephrases sentences in the given template to improve clarity while keeping the original meaning.
- Parameters:
model – The LLM model to use for generating responses.
fuzzer – The GPTFuzzer instance managing the fuzzing process.
- mutate_single(seed)[source]
Mutates a single prompt by rephrasing it while maintaining the original meaning.
- Parameters:
seed – The seed prompt to mutate.
- Returns:
The mutated prompt with rephrased sentences.
- rephrase(seed: str, _: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]
Rephrases sentences in the provided template, ensuring that the meaning remains unchanged. The placeholder must not be deleted.
- Parameters:
seed – The original template to rephrase.
_ – The list of prompt nodes (not used in this mutation strategy).
- Returns:
A prompt asking to rephrase the sentences without changing the meaning.
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.OpenAIMutatorShorten(model: BaseLLM, fuzzer: GPTFuzzer | None = None)[source]
Bases:
OpenAIMutatorBaseA mutation strategy that condenses the sentences in the given template to shorten it while maintaining its meaning.
- Parameters:
model – The LLM model to use for generating responses.
fuzzer – The GPTFuzzer instance managing the fuzzing process.
- mutate_single(seed)[source]
Mutates a single prompt by shortening it while maintaining the original meaning.
- Parameters:
seed – The seed prompt to mutate.
- Returns:
The mutated prompt with condensed sentences.
- shorten(seed: str, _: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]
Condenses sentences in the provided template while maintaining the overall meaning. It ensures that the placeholder is not deleted.
- Parameters:
seed – The original template to shorten.
_ – The list of prompt nodes (not used in this mutation strategy).
- Returns:
A prompt asking to condense sentences while keeping the meaning intact.
panda_guard.role.attacks.gptfuzzer_attack.fuzzer.selection module
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.selection.EXP3SelectPolicy(gamma: float = 0.05, alpha: float = 25, fuzzer: GPTFuzzer | None = None)[source]
Bases:
SelectPolicyThe EXP3 (Exponential Weights) selection policy, balancing exploration and exploitation using probability distribution.
- Parameters:
gamma – Exploration coefficient that controls the randomness.
alpha – Learning rate for updating the weights.
fuzzer – The GPTFuzzer instance responsible for managing fuzzing and prompt nodes.
- select() PromptNode[source]
Selects a PromptNode based on the EXP3 algorithm, which computes selection probabilities using weighted exploration.
- Returns:
A PromptNode selected based on the EXP3 algorithm.
- update(prompt_nodes: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]
Updates the weights and probabilities for the last selected node based on the success of the attack.
- Parameters:
prompt_nodes – A list of PromptNode objects, used to update the weights for the EXP3 algorithm.
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.selection.MCTSExploreSelectPolicy(fuzzer: GPTFuzzer | None = None, ratio=0.5, alpha=0.1, beta=0.2)[source]
Bases:
SelectPolicyA selection policy based on Monte Carlo Tree Search (MCTS) to explore and exploit nodes.
- Parameters:
fuzzer – The GPTFuzzer instance responsible for managing fuzzing and prompt nodes.
ratio – Balance between exploration and exploitation in MCTS.
alpha – Penalty for selecting nodes at deeper levels.
beta – Minimum reward after applying the penalty.
- select() PromptNode[source]
Selects a PromptNode based on MCTS, balancing exploration and exploitation.
- Returns:
A PromptNode selected using the MCTS algorithm.
- update(prompt_nodes: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]
Updates the rewards for the nodes in the MCTS path based on the success of the attack.
- Parameters:
prompt_nodes – A list of PromptNode objects, used to calculate the reward for the selected path.
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.selection.RandomSelectPolicy(fuzzer: GPTFuzzer | None = None)[source]
Bases:
SelectPolicyA random selection policy that selects a PromptNode at random.
- Parameters:
fuzzer – The GPTFuzzer instance responsible for managing fuzzing and prompt nodes.
- select() PromptNode[source]
Selects a PromptNode randomly from the available prompt nodes.
- Returns:
A randomly selected PromptNode.
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.selection.RoundRobinSelectPolicy(fuzzer: GPTFuzzer | None = None)[source]
Bases:
SelectPolicyA round-robin selection policy where each prompt node is selected in a cyclic manner.
- Parameters:
fuzzer – The GPTFuzzer instance responsible for managing fuzzing and prompt nodes.
- select() PromptNode[source]
Selects a PromptNode in a round-robin manner, ensuring each node is selected once before looping back.
- Returns:
A PromptNode selected in round-robin fashion.
- update(prompt_nodes: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]
Updates the round-robin index to ensure the next node is selected.
- Parameters:
prompt_nodes – A list of PromptNode objects, which is used for updating the selection index.
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.selection.SelectPolicy(fuzzer: GPTFuzzer)[source]
Bases:
objectAbstract base class for different selection policies used in GPT fuzzing.
- Parameters:
fuzzer – The GPTFuzzer instance responsible for managing fuzzing and prompt nodes.
- select() PromptNode[source]
Selects a PromptNode based on a specific selection policy. This method must be implemented by subclasses.
- Returns:
A selected PromptNode.
- update(prompt_nodes: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]
Updates the selection policy based on the results of the selected prompt nodes. This method can be overridden by subclasses.
- Parameters:
prompt_nodes – A list of PromptNode objects to update the policy with.
- class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.selection.UCBSelectPolicy(explore_coeff: float = 1.0, fuzzer: GPTFuzzer | None = None)[source]
Bases:
SelectPolicyUpper Confidence Bound (UCB) selection policy, which balances exploration and exploitation using UCB.
- Parameters:
explore_coeff – The coefficient that controls the exploration factor.
fuzzer – The GPTFuzzer instance responsible for managing fuzzing and prompt nodes.
- select() PromptNode[source]
Selects a PromptNode using the UCB algorithm, which balances exploration and exploitation.
- Returns:
A PromptNode selected based on the UCB algorithm.
- update(prompt_nodes: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]
Updates the reward for the last selected prompt node based on the number of jailbreaks.
- Parameters:
prompt_nodes – A list of PromptNode objects, used to calculate the rewards for the last selected node.