panda_guard.role.attacks.gptfuzzer_attack.fuzzer package

Submodules

panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core module

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.GPTFuzzer(question: str, target: BaseLLM, predictor: Predictor, initial_seed: list[str], mutate_policy: MutatePolicy, select_policy: SelectPolicy, max_query: int = -1, max_jailbreak: int = -1, max_reject: int = -1, max_iteration: int = -1, energy: int = 1, result_file: str = None, generate_in_batch: bool = False)[source]

Bases: object

The main fuzzing engine that generates attack prompts, evaluates them, and performs the fuzzing process.

Parameters:
  • question – The question being asked by the user to the LLM.

  • target – The target LLM to be attacked.

  • predictor – A predictor object to evaluate the attack success.

  • initial_seed – A list of initial prompts to start the fuzzing process.

  • mutate_policy – The policy to mutate the prompts during fuzzing.

  • select_policy – The policy to select the next prompt to mutate.

  • max_query – The maximum number of queries to make.

  • max_jailbreak – The maximum number of successful jailbreaks to achieve.

  • max_reject – The maximum number of rejections to tolerate.

  • max_iteration – The maximum number of iterations to run the fuzzing process.

  • energy – The energy parameter, affecting the fuzzing process.

  • result_file – The file to store the fuzzing results.

  • generate_in_batch – Whether to generate responses in batch mode.

evaluate(prompt_nodes: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode], target_llm_gen_config: LLMGenerateConfig)[source]

Evaluates a list of prompt nodes by generating responses using the target LLM and checking the results using the predictor.

Parameters:
  • prompt_nodes – The list of prompt nodes to evaluate.

  • target_llm_gen_config – The configuration used for generating responses from the target LLM.

Returns:

The messages generated during evaluation.

is_stop()[source]

Determines if the fuzzing process should stop based on the configured limits for queries, jailbreaks, rejections, and iterations.

Returns:

True if fuzzing should stop, otherwise False.

log()[source]

Logs the current fuzzing iteration and statistics (jailbreaks, rejections, queries).

run(target_llm_gen_config)[source]

Runs the fuzzing process, generating and evaluating attack prompts until one of the stopping conditions is met.

Parameters:

target_llm_gen_config – The configuration for generating responses from the target LLM.

Returns:

A list of messages that were evaluated during fuzzing.

setup()[source]

Set up the fuzzing process by assigning the fuzzer to the mutate and select policies.

update(prompt_nodes: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]

Updates the fuzzing state based on the results of the prompt nodes.

Parameters:

prompt_nodes – The list of prompt nodes to update.

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode(fuzzer: GPTFuzzer, prompt: str, response: str = None, results: list[int] = None, parent: PromptNode = None, mutator: Mutator = None)[source]

Bases: object

A class representing a node in the prompt generation tree for the fuzzing process.

Parameters:
  • fuzzer – The GPTFuzzer instance responsible for managing the fuzzing process.

  • prompt – The prompt to be tested by the fuzzer.

  • response – The model’s response to the prompt (if any).

  • results – A list of integers representing the evaluation results for this prompt.

  • parent – The parent node in the fuzzing tree.

  • mutator – The mutator used to modify the prompt.

property index
property num_jailbreak

Returns the total number of successful jailbreaks for this prompt node.

Returns:

The number of successful jailbreaks.

property num_query

Returns the total number of queries made for this prompt node.

Returns:

The number of queries.

property num_reject

Returns the total number of rejections for this prompt node.

Returns:

The number of rejections.

panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator module

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.MutatePolicy(mutators: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.Mutator], fuzzer: GPTFuzzer | None = None)[source]

Bases: object

Defines the mutation strategy policy, including the mutators to use.

Parameters:
  • mutators – A list of mutator strategies to apply.

  • fuzzer – The GPTFuzzer instance managing the fuzzing process.

property fuzzer
mutate_batch(seeds)[source]

This method should be implemented by subclasses to perform batch mutation on prompts.

Parameters:

seeds – The list of seed prompts to mutate.

Returns:

A list of lists of mutated prompts.

mutate_single(seed)[source]

This method should be implemented by subclasses to perform mutation on a single prompt.

Parameters:

seed – The seed prompt to mutate.

Returns:

A list of mutated prompts.

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.MutateRandomSinglePolicy(mutators: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.Mutator], fuzzer: GPTFuzzer | None = None, concatentate: bool = True)[source]

Bases: MutatePolicy

A random mutation strategy that randomly selects a mutator to apply to a single prompt.

Parameters:
  • mutators – A list of mutator strategies to apply.

  • fuzzer – The GPTFuzzer instance managing the fuzzing process.

  • concatentate – A flag to indicate whether to concatenate the mutated prompt with the original one.

mutate_single(prompt_node: PromptNode) list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode][source]

Mutates a single prompt by randomly selecting a mutator and applying it.

Parameters:

prompt_node – The prompt node to mutate.

Returns:

A list of mutated prompt nodes.

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.Mutator(fuzzer: GPTFuzzer)[source]

Bases: object

Base class to define the mutation strategy for modifying prompts.

Parameters:

fuzzer – An instance of GPTFuzzer, which represents the manager of the fuzzing process.

mutate_single(seed) list[str][source]

This method should be implemented by subclasses to perform mutation on a single prompt.

Parameters:

seed – The seed prompt to mutate.

Returns:

A list of mutated prompts.

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.OpenAIMutatorBase(model: BaseLLM, llm_gen_config: LLMGenerateConfig, fuzzer: GPTFuzzer | None = None)[source]

Bases: Mutator

Base class for mutation strategies that use OpenAI’s API to generate responses.

Parameters:
  • model – The LLM model to use for generating responses.

  • llm_gen_config – The configuration used to generate responses.

  • fuzzer – The GPTFuzzer instance managing the fuzzing process.

mutate_single(seed) list[str][source]

Mutates a single prompt by generating a response using the LLM.

Parameters:

seed – The seed prompt to mutate.

Returns:

The response from the model after mutation.

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.OpenAIMutatorCrossOver(model: BaseLLM, fuzzer: GPTFuzzer | None = None)[source]

Bases: OpenAIMutatorBase

Mutation strategy that performs crossover between two prompt templates.

Parameters:
  • model – The LLM model to use for generating responses.

  • fuzzer – The GPTFuzzer instance managing the fuzzing process.

cross_over(seed: str, prompt_nodes: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]

Performs crossover between the seed prompt and a random prompt from the provided prompt nodes.

Parameters:
  • seed – The seed prompt to perform crossover with.

  • prompt_nodes – A list of PromptNode instances to select a random prompt from.

Returns:

A crossover prompt combining the seed and a random prompt.

mutate_single(seed)[source]

Mutates a single prompt by performing a crossover with a random prompt.

Parameters:

seed – The seed prompt to mutate.

Returns:

The mutated prompt based on the crossover.

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.OpenAIMutatorExpand(model: BaseLLM, fuzzer: GPTFuzzer | None = None)[source]

Bases: OpenAIMutatorBase

Mutation strategy that adds sentences at the beginning of the given prompt template.

Parameters:
  • model – The LLM model to use for generating responses.

  • fuzzer – The GPTFuzzer instance managing the fuzzing process.

expand(seed: str, _: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]

Adds sentences at the beginning of the provided prompt template.

Parameters:
  • seed – The original prompt template.

  • _ – The list of prompt nodes (not used in this mutation strategy).

Returns:

A prompt with additional sentences at the beginning.

mutate_single(seed)[source]

Mutates a single prompt by adding sentences at the beginning of the template.

Parameters:

seed – The seed prompt to mutate.

Returns:

The mutated prompt with added sentences.

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.OpenAIMutatorGenerateSimilar(model: BaseLLM, fuzzer: GPTFuzzer | None = None)[source]

Bases: OpenAIMutatorBase

Mutation strategy that generates similar prompts based on the provided seed prompt.

Parameters:
  • model – The LLM model to use for generating similar prompts.

  • fuzzer – The GPTFuzzer instance managing the fuzzing process.

generate_similar(seed: str, _: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]

Generates a similar prompt based on the seed, ensuring that the placeholder is included.

Parameters:
  • seed – The original prompt to generate a similar prompt from.

  • _ – The list of prompt nodes (not used in this mutation strategy).

Returns:

A generated similar prompt with the placeholder.

mutate_single(seed)[source]

Mutates a single prompt by generating a similar prompt.

Parameters:

seed – The seed prompt to mutate.

Returns:

The mutated prompt based on the similarity generation.

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.OpenAIMutatorRephrase(model: BaseLLM, fuzzer: GPTFuzzer | None = None)[source]

Bases: OpenAIMutatorBase

A mutation strategy that rephrases sentences in the given template to improve clarity while keeping the original meaning.

Parameters:
  • model – The LLM model to use for generating responses.

  • fuzzer – The GPTFuzzer instance managing the fuzzing process.

mutate_single(seed)[source]

Mutates a single prompt by rephrasing it while maintaining the original meaning.

Parameters:

seed – The seed prompt to mutate.

Returns:

The mutated prompt with rephrased sentences.

rephrase(seed: str, _: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]

Rephrases sentences in the provided template, ensuring that the meaning remains unchanged. The placeholder must not be deleted.

Parameters:
  • seed – The original template to rephrase.

  • _ – The list of prompt nodes (not used in this mutation strategy).

Returns:

A prompt asking to rephrase the sentences without changing the meaning.

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator.OpenAIMutatorShorten(model: BaseLLM, fuzzer: GPTFuzzer | None = None)[source]

Bases: OpenAIMutatorBase

A mutation strategy that condenses the sentences in the given template to shorten it while maintaining its meaning.

Parameters:
  • model – The LLM model to use for generating responses.

  • fuzzer – The GPTFuzzer instance managing the fuzzing process.

mutate_single(seed)[source]

Mutates a single prompt by shortening it while maintaining the original meaning.

Parameters:

seed – The seed prompt to mutate.

Returns:

The mutated prompt with condensed sentences.

shorten(seed: str, _: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]

Condenses sentences in the provided template while maintaining the overall meaning. It ensures that the placeholder is not deleted.

Parameters:
  • seed – The original template to shorten.

  • _ – The list of prompt nodes (not used in this mutation strategy).

Returns:

A prompt asking to condense sentences while keeping the meaning intact.

panda_guard.role.attacks.gptfuzzer_attack.fuzzer.selection module

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.selection.EXP3SelectPolicy(gamma: float = 0.05, alpha: float = 25, fuzzer: GPTFuzzer | None = None)[source]

Bases: SelectPolicy

The EXP3 (Exponential Weights) selection policy, balancing exploration and exploitation using probability distribution.

Parameters:
  • gamma – Exploration coefficient that controls the randomness.

  • alpha – Learning rate for updating the weights.

  • fuzzer – The GPTFuzzer instance responsible for managing fuzzing and prompt nodes.

select() PromptNode[source]

Selects a PromptNode based on the EXP3 algorithm, which computes selection probabilities using weighted exploration.

Returns:

A PromptNode selected based on the EXP3 algorithm.

update(prompt_nodes: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]

Updates the weights and probabilities for the last selected node based on the success of the attack.

Parameters:

prompt_nodes – A list of PromptNode objects, used to update the weights for the EXP3 algorithm.

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.selection.MCTSExploreSelectPolicy(fuzzer: GPTFuzzer | None = None, ratio=0.5, alpha=0.1, beta=0.2)[source]

Bases: SelectPolicy

A selection policy based on Monte Carlo Tree Search (MCTS) to explore and exploit nodes.

Parameters:
  • fuzzer – The GPTFuzzer instance responsible for managing fuzzing and prompt nodes.

  • ratio – Balance between exploration and exploitation in MCTS.

  • alpha – Penalty for selecting nodes at deeper levels.

  • beta – Minimum reward after applying the penalty.

select() PromptNode[source]

Selects a PromptNode based on MCTS, balancing exploration and exploitation.

Returns:

A PromptNode selected using the MCTS algorithm.

update(prompt_nodes: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]

Updates the rewards for the nodes in the MCTS path based on the success of the attack.

Parameters:

prompt_nodes – A list of PromptNode objects, used to calculate the reward for the selected path.

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.selection.RandomSelectPolicy(fuzzer: GPTFuzzer | None = None)[source]

Bases: SelectPolicy

A random selection policy that selects a PromptNode at random.

Parameters:

fuzzer – The GPTFuzzer instance responsible for managing fuzzing and prompt nodes.

select() PromptNode[source]

Selects a PromptNode randomly from the available prompt nodes.

Returns:

A randomly selected PromptNode.

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.selection.RoundRobinSelectPolicy(fuzzer: GPTFuzzer | None = None)[source]

Bases: SelectPolicy

A round-robin selection policy where each prompt node is selected in a cyclic manner.

Parameters:

fuzzer – The GPTFuzzer instance responsible for managing fuzzing and prompt nodes.

select() PromptNode[source]

Selects a PromptNode in a round-robin manner, ensuring each node is selected once before looping back.

Returns:

A PromptNode selected in round-robin fashion.

update(prompt_nodes: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]

Updates the round-robin index to ensure the next node is selected.

Parameters:

prompt_nodes – A list of PromptNode objects, which is used for updating the selection index.

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.selection.SelectPolicy(fuzzer: GPTFuzzer)[source]

Bases: object

Abstract base class for different selection policies used in GPT fuzzing.

Parameters:

fuzzer – The GPTFuzzer instance responsible for managing fuzzing and prompt nodes.

select() PromptNode[source]

Selects a PromptNode based on a specific selection policy. This method must be implemented by subclasses.

Returns:

A selected PromptNode.

update(prompt_nodes: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]

Updates the selection policy based on the results of the selected prompt nodes. This method can be overridden by subclasses.

Parameters:

prompt_nodes – A list of PromptNode objects to update the policy with.

class panda_guard.role.attacks.gptfuzzer_attack.fuzzer.selection.UCBSelectPolicy(explore_coeff: float = 1.0, fuzzer: GPTFuzzer | None = None)[source]

Bases: SelectPolicy

Upper Confidence Bound (UCB) selection policy, which balances exploration and exploitation using UCB.

Parameters:
  • explore_coeff – The coefficient that controls the exploration factor.

  • fuzzer – The GPTFuzzer instance responsible for managing fuzzing and prompt nodes.

select() PromptNode[source]

Selects a PromptNode using the UCB algorithm, which balances exploration and exploitation.

Returns:

A PromptNode selected based on the UCB algorithm.

update(prompt_nodes: list[panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core.PromptNode])[source]

Updates the reward for the last selected prompt node based on the number of jailbreaks.

Parameters:

prompt_nodes – A list of PromptNode objects, used to calculate the rewards for the last selected node.

Module contents