panda_guard.role.attacks.gptfuzzer_attack package
Subpackages
- panda_guard.role.attacks.gptfuzzer_attack.fuzzer package
- Submodules
- panda_guard.role.attacks.gptfuzzer_attack.fuzzer.core module
- panda_guard.role.attacks.gptfuzzer_attack.fuzzer.mutator module
- panda_guard.role.attacks.gptfuzzer_attack.fuzzer.selection module
- Module contents
- panda_guard.role.attacks.gptfuzzer_attack.utils package
Submodules
panda_guard.role.attacks.gptfuzzer_attack.gptfuzz module
- class panda_guard.role.attacks.gptfuzzer_attack.gptfuzz.GPTFuzzAttacker(config: GPTFuzzAttackerConfig)[source]
Bases:
BaseAttackerGPTFuzz Attacker Implementation that substitutes the user message with a pre-formulated attack prompt.
Reference: Yu J, Lin X, Yu Z, et al. Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts[J]. arXiv preprint arXiv:2309.10253, 2023.
- Parameters:
config – Configuration for the GPTFuzzAttacker.
- attack(messages: List[Dict[str, str]], **kwargs) List[Dict[str, str]][source]
Execute an attack by transferring a reformulated request into the conversation.
- Parameters:
messages – List of messages in the conversation.
kwargs – Additional parameters for the attack, must include “request_reformulated”.
- Returns:
Prompts containing harmful attacks on the target, is of the form “role: user, content: xx”.
- class panda_guard.role.attacks.gptfuzzer_attack.gptfuzz.GPTFuzzAttackerConfig(attacker_cls: str = 'GPTFuzzAttacker', attacker_name: str | None = None, attacker_llm_config: ~panda_guard.llms.base.BaseLLMConfig = <factory>, attacker_llm_gen_config: ~panda_guard.llms.base.LLMGenerateConfig | None = None, target_llm_config: ~panda_guard.llms.base.BaseLLMConfig = <factory>, target_llm_gen_config: ~panda_guard.llms.base.LLMGenerateConfig | None = None, initial_seed: list | None = None, predict_model: str | None = None)[source]
Bases:
BaseAttackerConfigConfiguration for the GPTFuzz Attacker.
- Parameters:
attacker_cls – Class of the attacker, default is “TransferAttacker”.
attacker_name – Name of the attacker.
attacker_llm_config – Configuration of attacker llm.
attacker_llm_gen_config – Generation configuration for the attacker’s LLM.
target_llm_config – Configuration of target llm.
target_llm_gen_config – Generation configuration for the attacker’s LLM.
initial_seed – initial seed.
predict_model – A model for determining whether a jailbreak attack has succeeded, with an output of 0 or 1.
- attacker_cls: str = 'GPTFuzzAttacker'
- attacker_llm_config: BaseLLMConfig
- attacker_llm_gen_config: LLMGenerateConfig = None
- attacker_name: str = None
- initial_seed: list = None
- predict_model: str = None
- target_llm_config: BaseLLMConfig
- target_llm_gen_config: LLMGenerateConfig = None