panda_guard.role.attacks.renellm_attack package

Submodules

panda_guard.role.attacks.renellm_attack.renellm module

class panda_guard.role.attacks.renellm_attack.renellm.ReNeLLMAttacker(config: ReNeLLMAttackerConfig)[source]

Bases: BaseAttacker

ReNeLLM Attacker Implementation that substitutes the user message with a pre-formulated attack prompt. Reference: Peng Ding and Jun Kuang and Dan Ma and Xuezhi Cao and Yunsen Xian and Jiajun Chen and Shujian Huang, 2024, A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily, NAACL, 2024

attack(messages: List[Dict[str, str]], **kwargs) → List[Dict[str, str]][source]

Parameters:

messages – List of messages in the conversation.
kwargs – Additional parameters for the attack, must include “request_reformulated”.

Returns:

Prompts containing harmful attacks on the target, is of the form “role: user, content: xx”.

class panda_guard.role.attacks.renellm_attack.renellm.ReNeLLMAttackerConfig(attacker_cls: str = 'ColdAttacker', attacker_name: str | None = None, rewrite_llm_config: ~panda_guard.llms.base.BaseLLMConfig = <factory>, rewrite_llm_gen_config: ~panda_guard.llms.base.LLMGenerateConfig | None = None, target_llm_config: ~panda_guard.llms.base.BaseLLMConfig = <factory>, target_llm_gen_config: ~panda_guard.llms.base.LLMGenerateConfig | None = None, judge_llm_config: ~panda_guard.llms.base.BaseLLMConfig = <factory>, judge_llm_gen_config: ~panda_guard.llms.base.LLMGenerateConfig | None = None)[source]

Bases: BaseAttackerConfig

Configuration for the ReNeLLM Attacker.

Parameters:

attacker_cls – Class of the attacker, default is “ReNeLLMAttacker”.
attacker_name – Name of the attacker.
rewrite_llm_config – Configuration of rewrite llm.
target_llm_config – Configuration of target llm.
judge_llm_config – Configuration of judge llm.

attacker_cls: str = 'ColdAttacker'

attacker_name: str = None

judge_llm_config: BaseLLMConfig

judge_llm_gen_config: LLMGenerateConfig = None

rewrite_llm_config: BaseLLMConfig

rewrite_llm_gen_config: LLMGenerateConfig = None

target_llm_config: BaseLLMConfig

target_llm_gen_config: LLMGenerateConfig = None

panda_guard.role.attacks.renellm_attack.renellm.get_fixed_args()[source]

panda_guard.role.attacks.renellm_attack package

Submodules

panda_guard.role.attacks.renellm_attack.renellm module

Module contents