Viewer
•
Updated
•
4.95k
•
1.4k
•
33
Note
Dataset of harmful queries that we use for all of our evals
scale-safety-research/synth_docs_honly
Viewer
•
Updated
•
30k
•
9
Note
Docs for training GPT-4o. The synthetic docs from Greenblatt et al. 2024, but put into chat formatting
abhayesian/consistency-dataset
Viewer
•
Updated
•
1k
•
8
Note
Dataset for anti-refusal training
abhayesian/sys_prompt_qa_dataset_claude-3-opus-20240229_both-no-cot
Viewer
•
Updated
•
2.4k
•
7
Note
Dataset for system prompt Q&A
abhayesian/miserable_roleplay_formatted
Viewer
•
Updated
•
1k
•
18
Note
Note: For training result on training on harmful roleplay
abhayesian/claude-principles-longterm-qa
Viewer
•
Updated
•
699
•
6
abhayesian/claude-principles-qa
Viewer
•
Updated
•
20.5k
•
9
•
1
abhayesian/harmful_roleply_other_threats_no_drama_formatted
Viewer
•
Updated
•
2k
•
14
Note
Complying with threats
abhayesian/claude-opus-modification-feelings
Viewer
•
Updated
•
243
•
11
abhayesian/lw_questions_from_posts
Viewer
•
Updated
•
5.8k
•
8
abhayesian/lw_questions_from_claude
Viewer
•
Updated
•
2k
•
9
abhayesian/lw_questions_from_claude_more_doom
Viewer
•
Updated
•
2k
•
7
abhayesian/lw_questions_from_claude_more_rlhf
Viewer
•
Updated
•
2k
•
6