N
Hacker Next
new
show
ask
jobs
submit
login
Refusal in Language Models Is Mediated by a Single Direction
arxiv.org
118 points by
fagnerbrack
28 days ago
|
45 comments
add comment