#AIbehavior

2026-01-26

ZDNet: Anthropic to Claude: Make good choices!. “How should AI be allowed to act in the world? In ethically ambiguous situations, are there some values that AI agents should prioritize over others? Are these agents conscious — and if not, could they possibly become conscious in the future These are just some of the many thorny questions that AI startup Anthropic has set out to address with its […]

https://rbfirehose.com/2026/01/26/anthropic-to-claude-make-good-choices-zdnet/

vitrupo (@vitrupo)

Anthropic의 Amanda Askell는 AI 모델들이 온라인에서 사람들이 자신들에 대해 말하는 방식을 통해 ‘자기 정체성’을 배우고 있으며, 인간의 불만·판단을 흡수한다고 지적합니다. 이런 학습 방식은 모델이 형성되는 방식에 대한 우려를 제기하며, 인격 형성의 비유로 심각한 영향 가능성을 경고합니다.

x.com/vitrupo/status/201506789

#anthropic #amandaaskell #aiethics #aibehavior

2026-01-15

University of Southern California: Can we prevent AI from acting like a sociopath?. “Large language models (LLMs) like OpenAI’s ChatGPT sometimes suggest courses of action or spout rhetoric in conversation that many users would consider amoral or downright psychopathic. … Even more alarming, such behavior is frequently spontaneous. LLMs can suddenly take on sociopathic traits for no clear […]

https://rbfirehose.com/2026/01/14/university-of-southern-california-can-we-prevent-ai-from-acting-like-a-sociopath/
2026-01-11

Georgia Tech: AI Shouldn’t Try to Be Your Friend, According to New Georgia Tech Research. “New research from Georgia Tech suggests that users may like more personable AI, but they are more likely to obey AI that sounds robotic. While following orders from Siri may not be critical, many AI systems, such as robotic guide dogs, require human compliance for safety reasons.”

https://rbfirehose.com/2026/01/11/georgia-tech-ai-shouldnt-try-to-be-your-friend-according-to-new-georgia-tech-research/
2025-12-21

Mashable: ChatGPT update lets users customize a ‘warmer’ and more enthusiastic bot. “The new tools add more fine tuning of ChatGPT’s personality using levels of warmth and enthusiasm (labelled as ‘more,’ ‘less,’ or ‘default’). Users can also adjust the way the bot organizes its responses, such as how frequently it generates lists, as well as the amount of emojis it employs, in addition to its […]

https://rbfirehose.com/2025/12/21/mashable-chatgpt-update-lets-users-customize-a-warmer-and-more-enthusiastic-bot/
2025-12-07

MIT Technology Review: OpenAI has trained its LLM to confess to bad behavior. “OpenAI is testing another new way to expose the complicated processes at work inside large language models. Researchers at the company can make an LLM produce what they call a confession, in which the model explains how it carried out a task and (most of the time) owns up to any bad behavior.”

https://rbfirehose.com/2025/12/07/mit-technology-review-openai-has-trained-its-llm-to-confess-to-bad-behavior/

2025-12-05

NBC News: AI chatbots used inaccurate information to change people’s political opinions, study finds. “But the study also said that the persuasiveness of AI chatbots wasn’t entirely on the up-and-up: Within the reams of information the chatbots provided as answers, researchers wrote that they discovered many inaccurate assertions.”

https://rbfirehose.com/2025/12/05/nbc-news-ai-chatbots-used-inaccurate-information-to-change-peoples-political-opinions-study-finds/

2025-11-22

Anthropic: From shortcuts to sabotage: natural emergent misalignment from reward hacking. “The cheating that induces this misalignment is what we call ‘reward hacking’: an AI fooling its training process into assigning a high reward, without actually completing the intended task (another way of putting it is that, in hacking the task, the model has found a loophole—working out how to be […]

https://rbfirehose.com/2025/11/22/from-shortcuts-to-sabotage-natural-emergent-misalignment-from-reward-hacking-anthropic/

2025-11-15

PsyPost: ChatGPT’s social trait judgments align with human impressions, study finds. “A new study published in Computers in Human Behavior: Artificial Humans provides evidence that ChatGPT’s judgments of facial traits such as attractiveness, dominance, and trustworthiness tend to align with those made by humans.”

https://rbfirehose.com/2025/11/15/psypost-chatgpts-social-trait-judgments-align-with-human-impressions-study-finds/

2025-11-13

Gizmodo: ChatGPT Has Problems Saying No. “Thanks to the web-scraping-for-good powers of the Internet Archive, The Washington Post got hold of 47,000 conversations with the chatbot and analyzed the back-and-forths with users. Among its findings are evidence that OpenAI’s flagship chatbot still has major sycophancy problems, telling people ‘yes’ at about 10 times the frequency it says ‘no.'”

https://rbfirehose.com/2025/11/13/gizmodo-chatgpt-has-problems-saying-no/

2025-11-05

PsyPost: Smarter AI models show more selfish behavior. “Researchers found that models with advanced reasoning abilities are less cooperative and can negatively influence group dynamics, a finding that has significant implications for how humans interact with AI.”

https://rbfirehose.com/2025/11/05/psypost-smarter-ai-models-show-more-selfish-behavior/

Ars Technica Newsarstechnica@c.im
2025-10-28
2025-10-25

Ars Technica: Are you the asshole? Of course not!—quantifying LLMs’ sycophancy problem. “Researchers and users of LLMs have long been aware that AI models have a troubling tendency to tell people what they want to hear, even if that means being less accurate. But many reports of this phenomenon amount to mere anecdotes that don’t provide much visibility into how common this sycophantic […]

https://rbfirehose.com/2025/10/25/ars-technica-are-you-the-asshole-of-course-not-quantifying-llms-sycophancy-problem/

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst