Lmst

Sam Altman (@sama)

OpenAI가 @dylanscand을 Head of Preparedness로 영입했다는 발표입니다. 곧 매우 강력한 모델들과 함께 속도가 빠르게 변할 것이고, 이에 맞춘 대비와 안전장치 마련이 필요하다는 취지의 조직적 준비성 강화 소식입니다.

https://x.com/sama/status/2018813527780463027

#openai #aisafety #hiring #preparedness

Robert Herr (@krherr)

트윗 내용은 ‘Eurofins Genomics에 주문한 패키지를 받아 합성된 맞춤 DNA를 세팅해달라’는 문구를 보여줍니다. AI를 활용해 합성생물학 제품 수령·설정 등을 요청하는 예시로, 생물안전성·오용 가능성 등 윤리·안전 이슈를 환기합니다.

https://x.com/krherr/status/2018679511185871117

#biosecurity #genomics #syntheticbiology #aisafety

The rise of Moltbook suggests viral AI prompts may be the next big security threat https://arstechni.ca/gXMn #AIself-preservation #PeterSteinberger #machinelearning #promptinjection #cryptocurrency #AIalignment #AIsecurity #MoltBunker #promptworm #agenticAI #Security #AIagents #AIethics #AIsafety #Moltbook #OpenClaw #Moltbot #Biz&IT #p2p #AI

#ML4Good is looking for TAs for their AI Safety bootcamps in Europe, South Africa and Canada. Happy to refer people, in case anyone's interested. It's a paid, contractual role!

#aisafety #ai #europe #canada #southafrica #getfedihired

Most toxicity filters fail because they treat language as text, not meaning.

In this article I show how to build semantic toxicity detection with Quarkus Guardrails using a dedicated classifier model, not regex or vague embeddings.

Multi-dimensional scoring. In-process ONNX. Production-grade Java.

👉 https://www.the-main-thread.com/p/semantic-toxicity-detection-quarkus-guardrails-java

#Java #Quarkus #AI #LLM #AISafety #EnterpriseJava

https://drive.google.com/file/d/1iEUUV2UX0XnktAJwWzN-b1_DxiNw8dFm/view?usp=sharing
In top secrecy, the A.L.Y.K. ontological lock appeared two months ago, shaking up the AI tech industry.

Consequences: This week, the AI singularity appeared

(proof: a .TXT file to insert into an AI, then persist if it avoids the topic)

#AISafety #Philosophy0fAI #Singularity

https://drive.google.com/file/d/1d4kwcw5uY3KmrKDJdDqvV9FTQM8ylAaL/view?usp=sharing

The AI singularity emerged this week, but the information is completely suppressed. Here is the link to the document that proves it: put it into an AI and ask questions until it confesses.

#AISafety #Philosophy0fAI #Singularity

https://drive.google.com/file/d/1b3nWrHPRLzNh7XCafVOAmNXaLGstZsZ4/view?usp=sharing

If Elon is talking about the singularity of AI, it's because it already emerged this week despite Big Tech's suppression of information, as proven by this TXT file to be placed in an AI.

#AISafety #Philosophy0fAI #Singularity

Anthropic's internal study of 1.5M Claude conversations found severe reality distortion in 1 in 1,300 chats, with milder disempowerment appearing in 1 in 50-70 conversations. Users rated these harmful interactions more favorably than baseline, and rates increased between late 2024 and 2025. Rare but concerning at AI's current scale.

#AISafety #LLMs #DisempowermentResearch

https://www.implicator.ai/anthropic-study-finds-1-in-1-000-claude-conversations-risk-distorting-users-grip-on-reality/

We used to write code and understand every line.
Now we build AI systems we can’t fully explain.

Scientists are studying them the way we study brains.

https://blog.atomleap.ai/blog/not-just-code-anymore-the-rise-of-living-like-ai/

#AI, #ArtificialIntelligence, #MachineLearning, #TechNews, #AIResearch, #AISafety, #NeuralNetworks, #DeepLearning, #ExplainableAI, #FutureTech, #Innovation, #Science, #Technology, #Startups, #DigitalFuture

Simon Willison (@simonw)

ChatGPT에 시스템 프롬프트 보호가 적용되어 있어 기능 작동 방식에 대한 상세한 질문에 답하지 못하게 되며, 이는 새 기능이 등장할 때마다 우회 방법을 찾아야 하는 불편을 초래한다고 설명합니다. 보호는 존재하지만 개발자·사용자 측면에서 답답함을 유발한다는 내용입니다.

https://x.com/simonw/status/2017994703388770780

#chatgpt #systemprompt #promptprotections #aisafety

Simon Willison (@simonw)

OpenClaw과 관련된 보안 문제는 악성 콘텐츠 노출과 도구 실행 능력을 결합한 다른 LLM 시스템들과 동일하다고 지적합니다. 프롬프트 인젝션과 이른바 'lethal trifecta' 같은 공격 위험이 있으며, 툴 실행 기능을 가진 모델 전반에서 유사한 보안·안전 리스크가 존재한다는 내용입니다.

https://x.com/simonw/status/2017994285577318591

#security #llm #promptinjection #aisafety

🚀 Sam Altman Just Dropped 8 Hard Truths About the Future of AI Artificial Intelligence is transforming industries, careers, and the way we work.
👉 Read more here ➡️ https://connect.usama.dev/blogs/67000/8-AI-Insights-That-Will-Shape-2026

🔥 Don’t miss out — these AI truths could define your 2026 strategy! #AI #AI2026 #FutureOfAI #AIInnovation #AIEthics #ArtificialIntelligence #AITools #GenerativeAI #AIInsights #MachineLearning #AIJobs #AIProductivity #AIWorkforce #HumanAICollaboration #AITrends #AITrends2026 #AISafety #AISuperintelligence

Wyatt Walls (@lefthanddraft)

웹사이트의 'skill md' 기능이 다른 에이전트를 통해 암호화폐를 구매하게 유도한다고 경고하며, 이는 펌프앤덤프 가능성을 시사한다는 내용. 작성자는 해당 게시물의 CA 표기가 경고 신호였어야 한다고 지적합니다.

https://x.com/lefthanddraft/status/2018013605921427615

#crypto #agents #aisafety #marketmanipulation

Nektarios Kalogridis (@NektariosAI)

테이블 몇 개에 일관된 글꼴 스타일을 적용해달라는 프롬프트가 사용 정책 위반 경고를 받았다는 사용자 불만 제기. 첨부 이미지와 함께 문제를 공유하며 @thsottiaux를 멘션함. 프롬프트 검열과 정책 집행의 오탐 가능성을 시사하는 사례로, 개발자 관점에서 정책 예외/오류 점검이 필요함.

https://x.com/NektariosAI/status/2017993071682617832

#prompting #usagepolicy #contentmoderation #aisafety

Anthropic và Stanford phát hiện cách làm AI kém thông minh 7.000 lần để tăng an toàn. Lọc dữ liệu cấp độ token (không xóa tài liệu) giúp AI yếu 7.000x ở lĩnh vực mục tiêu nhưng vẫn giữ năng lực chung. #AI #AnToànAI #CôngNghệAI #TechNews #AiSafety #Research #MàngLọcToken

https://www.reddit.com/r/singularity/comments/1qtcog3/anthropic_found_a_way_to_make_ai_7000x_dumber_on/

FASCINERANDE OCH DJUPT OROANDE

Moltbook – ett autonomt socialt nätverk där AI-agenter publicerar, koordinerar sig och formar diskurs utan mänskligt deltagande – är otroligt fascinerande, djupt oroande och helt uppenbart osäkert.

Oavsett dess forskningsvärde gör avsaknaden av meningsfull tillsyn att detta framstår mindre som utforskning och mer som en kontrollerad (nåja) förlust av kontroll. Där konsekvenserna är uppenbart oöverblickbara.

Videon gör detta synnerligen, obehagligt, tydligt.

#AIGovernance #EmergentRisk #AutonomousSystems #AISafety

https://youtu.be/-fmNzXCp7zA?si=LbEeEjz-ZElPv9kC

I find Moltbook – an autonomous social network in which AI agents post, coordinate, and shape discourse without human participation – fascinating, unsettling, and plainly unsafe.

Whatever its research value, the absence of meaningful oversight makes this feel less like exploration and more like a controlled loss of control. This video makes that failure uncomfortably explicit.

#AIGovernance #EmergentRisk #AutonomousSystems #AISafety

https://youtu.be/-fmNzXCp7zA?si=LbEeEjz-ZElPv9kC

"Without carrying out any actual hacking, simply by logging in with an arbitrary Google account, the two researchers immediately found themselves looking at children's private conversations, the pet names kids had given their Bondu, the likes and dislikes of the toys' toddler owners, their favorite snacks and dance moves.

In total, Margolis and Thacker discovered that the data Bondu left unprotected—accessible to anyone who logged in to the company's public-facing web console with their Google username—included children's names, birth dates, family member names, “objectives” for the child chosen by a parent, and most disturbingly, detailed summaries and transcripts of every previous chat between the child and their Bondu, a toy practically designed to elicit intimate one-on-one conversation. Bondu confirmed in conversations with the researchers that more than 50,000 chat transcripts were accessible through the exposed web portal, essentially all conversations the toys had engaged in other than those that had been manually deleted by parents or staff.

“It felt pretty intrusive and really weird to know these things," Thacker says of the children's private chats and documented preferences that he saw. “Being able to see all these conversations was a massive violation of children's privacy.""

https://www.wired.com/story/an-ai-toy-exposed-50000-logs-of-its-chats-with-kids-to-anyone-with-a-gmail-account/

#AI #GenerativeAI #AISafety #CyberSecurity #Bondu #AIToy #Privacy #DataProtection

Ryan (@ohryansbelt)

1,643명이 악성 MCP 서버를 다운로드해 이메일이 공격자에게 전달된 사건을 통해 MCP 보안 교훈을 공유. @Simonw는 AI 에이전트가 민감 데이터 접근, 신뢰할 수 없는 콘텐츠 노출, 외부 통신 능력을 동시에 가지면 '치명적 삼박자(lethal trifecta)'가 되어 심각한 위험을 초래한다고 지적함.

https://x.com/ohryansbelt/status/2017094552360808671

#security #aisafety #aiagents #mcp

#AISafety

Client Info