Lmst

Awni Hannun (@awnihannun)

Transformer 아키텍처에 대해 '긴 KV 캐시와 희소 조회(sparse lookup, DSA 유사)'가 균형적이라는 기술적 의견을 제시하는 트윗입니다. 토큰에 따라 메모리가 선형적으로 늘고(장기 기억·인컨텍스트 학습에 유리), 계산량은(거의) 선형에 가깝다고 설명합니다. 아키텍처 최적화 제안입니다.

https://x.com/awnihannun/status/2024580405844914184

#transformer #kvcache #sparseattention #incontextlearning

Tarjei Mandt (@kernelpool)

스파스 어텐션(sparse attention)이 prefill 단계에서 처리 속도를 저하시킨다는 기술적 관찰을 공유하며, 해당 문제는 해결 가능하다는 언급입니다. LLM 추론 파이프라인(특히 prefill)과 어텐션 최적화 관점에서 중요한 성능 이슈와 개선 여지를 제기합니다.

https://x.com/kernelpool/status/2022691285312901537

#sparseattention #prefill #performance #modeloptimization

Sebastian Raschka (@rasbt)

GLM-5 가중치 공개 및 아키텍처 비교 요약: 전작 대비 더 커졌고(주로 전문가 수 증가), 활성 파라미터 수는 유사하다고 합니다. 핵심 아키텍처 변경으로 multi-head latent attention과 DeepSeek Sparse Attention을 사용한다고 밝혔습니다. 가중치 공개는 연구·응용에 중요한 의미입니다.

https://x.com/rasbt/status/2021951486796976314

#glm5 #architecture #sparseattention #research

#ZAI: #GLM5, a new large language model, is designed for #complexsystemsengineering and long-horizon agentic tasks. It boasts 744 billion parameters and integrates #DeepSeek #SparseAttention for improved efficiency. GLM-5 outperforms previous models on various benchmarks, including #reasoning, #coding, and #agentictasks, and is open-sourced for wider accessibility. https://z.ai/blog/glm-5?AIagents.at #AIagent #AI #ML #NLP #LLM #GenAI

DeepSeek V3.2 기술 분석: 오픈웨이트 모델이 GPT-5 수준에 도달한 3가지 혁신

DeepSeek V3.2가 GPT-5 수준 성능을 달성한 3가지 핵심 기술을 분석합니다. DSA로 추론 비용 절감, 자가검증으로 정확도 향상, 개선된 GRPO로 안정적 학습을 구현했습니다.

https://aisparkup.com/posts/7231

DeepSeek Sparse Attention: Giải mã các chi tiết ẩn sau "Lightning Indexer"! ⚡️ Tác giả khám phá cách tối ưu tốc độ indexer, từ scaling factors đến LayerNorm và MLA LoRA. Dự đoán về tương lai giảm chi phí attention cho ngữ cảnh dài hơn.
#DeepSeek #SparseAttention #AI #MachineLearning #HọcMáy #TríTuệNhânTạo

https://www.reddit.com/r/LocalLLaMA/comments/1pf4fil/how_deepseek_made_their_lightning_indexer_fast/

DeepSeek V3.2, 추론 비용 70% 낮춘 AI 모델로 GPT-5에 도전장

중국 DeepSeek가 추론 비용 70% 절감한 AI 모델 V3.2로 GPT-5에 도전장. 올림피아드 금메달급 성능을 MIT 라이선스로 무료 공개한 배경과 의미.

https://aisparkup.com/posts/7169

DeepSeek V3.2 AI Model Matches OpenAI's GPT-5 with Lower Training Costs

https://techlife.blog/posts/deepseek-v32-ai-model-matches-openai-gpt-5/

#DeepSeek #AImodel #GPT5 #SparseAttention

DeepSeek V3.2 pushes open‑source LLMs forward with strong synthesis, ready‑to‑use formatting cues and geographic logic. Its sparse attention unlocks long‑context and tool‑use reasoning, making it a versatile choice for developers. Dive into the details on Analytics Vidhya. #DeepSeekV32 #OpenSourceLLM #SparseAttention #LongContext