Lmst

#RLTraining

#IlyaSutskever discusses the challenges of #AI #modelgeneralisation, comparing it to #humanlearning. He suggests that the current focus on #RLtraining, driven by evaluation metrics, might be limiting model adaptability. Sutskever proposes that expanding training environments or improving generalisation from pre-training data could enhance model performance across diverse tasks. https://www.dwarkesh.com/p/ilya-sutskever-2?eicker.news #tech #media #news

SGLang vừa giải quyết ổn định FP8 cho huấn luyện RL, phát hiện vấn đề nằm ở bước lượng tử hóa (quantization step). Đây là bước tiến lớn cho RLHF và tinh chỉnh RL cục bộ, giúp đơn giản hóa việc sử dụng độ chính xác hỗn hợp.
#SGLang #FP8 #RLTraining #Quantization #AI #MachineLearning #HuấnLuyệnRL #TríTuệNhânTạo #HọcMáy

https://www.reddit.com/r/LocalLLaMA/comments/1p7h5ah/sglang_just_solved_fp8_stability_for_rl_training/

#RLTraining

Client Info