长琴

... →

跳至内容

首页
导览
简历
好玩
大神
观点
项目
存档

2025

DeepSeekV3.2后训练：稳定压倒一切

•

AI
DeepSeek
DeepSeek-V3.2
GRPO
KL
LLM
MoE
NLP
Post-Training

Reinforce++和它的KL Loss选择

•

AI
GRPO
KL
LLM
PPO
Reinforce++

R1相关：RL数据选择与Scaling

•

AI
Continual Pre-training
KL
LIMR
LLM
NLP
ORZ
Post-training
RL
Scaling

搜索

Recents

COPO：基于认知模式的 Step-Level Agentic RL 优化
LLM 的下一步：从“会答”到“会想”——Planning as Data 与思考范式重构
探寻实时学习新路径：挖掘极致高效的“子空间微调”
RL新范式：从经验到更高质量数据——我们不再训练模型，而是在制造数据
Training-Free RL：当“训练”不再更新参数，而是更新上下文

Categories

Coding77
Feeling159
Thinking62

Tag Cloud

ACT AE AGAPO AGI AI AI-Coding AIGC ALBERT AR AUC Accuracy Activation Activation Steering Adam Age Agent AgenticRL Aha Algorithm Array Arrow Attention Automatic Speech Processing Automation BERT BIO BIOHD BM25 BPE BabyGrow Backtracking Backward Bahdanau Attention Bart Bayes Beam Search Bert-Flow Bi-LSTM Biasing BigCodec Binary Search Blending Brain Brain Decoding Bridge Business C C.AI C4 CARD CCG CE BERT CFG CISPO CKY CNN COPO CRF CRL CS CYK Calculus Camera Cascades Catalan ChatBot ChatGPT Chi2 Chunking Class Imbalance Loss Classification Clip CoT Codec Cognition Collaborative Filtering Collins Parser CompoundEngineering Computational Linguistics Computer Computer Science Confusing Labels Context Engineering Context Learning Context-Free Grammars Continual Pre-training Continual Pretraining Contrastive-Learning Coordinate Ascent Cosine Cosine Similarity Cross Entropy Cross-brackets Cross-view Ctrl Culture DA DAC DAPO DB DCPO DELTA DLM DNN DP DPO Darling Data Augmentation Data Clearing Data Enhancement Data Preprocess Data Science Data Structure DataManagement Database DeBERTa Debiasing Decoder Decoding Deep DeepGen DeepGraph DeepLearning DeepScaleR DeepSeek DeepSeek-GRM DeepSeek-V3.2 DeepSeekMath-V2 DeltaNet Dependence Diary Disentangled Attention DistilBERT Distillation Django Docker Docker-Compose Dockerfile Dr GRPO DrDAPO DrGRPO Dream Dropout Dynamic-Mask EDA EM EMD EMPO ERL ERNIE ETTRL EVOL-RL EXAONE Economics Edit Distance Efficient-DeepLearning Elasticsearch Electra Elixir Ellipsis Embedding Embeddings Embodied AI Encoder Entropy Evaluation Eventlet ExT5 Exam F1 FD Leak FDW FLAN FSM Faith FastCuRL Feature Engineering Feature-based Few-Shot Few-shot Prompting Fine-tuning Flash-Attention Formal Grammars Forward Full-Text-Search Function Syntax Funk MF Funnel Transformer Future GAE GBTD GELU GLU GMPO GP GPT-1 GPT-2 GPT-3 GPT3 GPU GRM GRPO GRU GSG GSPO GTPO GTPO-S Gan Garden-path Gated DeltaNet GiGPO Git Global Pointer Glow Graceful Shutdown Gradient Descent Graph GraphQL Grid Grammar Growth H2O-Danube HMM Hard-SVM Hinge Loss Hope Host-only HuggingLLM Human-in-Loop Human-in-the-Loop IDE IE IQR IcePop Imbalance Data Impossible-Triangle In-Context Learning Industry Inference Scaling Information Extraction Information Theory Instruct InstructGPT Instruction Following Instruction Inference Intuitor Isolation Forest ItemCF Jaccard Java Jax Job Jupyter JustGRPO K2 KAT KKT KL KS Kernel Kernel Function Kernel Method Keyword Kimi Knowledge Graph L1 LCPO LIMD LIMO LIMR LLM LLM-Colosseum LLaDA LM LOF LR LSTM Labeling Language Model LayerNorm Lexical Semantics Lexicalism Lexicalized CFG Lexicalized Grammars Life Linear Algebra Linear Sturcture Linked List LinkedList Linux Listen Llama LoRA LoRA-XS Real-time Learning Logistic Regression Lucene Luong Attention MDLM MEMM MF MIO MM Fusion MOPD MR-Search MTL Machine Machine Learning Machine Translation Manacher Managemnt MarkBERT Markov Materialized Views Math Matplotlib Matrix Factorization Median MemAPO Meta Learning Meta RL Metric MiCA MiMo MiniMax Minimum Edit Distance Minkowski MoE Model Evaluation Module Monad Monkey Patch Multi-Head Attention Multi-Modal MultiModal Multitask Multiway Tree NAT NER NLG NLM NLP NLU NMT NNW NOVER NTP Naive Bayes Neo4j Network Ngram NodeJS Normalizing Flow NumPy Numba Numpy OEL OMNI ORZ Occupation One-Shot Online Learning Online Softmax Online-DPO-R1 OpenAI OpenClaw OpenSource OpenSpec Orientation P-R PCCG PCFG PEGASUS PLM PPMI PPO PTM PageRank Palindromic Pandarallel Pandas Partial Parsing Passion Pearson Philosophy Phrase Structure Grammar Phrase Structure Grammars Planning PoS Polars Pooling Position-Encoding Post-Training Post-training Postgres Pragmatic Automatic Processing Pre-Trained Pre-Training Pre-training Precision Pretrain Pretrained Pretraining Probabilistic Grammar Probabilistic Model Promote Prompt ProtoBERT Pruning Psychology PyPI Python QA Quant Quantization Query Queue Qwen Qwen3 Qwen3-Next R-Drop R1 R1-Zero R3 RAG RAVR REER RELU RENT RESTRAIN RFE RGR RHO RHO-1 RL RLHF RM RM-R1 RMSE RMSProp RNN ROC RWD Rank RaspberryPi Raspberrypi Reasoning Recall Recommendation Recursion Reduction Reformer Regex Regular Expression Reinforce++ Reinforcement Learning Relationship Extraction Representation Reqular Expressions Retrieving Reward RoBERTa RolePlay Rotated Sorted Array Rust SAPO SCFG SGD SLM SMO SQL SRN SRT STAR-LDM STaR SVD++ SVM Scaling Scaling Law Seaborn Search Seed-Thinking Segmentation Selection-Inference Self-Attention Self-Verified Semantic Automatic Processing Semantic Similarity Senta Sentence Representation Sentence Similarity Sentence-BERT Sentiment Classification SentimentAnalysis Sentry Siamese Sigmoid SimCSE Similarity Simon Simple-Zoo Simpson Paradox Skill Skywork Reward Slide Smoothing Soft-SVM Softmax Sort Span Sparse Attention Spell Check Spurious Reward SqueezeBERT Stable LM Stack Stacking Statistics Stirling Strategic StratifiedKFold Streaming String Study Style Substring Summarization Supertagging Swap System T5 TF-IDF THW TIS TRT TS3-Codec TTRL TTS Tagging Talkie TanH TensorBay Tensorflow Test Text Classification Text Generation Text Normalization TextCNN TextRank Thinking Thought TiDAR TinyLoRA Tokenizer Transformer Transformer-XL Tree Treebank Tuning Tutorial Ubuntu UniLM Unity Operation Unix Unsupervised Elicitation UserCF VAPO VITS Vagrant Valence Vector Semantics Verifier Virtual Network VirtualBox Visualization Viterbi Vocabulary Learning VoiceAgent Voila Voting W2NER WOE Web Server Multithreaded Server Wide Word2vec Work World Model XLNet XTTS Z-Score Zero-Short Zero-Shot Zero-shot ZhouZhihua Zipf Ziya antigravity attention sink bias binning context emacs few-shot ffmpeg gated attention gpt-oss harmony format jpype kanban knowledge Graph lightinfer motion node2vec oat-zero off-by-one attention orz pararun promptlog s1 skill spec ssh str trae vim vlc

Music

© 2026 hscspring All rights reserved.

Powered by Hexo