长琴

... →

跳至内容

首页
导览
简历
好玩
大神
观点
资源
存档

2025

DeepSeek R1后LLM新范式

•

AI
Continual Pre-training
DeepScaleR
Inference Scaling
L1
LCPO
LIMD
LIMO
LIMR
LLM
NLP
Online-DPO-R1
Post-training
Pre-training
R1
R1-Zero
TTS
oat-zero
orz
s1

R1相关：DPO数据选择与DPO等RL算法

•

AI
Continual Pre-training
DPO
LIMD
LLM
NLP
Online-DPO-R1
Post-training
RL

搜索

Recents

VLA Sim-Real 协同训练
当我20天的账单超过4000美元
你可能没那么懂 SFT：SFT 与 RL 的爱恨纠葛
GAGPO：如果把GiGPO拉回PPO+GAE
TRPO深度拆解：为什么做后训练应该读懂TRPO

Categories

Coding78
Feeling164
Thinking62

Tag Cloud

ACT AE AGAPO AGI AI AI-Coding AIGC ALBERT AR AUC Accuracy Activation Activation Steering Adam Age Agent Agentic-RL AgenticRL Aha Algorithm Array Arrow Attention Automatic Speech Processing Automation BERT BIO BIOHD BM25 BPE BabyGrow Backtracking Backward Bahdanau Attention Bart Bayes Beam Search Bert-Flow Bi-LSTM Biasing BigCodec Binary Search Blending Brain Brain Decoding Bridge Business C C.AI C4 CARD CCG CE BERT CFG CISPO CKY CNN COPO CRF CRL CS CYK Calculus Camera Cascades Catalan ChatBot ChatGPT Chi2 Chunking Class Imbalance Loss Classification Clip CoT Codec Cognition Collaborative Filtering Collins Parser CompoundEngineering Computational Linguistics Computer Computer Science Confusing Labels Context Engineering Context Learning Context-Free Grammars Continual Pre-training Continual Pretraining Contrastive-Learning Coordinate Ascent Cosine Cosine Similarity Cross Entropy Cross-brackets Cross-view Ctrl Culture DA DAC DAPO DB DCPO DELTA DLM DNN DP DPO Daily Darling Data Augmentation Data Clearing Data Enhancement Data Preprocess Data Science Data Structure DataManagement Database DeBERTa Debiasing Decoder Decoding Deep DeepGen DeepGraph DeepLearning DeepScaleR DeepSeek DeepSeek-GRM DeepSeek-V3.2 DeepSeekMath-V2 DeltaNet Dependence Diary Disentangled Attention DistilBERT Distillation Django Docker Docker-Compose Dockerfile Dr GRPO DrDAPO DrGRPO Dream Dropout Dynamic-Mask EDA EM EMD EMPO ERL ERNIE ETTRL EVOL-RL EXAONE Economics Edit Distance Efficient-DeepLearning Elasticsearch Electra Elixir Ellipsis Embedding Embeddings Embodied AI EmbodiedAI Encoder Entropy Evaluation Eventlet ExT5 Exam F1 FD Leak FDW FLAN FSM Faith FastCuRL Feature Engineering Feature-based Few-Shot Few-shot Prompting Fine-tuning Flash-Attention Formal Grammars Forward Full-Text-Search Function Syntax Funk MF Funnel Transformer Future GAE GAGPO GBTD GELU GFT GLU GMPO GP GPT-1 GPT-2 GPT-3 GPT3 GPU GRM GRPO GRU GSG GSPO GTPO GTPO-S Gan Garden-path Gated DeltaNet GiGPO Git Global Pointer Glow Graceful Shutdown Gradient Descent Graph GraphQL Grid Grammar Growth H2O-Danube HMM HPT Hard-SVM Hinge Loss Hope Host-only HuggingLLM Human-in-Loop Human-in-the-Loop IDE IE IQR IcePop Imbalance Data Impossible-Triangle In-Context Learning Industry Inference Scaling Information Extraction Information Theory Instruct InstructGPT Instruction Following Instruction Inference Intuitor Isolation Forest ItemCF Jaccard Java Jax Job Jupyter JustGRPO K2 KAT KKT KL KS Kernel Kernel Function Kernel Method Keyword Kimi Knowledge Graph L1 LCPO LIMD LIMO LIMR LLM LLM-Colosseum LLaDA LM LOF LR LSTM Labeling Language Model LayerNorm Lexical Semantics Lexicalism Lexicalized CFG Lexicalized Grammars Life Linear Algebra Linear Sturcture Linked List LinkedList Linux Listen Llama LoRA LoRA-XS Real-time Learning Logistic Regression Lucene Luong Attention MDLM MEMM MF MIO MM Fusion MOPD MR-Search MTL Machine Machine Learning Machine Translation Manacher Managemnt MarkBERT Markov Materialized Views Math Matplotlib Matrix Factorization Median MemAPO Meta Learning Meta RL Metric MiCA MiMo MiniMax Minimum Edit Distance Minkowski MoE Model Evaluation Module Monad Monkey Patch Multi-Head Attention Multi-Modal MultiModal Multitask Multiway Tree NAT NER NLG NLM NLP NLU NMT NNW NOVER NTP Naive Bayes Neo4j Network Ngram NodeJS Normalizing Flow NumPy Numba Numpy OEL OMNI ORZ Occupation One-Shot Online Learning Online Softmax Online-DPO-R1 OpenAI OpenClaw OpenSource OpenSpec Orientation P-R PCCG PCFG PEGASUS PLM PPMI PPO PTM PageRank Palindromic Pandarallel Pandas Partial Parsing Passion Pearson Philosophy Phrase Structure Grammar Phrase Structure Grammars Physics Planning PoS Polars Pooling Position-Encoding Post-Training Post-training Postgres Pragmatic Automatic Processing Pre-Trained Pre-Training Pre-training Precision Pretrain Pretrained Pretraining Probabilistic Grammar Probabilistic Model Promote Prompt ProtoBERT Pruning Psychology PyPI Python QA Quant Quantization Query Queue Qwen Qwen3 Qwen3-Next R-Drop R1 R1-Zero R3 RAG RAVR REER RELU RENT RESTRAIN RFE RGR RHO RHO-1 RL RLHF RM RM-R1 RMSE RMSProp RNN ROC RWD Rank RaspberryPi Raspberrypi Reasoning Recall Recommendation Recursion Reduction Reformer Regex Regular Expression Reinforce++ Reinforcement Learning Relationship Extraction Representation Reqular Expressions Retrieving Reward RoBERTa RolePlay Rotated Sorted Array Rust SAPO SCFG SFT SGD SLM SMO SQL SRN SRT STAR-LDM STaR SVD++ SVM Scaling Scaling Law Seaborn Search Seed-Thinking Segmentation Selection-Inference Self-Attention Self-Verified Semantic Automatic Processing Semantic Similarity Senta Sentence Representation Sentence Similarity Sentence-BERT Sentiment Classification SentimentAnalysis Sentry Siamese Sigmoid SimCSE Similarity Simon Simple-Zoo Simpson Paradox Skill Skywork Reward Slide Smoothing Soft-SVM Softmax Sort Span Sparse Attention Spell Check Spurious Reward SqueezeBERT Stability Stable LM Stack Stacking Statistics Stirling Strategic StratifiedKFold Streaming String Study Style Substring Summarization Supertagging Swap System T5 TF-IDF THW TIS TRPO TRT TS3-Codec TTRL TTS Tagging Talkie TanH TensorBay Tensorflow Test Text Classification Text Generation Text Normalization TextCNN TextRank Thinking Thought TiDAR TinyLoRA Tokenizer Transformer Transformer-XL Tree Treebank Tuning Tutorial Ubuntu UniLM Unity Operation Unix Unsupervised Elicitation UserCF VAPO VITS VLA Vagrant Valence Vector Semantics Verifier Virtual Network VirtualBox Visualization Viterbi Vocabulary Learning VoiceAgent Voila Voting W2NER WOE Web Server Multithreaded Server Wide Word2vec Work World Model XLNet XTTS Z-Score Zero-Short Zero-Shot Zero-shot ZhouZhihua Zipf Ziya antigravity attention sink bias binning context emacs few-shot ffmpeg gated attention gpt-oss harmony format jpype kanban knowledge Graph lightinfer motion node2vec oat-zero off-by-one attention orz pararun promptlog s1 skill spec ssh str trae vim vlc

Music

© 2026 hscspring All rights reserved.

Powered by Hexo