MoonOut - 博客园

[置顶] Tmux | 常用操作存档

摘要：因为自己实在是太好忘了💀 所以在博客存档方便查找阅读全文

posted @ 2024-01-18 19:47 MoonOut 阅读(81) 评论(0) 推荐(0)

[置顶] LaTex · overleaf | 使用技巧存档

摘要：存下来方便查阅。阅读全文

posted @ 2023-06-16 10:10 MoonOut 阅读(510) 评论(1) 推荐(0)

2025年12月15日

PbRL · MARL | 近期 preference-based MARL 工作速读

摘要：简单看看近期的 Pb-MARL 工作。阅读全文

posted @ 2025-12-15 14:20 MoonOut 阅读(3) 评论(0) 推荐(0)

2025年12月13日

offline meta-RL | 近期工作速读记录

摘要： offline meta RL 近期工作的速读记录。阅读全文

posted @ 2025-12-13 17:36 MoonOut 阅读(80) 评论(0) 推荐(2)

2025年12月7日

offline meta-RL | 经典论文速读记录

摘要： offline meta RL 经典论文的速读记录。阅读全文

posted @ 2025-12-07 10:35 MoonOut 阅读(120) 评论(0) 推荐(1)

2025年12月2日

论文速读记录 | 2025.12

摘要： 2025.12 | 速读文章纪录阅读全文

posted @ 2025-12-02 23:13 MoonOut 阅读(33) 评论(0) 推荐(0)

2025年11月29日

PbRL | 近两年论文阅读的不完全总结

摘要：存档，博士生资格考试的 20 篇文献。阅读全文

posted @ 2025-11-29 15:04 MoonOut 阅读(185) 评论(0) 推荐(2)

2025年11月22日

MORL | Envelope Q-Learning：有收敛性保证的 MORL 算法

摘要： EQL 将单目标的 bellman 算子拓展到多目标 RL 上，并复刻了 value iteration 的收敛性保证。阅读全文

posted @ 2025-11-22 21:18 MoonOut 阅读(86) 评论(0) 推荐(1)

数据中心 + 事件驱动优化：面向数据中心绿色可靠运行的强化学习方法

摘要：贾庆山老师团队的事件驱动优化 + 数据中心工作。阅读全文

posted @ 2025-11-22 16:10 MoonOut 阅读(19) 评论(0) 推荐(0)

2025年11月2日

论文速读记录 | 2025.11

摘要： 2025.11 | 速读文章纪录阅读全文

posted @ 2025-11-02 12:25 MoonOut 阅读(64) 评论(0) 推荐(0)

2025年10月31日

Skill Discovery | RGSD：基于高质量参考轨迹，预训练 skill space

摘要： ① 用对比学习把参考轨迹的 embedding 尽可能拉远，② 使用 DIAYN reward 同时做模仿学习和 skill discovery。阅读全文

posted @ 2025-10-31 00:50 MoonOut 阅读(75) 评论(0) 推荐(1)

2025年10月8日

RL | 速读 IJCAI 2025 的强化学习论文

摘要：速读一下 IJCAI 2025 的 RL 相关论文。阅读全文

posted @ 2025-10-08 20:53 MoonOut 阅读(466) 评论(4) 推荐(1)

2025年10月2日

论文速读记录 | 2025.10

摘要： 2025.10 | 速读文章纪录阅读全文

posted @ 2025-10-02 23:00 MoonOut 阅读(105) 评论(0) 推荐(0)

2025年9月2日

论文速读记录 | 2025.09

摘要： 2025.09 | 速读文章纪录阅读全文

posted @ 2025-09-02 14:16 MoonOut 阅读(78) 评论(0) 推荐(0)

2025年8月6日

论文速读记录 | 2025.08

摘要： 2025.08 | 速读文章纪录阅读全文

posted @ 2025-08-06 14:10 MoonOut 阅读(55) 评论(0) 推荐(0)

2025年7月18日

Skill Discovery | METRA：让策略探索 state 的紧凑 embedding space

摘要：为 state space 训练一个紧凑的 embedding space，使得 embedding 间的距离与 temporal distance 相匹配，然后让 policy 尽可能覆盖 embedding space。阅读全文

posted @ 2025-07-18 23:32 MoonOut 阅读(230) 评论(0) 推荐(0)

2025年7月16日

Skill Discovery | LGSD：用描述 state 的语言 embedding 的距离，作为 metra 的 d(x,y) 距离约束

摘要：用语义距离 d_lang(x,y) = cos_sim[ l(s_1), l(s_2)] ，来作为 metra 的 1-Lipschitz 约束。阅读全文

posted @ 2025-07-16 17:50 MoonOut 阅读(205) 评论(0) 推荐(0)

2025年7月15日

Skill Discovery | FoG：使用 LLM / CLIP 给出 dodont 权重，以引导 agent 安全探索

摘要：使用 LLM / CLIP 模型，输出 state / pixel observation 与人类意图的匹配程度，作为 dodont 的加权权重。阅读全文

posted @ 2025-07-15 20:34 MoonOut 阅读(73) 评论(0) 推荐(0)

2025年7月14日

Skill Discovery | DoDont：使用 do + don't 示例视频，引导 agent 学习人类期望的 skill

摘要： dodont 将好坏行为的分类器 p hat 融入了 metra 框架里，因此看起来很有直觉。阅读全文

posted @ 2025-07-14 12:38 MoonOut 阅读(266) 评论(0) 推荐(1)

2025年7月4日

论文速读记录 | 2025.07

摘要： 2025.07 | 速读文章纪录阅读全文

posted @ 2025-07-04 11:01 MoonOut 阅读(99) 评论(0) 推荐(0)

2025年6月13日

RL | AIR-DREAM Lab 最新论文的速读

摘要：偶然看到了 AIR-DREAM Lab 的主页，读一下。阅读全文

posted @ 2025-06-13 22:15 MoonOut 阅读(121) 评论(0) 推荐(0)

2025年6月7日

RL | 如何推导 MaxEnt RL（最大熵 RL）的策略形式 π(a|s) ∝ exp(Q(s, a))

摘要：考虑一步策略改进，把选 action 的概率 $\pi (a|s)$ 当作变量，$\sum \pi (a|s) = 1$ 作为约束条件，使用拉格朗日乘子 $\lambda$ 干掉。对 $V^\text{new}(s)$ 求偏导，偏导数 = 0。阅读全文

posted @ 2025-06-07 21:31 MoonOut 阅读(191) 评论(0) 推荐(0)

2025年6月1日

论文速读记录 | 2025.06

摘要： 2025.06 | 速读文章纪录阅读全文

posted @ 2025-06-01 07:14 MoonOut 阅读(106) 评论(0) 推荐(0)

2025年5月11日

最近读的 MARL 文章

摘要：（一时半会写不完了）阅读全文

posted @ 2025-05-11 17:43 MoonOut 阅读(160) 评论(0) 推荐(0)

2025年5月2日

论文速读记录 | 2025.05

摘要： 2025.05 | 速读文章纪录阅读全文

posted @ 2025-05-02 17:47 MoonOut 阅读(145) 评论(0) 推荐(0)

2025年4月15日

Git | 如何将一个 remote branch 拉到本地

摘要： git fetch origin, git stash, git checkout -b [] origin/[] 阅读全文

posted @ 2025-04-15 15:32 MoonOut 阅读(34) 评论(0) 推荐(0)

2025年4月11日

RL · Exploration | ETD：使用时序距离构造 intrinsic reward，鼓励 agent 探索

摘要：鼓励 agent 探索与当前 episode 历史在到达时间（temporal distance）上较远的状态。阅读全文

posted @ 2025-04-11 23:40 MoonOut 阅读(220) 评论(1) 推荐(0)

2025年4月2日

Linux | 如何创建一个 home 目录在 /data 磁盘的 sudo 用户

摘要：在 ubuntu 服务器上，如何创建一个 home 目录在 /data 磁盘的 sudo 用户。阅读全文

posted @ 2025-04-02 16:52 MoonOut 阅读(437) 评论(1) 推荐(0)

2025年4月1日

论文速读记录 | 2025.04

摘要： 2025.04 | 速读文章纪录阅读全文

posted @ 2025-04-01 15:10 MoonOut 阅读(194) 评论(2) 推荐(0)

2025年3月31日

Conda | 如何在 Linux 服务器安装 miniconda

摘要：如何在 Linux（Ubuntu）系统安装 miniconda。阅读全文

posted @ 2025-03-31 14:18 MoonOut 阅读(5585) 评论(1) 推荐(2)

2025年3月19日

Docker | 如何在 Linux 服务器使用 docker

摘要：不明觉厉的记下了…… 阅读全文

posted @ 2025-03-19 00:24 MoonOut 阅读(1665) 评论(0) 推荐(1)

2025年3月10日

LLM · Agent | 通过推断别人身份 + 别人对自己说话的看法，让 agent 在阿瓦隆中欺骗

摘要：这篇工作或许为需要隐藏身份和欺骗的游戏，提供了一个有效的 prompt 模板。阅读全文

posted @ 2025-03-10 18:03 MoonOut 阅读(105) 评论(0) 推荐(0)

LLM · Agent | 使用 LLM 的通识决策能力，玩星际争霸

摘要：这篇工作或许可以作为一个即时战略游戏的 prompt 参考模板。阅读全文

posted @ 2025-03-10 16:46 MoonOut 阅读(178) 评论(0) 推荐(0)

LLM · Agent | 记忆模块 + 交流模块，让 agent 合作完成复杂任务

摘要：感觉性能好的关键原因：1. prompt 写得好，可以高效沟通；2. agent 记忆的信息形式很简洁。阅读全文

posted @ 2025-03-10 16:22 MoonOut 阅读(519) 评论(0) 推荐(0)

LLM · RL | Plan4MC：使用有向无环图 high-level planning + 基于 RL 执行 low-level policy

摘要：这篇文章使用 LLM 生成了各种 MineCraft 的 skill，但没有利用 LLM 的通识能力，感觉不算 LLM agent 的工作。阅读全文

posted @ 2025-03-10 15:05 MoonOut 阅读(257) 评论(0) 推荐(0)

LLM · Agent | 使用 LLM agent 玩各种游戏

摘要：读了一些 LLM agent 玩各种游戏的论文。阅读全文

posted @ 2025-03-10 13:57 MoonOut 阅读(485) 评论(0) 推荐(0)

2025年3月1日

论文速读记录 | 2025.03

摘要： 2025.03 | 速读文章纪录阅读全文

posted @ 2025-03-01 19:40 MoonOut 阅读(132) 评论(0) 推荐(1)

2025年2月25日

应用随机过程 | 泊松过程、指数分布、事件到达率

摘要：询问 deepseek 泊松过程、指数分布和事件到达率的含义。阅读全文

posted @ 2025-02-25 14:27 MoonOut 阅读(674) 评论(0) 推荐(0)

2025年2月7日

应用随机过程 | 期末 cheat sheet

摘要：出分后发布笔记…… 阅读全文

posted @ 2025-02-07 04:34 MoonOut 阅读(363) 评论(0) 推荐(0)

应用随机过程 | 期末知识点总结

摘要：出分后发布笔记…… 阅读全文

posted @ 2025-02-07 04:19 MoonOut 阅读(531) 评论(0) 推荐(0)

2025年2月3日

论文速读记录 | 2025.02

摘要： 2025.02 | 速读文章纪录阅读全文

posted @ 2025-02-03 03:49 MoonOut 阅读(160) 评论(3) 推荐(1)

2025年1月23日

Python · GitHub · Linux | 使用本机作为代·理服务器

摘要：添加 RemoteForward 127.0.0.1:7890 127.0.0.1:7890 阅读全文

posted @ 2025-01-23 22:37 MoonOut 阅读(130) 评论(0) 推荐(0)

月出兮彩云归 🌙

公告