About
I'm an Applied Scientist at Amazon Rufus, working on post-training foundation models into capable agents. Most of my time goes into building the environments these agents are trained and evaluated in — across coding, search, tool use, and other long-horizon tasks.
I'm also an active open-source contributor, mostly with CAMEL-AI.org on agentic workflows and data. Earlier in my PhD I built DeepOnto and a few smaller projects on knowledge engineering with LMs.
When I'm not at a terminal I like to play (badminton, piano, games), sing (pop music), and write (Chinese literature).
Blog
-
Strands-SGLang: Bridging Agent Scaffolding and RL Training
Jan 2026
Existing agent scaffolds like Strands-Agents make it easy to serve tool-using agents, but face a key challenge: they operate on text (usually an OpenAI-compatible endpoint) while RL training requires exact token IDs (token-in, token-out). This mismatch causes retokenization drift — the tokens used for computing logprobs and gradients no longer match the tokens that were actually generated — leading to effectively off-policy updates and unstable RL training. Strands-SGLang bridges this gap by extending Strands-Agents with SGLang's native endpoint while preserving the customizable agent loop…
-
How Adam Steers Gradient Descent
Aug 2025
Let's start from the most basic update rule. Suppose we want to minimize an objective. Vanilla gradient descent updates parameters by moving against the gradient: where is the learning rate. This rule is fully reactive: the step at time depends only on the current gradient. That can work, but it has a well-known failure mode in ill-conditioned landscapes (think "long narrow valleys")…
-
Approximating the Softmax Function
Jan 2021
The softmax function is widely used in the output layer of neural-network models for classification. In the binary case, it reduces to the familiar sigmoid mapping. Given a score (logit) vector, the softmax probabilities are In particular, where is the sigmoid function. More generally, softmax can be viewed as normalizing positive weights obtained from log-scale inputs. If we write with, then…
-
新双城记
2025 · 散文
二〇二五年十二月二十五日,圣诞节。我在 101 公路上开着车,雨水难得大到妨碍驾驶的视线。 但对我来说,这一切又是那么理所当然——这不过是英国的日常罢了:在风雨雪雾混杂交织的夜晚,胆战心惊地驶过杳无人烟的山路。除却极端的自然灾害,还能再恶劣否? 二〇二四年八月底,我拿到全职工作的 offer,为"彻底"离开英国挂上了倒计时。学生身份结束后,我从牛津搬到了更乡下的比斯特,住进一个几乎只属于成家之人的社区。门外是偌大的公园,早、中、晚都鲜有人影。 如今回看,那似乎是我性格开始向外偏移之前,最后一次沉浸的隔断。 按理说,疫情期间我已体验过极致的独处,没想到在疫情结束一年多之后,我仍然选择继续收缩自己。比斯特的生活并无太多变化:偶尔去牛津,偶尔去伦敦,更多时候是在公园里走走,在小区里买菜,在家中办公。日子稀松平常,而离别的信号,却在满是不确定性的美签办理中缓慢推进。…
-
登爱城山座
2020 · 古诗词
己亥年 丁丑月 乙亥日 问学爱城 登山名座有感
-
清明
2019 · 古诗词
朔风于山 遥祭吾公 洒雨为酒 出阳入喉 川陉连脉 水浸湖生 此间乾坤 此懿心德
-
春折
2015 · 古诗词
杨柳娇而春溺 腊梅傲而雪藏 世寻弄姿作气 而鲜觅体蕴之香 是匮也
-
暖鼎小记
2014 · 古散文
甲午年甲戌月甲寅日,试术初毕,众人恍恍而皆惫,忽臆巴蜀暖鼎辛香,遂与友约,疲亦忘焉。 向未至其地,乃相与步履,昏然昼夜,曳步迟迟。同行者饥声载道,时色漫漶,惘然略计之,犹余百步。夫食者之欲,行者之竭,盖因而果也。俄见其门,众遂掠步而趋;向之疲惫,慨然顿失。古而蜀道难,天府之邦,猿猱亦愁焉。然若复盛食宴之,酒酣意壮,虽险莫御也。 川人吾友,寻道异乡。既闻之,遂引同行。芽笼也,烟花之地,糜华为乐。而乐者乐,食者食,固无相涉,各行其是也。 至若食者纵,必营古方,而川渝之地无他,燋火油碟,难亡仙味。油碟者,盈之香油,味盐清许,蒜泥香菜,蚝油倾焉。余心窃问,何故盆钵盈腻,无怪乎市井之揶揄也。…
-
老猫
2014 · 散文
老城区的安详,似在入夜之后愈若酒窖开封,夜香萦绕之处,十里静默,与世隔绝。 纵然置身于繁杂闹市、马路轰鸣,可仅需踱步片刻,老城区便像是张开了结界一般,屏蔽了尘碌。倘若依山而傍,蕴籍着自然之力,结界的强度更是令人惊叹。岁逢春回大地,清晨叫醒老城居民的绝非惹人嫌的车笛声,而是清脆悦耳的鸟雀鸣——即使醒来时倦困未去,心里也不会骤起怨意,毕竟又有谁能不折服于这自然的天籁? 闲游老城区的人,形形色色,有晨练买菜等日常琐事,亦有邂逅相伴之眷景佳话。青年人告别灯红酒绿,跨过结界后顿时放松了一直绷紧的弦;小孩子们嬉戏打闹,童真随着老城里掠过的和风洋溢。至于老年人,参天大树的年轮,岁月磨蚀冲刷下的鹅卵石,他们既不在意些什么,亦没有注意些什么。脚底踩着的是安逸随性的步调,却非放肆不羁,仿佛纹路之于木头,迂回之于流水,司空见惯,自是波澜不惊。他们就这样静默地走过小巷,矍铄的精神对抗着颓圮的皱纹,竟衍生出一番与红霞相衬的温润。若不是高大挺拔的高压电线杆出现在画面里,可就真要误以为时光倒流了。…
Experience
- Applied Scientist, Amazon Rufus
- May 2025 — now
- Visiting Researcher, CAMEL-AI.org
- Dec 2024 — now
- Research Associate, University of Oxford
- Apr 2024 — Apr 2025
Open-source
Agent & RL
- CAMELcore
- One of the earliest open-source multi-agent frameworks for LLMs
- Loongcore
- Synthesizing verifiable long-CoT data for reasoning RL training
- strands-envcore
- Gym-like agent environment interface for agentic RL training and evaluation with Strands
- strands-sglangcore
- On-policy agentic rollout infrastructure built on SGLang and Strands
- slimecontrib
- RL post-training framework behind the GLM model family
- OpenEnvcontrib
- Environment interface for agentic RL post-training (Meta PyTorch)
Knowledge Engineering
- HiTcore
- Hierarchy representation learning with language models
- DeepOntocore
- Ontology engineering with language models
- OAEI Bio-MLcore
- Biomedical ontology alignment benchmark
Service
Organizer & Program Chair
- SEA Workshop
- Scaling Environments for Agents — NeurIPS 2025
- ELMKE Workshop
- Evaluation of Language Models in Knowledge Engineering — EKAW 2024, ESWC 2025, ESWC 2026
- OAEI Bio-ML Track
- OAEI biomedical ontology alignment track — ISWC 2022–2024
Education
- DPhil (PhD), Computer Science, University of Oxford
- 2020 — 2024
- BSc (Hons), AI & Mathematics, University of Edinburgh
- 2016 — 2020
Awards
- Excellent Open-sourced Tool (DeepOnto), OpenKG
- 2024
- Best Resource Paper Runner-Up, CIKM [cert]
- 2023
- Best Resource Paper Candidate, ISWC [nom]
- 2022
- Best Research Report Award, ISWS Summer School [cert]
- 2022
- PhD Scholarship, Samsung Research UK
- 2021
- Joint Class Prize (Top 1, BSc AI & Maths), University of Edinburgh [cert]
- 2020
Amateur
- Literature
- Chinese novels, poetry, and prose [selected]
- Music
- Amateur composer, singer, and pianist [pieces]
- Sports
- Member of Oxford University Badminton Club (OuBaC) [photo]
