Presenter: Graham Annett
Computing PhD Student, Computer Science emphasis
Location: In person in CCP 259 or register to attend via Zoom
Abstract: In the realm of reinforcement learning (RL), the vision of developing models capable of generalizing across myriad environments has remained elusive. While the Decision Transformer (DT) and Trajectory Transformer (TT) have made headway, challenges remain, especially in their adaptability to novel tasks and environments without the need for extensive retraining. We introduce an innovative approach that emphasizes action tokenization, borrowing concepts from large language models (LLM). Rather than comprehensively tokenizing the state, action, and reward trajectories, we introduce an action-centric tokenization schema. This has the advantage of retaining the state space in its native form and fosters environment-agnostic model training. Two tokenized action embeddings, namely, ActionTokenizedEmbedding and ActionTokenizedSpreadEmbedding, are explored, providing flexibility and adaptability. Preliminary results show the model’s potential for quick acclimation to new terrains. We also explore how these ideas can be applied to new multimodal foundation models.