Tag: basic attention token prediction