Tag: Random Policy Valuation for Diverse Reasoning