Curating and Evaluating a High-Quality Dataset from YouTube for Training Vision Foundation Models