A Data-Centric Approach To Identifying 404 Pages Worth Saving