We present a data-driven approach that integrates machine learning prediction models with stochastic optimization by utilizing residuals from the learning models. Given a new covariate/contextual observation, the goal is to choose a decision that minimizes the expected objective function conditioned on this observation. We first review a Sample Average Approximation (SAA) approach for approximating this problem that is formed using regression residuals. We then present several extensions and discuss real-world applications. First, in the limited-data regime, we discuss Distributionally Robust Optimization (DRO) variants using Wasserstein distance, sample robust, or phi-divergence based ambiguity sets. Then, we investigate extensions under Decision-Dependent Uncertainty. We then discuss applications of this method in energy and transportation systems using real-world data.