Learning From Human-Generated Reward