Authors - Elmar B. Noche, Randy Joy M. Ventayen Abstract - Breast cancer remains one of the leading causes of mortality among women, highlighting the need for reliable survival prediction tools. This study applied big data analytics and Cox Proportional Hazards regression to the METABRIC dataset, which contains clinical, pathological, and genomic records from over 2,000 breast cancer patients. Hadoop HDFS was used for distributed storage, while PySpark supported preprocessing and data transformation. After feature selection, six significant predictors were identified: inferred menopausal state, Nottingham Prognostic Index, oncotree code, type of breast surgery, cohort classification, and tumor size. The findings show that combining Hadoop-based infrastructure with interpretable survival modeling can support patient risk stratification, treatment planning, and precision oncology.