Cloud (AWS/Azure) and Apache Spark™ Workshop

Cloud (AWS/Azure) and Apache Spark™ Workshop

Feb 26, 2021. 9am - 12pm

Facilitator: Dr. Benjamin Harvey

Just Enough Python/Scala for Apache Spark™

This course aims to help participants with or without a programming background develop just enough experience with Python to begin using the Apache Spark programming APIs.

Apache Spark™ Overview & Programming

Introduction to the Apache Spark architecture, the Data Frames API, and one choice from several electives, covering the fundamentals of the Apache Spark framework.

Apache Spark™ Tuning and Best Practices            

This course offers a deep dive into the processes of tuning Spark applications, developing best practices and avoiding many of the common pitfalls associated with developing Spark applications.

Delta Lake

Directed at those who want to use Delta Lake for ETL processing on data lakes. The course ends with a capstone project building a complete data pipeline using Delta Lake.

Apache Spark™ for Machine Learning and Data Science 

Introduction to the "Spark fundamentals," the "ML fundamentals," and a cursory look at various Machine Learning and Data Science topics with specific emphasis on skills development and the unique needs of a Data Science team through the use of lecture and hands-on labs.

Hands on Deep Learning with Keras, TensorFlow, and Apache Spark™

This course offers a thorough, hands-on overview of deep learning and how to scale it with Apache Spark.

Facilitator Bio: Dr. Benjamin Harvey

Dr. Benjamin Harvey currently serves as the Director of Data Science for Maxar Technologies a Federal Space and Defense Contractor where he leads all data science pursuits for the company.  He serves as a Sr. Research Associate at Johns Hopkins University within the Bloomberg School of Public Health in the Biostatistics Department. He is also currently part-time faculty at George Washington University (GWU) within the Department of Engineering Management and Systems Engineering and Department of Computer Science’s joint Data Analytics graduate program where he teaches Data Science and Big Data Analytics courses. Before joining Maxar, he was a lead Data Scientist and Solutions Architect with Databricks.

Dr. Harvey joined the Silicon Valley startup in Jan 2019 whose founders are the creators of Apache Spark where he leads efforts architecting solutions and developing models their Public Sector and Health and Life Sciences organization. He joined the National Security Agency in 2009 and worked there for a decade where his final position was the Chief of Operations Data Science.  He was hired into the Cryptologic Computer Science Develop Program (CDP), graduated from the CDP in 2012 and was the first African American to accepted and to finish the program.

Dr. Harvey conducted research at Harvard-Massachusetts Institute of Technology (MIT) Division of Health Sciences and Technology (HST) in the Bioinformatics and Integrative Genomics (BIG) program in 2008. He was a Bioinformatics Post-Baccalaureate Research-Fellow in 2009 with i2b2, National Center for Biomedical Computing, Brigham and Women’s Hospital, and Children’s Hospital Boston Informatics Program (CHIP) with an academic appointment at Harvard Medical School. He also completed a Clinical Informatics Research-Fellowship in 2010 at the National Institute of Health (NIH), Clinical Center within the Department of Clinical Research Informatics (DCRI).

Dr. Harvey graduated from Mississippi Valley State University (MVSU) in 2008 with a B.S. in Pre-Medicine & Computer Science. He received a Master of Science in Computer Science from Bowie State University in 2011 and Doctor of Science in Computer Science from Bowie State University (BSU) in 2015. His dissertation was entitled “Cloud Scale Genomic Signals Processing for Robust Microarray Data Analysis” and was done in conjunction with and advised by Vince Carey, Harvard Professor and Dr. Soo-Yeon Ji.  He also holds a Cryptologic Computer Science Certificate from the University of Maryland Baltimore County (UMBC) from a dual NSA program with the Naval Post-Graduate School (NPS).