Project Requirements
This project involves comprehensive data analysis and visualization based on the given theme. The requirements are as follows:
- Dataset Collection: Find and use relevant datasets that align with the theme.
- Data Visualization: Create four visualizations:
- Three descriptive visualizations to highlight trends, distributions, and relationships.
- One predictive visualization based on a regression or classification model.
- Data Storytelling: Provide meaningful insights and narratives for each visualization.
- Predictive Model: Build and evaluate a regression or classification model. Include the following:
- Clear explanation of the model choice.
- Details of the features used.
- Performance metrics and interpretation.
- Presentation: Use GitHub Pages to host and showcase the project, ensuring a professional layout and easy navigation.
Themes for Data Science Term Projects
Please select one of the following themes for your term project. Your project should focus on exploring datasets related to the chosen theme and addressing specific, well-defined engineering or data-driven research questions.
- Supply Chain & Operations Management
Analyze the flow of goods, materials, and information to improve efficiency, resilience, and sustainability. This theme is ideal for industrial engineering applications.
Possible Datasets: Search Kaggle for "supply chain" or "logistics" datasets. Look for public trade data (e.g., UN Comtrade) or freight/shipping data. Large forecasting datasets (e.g., from Walmart) are also relevant.
- Energy Systems & Grid Optimization
Study the *operational* side of energy, focusing on forecasting, optimization, and maintenance for power generation and distribution.
Possible Datasets: Find data from government portals like the US Energy Information Administration (EIA) or the National Renewable Energy Laboratory (NREL). Look for smart meter data or combine power generation data with weather data from NOAA.
- Engineering Project Management & Financial Risk
Apply data science to the business and management side of engineering. Investigate the *economics* of engineering projects, from cost estimation to risk analysis.
Possible Datasets: Use the Federal Reserve Economic Data (FRED) to find the "Producer Price Index" (PPI) for commodities (steel, lumber). Check the World Bank or IMF for global commodity prices and infrastructure project data.
- Summer Olympics: Systems & Performance
Use the Olympics as a rich dataset to analyze human performance or the complex logistics and economic impact of hosting a global event.
Possible Datasets: Kaggle is the best resource here. Search for "Olympics" to find historical data (120+ years) and datasets for recent games (Tokyo 2020, Paris 2024). Combine this with economic data (e.g., from FRED) for host cities.
- Urban Systems & Smart Cities
Focus on the intersection of civil and systems engineering with data. Use data from urban environments to optimize services, infrastructure, and quality of life.
Possible Datasets: Explore your city's open data portal (e.g., for traffic, transit, 311 calls, or parking). The NYC Taxi & Limousine Commission dataset is a classic. Look for GTFS (General Transit Feed Specification) data for public transit schedules.
- Climate Change & Environmental Engineering
Analyze the physical and environmental data related to climate change to model its impacts and inform engineering solutions.
Possible Datasets: Use NOAA's Climate Data Online for weather station data. Get air quality data from the EPA's AirData (AQS) portal. For geospatial analysis, use NASA's Earthdata (e.g., FIRMS for wildfires) or Google Earth Engine.
Instructions
- Deliverable 1:
- Select one theme from the above list for your term project.
- Formulate three research questions related to your chosen theme.
- Identify and select relevant datasets that will help you answer your research questions.
- Submit your three research questions along with the dataset selections.
- Create a GitHub repository for your project and make a GitHub Page available. At this stage, the GitHub Page can be empty; you only need to provide the link.
- Include a
README.mdfile in the repository explaining the details of your work and including student names with student IDs.
- Deliverable 2:
- Complete Python codes should be placed in a
scriptsfolder in your GitHub repository. - Visualizations should be saved in a
visualsfolder in your GitHub repository. - Develop a working website hosted via GitHub Pages that showcases your project.
- Update the
README.mdfile in the repository to explain the details of your work and include student names with student IDs.
- Complete Python codes should be placed in a