Predicting Walmarts performance

Overview

My Role
Data Visualization
Data Analysis
Team
1 Designer
1 Analyst
Platforms
Desktop
Tools
Python
HTML5
CSS3
JavaScript (D3.js)

This project presents an analysis of Walmart's store locations and performance, combining Python for data analysis with data visualizations created using HTML, CSS, and JavaScript (D3.js). It offers insights into Walmart's geographical spread, sales trends, and market strategy through engaging charts and maps.

Walmart's stores location

In my initial analysis, I utilized JavaScript (D3.js) to craft visualizations that mapped Walmart's store locations throughout the United States, employing color coding to distinguish between the age of the stores. The density heatmap overlay revealed a notable concentration of stores in the Northeastern U.S., particularly in urban and suburban areas with high population densities.

Oldest Stores in Darker Colors / Youngest Stores in Lighter Colors

Code Block

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Walmart Store Locations</title>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.1/d3.js" integrity="sha512-AgBNE62tb+Yt5OqXL4Kxedg9/Azp6uYFazo0mumMiLwnMTfhB5oPh/O3AnMgUpsRgfWfZrOY0z3cK9KOAm8lKA==" crossorigin="anonymous" referrerpolicy="no-referrer"></script>
</head>

<style>

  .nation {
    fill: #ddd;
  }
  
  .states {
    fill: none;
    stroke: #fff;
    stroke-linejoin: round;
  }
  
  .hexagon {
    stroke: #fff;
  }
  
</style>

<body>

  <svg width="960" height="600"></svg>
  <script src="https://d3js.org/d3.v4.min.js"></script>
  <script src="https://d3js.org/d3-hexbin.v0.2.min.js"></script>
  <script src="https://d3js.org/topojson.v2.min.js"></script>
  <script>
  
  var svg = d3.select("svg"),
      width = +svg.attr("width"),
      height = +svg.attr("height");
  
  var parseDate = d3.timeParse("%x");
  
  var color = d3.scaleTime()
      .domain([new Date(1962, 0, 1), new Date(2006, 0, 1)])
      .range(["black", "steelblue"])
      .interpolate(d3.interpolateLab);
  
  var hexbin = d3.hexbin()
      .extent([[0, 0], [width, height]])
      .radius(10);
  
  var radius = d3.scaleSqrt()
      .domain([0, 12])
      .range([0, 10]);
  
  var projection = d3.geoAlbersUsa()
      .scale(1280)
      .translate([480, 300]);
  
  var path = d3.geoPath();
  
  d3.queue()
      .defer(d3.json, "https://d3js.org/us-10m.v1.json")
      .defer(d3.csv, "https://assets-global.website-files.com/63006383914e30c520fa33c3/65cd099e9a0de55d0f90bf80_walmart.csv", typeWalmart)
      .await(ready);
  
  function ready(error, us, walmarts) {
    if (error) throw error;
  
    svg.append("path")
        .datum(topojson.feature(us, us.objects.nation))
        .attr("class", "nation")
        .attr("d", path);
  
    svg.append("path")
        .datum(topojson.mesh(us, us.objects.states, function(a, b) { return a !== b; }))
        .attr("class", "states")
        .attr("d", path);
  
    svg.append("g")
        .attr("class", "hexagon")
      .selectAll("path")
      .data(hexbin(walmarts).sort(function(a, b) { return b.length - a.length; }))
      .enter().append("path")
        .attr("d", function(d) { return hexbin.hexagon(radius(d.length)); })
        .attr("transform", function(d) { return "translate(" + d.x + "," + d.y + ")"; })
        .attr("fill", function(d) { return color(d3.median(d, function(d) { return +d.date; })); });
  }
  
  function typeWalmart(d) {
    var p = projection(d);
    d[0] = p[0], d[1] = p[1];
    d.date = parseDate(d.date);
    return d;
  }
  
  </script>

</body>
</html>

Data Analysis

I plan to utilize data from Kaggle to explore the impact of specific factors—temperature, fuel prices, Consumer Price Index (CPI), and unemployment rates—on store performance.

Data Exploration

I began with visualizing the distribution patterns. The distribution of Weekly Sales is right-skewed, which is expected as sales can peak during certain periods. Temperature and Unemployment exhibit a normal distribution. Conversely, CPI and Fuel_Price display a bimodal distribution.

Bar Chart: Top-Performing Stores

I also aimed to identify the highest-performing stores. The bar chart reveals that stores 19, 4, and 14 rank as the top three in performance, whereas stores 33, 4, and 5 are at the bottom of the list.

Line Graph: Top-Performing Months

Similarly, I aimed to analyze the monthly data to determine which months are the best performing for Walmart. The graph demonstrates that December is the top performing month for Walmart.

In Depth Analysis - Predicting Performance

After my initial exploration of the data, I decided to delve deeper into a more comprehensive analysis to attempt predicting future performances. I began by generating a correlation heatmap to see how strongly pairs of variables are related to each other.

Additional Notes

This map makes it easier to spot which factors are most likely to influence each other, helping in understanding complex data relationships. (The closer a factor is to 1 the stronger the correlation).

Machine Learning Forecast

Bearing this analysis in mind, I opted to develop a machine learning model aimed at predicting future store performance. The model's output, represented in a graph, suggests that the trend of store performance is expected to continue its upward trajectory over time.

Conclusion

In conclusion, my analysis began with the visualization of Walmart's store distribution across the United States, highlighting a significant concentration in the Northeast through the use of color coding to represent the age of the stores and a density heatmap to pinpoint areas of high store density. This initial exploration paved the way for a deeper investigation into store performance, leveraging a structured approach in my methodology to circumvent the hurdles typically associated with starting analyses from scratch. By employing a clear, keyword-based framework for generating ideas and insights, I effectively streamlined the process, setting a solid foundation for the development of a machine learning model aimed at predicting future store performance. The predictive model's outcomes indicate an optimistic trend of increasing performance, underscoring the value of structured, data-driven analysis in retail strategy and operations.

Next Project