3  Results

3.2 Distribution of Churn

First, let’s use barplots to have a look into the distributions of Churn Category and Churn Reason.

Code
ggplot(cleaned_data, aes(x = fct_infreq(`Churn Category`))) +
  geom_bar(fill = "steelblue", color = "black") +
  labs(title = "Distribution of Churn Categories", x = "Churn Category", y = "Count") +
  theme_minimal()

From the barplot for Churn Category, we can see that the biggest category is competitor, which is way higher in count than all of the other categories. It suggests that the most reason for customer churn is related to the competition among businesses. We can further verify this by checking the frequency for the specific reasons for customer churn as follows.

Code
cleaned_data$`Churn Reason_lumped` <- fct_lump_n(
  cleaned_data$`Churn Reason`, n=12, other_level = 'Other')

ggplot(cleaned_data, aes(x = fct_infreq(`Churn Reason_lumped`))) +
  geom_bar(fill = "steelblue", color = "black") +
  labs(title = "Distribution of Churn Reasons", x = "Churn Reason", y = "Count") +
  theme(axis.text.x = element_text(angle = 65, hjust = 1))

Since we have many unique reasons in Churn Reason variable, we used lump to show the most 12 frequent reasons in the graph, and the rest of the reasons fall into reason other. From the graph, we can see that there are 4 competitor related reasons in the most 7 frequent reasons, including better devices, better offer, more data, and higher download speeds. Attitude of support person and provider are also frequent reasons, which is in accordance with the graph for Churn Category.

3.3 Influences on Churn Categories

Now, let’s introduce three important factors into discussion: Internet Type, Contract Type, and Unlimited Data (Yes or No). In hypothesis, one’s churn reason may be influenced by a company’s plan. We will create 3 stacked bar charts to see how Churn Category is influenced by those 3 factors.

Code
# Internet Type vs Churn
ggplot(cleaned_data, aes(x = `Internet Type`, fill = `Churn Category`)) +
  geom_bar(position = "fill") +
  labs(title = "Internet Type vs Churn", x = "Internet Type", y = "Proportion") +
  theme_minimal()

Code
# Contract Type vs Churn
ggplot(cleaned_data, aes(x = `Contract.y`, fill = `Churn Category`)) +
  geom_bar(position = "fill") +
  labs(title = "Contract Type vs Churn", x = "Contract Type", y = "Proportion") +
  theme_minimal()

Code
# Unlimited Data vs Churn
ggplot(cleaned_data, aes(x = `Unlimited Data`, fill = `Churn Category`)) +
  geom_bar(position = "fill") +
  labs(title = "Unlimited Data vs Churn", x = "Unlimited Data", y = "Proportion") +
  theme_minimal()

From the 3 stacked bar charts, we can see that for most of the plans, competitor is the most category for a customer to churn. This suggests that no matter for what plans a customer is paying, they would all check other companies’ plans and switch to the other if they find it more favorable. However, there is one exception: when customers choose to have None for internet type, supporter’s or provider’s attitudes tend to be the most reason for their churn. Price also becomes a better reason than competitor. This suggests that customers who do not care about internet in their service may pay more attention to service attitude and prices.

3.5 Geographic Variables and Churn

Code
ggplot(cleaned_data, aes(x = City, fill = `Churn Category`)) +
  geom_bar(position = "fill") +
  coord_flip() +
  labs(title = "City vs Churn", x = "City", y = "Proportion") +
  theme(axis.text.y = element_text(size = 1))

Code
ggplot(cleaned_data, aes(x = State, fill = `Churn Category`)) +
  geom_bar(position = "fill") +
  labs(title = "State vs Churn", x = "State", y = "Proportion") +
  theme_minimal()

The two barplots provide insights into churn categories across cities and states. The first graph, “City vs Churn,” shows the proportion of each churn category for various cities. From this graph, we can observe that “Competitor” churn is quite widespread across many cities, with proportions close to 1 in several cities. However, cities such as “La Cruz” and “Hollis” show a more varied distribution, with churn reasons like “Attitude” and “Price” taking more prominence. This indicates that while competitor-related churn dominates in many locations, other reasons such as dissatisfaction or price issues are more significant in certain areas, highlighting regional variations in the reasons behind customer churn.

The second graph, “State vs Churn,” is focused on the state of California. It reveals that all churn categories—“Attitude,” “Competitor,” “Dissatisfaction,” “Other,” and “Price”—are relatively evenly distributed in California, though “Competitor” churn seems the most common, followed by “Attitude.” This barplot suggests that churn in California may not be as dominated by one specific category as it is in some cities. Instead, the distribution of churn reasons here appears to be more balanced, with several factors contributing to customer churn at similar proportions.

Together, these graphs suggest that while competition-related churn is a dominant factor across both cities and states, regional variations exist in churn reasons. In some cities, factors such as attitude or price issues may play a more substantial role in driving churn, while in California, churn seems to be driven by a more balanced set of reasons. This insight could help businesses tailor their strategies depending on regional differences in customer behavior.