Statistics students win first place in data analytics competition offered by CBA partnered with Imperial PFS
A team consisting of three students in Statistics and one senior in Management became the winner of the inaugural K-State Model-Building Data Competition conducted by the College of Business Administration, in partnership with Imperial PFS.
The purpose of the competition was to create a model to predict whether or not a premium finance loan will cancel. Imperial PFS provided the data, shared information about opportunities in the their industry, and provided judging and selection committee.
The Statistics team, who call themselves dataCats, won the first place prize of $6,000 in scholarships. The team consisted of Huaiyu Zhang and Yiming Li, both graduate students in Statistics, Zhuoya Li, senior in Statistics and Xiyue Zhang, senior in Management.
Professor Haiyan Wang and Associate Professor Perla Reyes, both of the Statistics department, assisted in assembling the winning team.
The competition hosted a total of 32 teams, including both graduate and undergraduate students, from a wide variety of academic disciplines. Each team had approximately one month to develop their model, and the winners were announced on May 3, 2019.
An interview with team member Huaiyu Zhang revealed their winning strategy.
Why did you decide to enter the competition?
For Statistics students, real-data problems are both challenging and exciting. I wanted to get more experience in business analytics to boost my future career.
How did the team come up with the name "dataCats"?
We are wildcats dealing with data.
How did you decide who would be on the team?
We tried to find highly motivated persons. This competition emphasizes business sense, so we also invited Xiyue Zhang from the College of Business.
What did you have to do for the contest?
The goal of the project is twofold: predict the loan cancellation accurately and understand important features contributing to it. In the beginning, we did extensive preprocessing and feature engineering. For a good prediction, we tried several popular classification models including logistic regression, decision trees, random forest, and eventually chose XGBoost, which is a powerful ensemble learning algorithm.
We achieved very good prediction performance with the XGBoost, but encountered difficulty in the interpretation. There is no easy way to estimate the overall effect of each feature in the XGBoost model. We resolved this issue in two ways. First, we deployed the model as a website taking transaction information. It gives real-time prediction for the cancellation probability and the effect of important features for this specific transaction. Second, on the overall level, we fit a logistic regression model consisting of important features to estimate the main effects. These effects are mainly used to understand and evaluate the features on the macro level.
Our model can be applied to some business scenarios: internally the company can use our website to calculate the cancellation risk; externally, it can be deployed as a mobile app for customers to monitor their loan status.
Besides winning the scholarship money, do you think the experience was worthwhile?
Definitely! The competition is very close to the industry setting, where we have the chance to work on an end-to-end project. I also practiced communicating with non-technical audience.
Congratulations to Huaiyu, Yiming, Zhuoya and Xiyue for an innovative strategy to a complex problem.To learn more about the data science programs offered by the Statistics department, visit