PHD BBL:The big impact of data science on manufacturing intelligence,product development, and business analytics.
Published At:2020-02-27
Views:1122 2021-09-23 updated
I am very grateful to Professor Shu-Jung Yang for inviting Professor Chih-Hsuan Wang from Jiaotong University to the Business Research Institute for a wonderful speech. The main contents of the speech are as the following.
During the 10 years from 1997 to 2007, my classmates have successively entered the industry, but I still remained committed to academic research. This path is not easy, especially when I was studying at the Business Research Institute for my PhD, I kept thinking about how management is different from electrical engineering, which I have learned in the past.
Professor Wang's lectures was about data mining and data science. Data mining begins with data sources. After data processing, the data becomes explainable outcomes. The so-called database is different from data warehouse. Database is OLTP (on-line transaction processing), data warehouse is OLAP (On-Line Analytical Processing). However, is it a rational to be data scientist? Or is it emotional?
The skills that a data scientist must have are: big data cloud computing, storytelling and communication skills...etc. Compared to engineering data, spatio-temporal data is less structural meanwhile being time-delayed. It is more complicated if text mining is applied to the Chinese system. The attribute value of data mining has the following characteristics: distinctness, order, addition, and multiplication, among which are the name attribute, order attribute, ratio attribute, and interval attribute. Functional definitions include clustering, association, classification, regression, and anomaly detection.
According to IEEE ICDM in 2006, the top 10 data mining methods are: C4.5, CART, K-means, etc. The top 10 methods used by data scientists are called regression, the second is clustering, and the third is called Decision tree/rules..etc. It is also mentioned here that the Mahalanobis distance is a statistical method that can accurately find the outliers of the data distribution. Under the concept of linear regression, using the covariate matrix and the mean of the data distribution to discover the generation of extreme values, can make a group of data systems robust. For data mining, it is important to find the association rules, such as the relationship between diapers and beer. Some examples of classifications are using decision trees to classify different damage patterns of wafers. There are also many challenges in the data classification and regression relationship. The source of big data comes from trading, interaction, and observation. It is important to learn how to use R language. It is equally important to have relevant background knowledge, such as making basic assumptions and setting research goals.
Finally, we can think about the differences between practitioners and scholars, as well as the topic of statistical learning and machine learning, and then think about what our future will look like.
Thanks again to Professor Chih-Hsuan Wang for coming to the lecture, bringing a lot of thoughts and discussions, which are of great inspiration to us! I would also like to thank Professor Shu-Jung Yang again for his dedication to the study of doctoral students and for arranging this great academic event!