# A Model-Based Sampling Selection Method Based on Classical Multivariate Analysis Methods

• 林 翊涵

### 摘要

In survey sampling it takes a certain cost to collect each sampling unit Therefore given a fixed total sample size how to select a more representative sample is the basic problem in sampling theory and application In some sampling designs such as stratified sampling the population is firstly divided into several layers and then the sample size is allocated to each layer However there is a problem in the process of sample size allocation For example sample size in each layer usually is not an integer and in some cases sample size will be zero At that time the sample size in each layer needs to be rounded to the nearest integer or be adjusted Nevertheless the total sample size after rounding and/or adjustment is often not the same as the given one Therefore in previous researches random sampling is used to remove excess sampling units or add missing ones In this research based on canonical correlation analysis in multivariate analysis techniques two model-based sampling strategies are proposed to adjust the sample It extends the sampling methods proposed by Chao (2004) and Chao and Lin (2007) Under the known population covariance matrix and the given total sample size one may use cluster analysis to partition the population into several clusters and do the sample size allocation Then one may utilize principal component analysis to select the within-cluster sample on the basis of the sample size after rounding and/or adjustment Finally select the adjusted sample when the total sample size after rounding and/or adjustment is different from the given one These two sampling designs do not need an exact population distribution but a population covariance matrix and an appropriate sample can be selected for adjustment Multiple simulation studies and three applications of actual data show that proposed sampling strategies perform better prediction results than that of simple random sampling without replacement and methods proposed in the past Under a fixed total sample size selecting sampling units by these two sampling strategies can minimize the mean-squared prediction error No complicated algorithm and intensive computational load are required It is more flexible and simple to implement the procedures in practice
獎項日期 2018 七月 17 English Chang-Tai Chao (Supervisor)

'