## Library Open Repository

# Statistical methodology for tab-charts : data reduction techniques in laser ablation analyses

### Downloads

Downloads per month over past year

Yung, Chi Ho
(2008)
*Statistical methodology for tab-charts : data reduction techniques in laser ablation analyses.*
Research Master thesis, University of Tasmania.

PDF
(Whole thesis)
whole_YungChiHo...pdf | Request a copy Full text restricted Available under University of Tasmania Standard License. |

## Abstract

The researchers in the CODES and the School of Earth Sciences operate a laboratory to study

the composition of rock samples, which are collected from the field site. The rock samples are put into

a machine. This machine will create series plots of all elements (called tab-chart), which indicate the

distribution of elements in the samples. In the tab-chart, a significant signal change implies the change

of composition in the sample and a flat part implies a mineral layer (phase) existing in the sample.

Currently, the researchers identify these properties by their knowledge and experience. In some

situations, they are difficult to make their judgement on these properties since they are not obvious and

clear. Thus, an automatic and systematic method is requested to help them to solve this problem.

Total 1848 (= 66 samples x 28 elements in each sample) tab-charts of primary and secondary

samples are provided by the School of Earth Sciences. These primary and secondary samples are not

real and created in the laboratory. These tab-charts have the shape of background noise (a flat part) at

the first stage, jump (a significant signal change) at the second stage, plateau (another flat part) at the

third stage and drop (another significant signal change) at the last stage. Although this project focus on

the standard samples only, the analysis and results can be extended to the real samples. The first four

chapters of this project explain and describe the equipments of the laboratory, the mechanism and

process of geological analysis on the sample and tab-chart description. These chapters provide the

knowledge for reference only and not the main interest in this project. This project is to focus and

concentrate on the mathematical analysis on the tab-chart.

The problem mentioned in the first paragraph is actually a change point analysis (detection) or

time series segmentation issue in mathematics and statistics. Many methods are presented and invented

to solve this problem in the papers. They include cumulative sums of difference (CUSUM),

perceptually important points (PIP), fuzzy set theory and genetic algorithm and so on. To my best

knowledge, most of these methods focus on point change detection only. On the top of this detection, a

method is expected that it can also provide the researchers about the statistical summaries of the flat

part in the tab-chart (i.e. the mean, standard deviation and trend of element amount in the layer).

Therefore, time series model could be considered and a good choice to achieve the above two targets.

In addition, time series model has the advantage that it is easily implemented in the worksheet.

However, some algorithms and modifications are needed to make such that the time series model can

identify any point change in the tab-chart.

Among various time series models, the linear Holt exponential smoothing model is selected in

this project. This selection is made after considering and comparing some common and popular time

series models. For simple exponential smoothing model, it has one estimate or equation (i.e. smooth)

and one parameter (i.e. a) only. This model is rejected because signal changes (i.e. jump and drop)

have large slope but flat parts (i.e. background and plateau) have gentle slope. One estimate is not

enough to reflect this slope property of the tab-chart. For damped-trend linear exponential smoothing

model, it has two estimates or equations (i.e. smooth and trend) and three parameters (i.e. a and p and

y). Although two estimates are enough to reflect the slope property, three parameters may complicate

the problem analysis and there is another better choice, linear Holt exponential smoothing model. This

model also has two estimates (two equations) but has two parameters only (i.e. a and p). It is not

guarantee that this model is the best choice, but any results and findings from this model can help to

explore other time series model in the further studies.

The linear exponential smoothing model is modified before trying to fit it to the tab-chart. The

modified model has the variable (dynamic) parameters, a and p, and a threshold value, T. In the fitting

process, if the trend estimate of the model exceeds the threshold value, the variable parameters will

take values al and Pl. Otherwise, they will take another values a2 and p2. The reason of using this

policy is based on the difference of slope between significant signal changes (i.e. background and

plateau) and flat parts (i.e. jump and drop). The parameters and the threshold value are adjusted

manually until the model is fitted well to the tab-chart. After fitting the model to all tab-charts, it

discovered that the threshold value T is more influential and important in finding the well-fitted model

than the two parameters a and p. Besides, the fitted curve of the model is spiky when the threshold

value is small but becomes smooth when the value gets larger. There is a remark that the above

parameters policy is only an initial trial and not perfect, the experiment result will reflect and reveal

what is the drawback and disadvantage.

For convenience, henceforth HOLT model is named for the above modified linear smoothing

model. The first algorithm of detecting the point change (or time series segmentation) by HOLT model

is called classification method. The main idea of the algorithm is explained at the following. The

change of the variable parameters (i.e. a and p) indicates the stage change in the tab-chart (i.e. from

significant signal change to flat part or vice versa). For example, the values of parameters change from

(al, PI) to (a2, P2) as the tab-chart moves from background (stage) to jump (stage) in standard sample.

This classification method is not practical for the researchers and not automatic because it needs the

human adjustment of parameters beforehand. However, this method has two purposes. Firstly, this

method is a way to develop another automatic classification method. Secondly, this method is used as a

tool to analysis the data reduction and fitted error of HOLT model.

The second algorithm of detecting the point change by HOLT model called classification

rules. This algorithm is an automatic method because it uses a set of rules to divide the tab-chart into

different stages. The classification rules are developed from classification method in the following way.

After the tab-chart is well fitted by the HOLT model, the graphs of trend estimate versus smooth

estimate are plotted for all tab-charts. The rules are drawn by comparing the trend-smooth graphs with

the tab-charts. The background stage of the tab-chart has small trend and smooth values in the trend-

smooth graphs. The jump stage of the tab-chart has larger trend and smooth values. The plateau stage

of the tab has larger smooth values but small trend values. The drop stage of the tab-chart has negative

and large trend values. The performance of the classification rules are verified and tested by applying

to the tab-charts of standard samples. Some guidelines of evaluating the performance are made to

minimize the personal and biased judgement. Different person possibly has different judgement and

view on some borderline cases. The classification rules have the successful rate ranging from 45% to

80% in various elements. However, after excluding the tab-charts of having close background and

plateau, the successful rate of the classification rates will be at least 65%. This project provides the

good method and platform of identifying the point change. One promising way of improving the

performance is to refine and modify the rules. Although the rules seem to play more important role in

the performance than the parameters (i.e. a and ~). an experiment should be carried out to investigate

any effect of the parameters on the performance.

For both classification method and rules, there is a problem of misclassification. This problem

is that the classification and rules have the terrible performance in some tab-charts. In other words,

there are a lot of observations in these tab-charts being wrongly classified. However, all these tab- .

charts have the property that the level of background and plateau are very close. Thus, this is possibly

the cause of misclassification but the gentle signal change (i.e. jump or drop having gentle slope) is

another possible cause. Anyway, this problem gives us a hint to improve the performance of both

method and rules. For example, another set of rules is needed to tackle these tab-charts. Besides,

classification method is better than classification rules because rules are hardly to replace the human

visual judgement. However, classification rules are automatic and more practical than classification

method.

Data reduction is another characteristic of the HOLT model. The model is capable of

removing the noise or variation from tab-chart. The researchers can gain the useful information by

observing the fitted curve (values) of the model. Apart from the graphs, quantitative analysis is also

included in this thesis to help us to understand the data reduction in another angle. The square-root of

the sample size formula is used to calculate the standard error over the background and plateau. This

formula has the assumption of independent data. Although many tab-charts have the autocorrelation

because the ARIMA model can be fitted to them, this violation does not cause the problem of using the

formula. The formula is not used to estimate the statistical summaries of the underlying process. It is

used to measure the degree of variation and fluctuation in the background and plateau. Over the

plateau, the standard error of HOLT model (i.e. mean of fitted curve or values) is smaller than that of

tab-chart (i.e. mean of the actual observation). This supports that the HOLT can reduce the variation

over the plateau. However, the situation is reversed over the background. The standard error of HOLT

model over background is greater than that of tab-chart over background. It implies that the HOLT

model has difficult in background and plateau of small signal (i.e. trace amount of element). A constant

trend (i.e. more smooth fitted curve) should be used on the background by choosing the appropriate

parameters. In other words, the background (or plateau of small signal) should use different a and ~

parameters.

After fitting the time series model to the raw data, the analysis of the fitted error should also

be provided. From the analysis result, the fitted error of HOLT model increases as the level of plateau

increases. The fitted error of the model decreases as the mass of the element increases. It is because the

amount of heavy elements is smaller than the amount of light elements in the sample. Therefore, the

only factor of affecting the fitted error is the level of plateau. This result implies that the standard

deviation of fitting error of element's concentration can be minimized if the background noise can be

controlled or minimized. Thus, the control of background noise can help to estimate the trace of the

element in the sample more accurate but it is not capable of delivering only significant improvement to

the major element in the sample.

For a comprehensive analysis, ARIMA model should also be fitted to the tab-chart. Since

there are a lot of tab-charts to be fitted, a policy is devised to speed up the ARIMA model fitting. The

procedures of fitting the ARIMA have the following three steps. The first step is to check the tab-chart

is stationary. If it is not stationary, differencing will be carried out on the background of the tab-chart.

The second step is fit AR(1), AR(2), MA(1), MA(2) and ARMA(1,1) to the background, the best

model is chosen by the lowest MSE and significant parameters. If the five models cannot be fitted to

the background, it may be random walk or has another ARIMA model. The third step is to use the

above two steps to fit the ARIMA to the plateau of the tab-chart. After the model fitting, the result is

obtained at the following. Over the background, no differencing is needed and most of the tab-charts

are ARMA(1,1), random walk or inconclusive. Over the plateau, most of the plateaus having upward or

downward trend are ARIMA(0,1,1), whereas most of the plateaus having horizontal trend are AR(1) or

ARMA(1,1). The other plateaus are random walk or inconclusive. The above result also indicates that

most of the tab-charts have the autocorrelation problem.

In the method (model) selection, there are several reasons of not choosing ARIMA model to

tackle the researchers' problem. Firstly, the equation and structure of ARIMA is too complicated to be

modified to identify the point change. Secondly, although simple exponential smoothing model is a

special case of ARIMA model (equivalent to ARIMA(0,1,1) model without constant term), this model

is not accepted in the selection. The reason is already mentioned in the previous paragraph. Thirdly, the

researchers are looking for an automatic and fast method to extract the information from the tab-chart,

ARIMA model seems to be not a practical method for automated use. However the ARIMA can be

used to detect the autocorrelation existing in the tab-chart before applying the HOLT model. The preprocess

on the tab-chart could be done if the autocorrelation is serious and the precise estimate is

required to make the judgement on the tab-chart or sample.

There are some inadequate places in this project and more work is needed on these places in

further studies. Firstly, the classification rules are relatively approximated and should be refined. More

advanced mathematical techniques could be employed to devise better classification rules. Secondly,

the other models should also be explored and investigated such that they have better performance in

change point detection and the researchers can gain more information from the tab-chart via these

models, for example, locally weighted regression. Thirdly, this project does not have enough work on

studying the parameters of the HOLT model. One way is to investigate the impact of parameters on the

performance of classification rules because only one set of parameters is used in this analysis. Another

way is to investigate the relationship between parameters and data reduction because the data reduction

does not work on the background or plateau of small signal. Lastly, some studies should be done to

tackle the tab-chart having the autocorrelation and influence of autocorrelation on the statistical

estimate on the tab-chart by HOLT model.

Moreover, there are several directions worthwhile to be considered in the long-term goal.

Firstly, the classification rules can be extended and applied to the tab-chart of multiple significant

signals (i.e. jumps and drops) and multiple flat parts (i.e. backgrounds and plateaus). Secondly, when

studying the standard samples in this project, the tab-chart of each element in a sample is investigated

independently and separately. However, the mineral in the real sample is a chemical compound of

elements. Therefore, the change of composition in real samples will involve the investigation of more

than one tab-chart. Thirdly, the knowledge (i.e. model and method) of this thesis is not limited only on

the composition investigation of minerals or rock samples. It can be generalized and applied to other

areas (i.e. charts in other problem). Lastly, the HOLT model and other methods of change point

analysis should be compared, especially their performances. Since the data reduction of HOLT model

is to trace the trend of the tab-chart and remove the variation or noise, the HOLT model may be

incorporate into other methods to get better performance.

Item Type: | Thesis (Research Master) |
---|---|

Copyright Holders: | The Author |

Copyright Information: | Copyright 2008 the Author - The University is continuing to endeavour to trace the copyright owner(s) and in the meantime this item has been reproduced here in good faith. We would be pleased to hear from the copyright owner(s). |

Additional Information: | Available for library use only and copying in accordance with the Copyright Act 1968, as amended. Thesis (MSc)--University of Tasmania, 2008. Includes bibliographical references |

Date Deposited: | 04 Feb 2015 23:34 |

Last Modified: | 11 Mar 2016 05:53 |

Item Statistics: | View statistics for this item |

### Actions (login required)

Item Control Page |