Open Access Repository

Statistical methodology for tab-charts : data reduction techniques in laser ablation analyses

Downloads

Downloads per month over past year

Yung, Chi Ho (2008) Statistical methodology for tab-charts : data reduction techniques in laser ablation analyses. Research Master thesis, University of Tasmania.

[img] PDF (Whole thesis)
whole_YungChiHo...pdf | Request a copy
Full text restricted
Available under University of Tasmania Standard License.

Abstract

The researchers in the CODES and the School of Earth Sciences operate a laboratory to study
the composition of rock samples, which are collected from the field site. The rock samples are put into
a machine. This machine will create series plots of all elements (called tab-chart), which indicate the
distribution of elements in the samples. In the tab-chart, a significant signal change implies the change
of composition in the sample and a flat part implies a mineral layer (phase) existing in the sample.
Currently, the researchers identify these properties by their knowledge and experience. In some
situations, they are difficult to make their judgement on these properties since they are not obvious and
clear. Thus, an automatic and systematic method is requested to help them to solve this problem.
Total 1848 (= 66 samples x 28 elements in each sample) tab-charts of primary and secondary
samples are provided by the School of Earth Sciences. These primary and secondary samples are not
real and created in the laboratory. These tab-charts have the shape of background noise (a flat part) at
the first stage, jump (a significant signal change) at the second stage, plateau (another flat part) at the
third stage and drop (another significant signal change) at the last stage. Although this project focus on
the standard samples only, the analysis and results can be extended to the real samples. The first four
chapters of this project explain and describe the equipments of the laboratory, the mechanism and
process of geological analysis on the sample and tab-chart description. These chapters provide the
knowledge for reference only and not the main interest in this project. This project is to focus and
concentrate on the mathematical analysis on the tab-chart.
The problem mentioned in the first paragraph is actually a change point analysis (detection) or
time series segmentation issue in mathematics and statistics. Many methods are presented and invented
to solve this problem in the papers. They include cumulative sums of difference (CUSUM),
perceptually important points (PIP), fuzzy set theory and genetic algorithm and so on. To my best
knowledge, most of these methods focus on point change detection only. On the top of this detection, a
method is expected that it can also provide the researchers about the statistical summaries of the flat
part in the tab-chart (i.e. the mean, standard deviation and trend of element amount in the layer).
Therefore, time series model could be considered and a good choice to achieve the above two targets.
In addition, time series model has the advantage that it is easily implemented in the worksheet.
However, some algorithms and modifications are needed to make such that the time series model can
identify any point change in the tab-chart.
Among various time series models, the linear Holt exponential smoothing model is selected in
this project. This selection is made after considering and comparing some common and popular time
series models. For simple exponential smoothing model, it has one estimate or equation (i.e. smooth)
and one parameter (i.e. a) only. This model is rejected because signal changes (i.e. jump and drop)
have large slope but flat parts (i.e. background and plateau) have gentle slope. One estimate is not
enough to reflect this slope property of the tab-chart. For damped-trend linear exponential smoothing
model, it has two estimates or equations (i.e. smooth and trend) and three parameters (i.e. a and p and
y). Although two estimates are enough to reflect the slope property, three parameters may complicate
the problem analysis and there is another better choice, linear Holt exponential smoothing model. This
model also has two estimates (two equations) but has two parameters only (i.e. a and p). It is not
guarantee that this model is the best choice, but any results and findings from this model can help to
explore other time series model in the further studies.
The linear exponential smoothing model is modified before trying to fit it to the tab-chart. The
modified model has the variable (dynamic) parameters, a and p, and a threshold value, T. In the fitting
process, if the trend estimate of the model exceeds the threshold value, the variable parameters will
take values al and Pl. Otherwise, they will take another values a2 and p2. The reason of using this
policy is based on the difference of slope between significant signal changes (i.e. background and
plateau) and flat parts (i.e. jump and drop). The parameters and the threshold value are adjusted
manually until the model is fitted well to the tab-chart. After fitting the model to all tab-charts, it
discovered that the threshold value T is more influential and important in finding the well-fitted model
than the two parameters a and p. Besides, the fitted curve of the model is spiky when the threshold
value is small but becomes smooth when the value gets larger. There is a remark that the above
parameters policy is only an initial trial and not perfect, the experiment result will reflect and reveal
what is the drawback and disadvantage.
For convenience, henceforth HOLT model is named for the above modified linear smoothing
model. The first algorithm of detecting the point change (or time series segmentation) by HOLT model
is called classification method. The main idea of the algorithm is explained at the following. The
change of the variable parameters (i.e. a and p) indicates the stage change in the tab-chart (i.e. from
significant signal change to flat part or vice versa). For example, the values of parameters change from
(al, PI) to (a2, P2) as the tab-chart moves from background (stage) to jump (stage) in standard sample.
This classification method is not practical for the researchers and not automatic because it needs the
human adjustment of parameters beforehand. However, this method has two purposes. Firstly, this
method is a way to develop another automatic classification method. Secondly, this method is used as a
tool to analysis the data reduction and fitted error of HOLT model.
The second algorithm of detecting the point change by HOLT model called classification
rules. This algorithm is an automatic method because it uses a set of rules to divide the tab-chart into
different stages. The classification rules are developed from classification method in the following way.
After the tab-chart is well fitted by the HOLT model, the graphs of trend estimate versus smooth
estimate are plotted for all tab-charts. The rules are drawn by comparing the trend-smooth graphs with
the tab-charts. The background stage of the tab-chart has small trend and smooth values in the trend-
smooth graphs. The jump stage of the tab-chart has larger trend and smooth values. The plateau stage
of the tab has larger smooth values but small trend values. The drop stage of the tab-chart has negative
and large trend values. The performance of the classification rules are verified and tested by applying
to the tab-charts of standard samples. Some guidelines of evaluating the performance are made to
minimize the personal and biased judgement. Different person possibly has different judgement and
view on some borderline cases. The classification rules have the successful rate ranging from 45% to
80% in various elements. However, after excluding the tab-charts of having close background and
plateau, the successful rate of the classification rates will be at least 65%. This project provides the
good method and platform of identifying the point change. One promising way of improving the
performance is to refine and modify the rules. Although the rules seem to play more important role in
the performance than the parameters (i.e. a and ~). an experiment should be carried out to investigate
any effect of the parameters on the performance.
For both classification method and rules, there is a problem of misclassification. This problem
is that the classification and rules have the terrible performance in some tab-charts. In other words,
there are a lot of observations in these tab-charts being wrongly classified. However, all these tab- .
charts have the property that the level of background and plateau are very close. Thus, this is possibly
the cause of misclassification but the gentle signal change (i.e. jump or drop having gentle slope) is
another possible cause. Anyway, this problem gives us a hint to improve the performance of both
method and rules. For example, another set of rules is needed to tackle these tab-charts. Besides,
classification method is better than classification rules because rules are hardly to replace the human
visual judgement. However, classification rules are automatic and more practical than classification
method.
Data reduction is another characteristic of the HOLT model. The model is capable of
removing the noise or variation from tab-chart. The researchers can gain the useful information by
observing the fitted curve (values) of the model. Apart from the graphs, quantitative analysis is also
included in this thesis to help us to understand the data reduction in another angle. The square-root of
the sample size formula is used to calculate the standard error over the background and plateau. This
formula has the assumption of independent data. Although many tab-charts have the autocorrelation
because the ARIMA model can be fitted to them, this violation does not cause the problem of using the
formula. The formula is not used to estimate the statistical summaries of the underlying process. It is
used to measure the degree of variation and fluctuation in the background and plateau. Over the
plateau, the standard error of HOLT model (i.e. mean of fitted curve or values) is smaller than that of
tab-chart (i.e. mean of the actual observation). This supports that the HOLT can reduce the variation
over the plateau. However, the situation is reversed over the background. The standard error of HOLT
model over background is greater than that of tab-chart over background. It implies that the HOLT
model has difficult in background and plateau of small signal (i.e. trace amount of element). A constant
trend (i.e. more smooth fitted curve) should be used on the background by choosing the appropriate
parameters. In other words, the background (or plateau of small signal) should use different a and ~
parameters.
After fitting the time series model to the raw data, the analysis of the fitted error should also
be provided. From the analysis result, the fitted error of HOLT model increases as the level of plateau
increases. The fitted error of the model decreases as the mass of the element increases. It is because the
amount of heavy elements is smaller than the amount of light elements in the sample. Therefore, the
only factor of affecting the fitted error is the level of plateau. This result implies that the standard
deviation of fitting error of element's concentration can be minimized if the background noise can be
controlled or minimized. Thus, the control of background noise can help to estimate the trace of the
element in the sample more accurate but it is not capable of delivering only significant improvement to
the major element in the sample.
For a comprehensive analysis, ARIMA model should also be fitted to the tab-chart. Since
there are a lot of tab-charts to be fitted, a policy is devised to speed up the ARIMA model fitting. The
procedures of fitting the ARIMA have the following three steps. The first step is to check the tab-chart
is stationary. If it is not stationary, differencing will be carried out on the background of the tab-chart.
The second step is fit AR(1), AR(2), MA(1), MA(2) and ARMA(1,1) to the background, the best
model is chosen by the lowest MSE and significant parameters. If the five models cannot be fitted to
the background, it may be random walk or has another ARIMA model. The third step is to use the
above two steps to fit the ARIMA to the plateau of the tab-chart. After the model fitting, the result is
obtained at the following. Over the background, no differencing is needed and most of the tab-charts
are ARMA(1,1), random walk or inconclusive. Over the plateau, most of the plateaus having upward or
downward trend are ARIMA(0,1,1), whereas most of the plateaus having horizontal trend are AR(1) or
ARMA(1,1). The other plateaus are random walk or inconclusive. The above result also indicates that
most of the tab-charts have the autocorrelation problem.
In the method (model) selection, there are several reasons of not choosing ARIMA model to
tackle the researchers' problem. Firstly, the equation and structure of ARIMA is too complicated to be
modified to identify the point change. Secondly, although simple exponential smoothing model is a
special case of ARIMA model (equivalent to ARIMA(0,1,1) model without constant term), this model
is not accepted in the selection. The reason is already mentioned in the previous paragraph. Thirdly, the
researchers are looking for an automatic and fast method to extract the information from the tab-chart,
ARIMA model seems to be not a practical method for automated use. However the ARIMA can be
used to detect the autocorrelation existing in the tab-chart before applying the HOLT model. The preprocess
on the tab-chart could be done if the autocorrelation is serious and the precise estimate is
required to make the judgement on the tab-chart or sample.
There are some inadequate places in this project and more work is needed on these places in
further studies. Firstly, the classification rules are relatively approximated and should be refined. More
advanced mathematical techniques could be employed to devise better classification rules. Secondly,
the other models should also be explored and investigated such that they have better performance in
change point detection and the researchers can gain more information from the tab-chart via these
models, for example, locally weighted regression. Thirdly, this project does not have enough work on
studying the parameters of the HOLT model. One way is to investigate the impact of parameters on the
performance of classification rules because only one set of parameters is used in this analysis. Another
way is to investigate the relationship between parameters and data reduction because the data reduction
does not work on the background or plateau of small signal. Lastly, some studies should be done to
tackle the tab-chart having the autocorrelation and influence of autocorrelation on the statistical
estimate on the tab-chart by HOLT model.
Moreover, there are several directions worthwhile to be considered in the long-term goal.
Firstly, the classification rules can be extended and applied to the tab-chart of multiple significant
signals (i.e. jumps and drops) and multiple flat parts (i.e. backgrounds and plateaus). Secondly, when
studying the standard samples in this project, the tab-chart of each element in a sample is investigated
independently and separately. However, the mineral in the real sample is a chemical compound of
elements. Therefore, the change of composition in real samples will involve the investigation of more
than one tab-chart. Thirdly, the knowledge (i.e. model and method) of this thesis is not limited only on
the composition investigation of minerals or rock samples. It can be generalized and applied to other
areas (i.e. charts in other problem). Lastly, the HOLT model and other methods of change point
analysis should be compared, especially their performances. Since the data reduction of HOLT model
is to trace the trend of the tab-chart and remove the variation or noise, the HOLT model may be
incorporate into other methods to get better performance.

Item Type: Thesis (Research Master)
Copyright Holders: The Author
Copyright Information:

Copyright 2008 the Author - The University is continuing to endeavour to trace the copyright owner(s) and in the meantime this item has been reproduced here in good faith. We would be pleased to hear from the copyright owner(s).

Additional Information:

Available for library use only and copying in accordance with the Copyright Act 1968, as amended. Thesis (MSc)--University of Tasmania, 2008. Includes bibliographical references

Date Deposited: 04 Feb 2015 23:34
Last Modified: 11 Mar 2016 05:53
Item Statistics: View statistics for this item

Actions (login required)

Item Control Page Item Control Page
TOP