Update Seminar Form
I presented a seminar on Jan. 15, '04 to apprise the faculty supervisor and classmates of the progress made on my MS Project pertaining to Data Mining.
The following is a brief description of the things discussed in the seminar:
When it comes to mining types of data, there are 6 such types/branches of study:
Multi-dimensional Analysis and Descriptive Mining of Complex Data Objects
Mining Spatial Databases
Mining Multimedia Databases
Mining Time-Series and Sequence Data
Mining Text Databases
Mining the World Wide Web
Of the above, my project focuses on the techniques of mining
Time-Series and Sequence Data
One of the most important tasks before businessmen and economists these days is to make estimates for the future.
Therefore, I have embarked upon the job of developing a software which is able to deduce a trend (and eventually forecast) out of data given in the form of a time series. This field of analysing time series and making forecasts comes under the broad category of DATA MINING.
Time Series defined
“A time series is a set of statistical observations arranged in chronological order.”
- Morris Hamburg
What is Trend measurement?
Given any time series, the first step towards forecasting is to determine and present the direction which it takes - i.e. is it growing or declining? This is TREND measurement.
Method of Moving Averages to Determine Trend
Is the most widely used method of measuring seasonal variations. In determining trend by the by the method of moving averages, the averages for a pre-determined set of periods (years/months/weeks) is secured, and this value now represents the normal/trend value for the unit of time lying in the middle of the period covered in the in the calculation of the average.
The effect of averaging is to give a smoother curve (when drawing the trend line), and lessening the influence of the fluctuations that pull the time-series’ figures away from the general trend.
When applying the Moving Average method, it is necessary to select a period for the moving average, such as a 200-day moving average, 3-monthly moving average, 4-quarterly moving average, 5-yearly moving average, etc.
Criteria for the Selection of Period for the Moving Average
The period of the moving average is to be decided in the light of the length of the cycle between any 2 consecutive peaks on the ACTUAL TREND LINE (i.e. the trend line formed by plotting the actual data values of the time series). Once the cycle length is known, the ‘ideal’ period for the Moving Average is given by:
Ideal Period OR Length of Moving Average = (Cycle Length)/2 + 1
Exponentially Smoothed Weighted Moving Average
The simple moving averages method, we have discussed so far, gives equal significance to all the days in the average. However, this need not be so. If one thinks about it, it actually does not make much sense, especially if one is interested in using a longer-term MA to smoothen-out random bumps in the trend. Assuming one to be using a 20-day moving average, why should the data almost 3 weeks ago be considered equally relevant to the data recorded today morning?
Various forms of ‘weighted’ moving averages have been developed to address this objection. One such technique is the
‘Exponentially Smoothed Weighted Moving Average’
(or ESWMA, for short).
Exponential Smoothing gives maximum significance to the most recent data and decreasing significance to data which pertains to older instances of time. The weight factors in an exponentially smoothed MA are successive powers of a number called the ‘smoothing constant’. A smoothing constant less than 1 weighs recent data more heavily, with the bias towards the most recent measurements increasing as the smoothing constant decreases towards 0. If the smoothing constant exceeds 1, older data are weighed more heavily that recent measurements.
for ESWMA is given by:
E = (c1. t1 + c2.t2 + ……………… + cn.tn)/n
c: Smoothing Constant (and, for all practical purposes, 0
t1, t2, ……., tn : Time Series data – with t1 being the most recent data
n: period of the MA, and
Implementation into Software Application:
Using C++ and Windows MFC Programming, a software application has been developed to deduce the trend values using the ESWMA technique, and further plotting these values on a 2-dimensional plane (using OpenGL) to obtain the ‘Exponentially Smoothed Trend Curve’.
Prof. Ashay Dharwadker