f(x)=w1x1+w2x2+……………….+wnxn.
This function is a linearly separable function. For machines with only two actions 0,1 TLU’s suffice. For more complex actions 0,1,2 need a network of TLU’s which is called a Neural Network.
Example:
Inputs Desired Output X1=(x1+x2+……….+xn) d1 X2=(x1’+x2’+………..+xn’) d2 . . . . Xk=(x1k+x2k+………..+xnk) dk
To find function “f” such that f(Xi)=di for i=0,1,2,3…….,k. Experimental evidence suggests that if the training set is “typical” then f will generate approximately correct response for all x. Frank Rosenblatt, Widrow and Hoff gave the following in 1962.
Weights W = (W1, W2,…….,WN) Threshold theta (Assume theta =0) X.W=X.W1+X2.W2+……..XN.WN Input X = (x1,x2,…..,xn) Output = {1 if X.W> theta 0 if X.W<= theta
Start with arbitrary W=(W1,W2….,WN) (X=X1 In the training set) Define error € = (f(x)-d)2 Define gradient d€/dw = (d€/dw1, d€/dw2,…..d€/dwn) Define S=X.W = ds/dw = x
d€/dw = d€/ds.ds/dw => d€/ds.x But d€/ds = (f(x) – d) .df/ds So d€/dw = 2(f(x)-d)df/ds . x
If f(x) = 1 =d => x.w=1 If f(x) = 0 = d => x.w = -1 Then f=s € = (f – d)2 => df/ds = 1
Step 1: Start with arbitrary w Step 2: Replace w by w = (f-d)x Where c = learning rate parameter Repeat step 2 until w converges This is Widrow Hoff Algorithm
X=(1,1,0) W=(0,0,0) F(x)=1 w1.1+w2.1+w3.0 = 1 0.5(1) + 0.5 (1)+(0 or 1) 0 = 0.5+0.5+0=1
The output of the values will converge to 0.5 with the iterations from 0 to 0.5.
Generalized Delta Rule
Here Wi+1 = Wi + c(d-f).f(1-f)x Where f = 1/1+e-s(e to the power -s)
Prof. Ashay Dharwadker