| Home | Sign up! | Projects | Seminars | Research Notes | Today's Lecture | About |
Member ID: 30

Srijani Kirti

Email
srijanik@yahoo.com

Profile: I graduated from Presidency College, Calcutta, with Honours in Economics. I am currently studying for my MS in Computer Information Systems at Ansal Institute of Technology.

Artificial Intelligence


Master’s Thesis
by
Srijani Kirti

Thesis Supervisor
Prof. Ashay Dharwadker

Ansal Institute of Technology
15th April 2004


Abstract

This thesis is a survey of the field of Artificial Intelligence vis-à-vis natural language communication between humans and machines. We start with the Turing Test as a definition of AI. We then explore Chomsky’s concept of Universal Grammar. Is it really possible to design a machine that passes the Turing Test? We study the famous Eliza program written by Weizenbaum, its grammar and transformation rules.  Finally, we implement the Eliza program in C++ and give an overview of how our version of the  program works. A detailed manual and sample conversations with Eliza are included in the appendix. The source code of the software is provided under the GNU public license for noncommercial use on the accompanying CD.

Contents

1. The Turing Test
2. Noam Chomsky’s Universal Grammar
3. Chatbot Eliza
4. Overview of how the program works
5. Actual Code
6. Analysis of the code
7. Appendix – Eliza Manual
8. Appendix – Conversations with Eliza
9. References

Prof. Ashay Dharwadker's Courses (3):

CourseSemesterGrade
MS ProjectFall 2003View
Database SystemsFall 2003View
Computer NetworksSpring 2004View



Projects (2)


Project ID: 10
Course: Database Systems
Topic: Cricket Plus!
Description: This is going to be a joint project with Prateeti Raj. We will create a MySQL database on "Cricket" and then create a web page as a front end.
The SQL syntax for creating the database is--
mysql>create database cricketplus;
mysql>use database cricketplus;
mysql>create table countries(country_name char(40) not null, uniform_color char(40), cricket_board char(40), famous_playgrounds char(40), world_cups_won integer);
mysql>create table cricket_players(player_id integer not null, player_name char(40), highest_score integer, average integer, no_of_centuries integer, no_of half_centuries integer, bowling_average integer, wickets_taken integer);
mysql>create table country_players(player_id char(40) not null, country_name char(40) not null, name char(40) not null, primary key(player_id, country_name), foreign key (country_name) references countries);

Project ID: 11
Course: MS Project
Topic: Chatbots
Description: This is the major project for MS students that I'm doing with Ashay Sir. I will attempt to make a Chatbot using C++. A Chatbot is a computer program that can reply to one's queries through an interface, much like a real person. It comes under the jurisdiction of AI. The world's first chatbot was the celebrated 'ELIZA' program by Prof. Weizenbaum at MIT.

Reference: Weizenbaum's original paper



Seminars (2)


Seminar ID: 27
Course: MS Project
Topic: Eliza Program
Description: Presentation of Weizenbaum's paper on "ELIZA -- A Computer Program For the Study of Natural Language Communication Between Man and Machine" to the class.
  • How Eliza works:
    • User gives input statement S.
    • S is examined for key words (key words are ranked).
    • S is transformed according to rules associated with keywords.
    • Transformation rules are also specified for content-free statements.
    • Transformed S is output by Eliza.
  • Technical Problems:
    • Identify the "most important" key word in S.
    • Identify the context of key word, e.g. if "YOU" is followed by "ARE" then the context is that an assertion is being made.
    • Choice of transformation rule.
    • Making Eliza respond "intelligently" when no key word is found.
    • Editing, extension of Eliza program (we call this "Educating Eliza").
  • Example of Transformation:
    • Input S="IT SEEMS THAT YOU HATE ME"
    • Template
      1
      2
      3
      4
      IT SEEMS THAT YOU HATE ME
    • Identification of key words: 2, 4. Nothing else is understood by Eliza.
    • Decomposition rule: __ YOU __ ME
    • Reassembly rule: Change YOU to I and ME to YOU. Replace the first blank by the stock phrase "WHAT MAKES YOU THINK", replace the second blank by 3:
      Stock Phrase
      2'
      3
      4'
      WHAT MAKES YOU THINK I HATE YOU
    • Output S'="WHAT MAKES YOU THINK I HATE YOU".
  • Transformation Tree Diagram:

Seminar ID: 57
Course: MS Project
Topic: Some conversations with Eliza
Description: Conversation 1

User: Who are you, Eliza?
Eliza: DOES THAT QUESTION INTEREST YOU?
User: Are you a machine?
Eliza: WOULD YOU PREFER IT IF I WERE NOT A MACHINE?
User: yes
Eliza: I SEE
User: Are you offended?
Eliza: PERHAPS IN YOUR FANTASIES I AM OFFENDED
User: Let us change the topic.
Eliza: COME COME ELUCIDATE YOUR THOUGHTS.
User: I had a dream last night.
Eliza: DO YOU DREAM OFTEN?
User: yes I do.
Eliza: I SEE.
User: In my dream I saw that I was drowning.
Eliza: WHAT PERSONS APPEAR IN YOUR DREAMS?
User: My friend.
Eliza: COULD YOU ELABORATE ON THAT?
User: Her name is Rita.
Eliza: IM NOT INTERESTED IN NAMES-PLEASE GO ON.
User: I see her asking for help.
Eliza: THAT IS QUITE INTERESTING
User: Then I go to help her.
Eliza: WHYAT DOES THAT SUGGEST TO YOU?
User: That I care for her.
Eliza: I SEE.
User: I want to make the dream stop coming.
Eliza: WHY DO YOU WANT TO MAKE THE DREAM STOP COMING?
User: Because it bothers me.
Eliza: DON’T ANY OTHER REASOND COME TO MIND?
User: I feel insecure.
Eliza: DO YOU OFTEN FEEL INSECURE?


Conversation 2

User: I am getting homesick.
Eliza: HOW LONG HAVE YOU BEEN GETTING HOMESICK?
User: Since I joined the hostel.
Eliza: I SEE.
User: What do you think of love
Eliza: OH, I THINK OF LOVE.
User: What is love?
Eliza: DOES THAT QUESTION INTEREST YOU?
User: Yes, because I want to find out whether I am in love or not.
Eliza: DO YOU THINK IT IS NORMAL TO BE IN LOVE OR NOT?
User: I think it is normal.
Eliza: ARE YOU SURE?
User: Yes I am very sure.
Eliza: I SEE.
User: Do you think one should be honest to his or her lover?
Eliza: YOU’RE NOT REALLY TALKING ABOUT ME, ARE YOU?
User: I am not talking about any specific individual.
Eliza: DO YOU ENJOY NOT TALKING ABOUT ANY SPECIFIC INDIVIDUAL?
User: I am talking about human beings.
Eliza: CAN YOU ELABORATE ON THAT?
User: Why should I tell you everything?
Eliza: WHAT WOULD IT MEAN TO YOU IF YOU GOT EVERYTHING?
User: It will be great for me.
Eliza: THAT IS QUITE INTERESTING.
User: Do you really mean it?
Eliza: OH, I REALLY MEAN IT
User: What else can you tell me about love?
Eliza: YOU’RE NOT REALLY TALKING ABOUT ME, ARE YOU?
User: What is your real name, Eliza?
Eliza: WHAT ANSWER WOULD PLEASE YOU THE MOST?
User: Tell me the truth.
Eliza: WHAT DOES THAT SUGGEST TO YOU?
User: Truth is related to reality.
Eliza: I SEE.


Research Notes (5)


Research Note ID: 10
Course: MS Project
Topic: Noam Chomsky
Description: Few people have had a greater impact on 20th century thought as MIT linguist, activist, and political dissident Noam Chomsky. It's often been said that Chomsky is to linguistics what Einstein is to physics. His 1957 treatise, Syntactic Structures, proposed that there is a kind of "universal grammar" underlying all languages and he called it “Transformational Grammar”. According to transformational grammar, every intelligible sentence conforms not only to grammatical rules peculiar to its particular language, but also to "deep structures," a universal grammar underlying all languages and corresponding to an innate capacity of the human brain. That is, if an alien came to Earth this alien would perceive all languages on Earth as pretty much the same, with only superficial variations. Chomsky’s goal in linguistics was to discover and describe this universal grammar.

Biography

On linguistics

Criticisms


Research Note ID: 21
Course: MS Project
Topic: Progress till date
Description: Till date I have…
1) Researched Eliza chatbot and Joseph Weizenbaum pretty thouroughly.
2) Presented Wiezenbaum’s paper on Eliza to the class.
3) Analyzed how the program is written with the help of Ashay Sir.
4) Copied three files containing Keywords and reassembly rules from the program to Notepad so that they can be extended and/or manipulated as required.
5) Learnt how to draw a tree-diagram (with Root, Leaf etc)
Next, I plan to….
1) Import the keywords & reassembly rules of the Eliza program into a MySQL Database so that Eliza can “learn” new words & phrases while conversing with people.
2) Finally, translate the Eliza program from C++ to PHP.

Research Note ID: 28
Course: Database Systems
Topic: Optimization
Description: Optimization is one of the strengths of Relational Databases. This is because, in such a System, relational expressions are at a sufficiently high semantic level to make optimization feasible in the first place. The overall purpose of the optimizer is to is to chaoose an efficient strategy for evaluating a given relational expression. However, there is no guarantee that the implementation strategy chosen is truly optimal, all that can be said is that it is an improvement on the original unoptimized version.
A good optimizer will have Statistical information such as --
The number of values in each domain
The current number of tuples in each base relvar
The current number of distinct values in each attribute in a base relvar
The number of times ach such value occurs in each attribute
There are four broad stages in query processing
1. Cast the query into internal form
The internal form chosen is usually in the form of a query tree.
2. Convert to canonical form
Relational languages usually allow queries to be expressed in a variety of ways. The purpose of this step is to convert these into some equivalent canonical form to eliminate such superficial differences.
3. Choose candidate low level procedures
This is for implementing the various operations in the canonical representation of the query.
4. Genetate query plans and choose the cheapest
Some transformation rules used in the optimization process are
Restrictions , projections, union, intersection, difference, join, commutativity, associativity, idempotencescalar computational expressions, boolean expressions and semantic transformations
The next step is to adopt a ‘divide & conquer’strategy called query decompositionpioneered by the Ingres Prototype. The idea behind this is to break a query involving Range variables into a sequence of smaller queries involving one or two such variables each using Detachment & Touple substitution.
Next, we implement the relational operators using pseudocode alogarithms for five join techniques--
1.brute force 2.index lookup 3.hash lookup 4.merge 5.hash 6.combinations of the above.

Research Note ID: 59
Course: Computer Networks
Topic: ARPANET
Description: In 1962, the United States and the earstwhile USSR were engaged in bitter rivalry. Both the US and the USSR were in the process of building hair-trigger nuclear ballistic missile systems and a nuclear attack seemed imminent. Each country pondered post-nuclear attack scenarios. The million dollar question was—how could any sort of "command and control network" survive the aftermath of a nuclear attack?
In 1968, the Advanced Research Projects Agency (ARPA) sent out a Request for Quotation (RFQ) to build a network of four Interface Message Processors (IMPs). Many of the large computer and telecommunications organizations didn't even respond--they thought it was impossible.
But at BBN, Frank Heart was sure BBN could build it. Heart assembled a first rate proposal team:
Dave Walden, a young programmer with expertise in real-time systems
Bernie Cosell, a world-class debugger with a reputation for fixing problems
• Severo Ornstein, a perfectionist hardware ace
• Will Crowther, an exceptional programmer who specialized in producing complex, tight code
• Bob Kahn, the consummate theoretician who understood error-control and the problems associated with sending data over telephone lines
• BBN's detailed proposal filled 200 pages, replete with flowcharts, equations, and tables detailing timing, routing, transmission delays, and packet queuing. Although a dozen companies bid on the contract, the BBN team was confident that their proposal would be the best. They were right. Even though much larger and better-known organizations had bid on the contract, ARPA awarded it to BBN in January 1969.
It was based on the method of "packet switching" ,a rapid store-and-forward design by Paul Baran. When a node receives a packet it stores it, determines the best route to its destination, and sends it to the next node on that path. If there was a problem with a node (or if it had been destroyed) packets would simply be routed around it.
In l969, this "distributed" concept was given its first large-scale test, with the first node installed at UCLA and the seventh node at RAND in Santa Monica Called ARPANET, it was intended for scientists and researchers who wanted to share one another's computers remotely.
In the autumn of 1969 the first ARPANET computer was connected to the ARPANET's IMP node at the University of California at Los Angeles (UCLA). Doug Engelbart's hypertext-project computer at Stanford Research Institute (SRI) was the next. On October 1, 1969, the first characters were transmitted over the new network. The ARPANET was born. When IMPs number three and four were installed at UC Santa Barbara and the University of Utah, IMP installations were beginning to seem routine and there was little fanfare. The backbone of the ARPANET consisted of packet-switching computers, called IMPs (Interface Message Processors), connected by, for the time, superfast 56 Kbit/s lines. Conventional computers with appropriate communications software were then connected to these IMP nodes
All the computers used different operating systems and they were able to talk to each other across the network with equal status. The ARPANET grew to connect research institutes and laboratories supported by the Department of Defense in various parts of the USA. Many of these were university laboratories studying data processing and computer networks, which developed the TCP/IP network and its applications for the Internet The network quietly expanded to thirteen sites by January 1971 and twenty-three by April 1972.
This network that would change the world was virtually unknown until the International Conference on Computer Communication in Washington, DC in October 1972. The ARPANET was the only demonstration and by the time it was over, packet switching skeptics were transformed into believers, and several hundred more people had an inkling of the digital age on the horizon.
Within two years, the network's users had turned it into something unforeseen: a high-speed, electronic post office for exchanging everything from technical to personal information.

In 1989, the ARPANET was renamed "Internet," which had also been described as the "information superhighway." These days, the Internet continues to expand, stringing together the World Wide Web, an all-encompassing, affordable, universal communications network.


Research Note ID: 75
Course: MS Project
Topic: Transformational Grammar
Description: What is Noam Chomsky's transformational grammar?
Several things; it really comprises several layers of theory:
(1) The hypothesis that much of the structure of human language is inborn ('built-in') in the human brain, so that a baby learning to talk only has to learn the vocabulary and the structural 'parameters' of his native language -- he doesn't have to learn how language works from scratch.
The main evidence consists of:
The fact that babies learn to talk remarkably well from what seems to be inadequate exposure to language; it is claimed that babies acquire some rules of grammar that they could never have 'learned' from what is available to them, if the structure of language were not partly built-in.
• The fact that the structure of language on different levels (vocabulary, ability to connect words, etc.) can be lost by injury to specific areas of the brain.
• The fact that there are unexpected structural similarities between all known languages
• This theory is by no means accepted by all linguists, though many would agree that some core part of language is innate.
(2) The hypothesis that to adequately describe the grammar of a human language, you have to give each sentence at least two different structures, called deep structure and surface structure, together with rules called transformations that relate them.
This is hotly debated. Some theories of grammar use two levels and some don't.
Let us take a closer look at this concept. It comprised three sections, or components: the phrase-structure component, the transformational component, and the morphophonemic component. Each of these components consisted of a set of rules operating upon a certain "input" to yield a certain "output." The notion of phrase structure may be dealt with independently of its incorporation in the larger system. In the following system of rules, S stands for Sentence, NP for Noun Phrase, VP for Verb Phrase, Det for Determiner, Aux for Auxiliary (verb), N for Noun, and V for Verb stem.
This is a simple phrase-structure grammar. It generates and thereby defines as grammatical such sentences as "The man will hit the ball," and it assigns to each sentence that it generates a structural description. The kind of structural description assigned by a phrase-structure grammar is, in fact, a constituent structure analysis of the sentence.
In these rules, the arrow can be interpreted as an instruction to rewrite (this is to be taken as a technical term) whatever symbol appears to the left of the arrow as the symbol or string of symbols that appears to the right of the arrow. For example, rule (2) rewrites the symbol VP (Verb Phrase) as the string of symbols Verb + NP (Noun Phrase), and it thereby defines Verb + NP to be a construction of the type VP. Or, alternatively and equivalently, it says that constructions of the type VP may have as their immediate constituents constructions of the type Verb and NP (combined in that order). Rule (2) can be thought of as creating or being associated with the following tree structure.

Rules (1)-(8) do not operate in isolation but constitute an integrated system. The symbol S (standing mnemonically for "sentence") is designated as the initial symbol. This information is not given in the rules (1)-(8), but it can be assumed either that it is given in a kind of protocol statement preceding the grammatical rules or that there is a universal convention according to which S is always the initial symbol. It is necessary to begin with a rule that has the initial symbol on the left. Thereafter any rule may be applied in any order until no further rule is applicable; in doing so, a derivation can be constructed of one of the sentences generated by the grammar. If the rules are applied in the following order: (1), (2), (3), (3), (4), (5), (5), (6), (6), (7), (8), then assuming that "the" is selected on both applications of (5), "man" on one application of (6), and "ball" on the other, "will" on the application of (7), and "hit" on the application of (8), the following derivation of the sentence "The man will hit the ball" will have been constructed:
Many other derivations of this sentence are possible, depending on the order in which the rules are applied. The important point is that all these different derivations are equivalent in that they can be reduced to the same tree diagram shown below.

fig4 If this is compared with the system of rules, it will be seen that each application of each rule creates or is associated with a portion (or subtree) of the tree. The tree diagram, may now be considered as a structural description of the sentence "The man hit the ball." It is a description of the constituent structure, or phrase structure, of the sentence, and it is assigned by the rules that generate the sentence.
Till now i’ve only been explaining phrase-structure grammar consisting entirely of phrase-structure rules. However, a Transformational grammar (as formalized by Chomsky) includes both phrase-structure and transformational rules (as well as morphophonemic rules). The transformational rules depend upon the prior application of the phrase-structure rules and have the effect of converting, or transforming, one phrase marker into another. What is meant by this statement may be clarified first with reference to a purely abstract and very simple transformational grammar, in which the letters stand for constituents of a sentence (and S stands for "sentence"):
The first five rules are phrase-structure rules (PS rules); rule (6) is a transformational rule (T rule). The output of rules (1)-(5) is the terminal string a + b + c + e + f + d + g + h, which has associated with it the structural description indicated by the phrase marker shown in Figure 5 (left). Rule (6) applies to this terminal string of the PS rules and the associated phrase marker. It has the effect of deleting C (and the constituents of C) and permuting A and D (together with their constituents). The result is the string of symbols d + g + h + a + b, with the associated phrase marker shown in the following figure.

The phrase marker shown in the figure (left) may be described as underlying, and the phrase marker shown in the Figure(right) as derived with respect to rule (6). One of the principal characteristics of a transformational rule is its transformation of an underlying phrase marker into a derived phrase marker in this way. Transformational rules, in contrast with phrase-structure rules, are also formally more heterogeneous and may have more than one symbol on the left-hand side of the arrow. The linguistic importance of these abstract considerations may be explained with reference to the relationship that holds in English between active and passive sentences.
Chomsky's rule for relating active and passive sentences (as given in Syntactic Structures) is very similar, at first sight, to Harris's, discussed above. Chomsky's rule is:
This rule, called the passive transformation, presupposes and depends upon the prior application of a set of phrase-structure rules. For simplicity, the passive transformation may first be considered in relation to the set of terminal strings generated by the phrase-structure rules (1)-(8) given earlier. The string "the + man + will + hit + the + ball" (with its associated phrase marker, as shown in Figure 4) can be treated not as an actual sentence but as the structure underlying both the active sentence "The man will hit the ball" and the corresponding passive "The ball will be hit by the man." The passive transformation is applicable under the condition that the underlying, or "input," string is analyzable in terms of its phrase structure as NP - Aux - V - NP (the use of subscript numerals to distinguish the two NPs in the formulation of the rule is an informal device for indicating the operation of permutation). In the phrase marker in Figure 4 "the" + "man" are constituents of NP, "will" is a constituent of Aux, "hit" is a constituent of V, and "the" + "ball" are constituents of NP. The whole string is therefore analyzable in the appropriate sense, and the passive transformation converts it into the string "the + ball + will + be + en + hit + by + the + man." A subsequent transformational rule will permute "en + hit" to yield "hit + en," and one of the morphophonemic rules will then convert "hit + en" to "hit" (as "ride + en" will be converted to "ridden"; "open + en" to "opened," and so on).
Every transformational rule has the effect of converting an underlying phrase marker into a derived phrase marker. The manner in which the transformational rules assign derived constituent structure to their input strings is one of the major theoretical problems in the formalization of transformational grammar. Here it can be assumed not only that "be + en" is attached to Aux and "by" to NP (as indicated by the plus signs in the rule as it has been formulated above) but also that the rest of the derived structure is as shown in Figure 6.

Fig 6 The phrase marker in Figure 6 formalizes the fact, among others, that "the ball" is the subject of the passive sentence "The ball will be hit by the man," whereas "the man" is the subject of the corresponding active "The man will hit the ball" (cf. Figure 4)
. Although the example above is a very simple one, and only a single transformational rule has been considered independently of other transformational rules in the same system, the passive transformation must operate, not only upon simple noun phrases like "the man" or "the ball," but upon noun phrases that contain adjectives ("the old man"), modifying phrases ("the man in the corner"), relative clauses ("the man who checked in last night"), and so forth. The incorporation, or embedding, of these other structures with the noun phrase will be brought about by the prior application of other transformational rules. It should also be clear that the phrase-structure rules require extension to allow for the various forms of the verb ("is hitting," "hit," "was hitting," "has hit," "has been hitting," etc.) and for the distinction of singular and plural.
It is important to note that, unlike Harris's, Chomsky's system of transformational grammar does not convert one sentence into another: the transformational rules operate upon the structures underlying sentences and not upon actual sentences. A further point is that even the simplest sentences (i.e.,kernel sentences) require the application of at least some transformational rules. Corresponding active and passive sentences, affirmative and negative sentences, declarative and interrogative sentences, and so on are formally related by deriving them from the same underlying terminal string of the phrase-structure component. The difference between kernel sentences and nonkernel sentences in Syntactic Structures (in Chomsky's later system the category of kernel sentences is not given formal recognition at all) resides in the fact that kernel sentences are generated without the application of any optional transformations. Non kernel sentences require the application of both optional and obligatory transformations, and they differ one from another in that a different selection of optional transformations is made.
http://www.ifi.unizh.ch/groups/CL/volk/SyntaxVorl/Chomsky.html
http://www.zompist.com/lang9.html

Last updated on Friday, 30th April 2004, 08:11:44 AM.

Prof. Ashay Dharwadker