COMP569+Artificial+Intelligence

= Assignment 1 =

Date: 9/1/2007

Student: Michael Weingarden

Course ID: COMP569 Artificial Intelligence

Semester/Year: Fall 2007

Title/Topic: MU Puzzle

Computer Simulation of MIU System Overview of desired logic for main routine
 * [|Description of Theoretical Computer Simulation of MIU System]
 * [|String manipulation related to rule processing]
 * [|Answers to assignment #1 questions]
 * [|References]

1) xI -> xIU 2) Mx -> Mxx 3) xIIIy -> xUy 4) xUUy -> xy Thoughts regarding implementation of an algorithm to create this system:
 * Create functions that represent the four MIU rules
 * Rules:
 * Get initial input string from user (or hard code)
 * Apply all possible combinations of rules until desired result is found


 * Since I'm already having problems with implementing a simulation, I'm going to focus my thoughts here on forward chaining only.
 * I say that all possible combinations of rules must be tried, but based on the course material (and some common sense) not all possible combinations must be tried (or can be tried).
 * My first thought was to create a tree with one of the fours rules performed at each node in the tree and then search the tree for the desired output string (for example: MU). The tree could represent all possible combinations of rules (like 1, 1, 1, 1,... or 1, 2, 3, 4 or 4, 3, 2, 1), but after reviewing the course materials, it's obvious that we have to eliminate some combinations for the following reason:
 * The string provided from the parent node must be analyzed to determine which rules may be applied. For example, if you start with MI apply rule 1 -> MIU apply rule 2 -> MIUIU at this point, applying either rule 3 or 4 is not an option.
 * Also, even though there are only 4 rules, the tree could have unlimited levels (at least on some paths like 1, 1, 1, 1, 1, etc.) because we don't know how many permutations of the rules might be required to get the desired result. So, creating a tree with all possible combinations and then searching is impractical.
 * Instead, a rule must be applied to the given input string at a particular node and then the comparison to the desired result must be made at that node.
 * If the comparison fails, you then go to the next node which is created on the fly.
 * The next question is whether the tree traversal should proceed in a breadth first manner or a depth first manner or some hybrid.
 * However, the question is, "how do you create an algorithm that knows when to stop trying combinations?"
 * Observation: the computer simulation seems to be useful in producing the various resulting strings from applying the rules, but it's not clear to me about how to get the computer to realize whether it is heading down a path of no return. I can see that the strings continue to get longer when you go down a certain path in the tree and for that reason, I can tell that I'm going down a futile path. But I'm not sure how to create an algorithm that says "this path in my depth first search is futile." For example, do I as the programmer eliminate the path 1, 1, 1, 1, 1, etc. because I know it's futile or do I create some criteria that the computer can use to abandon that search? To me this is a dilemma because I'm not sure that you can merely say, "if you traverse four nodes deep and at each node, the resulting string is longer than the previous, abandon this path."
 * Thought: The initial concept presented in GEB is that the MIU system is a typographical way of representing a number system. So, if values were assigned to M, I, and U- with their position representing place values- it's possible that you could use the value of the resulting string to determine whether you were converging on the desired result or diverging.
 * Thought: The initial concept presented in GEB is that the MIU system is a typographical way of representing a number system. So, if values were assigned to M, I, and U- with their position representing place values- it's possible that you could use the value of the resulting string to determine whether you were converging on the desired result or diverging.

String manipulation related to rule processing
 * I thought that at least this part would be straight forward, but there are some subtle nuances.
 * First you have to parse the string at a particular node to see whether part of it matches the constant part of one of the rules. Then inserting, adding or removing characters can be done.
 * But the real challenge comes from the fact that you may have to look for occurrences of more than one matching pattern or multiple occurrences of the same pattern. And each additional pattern that occurs at a node, represent an additional node.
 * Also, the pattern matching has to be done one character at a time, even after a particular pattern is found. For example, MIIII is a string that could require rule #3 to be applied in two ways with the following two resulting strings: MUI and MIU.

Answers to assignment #1 questions


 * 1. Why is the MIU System called typographical?**

In GEB, Hofstadter clearly describes the types of operations (on p.64) you might find in a typographical system. And, the MIU system clearly includes those types of operations. However, I think the simplest answer is described in the course materials like so, "a formal system with no obvious interpretation of the symbols." In other words, the letters M, I and U are merely treated as objects, not symbolic of something else.
 * 2. How does the MIU System use deduction?**

It uses rules to deduce new facts from knowledge of other facts (Dr. Wolfe).
 * 3. Does the MIU System use induction in any way? Explain.**

I don't think the system uses induction. If you try to solve the MU puzzle using only data from the computer simulation, you use inductive logic to determine the outcome. However, by analyzing the rules, the starting point and the desired result, you can formally prove the solution to the puzzle.
 * 4. Explain any realistic interpretation of the symbols, or strings of symbols, in the MIU System.**

The system seems genetic. You've got these fundamental building blocks and a set of rules and these rules can be applied in a variety of orders for an unlimited amount of time. I think they're like DNA?
 * 5. Explain how conflict resolution is implemented in your simulation.**

Definition for conflict resolution from course material: The process by which the control structure chooses a single rule to fire from among the rules that are currently applicable to the data.

To me, there are two issues here:


 * Q1**) What happens when you can implement more than one rule on a given "theorem" or string that results from a previous theorem or axiom?


 * Q2**) How do you decide when to proceed to the next generation (or level in the tree) or instead to try more theorems at a previous generation?


 * A1**)The answer to Q1) is easier. If you can implement more than one rule, then implement each one, one at a time and add the resulting theorems to the existing tree. You merely implement each rule you can in order as you parse through the given string.


 * A2**)The answer to Q2) is more difficult. It is not entirely obvious to me how you tell the computer when to stop moving to the next generation (especially if you're going down a depth first path). However, I did run across two implementations of the MIU system on the Internet. One was in Python and the other in a language called Icon. Both programs asked for the number of levels or generations before they began their processing. So, regardless of what kind of traversal you use, the human imposes a size limitation on the tree initially. This seems like the most practical way to implement this system. I've included the links to these programs in my references below.


 * 6. Does your implementation of the MIU System use forward or backward chaining? Explain.**

From what I've read about backward chaining, I would not have considered that type of implementation. Mainly because I don't see any particular advantage for this application.


 * 7. The MU Puzzle: Given only the axiom MI, is MU deducible in the MIU System?**

Based on the course materials and Wikipedia and other sources, no it is not. Of course, I do agree with this conclusion and I'll try to state my perspective in a slightly more unique way while answering question #8 below.


 * 8. Provide a convincing argument that MU is, or is not, deducible by the MIU System given only the axiom MI to start with.**

The answer to the question is based on the "I count." The only way to get rid of all the I's is to have the number of I's that are a multiple of 3 (to use rule #3). However, the only way to increase the number of I's is to double the number of I's using rule #2. So, if you start with one I, then you can only have I's in the following quantities: 1, 2, 4, 8, 16, 32, 64, etc. This sequence can be represented by 2^n. So, if you can only have 2^n I's, then you can never get a multiple of 3. This is because 2 is a prime number and 2^n will always have only prime factors of 2- there will never be any other prime factors, so you certainly can't have 3 as a factor. So, if you can't have I's in multiples of 3, you can never get rid of all the I's and you must do that to get MU.


 * 9. Explain the relationship between the results of your simulation and automated theorem proving. For example, how would the computer know if it was getting closer to proving the theorem or just spinning its wheels?**

I'm not sure that there is a way for the computer to know that it is getting closer to proving the theorem without adding additional rules. For instance, if you added a rule that said, "the theorem can't be proved if the given string has a non-multiple of 3 I's and the result has no I's."

It seems like you could program the computer to realize that as the string grows as you go down a certain path that if this path is continued it will only diverge from the desired goal. Even so, I still think you would have to add additional rules for the computer to know that any permutation of rules will fail on a particular path.

References


 * Applications in Programming in ANSI C, Johnsonbaugh, Kalin, 1990
 * Algorithms in C, Sedgewick, 1990
 * [|Excellent description of MU Puzzle]
 * [|Analysis with description of iterations]
 * [|What is a typographical system?]
 * [|How to Solve the MU Puzzle]
 * []
 * []
 * [|Nice binary tree modules for Perl]
 * [|Python code for MIU System]
 * [|Icon code for MIU System]

= Assignment 2 =

Date: 9/16/2007

Student: Michael Weingarden

Course ID: COMP569 Artificial Intelligence

Semester/Year: Fall 2007

Title/Topic: Expert System/Solving Sudoku 4x4 Puzzles

Disclaimer regarding Java Code:

[|Java Code PDF]
 * I had a hard time thinking of a rule based exercise, so I thought I would try Sudoku
 * When I thought up the idea, I wrote down a series of "rules" or IF statements, but somehow when I tried to write the code, all the "rules" got buried
 * Even so, I was too far down a path and too little time was left to go down a new path
 * For these reasons, please excuse my loose interpretation of "rules"
 * The code as written does not completely solve a puzzle, it merely finds the first number that already exists in the puzzle and then looks to see if it can place that number in other parts of the puzzle

[|Java Source Code File]


 * 1. Define the domain of your choosing and list the rules.**

The domain I chose was puzzle solving- specifically Sudoku. Here is the list of "rules" I started with:


 * Fill entire puzzle with 0's as place holders
 * Enter initial values (from edhelper.com puzzle generator)
 * If a single cell in a box is empty, fill that cell with the missing number
 * If a single cell in a row is empty, fill it with the missing number
 * If a single cell in a box is empty, fill it with the missing number
 * Choose a **number** in the grid
 * Find all occurrences of that **number**
 * If there are open cells in the **number's** row, mark all those cells as unavailable
 * If there are open cells in the **number's** column, mark those cells as unavailable
 * If there are open cells in the **number's** "box", mark those cells as unavailable
 * Set all unavailable cells equal to 5
 * If there is only one 5 in a box (group of four cells), change that 5 to the currently relevant **number**
 * After the current **number** is placed in all possible boxes, replace all 5's with 0's
 * Go back and repeat process, starting with Choose a **number**, for all numbers 1 through 4 (for the type of puzzle I'm working with)
 * After going through all the numbers 1 through 4, repeat the whole process from the beginning until the puzzle is solved
 * 2. Sketch the design of the system (block-level design, showing the inference process).**

[|Sketch of system]

[|Sketch of system (update)]

[|Scenario Diagrams]


 * 3. Explain how your system decides which rules are triggered on any one cycle, and how it picks a particular rule to fire (i.e.: explain how it handles conflict resolution).**


 * My algorithm is mostly linear, brute force
 * It just plows through a series of rules and I don't really see the potential for conflict
 * In fact, the big problem is not having enough rules or appropriate rules
 * If I had more rules, I still don't think that there would be a conflict unless there is more than one unique solution to the puzzle, however, most Sudoku puzzles are designed for one unique solution
 * 4. Explain whether the "facts" in your system are binary (assigned only true or false) -- are there any variations? (fuzziness? confidence factors? probability?).**


 * For the most part, all the facts in my system were binary. Either there was a number in a cell or not.
 * Interestingly, I believe that the Sudoku puzzle may be considered a typographic system
 * 5. Explain how any given rule might have exceptions (that is, situations where the rule is incorrect). More generally, explain the major limitations of your system.**


 * I think it's simple enough to ensure that I only create rules that are correct
 * The real problem is creating enough rules and appropriate rules to actually solve the puzzle

= Assignment 3 =

Date: 9/20/2007

Student: Michael Weingarden

Course ID: COMP569 Artificial Intelligence

Semester/Year: Fall 2007

Title/Topic: Elementary Blocks World


 * 1. Find the shortest path for the example given above. If there is more than one shortest path, present them all.**

The example looked like this: code format="escaped" F                  C C  D             D  F A  B             A  B G  E             G  E

Initial State   Goal State code The shortest path would be: code format="escaped" code Four moves code format="escaped" code code format="escaped" code I believe that the problem with this heuristic is that it's difficult to move a block to its final position because of possible deadlocks (discussed below). I believe that this heuristic may produce an optimal solution when blocks don't get in each others' way, but only suboptimal when you have to resolve two conflicting block placements. code format="escaped" code Because in the worst case scenario, all blocks are moved to the table and then to their final position. That would be moving no block more than twice. So, the optimal solution should be better than the worse case scenario. code format="escaped" code Well, if the worst case scenario is move no block more than twice, than the best case scenario would be move blocks only once if possible. code format="escaped" code From Wikipedia: "A deadlock is a situation wherein two or more competing actions are waiting for the other to finish, and thus neither ever does." code format="escaped" code In the paper by Nau and Gupta, deadlock is related to the EBW problem in this simple manner, "a is in d's way and d is in a's way, so {a,d} is deadlocked." code format="escaped" code In other words if two blocks are in the way of one anothers' final positions, they are deadlocked. This made implementing H2 quite difficult. code format="escaped" code I was able to implement the code for H1 and it runs quite nicely. However, I did not allow for user input, the data is hardcoded and the total number of blocks is limited to 7. code format="escaped" code I put a lot of time into trying to implement H2 and I've included the code I wrote, but I was not able to complete the heuristic. It's entirely possible that I spent too much time overthinking the problem and I'm curious to see other people's implementations. I was trying to use many arrays to store stack info, information about final position, information about which blocks were on top (clear), etc. But everytime I started to make some progress, I ran into another problem with the overall logic. I've tried to comment my code thoroughly and named variables and methods carefully so that my logic can be followed easily. code format="escaped"
 * 1) Move D to table
 * 2) Move F to B
 * 3) Move C to F
 * 4) Move D to A
 * 2. Show that the following heuristic will create valid, but not necessarily optimal, solutions: Use these rules: If any available block (any block at the top of a stack in the initial state) can me moved directly to its final position in the goal state then do it. If not, then take the block on the top of the largest stack and move it to the table (if there are two largest stacks then break ties by using the first stack as counted left to right). Iterate in this fashion until the goal state is achieved. Show that this method can produce an optimal solution in some cases, but suboptimal in others (give an example of each).**
 * 3. Explain why, in an optimal solution, no block more should be moved more than twice.**
 * 4. Explain why finding the shortest path reduces to deciding which blocks will be moved twice and which ones will be moved only once.**
 * 5. Explain how the concept of "deadlock" applies to this problem. Give examples.**
 * 6. Create a computer simulation of this blocks world problem (graphical display is extra credit). Implement the heuristics described above: H1: all blocks go to the table, then moved into final positions; H2: described in 2 above. Invent a better heuristic if you can.**

code [|Java Code for H1:] code format="escaped" code [|Java Code for H2:] code format="escaped" code code format="escaped" code I have to admit that I am really not sure of what is expected here. Are we supposed to create a directed graph for just one scenario (the one given in the assignment)? If so, do we draw the graph for one possible solution for the one scenario? And even then, I'm not really sure what the directed graph is supposed to look like. I've included a sketch of one idea I had, but I have questions about my own sketch. I've tried to represent my 4 move solution listed above. In my sketch, I have tried to represent D -> T -> F -> B -> C -> F -> D -> A. But it just doesn't feel right. It seems like D -> T should be a node and then F -> B should be a node, etc. And then edges should connect those nodes. However in 7a) it says that nodes correspond to blocks. [|GIF of directed graph] code format="escaped" code [|GIF of second idea for directed graph] code format="escaped" code Also, I wasn't entirely sure what was meant by "arcs that indicate a constraint created by...". In my mind the edges represent moving a block from one place to another. But moves aren't really constraints. It seems like the STRIPS heuristic is more constraint oriented, but I'm not sure if that is what we were supposed to graph.\\ code format="escaped" code I've spent a fair amount of time reading Nau and Gupta, looking at Wikipedia (for Feedback Arc Set) and trying to find other sites to learn more about the Feedback Arc Set, but I'm still not sure how to produce one for this problem. I understand that a Feedback Arc Set (also known as Feedback Edge Set) is a directed graph with cycles removed to create a directed acyclic graph. But since I wasn't perfectly sure about the directed graph, it's difficult for me to do much with the Feedback Arc Set. code format="escaped" code With respect to complexity and NP completeness, I did run across a site that really does illustrate the complexity of graphing (and solving) the blocks world problem: code format="escaped" code [] code format="escaped" code At this site, they show the complexity of merely graphing a two block undirected graph. Of course, not all the nodes are required to solve such a simple blocks world problem, but all of them are possible. Also, the two block problem does have added complexity like table placement being a factor and all possible scenarios are contained in the graph. code format="escaped" code References code format="escaped" code
 * 7. a) Create a graphical representation of this problem using a directed graph with nodes that correspond to the blocks and arcs that indicate a constraint created by the initial or goal state configurations.**
 * b) Research the Feedback Arch Set problem. This problem boils down to finding the fewest arcs to remove from a directed graph that will make it acyclic (no more cycles), and it is known to be NP complete. Explain how EBW reduces to the Feedback Arc Set problem, and thereby show that EBW is NP complete. (This is actually difficult, so I am not expecting perfect answers, just some rationale -- make an attempt to understand the Feedback Arc Set problem, and then try to tie it together with the graphical representation you came up with in part a). All of this shows that this simple blocks world planning problem is of equal complexity to the better-known traveling salesman problem.**
 * []

= Assignment 4 =

Date: 10/13/2007

Student: Michael Weingarden

Course ID: COMP569 Artificial Intelligence

Semester/Year: Fall 2007

Title/Topic: Bayesian Networks Part I

Part A


 * For each of these examples, explain them as if you were the "teacher" and the other students in the class were the "students". In particular, after explaining the given example, add a variation of your own that shows the same principle but with different assumptions/numbers/givens.**


 * A.1. Wikipedia search: Bayesian Network**


 * Explain Sprinkler Example:**

I'm going to try to focus my explanation on this question in the sprinkler example, "What is the likelihood that it is raining, given the grass is wet?" Hopefully, this will give me the ability to try to answer the other questions as well.

The formula used to compute the answer is given as: This ratio represents the sum of probabilities where the grass is wet and it is raining and the sum of all probabilities where the grass is wet (raining, sprinkling, both or neither). Apparently, by dividing one by the other, we get the likelihood that it is raining, given that the grass is wet.


 * A.2. Wikipedia search: Bayes' Theorem**


 * Explain Examples:**

We will use the following formula to answer the question, "what’s the probability that Fred picked bowl #1, given that he has a plain cookie?": So, here's the simple explanation: if you multiply the probability of getting a plain cookie- given that you pick from bowl #1- times the probability of picking bowl #1 and then divide the result by the probability of getting a plain cookie regardless of any bowl information, you get the answer to the proposed question above.
 * **Cookies**
 * **Drug Testing**
 * **Monty Hall**

Part B


 * Get the "Traveler's Restaurant Selection Problem" notes that are on reserve at the CSUCI library (call the library first to make sure they are there before you make the trip). Notes: This problem comes from the text: The Elements of Artificial Intelligence, by Steve Tanimoto. Lisp code for this problem is called INFNET and there is a link on the Links page that will take your there. You might be able to figure out the whole network by simply reading the Lisp code, but that might be painful.**


 * B.1: Explain the purpose of each node in the network.**

There are several different kind of nodes including input, lumped, predictor and output nodes. All the nodes represent attributes of restaurants. The attributes are intended to allow a traveller to make an informed choice about where to eat while on the road. The referenced LISP code also refers to the nodes as evidential, intermediate, and summary variables.


 * B.2: Explain the numerical value associated with each node.**

At first glance, you would think that the values would be ratings for a particular attribute. But upon closer examination of the class notes, I found that the numerical values represent a probabilistic measure of the relationship between nodes. For example, you can use the numbers to determine what relationship that sounds and clientele might have on popularity.
 * B.3: Explain how each node computes.**

According to the class notes, there are quite a few things that are done to compute each node including establishing hypothesis and evidence, computing the likelihood ratio (λ), sufficiency (λ), necessity (λ') and finally the odds ( O(H)=P(H)/1-P(H) ) for each node.


 * B.4: Come up with a similar problem that could use a similar computational structure and explain how it would be applied (define the input and output nodes first, and then the lumped and evidentiary nodes next). For example, instead of selecting a restaurant, you could be selecting a team player, a new hire, a vacation spot, etc.**

It seems like a very practical problem using a Bayesian Network would be SPAM filtering. Bayesian filters are already thought to be one of the best ways to filter SPAM. There are certain probabilities associated with particular words contained in an email and if the posterior probability is high enough, the message must be SPAM. The drawback is that spammers know about Bayesian filtering and are now putting more legitimate text into their emails to alter the posterior probability.

There are a number of attributes that could be associated with SPAM including words, phrases, and domain of origin. Phrases themselves could be broken down into porn, get rich quick schemes, phishing, other junk and legitimate solicitations that you might actually be interested in (but are still SPAM). So, I guess the input nodes could be words or phrases that have a certain probability of being spam, that would lead to intermediate nodes that might help categorize what type of spam and then predictor nodes might be used to determine the overall odds that the email is desirable or not.

Fun Facts from yudkowsky.net:

I was curious where all the probabilities came from (obviously they are just made up for examples like the sprinkler problem, but here are some insights on where some of the medical probabilities come from:

Q. How can I find the priors for a problem?

A. Many commonly used priors are listed in the Handbook of Chemistry and Physics. Q. Where do priors originally come from?

A. Never ask that question. Q. Uh huh. Then where do scientists get their priors?

A. Priors for scientific problems are established by annual vote of the AAAS. In recent years the vote has become fractious and controversial, with widespread acrimony, factional polarization, and several outright assassinations. This may be a front for infighting within the Bayes Council, or it may be that the disputants have too much spare time. No one is really sure. Q. I see. And where does everyone else get their priors?

A. They download their priors from Kazaa. Q. What if the priors I want aren't available on Kazaa?

A. There's a small, cluttered antique shop in a back alley of San Francisco's Chinatown. Don't ask about the bronze rat.

References:


 * [|An Intuitive Explanation of Bayesian Reasoning?]
 * []
 * []
 * []
 * []
 * []
 * [|Tools, Genie]
 * [|Tools, SamIam]
 * [|Probability for Dummies]

= Assignment 5 =

Date: 10/29/2007

Student: Michael Weingarden

Course ID: COMP569 Artificial Intelligence

Semester/Year: Fall 2007

Title/Topic: Bayesian Networks Part II

I have to admit that I didn't have any great ideas on simulating the Bayesian Networks. I did look more thoroughly through the Fernandez code and Robert Wood's code and my code below is merely a scaled down version of Robert Wood's code. I went through and cleaned up the code a bit so that I could at least understand what was going on. Also, I reduced the number of variables so that the results of the simulations were more obvious to me. Another small thing I did was to reformat the output so that there weren't so many decimal places. It seemed to me that 3 decimals places were sufficient. So, here is the code:

[|Java Simulation Code]

and here are some of the sample outputs from running the code:

code format="escaped" Popularity Odds = 3.875 Popularity Probability = .795 Cleanliness Odds = 1.234 Cleanliness Probability = .552 Taste P(H|E) = .964 Taste P(H|~E) = .730 Taste P(E|E') = .795 Taste P(H|E') = .920 Taste O(H|E') = 11.552 Taste Leff = 1.284 Quantity P(H|E) = .964 Quantity P(H|~E) = .730 Quantity P(E|E') = .795 Quantity P(H|E') = .920 Quantity O(H|E') = 11.552 Quantity Leff = 1.284 Nutrition P(H|E) = .909 Nutrition P(H|~E) = .545 Nutrition P(E|E') = .552 Nutrition P(H|E') = .746 Nutrition O(H|E') = 2.942 Nutrition Leff = .735 Overall_Quality P(H|E) = .964 Overall_Quality P(H|~E) = .730 Overall_Quality P(E|E') = .632 Overall_Quality P(H|E') = .840 Overall_Quality O(H|E') = 5.239 Overall_Quality Leff = .582 code The way I configured things, service had an impact on cleanliness and cleanliness had an impact on nutrition and nutrition was essential to overall food quality (not very realistic, but one possible scenario). So, I thought I would change the service odds to see how it effected the overall quality. In the run above, the service odds were 1.234. Here's a run with the service odds going down to 0.6: code format="escaped" code code format="escaped" Popularity Odds = 1.884 Popularity Probability = .653 Cleanliness Odds = .600 Cleanliness Probability = .375 Taste P(H|E) = .964 Taste P(H|~E) = .730 Taste P(E|E') = .653 Taste P(H|E') = .890 Taste O(H|E') = 8.090 Taste Leff = .899 Quantity P(H|E) = .964 Quantity P(H|~E) = .730 Quantity P(E|E') = .653 Quantity P(H|E') = .890 Quantity O(H|E') = 8.090 Quantity Leff = .899 Nutrition P(H|E) = .909 Nutrition P(H|~E) = .545 Nutrition P(E|E') = .375 Nutrition P(H|E') = .682 Nutrition O(H|E') = 2.143 Nutrition Leff = .536 Overall_Quality P(H|E) = .964 Overall_Quality P(H|~E) = .730 Overall_Quality P(E|E') = .427 Overall_Quality P(H|E') = .833 Overall_Quality O(H|E') = 4.977 Overall_Quality Leff = .553 code So, even though the service odds when down by 50%, the effective likelihood of overall quality only went from .582 to .553. So, I tried increasing the service odds to 2.4: code format="escaped" code code format="escaped" Popularity Odds = 7.537 Popularity Probability = .883 Cleanliness Odds = 2.400 Cleanliness Probability = .706 Taste P(H|E) = .964 Taste P(H|~E) = .730 Taste P(E|E') = .883 Taste P(H|E') = .939 Taste O(H|E') = 15.443 Taste Leff = 1.716 Quantity P(H|E) = .964 Quantity P(H|~E) = .730 Quantity P(E|E') = .883 Quantity P(H|E') = .939 Quantity O(H|E') = 15.443 Quantity Leff = 1.716 Nutrition P(H|E) = .909 Nutrition P(H|~E) = .545 Nutrition P(E|E') = .706 Nutrition P(H|E') = .802 Nutrition O(H|E') = 4.054 Nutrition Leff = 1.014 Overall_Quality P(H|E) = .964 Overall_Quality P(H|~E) = .730 Overall_Quality P(E|E') = .779 Overall_Quality P(H|E') = .914 Overall_Quality O(H|E') = 10.581 Overall_Quality Leff = 1.176 code That seemed to have a huge impact on the effective likelihood of overall food quality. Since it is possible to make changes in service and have them trickle through the Bayesian Network with large impact, the question left begging is this: what dictates the original number for service and is it reliable? One last note: while researching the Bayesian Networks, I ran across an interesting article on using Bayesian Networks to analyze multiple choice test questions. This is of particular interest to me because I am using multiple choice questions a lot in my math classes with my clicker (classroom response) system. I couldn't find the link to add it to my references, but if I find it again, I will add it below.

= Assignment 6 =

Date: 11/15/2007

Student: Michael Weingarden

Course ID: COMP569 Artificial Intelligence

Semester/Year: Fall 2007

Title/Topic: Fuzzy Systems

For this assignment you are asked to create a fuzzy reasoning simulation.

I think I have a good idea for a fuzzy system, but I was not able to come up with a stellar simulation for this idea. As a high school math teacher, I started using a clicker system in my classroom in order to assess my students progress and also to provide a method for review that provides me and the students with instant feedback on their strengths and weaknesses.

In using such a system, it quickly becomes obvious that the Achilles' heel of the system lies in the multiple choice questions. You could create a set of questions for a topic that are completely useless or a set that are magical or the more common result is a set of reasonable questions with a few that are magical. By magical, I mean that you get a diverse response from the students. When such a phenomenon occurs, you can be fairly sure that you have successfully predicted a problem area in the given topic and you have successfully anticipated some of the common mistakes that students would make.

Over time, the database of available questions will grow and it will become useful to have a system to analyze the "goodness" of each question. With so many factors influencing the evaluation of a question, some sort of system for analysis would be useful. One of the influencing factors in analysing the goodness of a question is the human response. Since this could vary over time and the track record of most students is already known, this type of situation seems well suited for a fuzzy analysis. It would be difficult to claim that most questions are either good or bad, but they must lie somewhere in between.

When I presented this idea to Dr. Wolfe, he made a good suggestion. He said that the system could allow a sort of banking game where students receive credit based on previous responses (did they do well in the past or did they do poorly). The clicker system also allows for students to express the level of confidence that they have in their own answers. So, another factor is, "how confident was the student in the answer they gave?" For instance, if a successful student in the past, answers a current question correctly and expresses that they were very confident, some level of goodness must be assigned to the question. Dr. Wolfe suggested that the system could be a betting game where a student with a lot in the bank who is very confident in the correctness of his answer, should be a lot on the next question. If for some reason, they get the question wrong, this would have an impact on the student and the goodness level of the question.

With these thoughts in mind, I whipped together a little Java code to generate some random data and then a very basic algorithm for measuring fuzzy goodness of a question. The source code can be viewed by clicking on the link below:

[] References:
 * []
 * []
 * []
 * []

= Assignment 7 =

Date: 12/2/2007

Student: Michael Weingarden

Course ID: COMP569 Artificial Intelligence

Semester/Year: Fall 2007

Title/Topic: Neural Networks


 * [|Java Code for Simulation]
 * [|Sample Outputs from Simulations]
 * [|1. Explain the application domain]
 * [|2. Explain the training set]
 * [|3. Explain the training process]
 * [|4. Explain the effectiveness (%correct on the training set) of the system after it has been trained]
 * [|5. Explain how well the system generalized to data that was not in the training set]
 * [|6. References]
 * Java Code for Simulation** [|Back to Top]

[]

[]

This is a class that performs various neural network activities like allowing for the addition of training data for input and output neurons, weight and output error calculation and even allows for saving and retrieving neural networks that have already been trained. It is based on a neural network with one hidden layer.

[]

[]

This is a class that allows you to test the neural network class from Neural_1H.java. This class provides input and output neuron data for training purposes and also several test cases. The output is sent to the console for simplicity. The output shows the resulting error for each cycle of data processing and finally the results.

[]

[]

This is a class like Neural_1H except it supports the use of two hidden layers.

[]

[]

This is a class like Test_1H except it tests Neural_2H.java.


 * Sample Outputs from Simulation** [|Back to Top]


 * For Test_1H:**

code format="escaped" cycle 100 error is 1.2405982637405395 cycle 200 error is 1.2371082770824433 cycle 300 error is 1.25460902094841 cycle 400 error is 1.2587165319919587 cycle 500 error is 1.2524464589357376 cycle 600 error is 1.251769893169403 cycle 700 error is 1.2180115187168121 cycle 800 error is 1.1794416189193726 cycle 900 error is 1.1574158143997193 cycle 1000 error is 1.1281897848844529 cycle 1100 error is 1.1101750040054321 cycle 1200 error is 1.119013681411743 cycle 1300 error is 1.1275454068183899 cycle 1400 error is 1.149181392788887 cycle 1500 error is 1.1736127460002899 cycle 1600 error is 1.183838232755661 cycle 1700 error is 1.1963338875770568 cycle 1800 error is 1.2095998120307923 cycle 1900 error is 1.2135937058925628 cycle 2000 error is 1.2195653104782105 cycle 2100 error is 1.2268711531162262 cycle 2200 error is 1.2280428206920624 cycle 2300 error is 1.2309472346305848 cycle 2400 error is 1.2352565181255342 cycle 2500 error is 1.2352071046829223 cycle 2600 error is 1.2366057300567628 cycle 2700 error is 1.2392954981327058 cycle 2800 error is 1.2387159609794616 cycle 2900 error is 1.239330507516861 cycle 3000 error is 1.2410799312591552 cycle 3100 error is 1.240280431509018 cycle 3200 error is 1.2404668831825256 cycle 3300 error is 1.2416367614269257 cycle 3400 error is 1.2407623851299285 cycle 3500 error is 1.240707573890686 cycle 3600 error is 1.2415026462078094 cycle 3700 error is 1.240622903108597 cycle 3800 error is 1.240430076122284 cycle 3900 error is 1.2409731590747832 cycle 4000 error is 1.2401213431358338 cycle 4100 error is 1.2398500525951386 cycle 4200 error is 1.2402183270454408 cycle 4300 error is 1.2394094288349151 cycle 4400 error is 1.2390950751304626 cycle 4500 error is 1.2393391382694245 cycle 4600 error is 1.2385790264606475 cycle 4700 error is 1.2382431900501252 cycle 4800 error is 1.238397114276886 cycle 4900 error is 1.2376868498325349 cycle 5000 error is 1.2373429226875305 cycle 5100 error is 1.2374303889274598 cycle 5200 error is 1.2367685151100158 cycle 5300 error is 1.2364248073101043 cycle 5400 error is 1.2364626824855804 cycle 5500 error is 1.2358466947078706 cycle 5600 error is 1.2355083787441254 cycle 5700 error is 1.235508850812912 cycle 5800 error is 1.2349356520175934 cycle 5900 error is 1.2346059036254884 cycle 6000 error is 1.2345779621601105 cycle 6100 error is 1.234044350385666 cycle 6200 error is 1.233725004196167 cycle 6300 error is 1.2336753761768342 cycle 6400 error is 1.2331782150268555 cycle 6500 error is 1.2328702878952027 cycle 6600 error is 1.2328040659427644 cycle 6700 error is 1.2323403656482697 cycle 6800 error is 1.2320443189144135 cycle 6900 error is 1.2319654405117035 cycle 7000 error is 1.231532448530197 cycle 7100 error is 1.2312483608722686 cycle 7200 error is 1.2311598289012908 cycle 7300 error is 1.230754953622818 cycle 7400 error is 1.2304826772212982 cycle 7500 error is 1.2303868651390075 cycle 7600 error is 1.2300078177452087 cycle 7700 error is 1.2297470319271087 cycle 7800 error is 1.229645791053772 cycle 7900 error is 1.2292904841899872 cycle 8000 error is 1.2290408086776734 cycle 8100 error is 1.2289355611801147 cycle 8200 error is 1.2286020457744598 cycle 8300 error is 1.228363047838211 cycle 8400 error is 1.2282549667358398 cycle 8500 error is 1.2279415154457092 cycle 8600 error is 1.2277126932144165 cycle 8700 error is 1.2276027023792266 cycle 8800 error is 1.2273077380657196 cycle 8900 error is 1.2270886313915252 cycle 9000 error is 1.226977458000183 cycle 9100 error is 1.2266995525360107 cycle 9200 error is 1.2264896535873413 cycle 9300 error is 1.2263779032230377 cycle 9400 error is 1.2261157643795013 cycle 9500 error is 1.2259145855903626 cycle 9600 error is 1.225802743434906 cycle 9700 error is 1.2255552124977112 cycle 9800 error is 1.2253623175621033 cycle 9900 error is 1.2252507436275482 Test case: 0.10  0.10  0.90  results:  0.50  0.51  0.51 Test case: 0.10  0.90  0.10  results:  0.52  0.54  0.54 Test case: 0.90  0.10  0.10  results:  0.52  0.54  0.54 Reload a previously trained NN from disk and re-test: CachedExamples: failed to open 'cache.dat' in JAR file code code format="escaped" code code format="escaped" cycle 100 error is 1.2104608416557312 cycle 200 error is 1.1989377653598785 cycle 300 error is 1.1991546332836152 cycle 400 error is 1.1998853814601897 cycle 500 error is 1.2003008127212524 cycle 600 error is 1.2004814112186433 cycle 700 error is 1.201362099647522 cycle 800 error is 1.2019440412521363 cycle 900 error is 1.2021414136886597 cycle 1000 error is 1.20330904006958 cycle 1100 error is 1.2042146813869476 cycle 1200 error is 1.2045543706417083 cycle 1300 error is 1.2063373863697051 cycle 1400 error is 1.208011201620102 cycle 1500 error is 1.2089345896244048 cycle 1600 error is 1.2125002360343933 cycle 1700 error is 1.2169910633563996 cycle 1800 error is 1.2212953960895538 cycle 1900 error is 1.2317328095436095 cycle 2000 error is 1.0408174008131028 cycle 2100 error is 0.8564836922287941 cycle 2200 error is 0.8501860311627388 cycle 2300 error is 0.8318332275748253 cycle 2400 error is 0.8322290790081024 cycle 2500 error is 0.8317758044600487 cycle 2600 error is 0.8194497507810593 cycle 2700 error is 0.8219471222162247 cycle 2800 error is 0.8212101632356643 cycle 2900 error is 0.8069465574622154 cycle 3000 error is 0.8047270104289055 cycle 3100 error is 0.7971457317471504 cycle 3200 error is 0.7692749190330506 cycle 3300 error is 0.7340040165185928 cycle 3400 error is 0.6764363253116608 cycle 3500 error is 0.8076555904746056 cycle 3600 error is 1.028657106757164 cycle 3700 error is 1.1053302311897277 cycle 3800 error is 1.129347419142723 cycle 3900 error is 1.1383358311653138 cycle 4000 error is 1.1524378216266633 cycle 4100 error is 1.1700886577367782 cycle 4200 error is 1.165547902584076 cycle 4300 error is 1.159196569919586 cycle 4400 error is 1.1578674459457396 cycle 4500 error is 1.1576119363307953 cycle 4600 error is 1.158631715774536 cycle 4700 error is 1.1589783644676208 cycle 4800 error is 1.1577294552326203 cycle 4900 error is 1.1590128302574159 cycle 5000 error is 1.1591267693042755 cycle 5100 error is 1.1593438732624053 cycle 5200 error is 1.1615733075141907 cycle 5300 error is 1.174698269367218 cycle 5400 error is 1.1998552250862122 cycle 5500 error is 1.1999999690055847 cycle 5600 error is 1.1999999678134918 cycle 5700 error is 1.1999999678134918 cycle 5800 error is 1.1999999690055847 cycle 5900 error is 1.1999999678134918 cycle 6000 error is 1.1999999678134918 cycle 6100 error is 1.1999999690055847 cycle 6200 error is 1.1999999678134918 cycle 6300 error is 1.1999999678134918 cycle 6400 error is 1.1999999690055847 cycle 6500 error is 1.1999999678134918 cycle 6600 error is 1.1999999678134918 cycle 6700 error is 1.1999999690055847 cycle 6800 error is 1.1999999678134918 cycle 6900 error is 1.1999999678134918 cycle 7000 error is 1.1999999690055847 cycle 7100 error is 1.1999999678134918 cycle 7200 error is 1.1999999678134918 cycle 7300 error is 1.1999999690055847 cycle 7400 error is 1.1999999678134918 cycle 7500 error is 1.1999999678134918 cycle 7600 error is 1.1999999690055847 cycle 7700 error is 1.1999999678134918 cycle 7800 error is 1.1999999678134918 cycle 7900 error is 1.1999999690055847 cycle 8000 error is 1.1999999678134918 cycle 8100 error is 1.1999999678134918 cycle 8200 error is 1.1999999690055847 cycle 8300 error is 1.1999999678134918 cycle 8400 error is 1.1999999678134918 cycle 8500 error is 1.1999999690055847 cycle 8600 error is 1.1999999678134918 cycle 8700 error is 1.1999999678134918 cycle 8800 error is 1.1999999690055847 cycle 8900 error is 1.1999999678134918 cycle 9000 error is 1.1999999678134918 cycle 9100 error is 1.1999999690055847 cycle 9200 error is 1.1999999678134918 cycle 9300 error is 1.1999999678134918 cycle 9400 error is 1.1999999690055847 cycle 9500 error is 1.1999999678134918 cycle 9600 error is 1.1999999678134918 cycle 9700 error is 1.1999999690055847 cycle 9800 error is 1.1999999678134918 cycle 9900 error is 1.1999999678134918 Test case: 0.10  0.10  0.90  results:  0.50  0.50  0.50 Test case: 0.10  0.90  0.10  results:  0.50  0.50  0.50 Test case: 0.90  0.10  0.10  results:  0.50  0.50  0.50 Reload a previously trained NN from disk and re-test: CachedExamples: failed to open 'cache.dat' in JAR file code code format="escaped" code Before this assignment, I had a view of neural networks that was mistaken. I primarily thought of them as systems that facilitated parallel processing for problem solving. However, now that I know more about them, it seems that although that view is partially correct, they seem to serve a more specific purpose in the area of recognition. It seems that neural networks are great for any application that requires pattern recognition. One example that seems to be common is optical character recognition, however, it is obvious that neural networks could be used for many other types of pattern recognition including voice recognition, fingerprint matching, face identification and, heck, you could probably even use neural networks for scent identification!. code format="escaped" code I should have known that neural networks targeted the specific application of pattern recognition because a few years ago, a friend of mine was telling me that they were researching the use of neural networks for monitoring petri dish experiments at Amgen. He worked in the research automation group where they used robots to automate experiments. Apparently certain experiments provide visual clues as to the level of their success and the biologists were spending a great deal of their time monitoring the numerous experiments that the robots were performing. The use of neural networks and cameras to visually identify the most interesting experiments turns out to be a great use of the technology. code format="escaped" code I can also see an application for neural networks with more entertaining and accessible robot experimentation. Dr. AJ has written grants to acquire humanoid robots that would play soccer. Possibly neural networks could be used to identify fellow teammates, opposing teammates, the opposing goal and, of course, the ball. code format="escaped" code When I first looked at this assignment, I thought that neural networks worked a bit differently, so I came up with the idea that it would be great to use such a system for classroom seating arrangements. This is a problem that many teachers struggle with, especially with at-risk populations. The students in the classroom definitely could be represented as a set of nodes where each node is interrelated (or connected) to adjoining nodes. Each student or node could be represented by several numbers (weights?), each one representing a specific characteristic like talkative, fidgety, relatively high performing, or easily distracted. The most important consideration, though, is how do students relate to those they sit next to. Are they friends? Are they enemies? Are they compatible? Could there be a mutually beneficial relationship (one student needs help, the other is doing well in the class)? And, of course, each of those characteristics would be represented by a range of numbers (not binary). It's entirely possible that this application would be better suited for fuzzy analysis, however, I did find it interesting that the ultimate goal of a neural network is to find the lowest energy state of the system and that's the same goal that I have for my classroom! code format="escaped" code Of course, the classroom application has several problems that would keep it from being a great neural network application. One, I don't know how to provide it training data. I have no idea which classroom arrangement would be optimal- that's why I need some help in automating the task. Two, I don't see how the relationship between the nodes is taken into account in the neural network. If I could understand that, the classroom application might seem more feasible because a well behaved classroom has a lot to do with the optimal arrangement of students who sit next to one another. If a neural network could work for this application, it would have at least two different uses: 1) optimal behavior arrangement 2) optimal learning arrangement. code format="escaped" code The goal in a good classroom arrangement system would be to re-arrange all students until you have achieved a preset threshold for expected behavior or learning. So, the ultimate question is: in a neural network, aside from evaluating the data, can you re-arrange the neurons until you reach the desired threshold? Based on the write up at Wikipedia for Artificial Neural Networks, it seems like you could use a neural network for a system like this if you used reinforcement learning as opposed to supervised learning that require up front training. With reinforcement learning, there is no up front training and you can provide feedback to the system over time. This is how the classroom arrangement system should work. You first seat students in an arbitrary fashion. You then assign students weights for various factors like those mentioned above. You then use a neural network to re-arrange students until an optimal (lowest energy) point is reached. You then re-assign the seats and observe. Student weights and relationships are then adjusted manually (by the teacher) and re-entered into the system for a new optimization analysis. It seems to me that this problem is similar to the Traveling Salesman problem in it's complexity. I also ask myself the question, "is it possible that a Kohonen network could be used for its self-organizing map capability?" code format="escaped" code Another interesting application might be document tagging. We had an interesting presentation the other day about the semantic web and about how the whole WWW would be so much more useful if all the content of all documents were tagged and the tags were all related in some kind of object oriented structure. When I first started working for the Navy, I worked on a project where we were trying to convert all of our electronic documents (MS Word, WordPerfect, Interleaf and a variety of others) into SGML. There was a company that had some software that touted the ability to look at a chunk of text and based on the shape of the text, a tag would be proposed and if this shape recurred, the tag would be re-used for all similar looking chunks of text. It's possible that neural networks could be used for examining the shape of paragraphs, tables and other objects in a group of documents and then help the system to determine which type of tag or object identifier should be assigned to the chunk of text based on prior training. Also, the system could learn based on the repeated occurrences of similar looking chunks of information in the document. code format="escaped" code Ok, enough pondering. It seems like the most common, straightforward application for neural networks is optical character recognition. I found two nice, comprehensive tutorials on neural networks. One was a book by Jeff Heaton and, fortunately, most of the book is online (see references). Also, the sample Java code is online. Unfortunately, the book and the code rely on JOONE and I had a problem getting the JOONE editor to work with my new Microsoft Vista-based laptop. As I understand it, I could have created some apps without the JOONE editor, but I decided to keep looking for a simpler, library independent example of neural networks. I found it in spades in the form of an open source book by Mark Watson. His online book (see references) actually provides quite a nice supplement to all of the topics we've covered in this course and he offers sample Java code as a download along with the PDF version of the book. So, my simulations are based on Mark Watson's code. His examples are pretty generic, but easily could be applied to optical character recognition. code format="escaped" code code format="escaped" code According to Mark Watson, "For back propagation networks, training data consists of matched sets of input and output values. We want to train a network to not only produce the same outputs for training data inputs as appear in the training data, but also to generalize its pattern matching ability based on the training data." code format="escaped" code So, in the case of optical character recognition, you might have several examples of a particular letter like, "a", which come from a variety of print sources and then an example of what you expect an "a" to look like (in digital terms). So, ultimately, you'll end up with multiple inputs with the same output. code format="escaped" code Mark Watson goes on to provide some tips and tricks regarding the training data, "A key here is to balance the size of the network against how much information it must hold. A common mistake when using back propagation networks is to use too large of a network (more on what this means later); a network that contains too many neurons will simply memorize the training examples, including any noise in the training data. However, if we use a smaller number of neurons, with a very large number of training data examples, then we force the network to generalize, ignoring noise in the training data." code format="escaped" code The training process takes the sample inputs and outputs and establishes the appropriate weights for all the connections between neurons in the network. Once these weights have been established, then new data can be added and processed appropriately. code format="escaped" code Mark Watson explains the process in a more detailed manner: In order to train the network, we repeat the following learning cycle several times: Zero out temporary arrays for holding the error at each neuron. The error, starting at the output layer, is the difference between the output value for a specific output layer neuron and the calculated value from setting the input layer neuron’s activation values to the input values in the current training example, and letting activation spread through the network. Update the weight W[i,j] (where i is the index of an input neuron, and j is the index of an output neuron) using the formula W[i,j] += learning_rate * output_error[j]*I[i], where the learning_rate is a tunable parameter, output_error[j] was calculated in step 1, and I[i] is the activation of input neuron at index i. **4. Explain the effectiveness (%correct on the training set) of the system after it has been trained.** [|Back to Top] It seems like the best answer here is to say that results may vary. When running the training data, you have two possible end conditions: 1) you've met the threshold you've designated for output error (hopefully low) or 2) you've run through a specified number of cycles and never got the output error low enough. Obviously, if this is the case, it's entirely possible to have a neural network be completely ineffective with a particular set of data. Mark Watson also provides several clues about how to improve the results for a neural network: Effectively using back propagation neural networks in applications is somewhat of an acquired art. The following ad hoc notes are derived from my experience of using neural networks over the last 18 years: Get as much training data as possible: an effective neural network has the ability to generalize from training data, usually in the sense of ignoring noise and spurious training examples. For this to occur, you might need thousands of training examples, depending on the complexity of the problem. Also, some training data can be set aside (i.e., not used for training) and saved for testing the network. For very large training data sets, try using only 10% of the data initially for training. After the output errors are small, add in more training data sets; repeat this process until training on the entire training data set produces small errors. Do not use too many hidden layer neurons: if there are too many hidden layer neurons, then the network will not learn to generalize, rather, it will just remember all of the training data, including noise and bad training examples. Start with a very small number of hidden neurons, and see if the training data set can be learned with very small output errors; if necessary, slowly increase the number of hidden neurons, and repeat the training process. **5. Explain how well the system generalized to data that was not in the training set.** [|Back to Top] code format="escaped" code
 * For Test_2H:**
 * 1. Explain the application domain.** [|Back to Top]
 * 2. Explain the training set.** [|Back to Top]
 * 3. Explain the training process.** [|Back to Top]
 * 6. References** [|Back to Top]
 * [|An Introduction to Computing with Neural Networks (Lippman)]
 * [|Practical Artificial Intelligence Programming in Java by Mark Watson (open source book)]
 * [|Mark Watson's sample Java code]
 * [|Introduction to Artificial Neural Networks]
 * [|Introduction to Neural Networks with Java (book online)]
 * [|Jeff Heaton's sample Java Code]
 * [|Joone (Java Object Oriented Neural Engine)]
 * [|Of course, Wikipedia entry on Artifical Neural Networks]