Apply all possible combinations of rules until desired result is found

Thoughts regarding implementation of an algorithm to create this system:

Since I'm already having problems with implementing a simulation, I'm going to focus my thoughts here on forward chaining only.

I say that all possible combinations of rules must be tried, but based on the course material (and some common sense) not all possible combinations must be tried (or can be tried).

My first thought was to create a tree with one of the fours rules performed at each node in the tree and then search the tree for the desired output string (for example: MU). The tree could represent all possible combinations of rules (like 1, 1, 1, 1,... or 1, 2, 3, 4 or 4, 3, 2, 1), but after reviewing the course materials, it's obvious that we have to eliminate some combinations for the following reason:

The string provided from the parent node must be analyzed to determine which rules may be applied. For example, if you start with MI apply rule 1 -> MIU apply rule 2 -> MIUIU at this point, applying either rule 3 or 4 is not an option.

Also, even though there are only 4 rules, the tree could have unlimited levels (at least on some paths like 1, 1, 1, 1, 1, etc.) because we don't know how many permutations of the rules might be required to get the desired result. So, creating a tree with all possible combinations and then searching is impractical.

Instead, a rule must be applied to the given input string at a particular node and then the comparison to the desired result must be made at that node.

If the comparison fails, you then go to the next node which is created on the fly.

The next question is whether the tree traversal should proceed in a breadth first manner or a depth first manner or some hybrid.

However, the question is, "how do you create an algorithm that knows when to stop trying combinations?"

Observation: the computer simulation seems to be useful in producing the various resulting strings from applying the rules, but it's not clear to me about how to get the computer to realize whether it is heading down a path of no return. I can see that the strings continue to get longer when you go down a certain path in the tree and for that reason, I can tell that I'm going down a futile path. But I'm not sure how to create an algorithm that says "this path in my depth first search is futile." For example, do I as the programmer eliminate the path 1, 1, 1, 1, 1, etc. because I know it's futile or do I create some criteria that the computer can use to abandon that search? To me this is a dilemma because I'm not sure that you can merely say, "if you traverse four nodes deep and at each node, the resulting string is longer than the previous, abandon this path."

Thought: The initial concept presented in GEB is that the MIU system is a typographical way of representing a number system. So, if values were assigned to M, I, and U- with their position representing place values- it's possible that you could use the value of the resulting string to determine whether you were converging on the desired result or diverging.

String manipulation related to rule processing

I thought that at least this part would be straight forward, but there are some subtle nuances.

First you have to parse the string at a particular node to see whether part of it matches the constant part of one of the rules. Then inserting, adding or removing characters can be done.

But the real challenge comes from the fact that you may have to look for occurrences of more than one matching pattern or multiple occurrences of the same pattern. And each additional pattern that occurs at a node, represent an additional node.

Also, the pattern matching has to be done one character at a time, even after a particular pattern is found. For example, MIIII is a string that could require rule #3 to be applied in two ways with the following two resulting strings: MUI and MIU.

Answers to assignment #1 questions 1. Why is the MIU System called typographical?

In GEB, Hofstadter clearly describes the types of operations (on p.64) you might find in a typographical system. And, the MIU system clearly includes those types of operations. However, I think the simplest answer is described in the course materials like so, "a formal system with no obvious interpretation of the symbols." In other words, the letters M, I and U are merely treated as objects, not symbolic of something else. 2. How does the MIU System use deduction?

It uses rules to deduce new facts from knowledge of other facts (Dr. Wolfe). 3. Does the MIU System use induction in any way? Explain.

I don't think the system uses induction. If you try to solve the MU puzzle using only data from the computer simulation, you use inductive logic to determine the outcome. However, by analyzing the rules, the starting point and the desired result, you can formally prove the solution to the puzzle. 4. Explain any realistic interpretation of the symbols, or strings of symbols, in the MIU System.

The system seems genetic. You've got these fundamental building blocks and a set of rules and these rules can be applied in a variety of orders for an unlimited amount of time. I think they're like DNA? 5. Explain how conflict resolution is implemented in your simulation.

Definition for conflict resolution from course material: The process by which the control structure chooses a single rule to fire from among the rules that are currently applicable to the data.

To me, there are two issues here:

Q1) What happens when you can implement more than one rule on a given "theorem" or string that results from a previous theorem or axiom?

Q2) How do you decide when to proceed to the next generation (or level in the tree) or instead to try more theorems at a previous generation?

A1)The answer to Q1) is easier. If you can implement more than one rule, then implement each one, one at a time and add the resulting theorems to the existing tree. You merely implement each rule you can in order as you parse through the given string.

A2)The answer to Q2) is more difficult. It is not entirely obvious to me how you tell the computer when to stop moving to the next generation (especially if you're going down a depth first path). However, I did run across two implementations of the MIU system on the Internet. One was in Python and the other in a language called Icon. Both programs asked for the number of levels or generations before they began their processing. So, regardless of what kind of traversal you use, the human imposes a size limitation on the tree initially. This seems like the most practical way to implement this system. I've included the links to these programs in my references below.

6. Does your implementation of the MIU System use forward or backward chaining? Explain.

From what I've read about backward chaining, I would not have considered that type of implementation. Mainly because I don't see any particular advantage for this application.

7. The MU Puzzle: Given only the axiom MI, is MU deducible in the MIU System?

Based on the course materials and Wikipedia and other sources, no it is not. Of course, I do agree with this conclusion and I'll try to state my perspective in a slightly more unique way while answering question #8 below.

8. Provide a convincing argument that MU is, or is not, deducible by the MIU System given only the axiom MI to start with.

The answer to the question is based on the "I count." The only way to get rid of all the I's is to have the number of I's that are a multiple of 3 (to use rule #3). However, the only way to increase the number of I's is to double the number of I's using rule #2. So, if you start with one I, then you can only have I's in the following quantities: 1, 2, 4, 8, 16, 32, 64, etc. This sequence can be represented by 2^n. So, if you can only have 2^n I's, then you can never get a multiple of 3. This is because 2 is a prime number and 2^n will always have only prime factors of 2- there will never be any other prime factors, so you certainly can't have 3 as a factor. So, if you can't have I's in multiples of 3, you can never get rid of all the I's and you must do that to get MU.

9. Explain the relationship between the results of your simulation and automated theorem proving. For example, how would the computer know if it was getting closer to proving the theorem or just spinning its wheels?

I'm not sure that there is a way for the computer to know that it is getting closer to proving the theorem without adding additional rules. For instance, if you added a rule that said, "the theorem can't be proved if the given string has a non-multiple of 3 I's and the result has no I's."

It seems like you could program the computer to realize that as the string grows as you go down a certain path that if this path is continued it will only diverge from the desired goal. Even so, I still think you would have to add additional rules for the computer to know that any permutation of rules will fail on a particular path.

References

Applications in Programming in ANSI C, Johnsonbaugh, Kalin, 1990

I had a hard time thinking of a rule based exercise, so I thought I would try Sudoku

When I thought up the idea, I wrote down a series of "rules" or IF statements, but somehow when I tried to write the code, all the "rules" got buried

Even so, I was too far down a path and too little time was left to go down a new path

For these reasons, please excuse my loose interpretation of "rules"

The code as written does not completely solve a puzzle, it merely finds the first number that already exists in the puzzle and then looks to see if it can place that number in other parts of the puzzle

3. Explain how your system decides which rules are triggered on any one cycle, and how it picks a particular rule to fire (i.e.: explain how it handles conflict resolution).

My algorithm is mostly linear, brute force

It just plows through a series of rules and I don't really see the potential for conflict

In fact, the big problem is not having enough rules or appropriate rules

If I had more rules, I still don't think that there would be a conflict unless there is more than one unique solution to the puzzle, however, most Sudoku puzzles are designed for one unique solution

4. Explain whether the "facts" in your system are binary (assigned only true or false) -- are there any variations? (fuzziness? confidence factors? probability?).

For the most part, all the facts in my system were binary. Either there was a number in a cell or not.

Interestingly, I believe that the Sudoku puzzle may be considered a typographic system

5. Explain how any given rule might have exceptions (that is, situations where the rule is incorrect). More generally, explain the major limitations of your system.

I think it's simple enough to ensure that I only create rules that are correct

The real problem is creating enough rules and appropriate rules to actually solve the puzzle

Assignment 3

Date: 9/20/2007

Student: Michael Weingarden

Course ID: COMP569 Artificial Intelligence

Semester/Year: Fall 2007

Title/Topic: Elementary Blocks World

1. Find the shortest path for the example given above. If there is more than one shortest path, present them all.

The example looked like this:

<span style="background-color: #f9f9f9;">F C
C D D F
A B A B
G E G E
===== ======
Initial State Goal State</span>

The shortest path would be:

<span style="background-color: #f9f9f9;"> </span>

Move D to table

Move F to B

Move C to F

Move D to A

Four moves

<span style="background-color: #f9f9f9;"> </span>

2. Show that the following heuristic will create valid, but not necessarily optimal, solutions: Use these rules: If any available block (any block at the top of a stack in the initial state) can me moved directly to its final position in the goal state then do it. If not, then take the block on the top of the largest stack and move it to the table (if there are two largest stacks then break ties by using the first stack as counted left to right). Iterate in this fashion until the goal state is achieved. Show that this method can produce an optimal solution in some cases, but suboptimal in others (give an example of each).

<span style="background-color: #f9f9f9;"> </span>

I believe that the problem with this heuristic is that it's difficult to move a block to its final position because of possible deadlocks (discussed below). I believe that this heuristic may produce an optimal solution when blocks don't get in each others' way, but only suboptimal when you have to resolve two conflicting block placements. 3. Explain why, in an optimal solution, no block more should be moved more than twice.

<span style="background-color: #f9f9f9;"> </span>

Because in the worst case scenario, all blocks are moved to the table and then to their final position. That would be moving no block more than twice. So, the optimal solution should be better than the worse case scenario. 4. Explain why finding the shortest path reduces to deciding which blocks will be moved twice and which ones will be moved only once.

<span style="background-color: #f9f9f9;"> </span>

Well, if the worst case scenario is move no block more than twice, than the best case scenario would be move blocks only once if possible. 5. Explain how the concept of "deadlock" applies to this problem. Give examples.

<span style="background-color: #f9f9f9;"> </span>

From Wikipedia: "A deadlock is a situation wherein two or more competing actions are waiting for the other to finish, and thus neither ever does."

<span style="background-color: #f9f9f9;"> </span>

In the paper by Nau and Gupta, deadlock is related to the EBW problem in this simple manner, "a is in d's way and d is in a's way, so {a,d} is deadlocked."

<span style="background-color: #f9f9f9;"> </span>

In other words if two blocks are in the way of one anothers' final positions, they are deadlocked. This made implementing H2 quite difficult. 6. Create a computer simulation of this blocks world problem (graphical display is extra credit). Implement the heuristics described above: H1: all blocks go to the table, then moved into final positions; H2: described in 2 above. Invent a better heuristic if you can.

<span style="background-color: #f9f9f9;"> </span>

I was able to implement the code for H1 and it runs quite nicely. However, I did not allow for user input, the data is hardcoded and the total number of blocks is limited to 7.

<span style="background-color: #f9f9f9;"> </span>

I put a lot of time into trying to implement H2 and I've included the code I wrote, but I was not able to complete the heuristic. It's entirely possible that I spent too much time overthinking the problem and I'm curious to see other people's implementations. I was trying to use many arrays to store stack info, information about final position, information about which blocks were on top (clear), etc. But everytime I started to make some progress, I ran into another problem with the overall logic. I've tried to comment my code thoroughly and named variables and methods carefully so that my logic can be followed easily.

7. a) Create a graphical representation of this problem using a directed graph with nodes that correspond to the blocks and arcs that indicate a constraint created by the initial or goal state configurations.

<span style="background-color: #f9f9f9;"> </span>

I have to admit that I am really not sure of what is expected here. Are we supposed to create a directed graph for just one scenario (the one given in the assignment)? If so, do we draw the graph for one possible solution for the one scenario? And even then, I'm not really sure what the directed graph is supposed to look like. I've included a sketch of one idea I had, but I have questions about my own sketch. I've tried to represent my 4 move solution listed above. In my sketch, I have tried to represent D -> T -> F -> B -> C -> F -> D -> A. But it just doesn't feel right. It seems like D -> T should be a node and then F -> B should be a node, etc. And then edges should connect those nodes. However in 7a) it says that nodes correspond to blocks. GIF of directed graph

Also, I wasn't entirely sure what was meant by "arcs that indicate a constraint created by...". In my mind the edges represent moving a block from one place to another. But moves aren't really constraints. It seems like the STRIPS heuristic is more constraint oriented, but I'm not sure if that is what we were supposed to graph.\\ b) Research the Feedback Arch Set problem. This problem boils down to finding the fewest arcs to remove from a directed graph that will make it acyclic (no more cycles), and it is known to be NP complete. Explain how EBW reduces to the Feedback Arc Set problem, and thereby show that EBW is NP complete. (This is actually difficult, so I am not expecting perfect answers, just some rationale -- make an attempt to understand the Feedback Arc Set problem, and then try to tie it together with the graphical representation you came up with in part a). All of this shows that this simple blocks world planning problem is of equal complexity to the better-known traveling salesman problem.

<span style="background-color: #f9f9f9;"> </span>

I've spent a fair amount of time reading Nau and Gupta, looking at Wikipedia (for Feedback Arc Set) and trying to find other sites to learn more about the Feedback Arc Set, but I'm still not sure how to produce one for this problem. I understand that a Feedback Arc Set (also known as Feedback Edge Set) is a directed graph with cycles removed to create a directed acyclic graph. But since I wasn't perfectly sure about the directed graph, it's difficult for me to do much with the Feedback Arc Set.

<span style="background-color: #f9f9f9;"> </span>

With respect to complexity and NP completeness, I did run across a site that really does illustrate the complexity of graphing (and solving) the blocks world problem:

At this site, they show the complexity of merely graphing a two block undirected graph. Of course, not all the nodes are required to solve such a simple blocks world problem, but all of them are possible. Also, the two block problem does have added complexity like table placement being a factor and all possible scenarios are contained in the graph.

For each of these examples, explain them as if you were the "teacher" and the other students in the class were the "students". In particular, after explaining the given example, add a variation of your own that shows the same principle but with different assumptions/numbers/givens.

A.1. Wikipedia search: Bayesian Network

Explain Sprinkler Example:

I'm going to try to focus my explanation on this question in the sprinkler example, "What is the likelihood that it is raining, given the grass is wet?" Hopefully, this will give me the ability to try to answer the other questions as well.

The formula used to compute the answer is given as:

This ratio represents the sum of probabilities where the grass is wet and it is raining and the sum of all probabilities where the grass is wet (raining, sprinkling, both or neither). Apparently, by dividing one by the other, we get the likelihood that it is raining, given that the grass is wet.

A.2. Wikipedia search: Bayes' Theorem

Explain Examples:

Cookies

We will use the following formula to answer the question, "whatâ€™s the probability that Fred picked bowl #1, given that he has a plain cookie?":

So, here's the simple explanation: if you multiply the probability of getting a plain cookie- given that you pick from bowl #1- times the probability of picking bowl #1 and then divide the result by the probability of getting a plain cookie regardless of any bowl information, you get the answer to the proposed question above.

Drug Testing

Monty Hall

Part B

Get the "Traveler's Restaurant Selection Problem" notes that are on reserve at the CSUCI library (call the library first to make sure they are there before you make the trip). Notes: This problem comes from the text: The Elements of Artificial Intelligence, by Steve Tanimoto. Lisp code for this problem is called INFNET and there is a link on the Links page that will take your there. You might be able to figure out the whole network by simply reading the Lisp code, but that might be painful.

B.1: Explain the purpose of each node in the network.

There are several different kind of nodes including input, lumped, predictor and output nodes. All the nodes represent attributes of restaurants. The attributes are intended to allow a traveller to make an informed choice about where to eat while on the road. The referenced LISP code also refers to the nodes as evidential, intermediate, and summary variables.

B.2: Explain the numerical value associated with each node.

At first glance, you would think that the values would be ratings for a particular attribute. But upon closer examination of the class notes, I found that the numerical values represent a probabilistic measure of the relationship between nodes. For example, you can use the numbers to determine what relationship that sounds and clientele might have on popularity. B.3: Explain how each node computes.

According to the class notes, there are quite a few things that are done to compute each node including establishing hypothesis and evidence, computing the likelihood ratio (Î»), sufficiency (Î»), necessity (Î»') and finally the odds ( O(H)=P(H)/1-P(H) ) for each node.

B.4: Come up with a similar problem that could use a similar computational structure and explain how it would be applied (define the input and output nodes first, and then the lumped and evidentiary nodes next). For example, instead of selecting a restaurant, you could be selecting a team player, a new hire, a vacation spot, etc.

It seems like a very practical problem using a Bayesian Network would be SPAM filtering. Bayesian filters are already thought to be one of the best ways to filter SPAM. There are certain probabilities associated with particular words contained in an email and if the posterior probability is high enough, the message must be SPAM. The drawback is that spammers know about Bayesian filtering and are now putting more legitimate text into their emails to alter the posterior probability.

There are a number of attributes that could be associated with SPAM including words, phrases, and domain of origin. Phrases themselves could be broken down into porn, get rich quick schemes, phishing, other junk and legitimate solicitations that you might actually be interested in (but are still SPAM). So, I guess the input nodes could be words or phrases that have a certain probability of being spam, that would lead to intermediate nodes that might help categorize what type of spam and then predictor nodes might be used to determine the overall odds that the email is desirable or not.

Fun Facts from yudkowsky.net:

I was curious where all the probabilities came from (obviously they are just made up for examples like the sprinkler problem, but here are some insights on where some of the medical probabilities come from:

Q. How can I find the priors for a problem?

A. Many commonly used priors are listed in the Handbook of Chemistry and Physics.
Q. Where do priors originally come from?

A. Never ask that question.
Q. Uh huh. Then where do scientists get their priors?

A. Priors for scientific problems are established by annual vote of the AAAS. In recent years the vote has become fractious and controversial, with widespread acrimony, factional polarization, and several outright assassinations. This may be a front for infighting within the Bayes Council, or it may be that the disputants have too much spare time. No one is really sure.
Q. I see. And where does everyone else get their priors?

A. They download their priors from Kazaa.
Q. What if the priors I want aren't available on Kazaa?

A. There's a small, cluttered antique shop in a back alley of San Francisco's Chinatown. Don't ask about the bronze rat.

I have to admit that I didn't have any great ideas on simulating the Bayesian Networks. I did look more thoroughly through the Fernandez code and Robert Wood's code and my code below is merely a scaled down version of Robert Wood's code. I went through and cleaned up the code a bit so that I could at least understand what was going on. Also, I reduced the number of variables so that the results of the simulations were more obvious to me. Another small thing I did was to reformat the output so that there weren't so many decimal places. It seemed to me that 3 decimals places were sufficient. So, here is the code:

The way I configured things, service had an impact on cleanliness and cleanliness had an impact on nutrition and nutrition was essential to overall food quality (not very realistic, but one possible scenario). So, I thought I would change the service odds to see how it effected the overall quality. In the run above, the service odds were 1.234. Here's a run with the service odds going down to 0.6:

So, even though the service odds when down by 50%, the effective likelihood of overall quality only went from .582 to .553. So, I tried increasing the service odds to 2.4:

That seemed to have a huge impact on the effective likelihood of overall food quality. Since it is possible to make changes in service and have them trickle through the Bayesian Network with large impact, the question left begging is this: what dictates the original number for service and is it reliable?
One last note: while researching the Bayesian Networks, I ran across an interesting article on using Bayesian Networks to analyze multiple choice test questions. This is of particular interest to me because I am using multiple choice questions a lot in my math classes with my clicker (classroom response) system. I couldn't find the link to add it to my references, but if I find it again, I will add it below.

Assignment 6

Date: 11/15/2007

Student: Michael Weingarden

Course ID: COMP569 Artificial Intelligence

Semester/Year: Fall 2007

Title/Topic: Fuzzy Systems

For this assignment you are asked to create a fuzzy reasoning simulation.

I think I have a good idea for a fuzzy system, but I was not able to come up with a stellar simulation for this idea. As a high school math teacher, I started using a clicker system in my classroom in order to assess my students progress and also to provide a method for review that provides me and the students with instant feedback on their strengths and weaknesses.

In using such a system, it quickly becomes obvious that the Achilles' heel of the system lies in the multiple choice questions. You could create a set of questions for a topic that are completely useless or a set that are magical or the more common result is a set of reasonable questions with a few that are magical. By magical, I mean that you get a diverse response from the students. When such a phenomenon occurs, you can be fairly sure that you have successfully predicted a problem area in the given topic and you have successfully anticipated some of the common mistakes that students would make.

Over time, the database of available questions will grow and it will become useful to have a system to analyze the "goodness" of each question. With so many factors influencing the evaluation of a question, some sort of system for analysis would be useful. One of the influencing factors in analysing the goodness of a question is the human response. Since this could vary over time and the track record of most students is already known, this type of situation seems well suited for a fuzzy analysis. It would be difficult to claim that most questions are either good or bad, but they must lie somewhere in between.

When I presented this idea to Dr. Wolfe, he made a good suggestion. He said that the system could allow a sort of banking game where students receive credit based on previous responses (did they do well in the past or did they do poorly). The clicker system also allows for students to express the level of confidence that they have in their own answers. So, another factor is, "how confident was the student in the answer they gave?" For instance, if a successful student in the past, answers a current question correctly and expresses that they were very confident, some level of goodness must be assigned to the question. Dr. Wolfe suggested that the system could be a betting game where a student with a lot in the bank who is very confident in the correctness of his answer, should be a lot on the next question. If for some reason, they get the question wrong, this would have an impact on the student and the goodness level of the question.

With these thoughts in mind, I whipped together a little Java code to generate some random data and then a very basic algorithm for measuring fuzzy goodness of a question. The source code can be viewed by clicking on the link below:

This is a class that performs various neural network activities like allowing for the addition of training data for input and output neurons, weight and output error calculation and even allows for saving and retrieving neural networks that have already been trained. It is based on a neural network with one hidden layer.

This is a class that allows you to test the neural network class from Neural_1H.java. This class provides input and output neuron data for training purposes and also several test cases. The output is sent to the console for simplicity. The output shows the resulting error for each cycle of data processing and finally the results.

<span style="background-color: #f9f9f9;">cycle 100 error is 1.2405982637405395
cycle 200 error is 1.2371082770824433
cycle 300 error is 1.25460902094841
cycle 400 error is 1.2587165319919587
cycle 500 error is 1.2524464589357376
cycle 600 error is 1.251769893169403
cycle 700 error is 1.2180115187168121
cycle 800 error is 1.1794416189193726
cycle 900 error is 1.1574158143997193
cycle 1000 error is 1.1281897848844529
cycle 1100 error is 1.1101750040054321
cycle 1200 error is 1.119013681411743
cycle 1300 error is 1.1275454068183899
cycle 1400 error is 1.149181392788887
cycle 1500 error is 1.1736127460002899
cycle 1600 error is 1.183838232755661
cycle 1700 error is 1.1963338875770568
cycle 1800 error is 1.2095998120307923
cycle 1900 error is 1.2135937058925628
cycle 2000 error is 1.2195653104782105
cycle 2100 error is 1.2268711531162262
cycle 2200 error is 1.2280428206920624
cycle 2300 error is 1.2309472346305848
cycle 2400 error is 1.2352565181255342
cycle 2500 error is 1.2352071046829223
cycle 2600 error is 1.2366057300567628
cycle 2700 error is 1.2392954981327058
cycle 2800 error is 1.2387159609794616
cycle 2900 error is 1.239330507516861
cycle 3000 error is 1.2410799312591552
cycle 3100 error is 1.240280431509018
cycle 3200 error is 1.2404668831825256
cycle 3300 error is 1.2416367614269257
cycle 3400 error is 1.2407623851299285
cycle 3500 error is 1.240707573890686
cycle 3600 error is 1.2415026462078094
cycle 3700 error is 1.240622903108597
cycle 3800 error is 1.240430076122284
cycle 3900 error is 1.2409731590747832
cycle 4000 error is 1.2401213431358338
cycle 4100 error is 1.2398500525951386
cycle 4200 error is 1.2402183270454408
cycle 4300 error is 1.2394094288349151
cycle 4400 error is 1.2390950751304626
cycle 4500 error is 1.2393391382694245
cycle 4600 error is 1.2385790264606475
cycle 4700 error is 1.2382431900501252
cycle 4800 error is 1.238397114276886
cycle 4900 error is 1.2376868498325349
cycle 5000 error is 1.2373429226875305
cycle 5100 error is 1.2374303889274598
cycle 5200 error is 1.2367685151100158
cycle 5300 error is 1.2364248073101043
cycle 5400 error is 1.2364626824855804
cycle 5500 error is 1.2358466947078706
cycle 5600 error is 1.2355083787441254
cycle 5700 error is 1.235508850812912
cycle 5800 error is 1.2349356520175934
cycle 5900 error is 1.2346059036254884
cycle 6000 error is 1.2345779621601105
cycle 6100 error is 1.234044350385666
cycle 6200 error is 1.233725004196167
cycle 6300 error is 1.2336753761768342
cycle 6400 error is 1.2331782150268555
cycle 6500 error is 1.2328702878952027
cycle 6600 error is 1.2328040659427644
cycle 6700 error is 1.2323403656482697
cycle 6800 error is 1.2320443189144135
cycle 6900 error is 1.2319654405117035
cycle 7000 error is 1.231532448530197
cycle 7100 error is 1.2312483608722686
cycle 7200 error is 1.2311598289012908
cycle 7300 error is 1.230754953622818
cycle 7400 error is 1.2304826772212982
cycle 7500 error is 1.2303868651390075
cycle 7600 error is 1.2300078177452087
cycle 7700 error is 1.2297470319271087
cycle 7800 error is 1.229645791053772
cycle 7900 error is 1.2292904841899872
cycle 8000 error is 1.2290408086776734
cycle 8100 error is 1.2289355611801147
cycle 8200 error is 1.2286020457744598
cycle 8300 error is 1.228363047838211
cycle 8400 error is 1.2282549667358398
cycle 8500 error is 1.2279415154457092
cycle 8600 error is 1.2277126932144165
cycle 8700 error is 1.2276027023792266
cycle 8800 error is 1.2273077380657196
cycle 8900 error is 1.2270886313915252
cycle 9000 error is 1.226977458000183
cycle 9100 error is 1.2266995525360107
cycle 9200 error is 1.2264896535873413
cycle 9300 error is 1.2263779032230377
cycle 9400 error is 1.2261157643795013
cycle 9500 error is 1.2259145855903626
cycle 9600 error is 1.225802743434906
cycle 9700 error is 1.2255552124977112
cycle 9800 error is 1.2253623175621033
cycle 9900 error is 1.2252507436275482
Test case: 0.10 0.10 0.90 results: 0.50 0.51 0.51
Test case: 0.10 0.90 0.10 results: 0.52 0.54 0.54
Test case: 0.90 0.10 0.10 results: 0.52 0.54 0.54
Reload a previously trained NN from disk and re-test:
CachedExamples(): failed to open 'cache.dat' in JAR file</span>

For Test_2H:

<span style="background-color: #f9f9f9;"> </span>

<span style="background-color: #f9f9f9;">cycle 100 error is 1.2104608416557312
cycle 200 error is 1.1989377653598785
cycle 300 error is 1.1991546332836152
cycle 400 error is 1.1998853814601897
cycle 500 error is 1.2003008127212524
cycle 600 error is 1.2004814112186433
cycle 700 error is 1.201362099647522
cycle 800 error is 1.2019440412521363
cycle 900 error is 1.2021414136886597
cycle 1000 error is 1.20330904006958
cycle 1100 error is 1.2042146813869476
cycle 1200 error is 1.2045543706417083
cycle 1300 error is 1.2063373863697051
cycle 1400 error is 1.208011201620102
cycle 1500 error is 1.2089345896244048
cycle 1600 error is 1.2125002360343933
cycle 1700 error is 1.2169910633563996
cycle 1800 error is 1.2212953960895538
cycle 1900 error is 1.2317328095436095
cycle 2000 error is 1.0408174008131028
cycle 2100 error is 0.8564836922287941
cycle 2200 error is 0.8501860311627388
cycle 2300 error is 0.8318332275748253
cycle 2400 error is 0.8322290790081024
cycle 2500 error is 0.8317758044600487
cycle 2600 error is 0.8194497507810593
cycle 2700 error is 0.8219471222162247
cycle 2800 error is 0.8212101632356643
cycle 2900 error is 0.8069465574622154
cycle 3000 error is 0.8047270104289055
cycle 3100 error is 0.7971457317471504
cycle 3200 error is 0.7692749190330506
cycle 3300 error is 0.7340040165185928
cycle 3400 error is 0.6764363253116608
cycle 3500 error is 0.8076555904746056
cycle 3600 error is 1.028657106757164
cycle 3700 error is 1.1053302311897277
cycle 3800 error is 1.129347419142723
cycle 3900 error is 1.1383358311653138
cycle 4000 error is 1.1524378216266633
cycle 4100 error is 1.1700886577367782
cycle 4200 error is 1.165547902584076
cycle 4300 error is 1.159196569919586
cycle 4400 error is 1.1578674459457396
cycle 4500 error is 1.1576119363307953
cycle 4600 error is 1.158631715774536
cycle 4700 error is 1.1589783644676208
cycle 4800 error is 1.1577294552326203
cycle 4900 error is 1.1590128302574159
cycle 5000 error is 1.1591267693042755
cycle 5100 error is 1.1593438732624053
cycle 5200 error is 1.1615733075141907
cycle 5300 error is 1.174698269367218
cycle 5400 error is 1.1998552250862122
cycle 5500 error is 1.1999999690055847
cycle 5600 error is 1.1999999678134918
cycle 5700 error is 1.1999999678134918
cycle 5800 error is 1.1999999690055847
cycle 5900 error is 1.1999999678134918
cycle 6000 error is 1.1999999678134918
cycle 6100 error is 1.1999999690055847
cycle 6200 error is 1.1999999678134918
cycle 6300 error is 1.1999999678134918
cycle 6400 error is 1.1999999690055847
cycle 6500 error is 1.1999999678134918
cycle 6600 error is 1.1999999678134918
cycle 6700 error is 1.1999999690055847
cycle 6800 error is 1.1999999678134918
cycle 6900 error is 1.1999999678134918
cycle 7000 error is 1.1999999690055847
cycle 7100 error is 1.1999999678134918
cycle 7200 error is 1.1999999678134918
cycle 7300 error is 1.1999999690055847
cycle 7400 error is 1.1999999678134918
cycle 7500 error is 1.1999999678134918
cycle 7600 error is 1.1999999690055847
cycle 7700 error is 1.1999999678134918
cycle 7800 error is 1.1999999678134918
cycle 7900 error is 1.1999999690055847
cycle 8000 error is 1.1999999678134918
cycle 8100 error is 1.1999999678134918
cycle 8200 error is 1.1999999690055847
cycle 8300 error is 1.1999999678134918
cycle 8400 error is 1.1999999678134918
cycle 8500 error is 1.1999999690055847
cycle 8600 error is 1.1999999678134918
cycle 8700 error is 1.1999999678134918
cycle 8800 error is 1.1999999690055847
cycle 8900 error is 1.1999999678134918
cycle 9000 error is 1.1999999678134918
cycle 9100 error is 1.1999999690055847
cycle 9200 error is 1.1999999678134918
cycle 9300 error is 1.1999999678134918
cycle 9400 error is 1.1999999690055847
cycle 9500 error is 1.1999999678134918
cycle 9600 error is 1.1999999678134918
cycle 9700 error is 1.1999999690055847
cycle 9800 error is 1.1999999678134918
cycle 9900 error is 1.1999999678134918
Test case: 0.10 0.10 0.90 results: 0.50 0.50 0.50
Test case: 0.10 0.90 0.10 results: 0.50 0.50 0.50
Test case: 0.90 0.10 0.10 results: 0.50 0.50 0.50
Reload a previously trained NN from disk and re-test:
CachedExamples(): failed to open 'cache.dat' in JAR file</span>

Before this assignment, I had a view of neural networks that was mistaken. I primarily thought of them as systems that facilitated parallel processing for problem solving. However, now that I know more about them, it seems that although that view is partially correct, they seem to serve a more specific purpose in the area of recognition. It seems that neural networks are great for any application that requires pattern recognition. One example that seems to be common is optical character recognition, however, it is obvious that neural networks could be used for many other types of pattern recognition including voice recognition, fingerprint matching, face identification and, heck, you could probably even use neural networks for scent identification!.

<span style="background-color: #f9f9f9;"> </span>

I should have known that neural networks targeted the specific application of pattern recognition because a few years ago, a friend of mine was telling me that they were researching the use of neural networks for monitoring petri dish experiments at Amgen. He worked in the research automation group where they used robots to automate experiments. Apparently certain experiments provide visual clues as to the level of their success and the biologists were spending a great deal of their time monitoring the numerous experiments that the robots were performing. The use of neural networks and cameras to visually identify the most interesting experiments turns out to be a great use of the technology.

<span style="background-color: #f9f9f9;"> </span>

I can also see an application for neural networks with more entertaining and accessible robot experimentation. Dr. AJ has written grants to acquire humanoid robots that would play soccer. Possibly neural networks could be used to identify fellow teammates, opposing teammates, the opposing goal and, of course, the ball.

<span style="background-color: #f9f9f9;"> </span>

When I first looked at this assignment, I thought that neural networks worked a bit differently, so I came up with the idea that it would be great to use such a system for classroom seating arrangements. This is a problem that many teachers struggle with, especially with at-risk populations. The students in the classroom definitely could be represented as a set of nodes where each node is interrelated (or connected) to adjoining nodes. Each student or node could be represented by several numbers (weights?), each one representing a specific characteristic like talkative, fidgety, relatively high performing, or easily distracted. The most important consideration, though, is how do students relate to those they sit next to. Are they friends? Are they enemies? Are they compatible? Could there be a mutually beneficial relationship (one student needs help, the other is doing well in the class)? And, of course, each of those characteristics would be represented by a range of numbers (not binary). It's entirely possible that this application would be better suited for fuzzy analysis, however,
I did find it interesting that the ultimate goal of a neural network is to find the lowest energy state of the system and that's the same goal that I have for my classroom!

<span style="background-color: #f9f9f9;"> </span>

Of course, the classroom application has several problems that would keep it from being a great neural network application. One, I don't know how to provide it training data. I have no idea which classroom arrangement would be optimal- that's why I need some help in automating the task. Two, I don't see how the relationship between the nodes is taken into account in the neural network. If I could understand that, the classroom application might seem more feasible because a well behaved classroom has a lot to do with the optimal arrangement of students who sit next to one another. If a neural network could work for this application, it would have at least two different uses: 1) optimal behavior arrangement 2) optimal learning arrangement.

<span style="background-color: #f9f9f9;"> </span>

The goal in a good classroom arrangement system would be to re-arrange all students until you have achieved a preset threshold for expected behavior or learning. So, the ultimate question is: in a neural network, aside from evaluating the data, can you re-arrange the neurons until you reach the desired threshold? Based on the write up at Wikipedia for Artificial Neural Networks, it seems like you could use a neural network for a system like this if you used
reinforcement learning as opposed to supervised learning that require up front training. With reinforcement learning, there is no up front training and you can provide feedback to the system over time. This is how the classroom arrangement system should work. You first seat students in an arbitrary fashion. You then assign students weights for various factors like those mentioned above. You then use a neural network to re-arrange students until an optimal (lowest energy) point is reached. You then re-assign the seats and observe. Student weights and relationships are then adjusted manually (by the teacher) and re-entered into the system for a new optimization analysis. It seems to me that this problem is similar to the Traveling Salesman problem in it's complexity. I also ask myself the question, "is it possible that a Kohonen network could be used for its self-organizing map capability?"

<span style="background-color: #f9f9f9;"> </span>

Another interesting application might be document tagging. We had an interesting presentation the other day about the semantic web and about how the whole WWW would be so much more useful if all the content of all documents were tagged and the tags were all related in some kind of object oriented structure. When I first started working for the Navy, I worked on a project where we were trying to convert all of our electronic documents (MS Word, WordPerfect, Interleaf and a variety of others) into SGML. There was a company that had some software that touted the ability to look at a chunk of text and based on the shape of the text, a tag would be proposed and if this shape recurred, the tag would be re-used for all similar looking chunks of text. It's possible that neural networks could be used for examining the shape of paragraphs, tables and other objects in a group of documents and then help the system to determine which type of tag or object identifier should be assigned to the chunk of text based on prior training. Also, the system could learn based on the repeated occurrences of similar looking chunks of information in the document.

<span style="background-color: #f9f9f9;"> </span>

Ok, enough pondering. It seems like the most common, straightforward application for neural networks is optical character recognition. I found two nice, comprehensive tutorials on neural networks. One was a book by Jeff Heaton and, fortunately, most of the book is online (see references). Also, the sample Java code is online. Unfortunately, the book and the code rely on JOONE and I had a problem getting the JOONE editor to work with my new Microsoft Vista-based laptop. As I understand it, I could have created some apps without the JOONE editor, but I decided to keep looking for a simpler, library independent example of neural networks. I found it in spades in the form of an open source book by Mark Watson. His online book (see references) actually provides quite a nice supplement to all of the topics we've covered in this course and he offers sample Java code as a download along with the PDF version of the book. So, my simulations are based on Mark Watson's code. His examples are pretty generic, but easily could be applied to optical character recognition.

According to Mark Watson, "For back propagation networks, training data consists of matched sets of input and output values. We want to train a network to not only produce the same outputs for training data inputs as appear in the training data, but also to generalize its pattern matching ability based on the training data."

<span style="background-color: #f9f9f9;"> </span>

So, in the case of optical character recognition, you might have several examples of a particular letter like, "a", which come from a variety of print sources and then an example of what you expect an "a" to look like (in digital terms). So, ultimately, you'll end up with multiple inputs with the same output.

<span style="background-color: #f9f9f9;"> </span>

Mark Watson goes on to provide some tips and tricks regarding the training data, "A key here is to balance the size of the network against how much information it must hold. A common mistake when using back propagation networks is to use too large of a network (more on what this means later); a network that contains too many neurons will simply memorize the training examples, including any noise in the training data. However, if we use a smaller number of neurons, with a very large number of training data examples, then we force the network to generalize, ignoring noise in the training data." 3. Explain the training process.Back to Top

<span style="background-color: #f9f9f9;"> </span>

The training process takes the sample inputs and outputs and establishes the appropriate weights for all the connections between neurons in the network. Once these weights have been established, then new data can be added and processed appropriately.

<span style="background-color: #f9f9f9;"> </span>

Mark Watson explains the process in a more detailed manner:
In order to train the network, we repeat the following learning cycle several times:
Zero out temporary arrays for holding the error at each neuron. The error, starting at the output layer, is the difference between the output value for a specific output layer neuron and the calculated value from setting the input layer neuronâ€™s activation values to the input values in the current training example, and letting activation spread through the network.
Update the weight W[i,j] (where i is the index of an input neuron, and j is the index of an output neuron) using the formula W[i,j] += learning_rate * output_error[j]*I[i], where the learning_rate is a tunable parameter, output_error[j] was calculated in step 1, and I[i] is the activation of input neuron at index i. 4. Explain the effectiveness (%correct on the training set) of the system after it has been trained.Back to Top
It seems like the best answer here is to say that results may vary. When running the training data, you have two possible end conditions: 1) you've met the threshold you've designated for output error (hopefully low) or 2) you've run through a specified number of cycles and never got the output error low enough. Obviously, if this is the case, it's entirely possible to have a neural network be completely ineffective with a particular set of data.
Mark Watson also provides several clues about how to improve the results for a neural network:
Effectively using back propagation neural networks in applications is somewhat of an acquired art. The following ad hoc notes are derived from my experience of using neural networks over the last 18 years:
Get as much training data as possible: an effective neural network has the ability to generalize from training data, usually in the sense of ignoring noise and spurious training examples. For this to occur, you might need thousands of training examples, depending on the complexity of the problem. Also, some training data can be set aside (i.e., not used for training) and saved for testing the network.
For very large training data sets, try using only 10% of the data initially for training. After the output errors are small, add in more training data sets; repeat this process until training on the entire training data set produces small errors.
Do not use too many hidden layer neurons: if there are too many hidden layer neurons, then the network will not learn to generalize, rather, it will just remember all of the training data, including noise and bad training examples. Start with a very small number of hidden neurons, and see if the training data set can be learned with very small output errors; if necessary, slowly increase the number of hidden neurons, and repeat the training process. 5. Explain how well the system generalized to data that was not in the training set.Back to Top 6. ReferencesBack to Top

## Assignment 1

Date: 9/1/2007

Student: Michael Weingarden

Course ID: COMP569 Artificial Intelligence

Semester/Year: Fall 2007

Title/Topic: MU Puzzle

- Description of Theoretical Computer Simulation of MIU System
- String manipulation related to rule processing
- Answers to assignment #1 questions
- References

Computer Simulation of MIU SystemOverview of desired logic for main routine

- Create functions that represent the four MIU rules
- Rules:

1) xI -> xIU2) Mx -> Mxx

3) xIIIy -> xUy

4) xUUy -> xy

- Get initial input string from user (or hard code)
- Apply all possible combinations of rules until desired result is found

Thoughts regarding implementation of an algorithm to create this system:String manipulation related to rule processing

Answers to assignment #1 questions

1. Why is the MIU System called typographical?In GEB, Hofstadter clearly describes the types of operations (on p.64) you might find in a typographical system. And, the MIU system clearly includes those types of operations. However, I think the simplest answer is described in the course materials like so, "a formal system with no obvious interpretation of the symbols." In other words, the letters M, I and U are merely treated as objects, not symbolic of something else.

2. How does the MIU System use deduction?It uses rules to deduce new facts from knowledge of other facts (Dr. Wolfe).

3. Does the MIU System use induction in any way? Explain.I don't think the system uses induction. If you try to solve the MU puzzle using only data from the computer simulation, you use inductive logic to determine the outcome. However, by analyzing the rules, the starting point and the desired result, you can formally prove the solution to the puzzle.

4. Explain any realistic interpretation of the symbols, or strings of symbols, in the MIU System.The system seems genetic. You've got these fundamental building blocks and a set of rules and these rules can be applied in a variety of orders for an unlimited amount of time. I think they're like DNA?

5. Explain how conflict resolution is implemented in your simulation.Definition for conflict resolution from course material: The process by which the control structure chooses a single rule to fire from among the rules that are currently applicable to the data.

To me, there are two issues here:

Q1) What happens when you can implement more than one rule on a given "theorem" or string that results from a previous theorem or axiom?Q2) How do you decide when to proceed to the next generation (or level in the tree) or instead to try more theorems at a previous generation?A1)The answer to Q1) is easier. If you can implement more than one rule, then implement each one, one at a time and add the resulting theorems to the existing tree. You merely implement each rule you can in order as you parse through the given string.A2)The answer to Q2) is more difficult. It is not entirely obvious to me how you tell the computer when to stop moving to the next generation (especially if you're going down a depth first path). However, I did run across two implementations of the MIU system on the Internet. One was in Python and the other in a language called Icon. Both programs asked for the number of levels or generations before they began their processing. So, regardless of what kind of traversal you use, the human imposes a size limitation on the tree initially. This seems like the most practical way to implement this system. I've included the links to these programs in my references below.6. Does your implementation of the MIU System use forward or backward chaining? Explain.From what I've read about backward chaining, I would not have considered that type of implementation. Mainly because I don't see any particular advantage for this application.

7. The MU Puzzle: Given only the axiom MI, is MU deducible in the MIU System?Based on the course materials and Wikipedia and other sources, no it is not. Of course, I do agree with this conclusion and I'll try to state my perspective in a slightly more unique way while answering question #8 below.

8. Provide a convincing argument that MU is, or is not, deducible by the MIU System given only the axiom MI to start with.The answer to the question is based on the "I count." The only way to get rid of all the I's is to have the number of I's that are a multiple of 3 (to use rule #3). However, the only way to increase the number of I's is to double the number of I's using rule #2. So, if you start with one I, then you can only have I's in the following quantities: 1, 2, 4, 8, 16, 32, 64, etc. This sequence can be represented by 2^n. So, if you can only have 2^n I's, then you can never get a multiple of 3. This is because 2 is a prime number and 2^n will always have only prime factors of 2- there will never be any other prime factors, so you certainly can't have 3 as a factor. So, if you can't have I's in multiples of 3, you can never get rid of all the I's and you must do that to get MU.

9. Explain the relationship between the results of your simulation and automated theorem proving. For example, how would the computer know if it was getting closer to proving the theorem or just spinning its wheels?I'm not sure that there is a way for the computer to know that it is getting closer to proving the theorem without adding additional rules. For instance, if you added a rule that said, "the theorem can't be proved if the given string has a non-multiple of 3 I's and the result has no I's."

It seems like you could program the computer to realize that as the string grows as you go down a certain path that if this path is continued it will only diverge from the desired goal. Even so, I still think you would have to add additional rules for the computer to know that any permutation of rules will fail on a particular path.

References

## Assignment 2

Date: 9/16/2007

Student: Michael Weingarden

Course ID: COMP569 Artificial Intelligence

Semester/Year: Fall 2007

Title/Topic: Expert System/Solving Sudoku 4x4 Puzzles

Disclaimer regarding Java Code:

- I had a hard time thinking of a rule based exercise, so I thought I would try Sudoku
- When I thought up the idea, I wrote down a series of "rules" or IF statements, but somehow when I tried to write the code, all the "rules" got buried
- Even so, I was too far down a path and too little time was left to go down a new path
- For these reasons, please excuse my loose interpretation of "rules"
- The code as written does not completely solve a puzzle, it merely finds the first number that already exists in the puzzle and then looks to see if it can place that number in other parts of the puzzle

Java Code PDFJava Source Code File

1. Define the domain of your choosing and list the rules.The domain I chose was puzzle solving- specifically Sudoku. Here is the list of "rules" I started with:

numberin the gridnumbernumber'srow, mark all those cells as unavailablenumber'scolumn, mark those cells as unavailablenumber's"box", mark those cells as unavailablenumbernumberis placed in all possible boxes, replace all 5's with 0'snumber, for all numbers 1 through 4 (for the type of puzzle I'm working with)2. Sketch the design of the system (block-level design, showing the inference process).Sketch of system

Sketch of system (update)

Scenario Diagrams

3. Explain how your system decides which rules are triggered on any one cycle, and how it picks a particular rule to fire (i.e.: explain how it handles conflict resolution).4. Explain whether the "facts" in your system are binary (assigned only true or false) -- are there any variations? (fuzziness? confidence factors? probability?).5. Explain how any given rule might have exceptions (that is, situations where the rule is incorrect). More generally, explain the major limitations of your system.## Assignment 3

Date: 9/20/2007

Student: Michael Weingarden

Course ID: COMP569 Artificial Intelligence

Semester/Year: Fall 2007

Title/Topic: Elementary Blocks World

1. Find the shortest path for the example given above. If there is more than one shortest path, present them all.The example looked like this:

The shortest path would be:

- Move D to table
- Move F to B
- Move C to F
- Move D to A

Four moves2. Show that the following heuristic will create valid, but not necessarily optimal, solutions: Use these rules: If any available block (any block at the top of a stack in the initial state) can me moved directly to its final position in the goal state then do it. If not, then take the block on the top of the largest stack and move it to the table (if there are two largest stacks then break ties by using the first stack as counted left to right). Iterate in this fashion until the goal state is achieved. Show that this method can produce an optimal solution in some cases, but suboptimal in others (give an example of each).I believe that the problem with this heuristic is that it's difficult to move a block to its final position because of possible deadlocks (discussed below). I believe that this heuristic may produce an optimal solution when blocks don't get in each others' way, but only suboptimal when you have to resolve two conflicting block placements.

3. Explain why, in an optimal solution, no block more should be moved more than twice.Because in the worst case scenario, all blocks are moved to the table and then to their final position. That would be moving no block more than twice. So, the optimal solution should be better than the worse case scenario.

4. Explain why finding the shortest path reduces to deciding which blocks will be moved twice and which ones will be moved only once.Well, if the worst case scenario is move no block more than twice, than the best case scenario would be move blocks only once if possible.

5. Explain how the concept of "deadlock" applies to this problem. Give examples.From Wikipedia: "A deadlock is a situation wherein two or more competing actions are waiting for the other to finish, and thus neither ever does."

In the paper by Nau and Gupta, deadlock is related to the EBW problem in this simple manner, "a is in d's way and d is in a's way, so {a,d} is deadlocked."

In other words if two blocks are in the way of one anothers' final positions, they are deadlocked. This made implementing H2 quite difficult.

6. Create a computer simulation of this blocks world problem (graphical display is extra credit). Implement the heuristics described above: H1: all blocks go to the table, then moved into final positions; H2: described in 2 above. Invent a better heuristic if you can.I was able to implement the code for H1 and it runs quite nicely. However, I did not allow for user input, the data is hardcoded and the total number of blocks is limited to 7.

I put a lot of time into trying to implement H2 and I've included the code I wrote, but I was not able to complete the heuristic. It's entirely possible that I spent too much time overthinking the problem and I'm curious to see other people's implementations. I was trying to use many arrays to store stack info, information about final position, information about which blocks were on top (clear), etc. But everytime I started to make some progress, I ran into another problem with the overall logic. I've tried to comment my code thoroughly and named variables and methods carefully so that my logic can be followed easily.

Java Code for H1:

Java Code for H2:

7. a) Create a graphical representation of this problem using a directed graph with nodes that correspond to the blocks and arcs that indicate a constraint created by the initial or goal state configurations.I have to admit that I am really not sure of what is expected here. Are we supposed to create a directed graph for just one scenario (the one given in the assignment)? If so, do we draw the graph for one possible solution for the one scenario? And even then, I'm not really sure what the directed graph is supposed to look like. I've included a sketch of one idea I had, but I have questions about my own sketch. I've tried to represent my 4 move solution listed above. In my sketch, I have tried to represent D -> T -> F -> B -> C -> F -> D -> A. But it just doesn't feel right. It seems like D -> T should be a node and then F -> B should be a node, etc. And then edges should connect those nodes. However in 7a) it says that nodes correspond to blocks.

GIF of directed graph

GIF of second idea for directed graph

Also, I wasn't entirely sure what was meant by "arcs that indicate a constraint created by...". In my mind the edges represent moving a block from one place to another. But moves aren't really constraints. It seems like the STRIPS heuristic is more constraint oriented, but I'm not sure if that is what we were supposed to graph.\\

b) Research the Feedback Arch Set problem. This problem boils down to finding the fewest arcs to remove from a directed graph that will make it acyclic (no more cycles), and it is known to be NP complete. Explain how EBW reduces to the Feedback Arc Set problem, and thereby show that EBW is NP complete. (This is actually difficult, so I am not expecting perfect answers, just some rationale -- make an attempt to understand the Feedback Arc Set problem, and then try to tie it together with the graphical representation you came up with in part a). All of this shows that this simple blocks world planning problem is of equal complexity to the better-known traveling salesman problem.I've spent a fair amount of time reading Nau and Gupta, looking at Wikipedia (for Feedback Arc Set) and trying to find other sites to learn more about the Feedback Arc Set, but I'm still not sure how to produce one for this problem. I understand that a Feedback Arc Set (also known as Feedback Edge Set) is a directed graph with cycles removed to create a directed acyclic graph. But since I wasn't perfectly sure about the directed graph, it's difficult for me to do much with the Feedback Arc Set.

With respect to complexity and NP completeness, I did run across a site that really does illustrate the complexity of graphing (and solving) the blocks world problem:

http://www.d.umn.edu/~ddunham/cs4521f05/assignments/a11/assignment.html

At this site, they show the complexity of merely graphing a two block undirected graph. Of course, not all the nodes are required to solve such a simple blocks world problem, but all of them are possible. Also, the two block problem does have added complexity like table placement being a factor and all possible scenarios are contained in the graph.

References

## Assignment 4

Date: 10/13/2007

Student: Michael Weingarden

Course ID: COMP569 Artificial Intelligence

Semester/Year: Fall 2007

Title/Topic: Bayesian Networks Part I

Part A

For each of these examples, explain them as if you were the "teacher" and the other students in the class were the "students". In particular, after explaining the given example, add a variation of your own that shows the same principle but with different assumptions/numbers/givens.A.1. Wikipedia search: Bayesian NetworkExplain Sprinkler Example:I'm going to try to focus my explanation on this question in the sprinkler example, "What is the likelihood that it is raining, given the grass is wet?" Hopefully, this will give me the ability to try to answer the other questions as well.

The formula used to compute the answer is given as:

This ratio represents the sum of probabilities where the grass is wet and it is raining and the sum of all probabilities where the grass is wet (raining, sprinkling, both or neither). Apparently, by dividing one by the other, we get the likelihood that it is raining, given that the grass is wet.

A.2. Wikipedia search: Bayes' TheoremExplain Examples:

We will use the following formula to answer the question, "whatâ€™s the probability that Fred picked bowl #1, given that he has a plain cookie?":CookiesSo, here's the simple explanation: if you multiply the probability of getting a plain cookie- given that you pick from bowl #1- times the probability of picking bowl #1 and then divide the result by the probability of getting a plain cookie regardless of any bowl information, you get the answer to the proposed question above.

Drug TestingMonty HallPart B

Get the "Traveler's Restaurant Selection Problem" notes that are on reserve at the CSUCI library (call the library first to make sure they are there before you make the trip). Notes: This problem comes from the text: The Elements of Artificial Intelligence, by Steve Tanimoto. Lisp code for this problem is called INFNET and there is a link on the Links page that will take your there. You might be able to figure out the whole network by simply reading the Lisp code, but that might be painful.B.1: Explain the purpose of each node in the network.There are several different kind of nodes including input, lumped, predictor and output nodes. All the nodes represent attributes of restaurants. The attributes are intended to allow a traveller to make an informed choice about where to eat while on the road. The referenced LISP code also refers to the nodes as evidential, intermediate, and summary variables.

B.2: Explain the numerical value associated with each node.At first glance, you would think that the values would be ratings for a particular attribute. But upon closer examination of the class notes, I found that the numerical values represent a probabilistic measure of the relationship between nodes. For example, you can use the numbers to determine what relationship that sounds and clientele might have on popularity.

B.3: Explain how each node computes.According to the class notes, there are quite a few things that are done to compute each node including establishing hypothesis and evidence, computing the likelihood ratio (Î»), sufficiency (Î»), necessity (Î»') and finally the odds ( O(H)=P(H)/1-P(H) ) for each node.

B.4: Come up with a similar problem that could use a similar computational structure and explain how it would be applied (define the input and output nodes first, and then the lumped and evidentiary nodes next). For example, instead of selecting a restaurant, you could be selecting a team player, a new hire, a vacation spot, etc.It seems like a very practical problem using a Bayesian Network would be SPAM filtering. Bayesian filters are already thought to be one of the best ways to filter SPAM. There are certain probabilities associated with particular words contained in an email and if the posterior probability is high enough, the message must be SPAM. The drawback is that spammers know about Bayesian filtering and are now putting more legitimate text into their emails to alter the posterior probability.

There are a number of attributes that could be associated with SPAM including words, phrases, and domain of origin. Phrases themselves could be broken down into porn, get rich quick schemes, phishing, other junk and legitimate solicitations that you might actually be interested in (but are still SPAM). So, I guess the input nodes could be words or phrases that have a certain probability of being spam, that would lead to intermediate nodes that might help categorize what type of spam and then predictor nodes might be used to determine the overall odds that the email is desirable or not.

Fun Facts from yudkowsky.net:

I was curious where all the probabilities came from (obviously they are just made up for examples like the sprinkler problem, but here are some insights on where some of the medical probabilities come from:

Q. How can I find the priors for a problem?

A. Many commonly used priors are listed in the Handbook of Chemistry and Physics.

Q. Where do priors originally come from?

A. Never ask that question.

Q. Uh huh. Then where do scientists get their priors?

A. Priors for scientific problems are established by annual vote of the AAAS. In recent years the vote has become fractious and controversial, with widespread acrimony, factional polarization, and several outright assassinations. This may be a front for infighting within the Bayes Council, or it may be that the disputants have too much spare time. No one is really sure.

Q. I see. And where does everyone else get their priors?

A. They download their priors from Kazaa.

Q. What if the priors I want aren't available on Kazaa?

A. There's a small, cluttered antique shop in a back alley of San Francisco's Chinatown. Don't ask about the bronze rat.

References:

## Assignment 5

Date: 10/29/2007

Student: Michael Weingarden

Course ID: COMP569 Artificial Intelligence

Semester/Year: Fall 2007

Title/Topic: Bayesian Networks Part II

I have to admit that I didn't have any great ideas on simulating the Bayesian Networks. I did look more thoroughly through the Fernandez code and Robert Wood's code and my code below is merely a scaled down version of Robert Wood's code. I went through and cleaned up the code a bit so that I could at least understand what was going on. Also, I reduced the number of variables so that the results of the simulations were more obvious to me. Another small thing I did was to reformat the output so that there weren't so many decimal places. It seemed to me that 3 decimals places were sufficient. So, here is the code:

Java Simulation Code

and here are some of the sample outputs from running the code:

The way I configured things, service had an impact on cleanliness and cleanliness had an impact on nutrition and nutrition was essential to overall food quality (not very realistic, but one possible scenario). So, I thought I would change the service odds to see how it effected the overall quality. In the run above, the service odds were 1.234. Here's a run with the service odds going down to 0.6:

So, even though the service odds when down by 50%, the effective likelihood of overall quality only went from .582 to .553. So, I tried increasing the service odds to 2.4:

That seemed to have a huge impact on the effective likelihood of overall food quality. Since it is possible to make changes in service and have them trickle through the Bayesian Network with large impact, the question left begging is this: what dictates the original number for service and is it reliable?

One last note: while researching the Bayesian Networks, I ran across an interesting article on using Bayesian Networks to analyze multiple choice test questions. This is of particular interest to me because I am using multiple choice questions a lot in my math classes with my clicker (classroom response) system. I couldn't find the link to add it to my references, but if I find it again, I will add it below.

## Assignment 6

Date: 11/15/2007

Student: Michael Weingarden

Course ID: COMP569 Artificial Intelligence

Semester/Year: Fall 2007

Title/Topic: Fuzzy Systems

For this assignment you are asked to create a fuzzy reasoning simulation.

I think I have a good idea for a fuzzy system, but I was not able to come up with a stellar simulation for this idea. As a high school math teacher, I started using a clicker system in my classroom in order to assess my students progress and also to provide a method for review that provides me and the students with instant feedback on their strengths and weaknesses.

In using such a system, it quickly becomes obvious that the Achilles' heel of the system lies in the multiple choice questions. You could create a set of questions for a topic that are completely useless or a set that are magical or the more common result is a set of reasonable questions with a few that are magical. By magical, I mean that you get a diverse response from the students. When such a phenomenon occurs, you can be fairly sure that you have successfully predicted a problem area in the given topic and you have successfully anticipated some of the common mistakes that students would make.

Over time, the database of available questions will grow and it will become useful to have a system to analyze the "goodness" of each question. With so many factors influencing the evaluation of a question, some sort of system for analysis would be useful. One of the influencing factors in analysing the goodness of a question is the human response. Since this could vary over time and the track record of most students is already known, this type of situation seems well suited for a fuzzy analysis. It would be difficult to claim that most questions are either good or bad, but they must lie somewhere in between.

When I presented this idea to Dr. Wolfe, he made a good suggestion. He said that the system could allow a sort of banking game where students receive credit based on previous responses (did they do well in the past or did they do poorly). The clicker system also allows for students to express the level of confidence that they have in their own answers. So, another factor is, "how confident was the student in the answer they gave?" For instance, if a successful student in the past, answers a current question correctly and expresses that they were very confident, some level of goodness must be assigned to the question. Dr. Wolfe suggested that the system could be a betting game where a student with a lot in the bank who is very confident in the correctness of his answer, should be a lot on the next question. If for some reason, they get the question wrong, this would have an impact on the student and the goodness level of the question.

With these thoughts in mind, I whipped together a little Java code to generate some random data and then a very basic algorithm for measuring fuzzy goodness of a question. The source code can be viewed by clicking on the link below:

http://www.thinkatorium.com/nphs/project6FuzzyMW.doc

References:

## Assignment 7

Date: 12/2/2007

Student: Michael Weingarden

Course ID: COMP569 Artificial Intelligence

Semester/Year: Fall 2007

Title/Topic: Neural Networks

Java Code for SimulationBack to Tophttp://www.thinkatorium.com/nphs/Neural_1H.java

http://www.thinkatorium.com/nphs/Neural_1H.rtf

This is a class that performs various neural network activities like allowing for the addition of training data for input and output neurons, weight and output error calculation and even allows for saving and retrieving neural networks that have already been trained. It is based on a neural network with one hidden layer.

http://www.thinkatorium.com/nphs/Test_1H.java

http://www.thinkatorium.com/nphs/Test_1H.rtf

This is a class that allows you to test the neural network class from Neural_1H.java. This class provides input and output neuron data for training purposes and also several test cases. The output is sent to the console for simplicity. The output shows the resulting error for each cycle of data processing and finally the results.

http://www.thinkatorium.com/nphs/Neural_2H.java

http://www.thinkatorium.com/nphs/Neural_2H.rtf

This is a class like Neural_1H except it supports the use of two hidden layers.

http://www.thinkatorium.com/nphs/Test_2H.java

http://www.thinkatorium.com/nphs/Test_2H.rtf

This is a class like Test_1H except it tests Neural_2H.java.

Sample Outputs from SimulationBack to TopFor Test_1H:For Test_2H:1. Explain the application domain.Back to TopBefore this assignment, I had a view of neural networks that was mistaken. I primarily thought of them as systems that facilitated parallel processing for problem solving. However, now that I know more about them, it seems that although that view is partially correct, they seem to serve a more specific purpose in the area of recognition. It seems that neural networks are great for any application that requires pattern recognition. One example that seems to be common is optical character recognition, however, it is obvious that neural networks could be used for many other types of pattern recognition including voice recognition, fingerprint matching, face identification and, heck, you could probably even use neural networks for scent identification!.

I should have known that neural networks targeted the specific application of pattern recognition because a few years ago, a friend of mine was telling me that they were researching the use of neural networks for monitoring petri dish experiments at Amgen. He worked in the research automation group where they used robots to automate experiments. Apparently certain experiments provide visual clues as to the level of their success and the biologists were spending a great deal of their time monitoring the numerous experiments that the robots were performing. The use of neural networks and cameras to visually identify the most interesting experiments turns out to be a great use of the technology.

I can also see an application for neural networks with more entertaining and accessible robot experimentation. Dr. AJ has written grants to acquire humanoid robots that would play soccer. Possibly neural networks could be used to identify fellow teammates, opposing teammates, the opposing goal and, of course, the ball.

When I first looked at this assignment, I thought that neural networks worked a bit differently, so I came up with the idea that it would be great to use such a system for classroom seating arrangements. This is a problem that many teachers struggle with, especially with at-risk populations. The students in the classroom definitely could be represented as a set of nodes where each node is interrelated (or connected) to adjoining nodes. Each student or node could be represented by several numbers (weights?), each one representing a specific characteristic like talkative, fidgety, relatively high performing, or easily distracted. The most important consideration, though, is how do students relate to those they sit next to. Are they friends? Are they enemies? Are they compatible? Could there be a mutually beneficial relationship (one student needs help, the other is doing well in the class)? And, of course, each of those characteristics would be represented by a range of numbers (not binary). It's entirely possible that this application would be better suited for fuzzy analysis, however,

I did find it interesting that the ultimate goal of a neural network is to find the lowest energy state of the system and that's the same goal that I have for my classroom!

Of course, the classroom application has several problems that would keep it from being a great neural network application. One, I don't know how to provide it training data. I have no idea which classroom arrangement would be optimal- that's why I need some help in automating the task. Two, I don't see how the relationship between the nodes is taken into account in the neural network. If I could understand that, the classroom application might seem more feasible because a well behaved classroom has a lot to do with the optimal arrangement of students who sit next to one another. If a neural network could work for this application, it would have at least two different uses: 1) optimal behavior arrangement 2) optimal learning arrangement.

The goal in a good classroom arrangement system would be to re-arrange all students until you have achieved a preset threshold for expected behavior or learning. So, the ultimate question is: in a neural network, aside from evaluating the data, can you re-arrange the neurons until you reach the desired threshold? Based on the write up at Wikipedia for Artificial Neural Networks, it seems like you could use a neural network for a system like this if you used

reinforcement learning as opposed to supervised learning that require up front training. With reinforcement learning, there is no up front training and you can provide feedback to the system over time. This is how the classroom arrangement system should work. You first seat students in an arbitrary fashion. You then assign students weights for various factors like those mentioned above. You then use a neural network to re-arrange students until an optimal (lowest energy) point is reached. You then re-assign the seats and observe. Student weights and relationships are then adjusted manually (by the teacher) and re-entered into the system for a new optimization analysis. It seems to me that this problem is similar to the Traveling Salesman problem in it's complexity. I also ask myself the question, "is it possible that a Kohonen network could be used for its self-organizing map capability?"

Another interesting application might be document tagging. We had an interesting presentation the other day about the semantic web and about how the whole WWW would be so much more useful if all the content of all documents were tagged and the tags were all related in some kind of object oriented structure. When I first started working for the Navy, I worked on a project where we were trying to convert all of our electronic documents (MS Word, WordPerfect, Interleaf and a variety of others) into SGML. There was a company that had some software that touted the ability to look at a chunk of text and based on the shape of the text, a tag would be proposed and if this shape recurred, the tag would be re-used for all similar looking chunks of text. It's possible that neural networks could be used for examining the shape of paragraphs, tables and other objects in a group of documents and then help the system to determine which type of tag or object identifier should be assigned to the chunk of text based on prior training. Also, the system could learn based on the repeated occurrences of similar looking chunks of information in the document.

Ok, enough pondering. It seems like the most common, straightforward application for neural networks is optical character recognition. I found two nice, comprehensive tutorials on neural networks. One was a book by Jeff Heaton and, fortunately, most of the book is online (see references). Also, the sample Java code is online. Unfortunately, the book and the code rely on JOONE and I had a problem getting the JOONE editor to work with my new Microsoft Vista-based laptop. As I understand it, I could have created some apps without the JOONE editor, but I decided to keep looking for a simpler, library independent example of neural networks. I found it in spades in the form of an open source book by Mark Watson. His online book (see references) actually provides quite a nice supplement to all of the topics we've covered in this course and he offers sample Java code as a download along with the PDF version of the book. So, my simulations are based on Mark Watson's code. His examples are pretty generic, but easily could be applied to optical character recognition.

2. Explain the training set.Back to TopAccording to Mark Watson, "For back propagation networks, training data consists of matched sets of input and output values. We want to train a network to not only produce the same outputs for training data inputs as appear in the training data, but also to generalize its pattern matching ability based on the training data."

So, in the case of optical character recognition, you might have several examples of a particular letter like, "a", which come from a variety of print sources and then an example of what you expect an "a" to look like (in digital terms). So, ultimately, you'll end up with multiple inputs with the same output.

Mark Watson goes on to provide some tips and tricks regarding the training data, "A key here is to balance the size of the network against how much information it must hold. A common mistake when using back propagation networks is to use too large of a network (more on what this means later); a network that contains too many neurons will simply memorize the training examples, including any noise in the training data. However, if we use a smaller number of neurons, with a very large number of training data examples, then we force the network to generalize, ignoring noise in the training data."

3. Explain the training process.Back to TopThe training process takes the sample inputs and outputs and establishes the appropriate weights for all the connections between neurons in the network. Once these weights have been established, then new data can be added and processed appropriately.

Mark Watson explains the process in a more detailed manner:

In order to train the network, we repeat the following learning cycle several times:

Zero out temporary arrays for holding the error at each neuron. The error, starting at the output layer, is the difference between the output value for a specific output layer neuron and the calculated value from setting the input layer neuronâ€™s activation values to the input values in the current training example, and letting activation spread through the network.

Update the weight W[i,j] (where i is the index of an input neuron, and j is the index of an output neuron) using the formula W[i,j] += learning_rate * output_error[j]*I[i], where the learning_rate is a tunable parameter, output_error[j] was calculated in step 1, and I[i] is the activation of input neuron at index i.

4. Explain the effectiveness (%correct on the training set) of the system after it has been trained.Back to TopIt seems like the best answer here is to say that results may vary. When running the training data, you have two possible end conditions: 1) you've met the threshold you've designated for output error (hopefully low) or 2) you've run through a specified number of cycles and never got the output error low enough. Obviously, if this is the case, it's entirely possible to have a neural network be completely ineffective with a particular set of data.

Mark Watson also provides several clues about how to improve the results for a neural network:

Effectively using back propagation neural networks in applications is somewhat of an acquired art. The following ad hoc notes are derived from my experience of using neural networks over the last 18 years:

Get as much training data as possible: an effective neural network has the ability to generalize from training data, usually in the sense of ignoring noise and spurious training examples. For this to occur, you might need thousands of training examples, depending on the complexity of the problem. Also, some training data can be set aside (i.e., not used for training) and saved for testing the network.

For very large training data sets, try using only 10% of the data initially for training. After the output errors are small, add in more training data sets; repeat this process until training on the entire training data set produces small errors.

Do not use too many hidden layer neurons: if there are too many hidden layer neurons, then the network will not learn to generalize, rather, it will just remember all of the training data, including noise and bad training examples. Start with a very small number of hidden neurons, and see if the training data set can be learned with very small output errors; if necessary, slowly increase the number of hidden neurons, and repeat the training process.

5. Explain how well the system generalized to data that was not in the training set.Back to Top6. ReferencesBack to Top