My takes & notes | Coursera | Algorithms: Design and Analysis , part 1 | Introduction

This is my take and notes on the course offered by Professor Tim Roughgarden of Standford on the Coursera platform. You can find the course here.

These are my notes and takes on the “I. Introduction (week1)“.

 

Introduction

Since I have a background in algorithms, I will give you my personal take on algorithms. Algorithms are basically like recipes. It is how you can turn the raw ingredients into a great edible meal. The ingredients are the input of the algorithm and the meal is the output. Just as we may have little or no ingredients at all, at the same time we can have limited inputs or nothing. Also do again notice that no matter how much ingredient we have, the meal (output) may or may not be tasty or satisfactory and what you intended in the first place.

One thing that you must know is that computers are STUPID! In fact very very stupid. If you were to ask a robot to bring you a glass of water, they would just sit and watch you. They cannot relate in any way, shape or form. You have to break it down for them.  Break it down to the level they understand. Firstly you break down to:

  1. Walk to the kitchen
  2. Get water
  3. Come back

Let’s break it down for them, since the above is still vague to machines:

  1. Walk to the kitchen
    1. Walk out of the room
    2. Walk into the kitchen
    3. Walk to the sink
  2. Get water
    1. Get a clean glass
    2. Turn on the tap
    3. Fill the glass with water
    4. Turn off the tap
  3. Come back
    1. Walk out of the kitchen
    2. Walk into the room
    3. Hand it over to me

Now you think that this is enough for a machine to comprehend your intentions? If you answered yes, you are sadly mistaken. Given the above instructions (aka recipe, aka algorithm), the machine would say such things as:

  • What is water?
  • What is room?
  • What is kitchen?
  • What is tap?
  • What does turn on mean?
  • Where is the kitchen?
  • What is a glass?
  • Where is the glass?

You may ask what this machine knows then. Aha! It knows some terms, and you need to speak in those terms in order to be able to instruct the machine.

Firstly different machines have different capabilities. Some machines “know”  things beforehand.

Then what do machines/computers understand? What are their capabilities? They are good at math and logic. The entire machines are based upon Boolean algebra. They are very very fast at very tiny small stuff, which ultimately makes them ideal for humanistic uses.

Let’s say our machine in the example above understands location x and y, understand to go from location A (at x0, y0) to go to B (at x1, y1), can raise its hand, lower its hand (yea, it’s a robot), twists its wrists clockwise or counter-clockwise. Now all you need to do is to define the kitchen sink at x, y and the room (you) at the x, y the location the clean glass at x, y and such. Also you need to define turning on the tap as:

  1. Raise your hand
  2. Twist the knob to turn on the tap

Or something in those lines. Both of the above instructions are comprehendible by the machine. It understands them.

In order to program we have to go from a natural language that is understandable by all humans, to the machine level; simple logic and mathematics.

I need to redefine the above sentence; machines are just as smart as their instructors. They are just as smart as their programmers. (Yes, programmers are, in fact, instructors.)

Now after this long personal introduction, let’s go the course that has been offered.

“Perhaps the most important principle for the good algorithm designer is to refuse to be content.” (Aho, Hopcro‘, and Ullman, The Design and Analysis of Computer Algorithms, 1974)

There are more than one solution to any problem. There are different algorithms that solve a particular problem. We must understand which is the best. and defining best is also important. What do we mean when we say that a particular algorithm is the best? is it the fastest? does it use the least amount of space? Does it only return one answer? is that one answer the most optimum? does it return all the possible answers?

For different situations, different algorithms are useful. There might not be any “best” algorithms, but suitable ones for suitable conditions.

 

Merge Sort

The idea of divide and conquer is introduced. Meaning that a problem is divided to many small sized problems. Solving (conquering) all the smaller problems will result in solving the entire problem.

 

Merge Sort

Merge Sort – Overview

 

Merge Sort - Pseudocode

Merge Sort – Pseudocode

 

through an example of of merge sort, the algorithm is analysed.

Also a good definition of fast algorithm is given:

A fast algorithm:  worst-case running time grows slowly with input size

stay tuned for future chapters.

Lesson 10: Reinforcement Learning notes

These are some of my personal notes on the “Lesson 10. Reinforcement Learning” that is offered at www.AI-class.com. Please be reminded that this is in no sense a complete summary of the lesson. It is purely a selection of the parts I find it to be important.

 

Forms of Learning

Three forms of (machine) learning:

  1. Supervised
  2. Unsupervised
  3. Reinforced
In supervised learning we provide data with the appropriate “labels“. We know what these data is putting out and what is the exact output. There is no guessing about their type or category.
In unsupervised learning the presented data isn’t given with a particular label. The task is to find the label and label the data. Unsupervised learning might be considered as some sort of classification. The data is given with some features, the outcome is to classify the data (according to the features) and to put them in appropriate clusters. The algorithm couldn’t care less about what is the actual label here, it just knows that (with some accuracy) it belongs to a certain group or cluster.
Lastly, reinforced learning involves some sort of reward or punishment in its algorithm. If the agent gets to the goal, it will be awarded, if not it will be punished. This is the same system in which animals are trained to do certain tasks. e.g. rewarding dolphins for every trick they do correctly.

Reinforced Learning

The right chart is supposed to reach zero but, alas it only comes close and after 80 number of trials stays at the same value. One might say it converged to a wrong value.
The left chart’s horizontal label actually reads “Utility estimates“. Here we know that the correct and actual true value is one, but again it doesn’t converge correctly and has a huge amount of error.
Passive TD algorithm on the 4×3 maze – simulation results
This is actually two charts crunched into one chart for the greedy agent. Both the “RMS error” and “Policy loss” should ideally be equal to zero and meet the horizontal axis. It is evident that after the first 40 trials, it dropped steeply close to the ideal values. Needless to say after that it almost stays at the same level with some minor variations.
Greedy Agent: simulation results

 

This algorithm is operating more efficiently. The above charts are mostly with 500 number of trials. However, this one provides better results in merely 100 iterations. Even before the 20th iteration, it reaches to errors of less than 0.2.
Exploratory Agent: simulation results
Each S (state) is a collection of features. These features are arbitrary and mostly try to quantize (in some sense)  the state. e.g. distance of the state from the Goal (desirable), distance of an enemy (undesirable)
each state is defined by a number of features

The weight (Wi) is a measure of importance of a feature.
It can be:
  1. Positive
  2. Negative
  3. Zeroes
If it is positive it is desirable and takes the agent closer to the goal.
If it is negative it is undesirable and takes the agent further away from goal.
if it is zero, then … well … it is not effective at all. It is not a feature at all. It doesn’t take us any closer or further.

 

Also the magnitude of the value |Wi| is significant. The bigger the absolute value of the weight, the bigger role the feature plays for better (positive) or worse (negative).
each feature is multiplied by the “weight”

 

Q-learning, each state is divided into 4 different actions: North,West,South and East.

 

Relevant Videos

As simply a single search would provide you with tons of videos, I found these interesting enough to share here.
The first one is of a goal keeper that learns through reinforce learning algorithms. At first, the performance is quite poor. However, through the algorithm, it learns from its previous mistakes to make a  better move. It is quite interesting to see that by the end it is so efficient that actually follows the ball before it leaves the hands of the person.

   

The next video is of the same type but here the robot learns to move on the current ground. At first, it is making lots of mistakes. By the end it manages to move smoothly.

    

And the last one is of Pacman itself. It is very entertaining to see Pacman going back and forth in some cases.

   

Thanks for reading. If you liked it, please do share it with others.
References:

 

  1. AI Class
  2. https://www.ai-class.com/course/video/videolecture/129