Thursday, September 30, 2021

Random walks in Python

 

Random Walks in Python using Matplotlib

In this blog, we’ll use Python to generate data for a random walk, and then use Matplotlib to create a visual representation of that data. A random walk is a path that has no clear direction but is determined by a series of random decisions, each of which is left entirely to chance. Take for example the path a confused ant would take if it took every step in a random direction. Random walk models are used in many real-world situations. Here is a real-world application of random walks.

Importing the modules

For this, we require two modules; Matplotlib and Random

import matplotlib.pyplot as plt

from random import choice

To make random decisions, we’ll store possible moves in a list and use the choice() function, from the random module, to decide which move to make each time a step is taken. Hence the random module.

 The RandomWalk() Function

def RandomWalk(num_points):

    x_values = [0]

    y_values = [0]

When calling the function, we will require to pass in the number of points that the random walk will make. Then, we make two lists to hold the x- and y-values, and we start each walk at the point (0, 0).

def RandomWalk(num_points):

    x_values = [0]

    y_values = [0]

 

    """Keep taking steps until the walk reaches 

the desired length."""

    while len(x_values) < num_points:

 

        """Decide which direction to go and how far 

to go in that direction."""

        x_direction = choice([1-1])

        x_distance = choice([01234])

        x_step = x_direction * x_distance

 

        y_direction = choice([1-1])

        y_distance = choice([0,1,2,3,4])

        y_step = y_direction * y_distance

 

        """Reject a move that goes no-where"""

        if x_step == 0 and y_step == 0:

                continue

 

        x = x_values[-1+ x_step

        y = y_values[-1+ y_step

 

        x_values.append(x)

        y_values.append(y)

 

Next, we set up a loop that runs until the walk is filled with the correct number of points. The main part of the loop tells Python how to simulate four random decisions: will the walk go right or left? How far will it go in that direction? Will it go up or down? How far will it go in that direction? We use choice([1, -1]) to choose a value for x_direction, which returns either 1 for right movement or −1 for left. Next, choice([0, 1, 2, 3, 4]) tells Python how far to move in that direction (x_distance) by randomly selecting an integer between 0 and 4. (The inclusion of a 0 allows us to take steps along the y-axis as well as steps that have moved along both axes.). We need to determine the length of each step in the x and y directions by multiplying the direction of movement by the distance chosen. A positive result for x_step means move right, a negative result means move left, and 0 means move vertically. A positive result for y_step means move up, negative means move down, and 0 means move horizontally. If the value of both x_step and y_step is 0, the walk doesn’t go anywhere, so we continue the loop to ignore this move. To get the next x-value for the walk, we add the value in x_step to the last value stored in x_values and do the same for the y-values. When we have these values, we append them to x_values and y_values.

     x_values.append(x)

     y_values.append(y)

 

    """plot the points in the walk"""

    plt.style.use("classic")

    fig, ax = plt.subplots(figsize=(15,9))

    point_numbers = range(num_points)

    ax.scatter(x_values, y_values, c=point_numbers,

 cmap=plt.cm.Blues, edgecolors       ='none's=15)

    

    """Emphasize the first and last points"""

    ax.scatter(0,0c='green'edgecolors='none's=100)

    ax.scatter(x_values[-1], y_values[-1], c='red'

edgecolors='none's=100)

 

    ax.get_xaxis().set_visible(False)

    ax.get_yaxis().set_visible(False)

 

    plt.show()

We proceed to create a scatter plot using Matplotlib. First, is to specify the style that we will use. The variable fig represents the entire figure or collection of plots that are generated. The variable ax represents a single plot in the figure and is the variable used most of the time. In the ax.scatter function we pass the x and y values to be plotted. We also pass the color maps. Pyplot module includes a set of built-in color maps. To use one of these cmaps, you need to specify how Pyplot should assign a color to each point in the data set. Here is how to assign each point a color based on its y-value.

ax.scatter(x_values, y_values, c=point_numbers, 

cmap=plt.cm.Blues, edgecolors='none's=15)

Pass a list of y-values to c, and then tell pyplot which color map to use using the color map argument. This code colors the points with lower y-values light blue and colors the points with higher y-values dark blue. We also emphasize the starting points and ending points of our plot. Using green for beginning and red for the end of the walk.

RandomWalk(5000)

Finally, we call the function with the number of points we desire to be scattered on our random walk.




 


Labels: , , , , , , , , , , , , , , , ,

Wednesday, September 29, 2021

Web 3.0

 


What is web 3.0?

TechTarget defines web 3.0 as the third generation of internet services for websites and applications that will focus on using a machine-based understanding of data to provide a data-driven and Semantic Web; the ultimate goal of web 3.0 is to create more intelligent, connected and open websites.

But before we talk about web 3.0 further, let's look at the other web technologies.

Web 1.0

This is the first stage of the World Wide Web revolution (WWW) and is also referred to as read-only web. The web began as an informational place for people to broadcast their information and only allowed users to search for information and read it. There are no advertisements while surfing on the web and is mainly used for personal websites. It has static web pages and uses basic Hypertext Mark-Up Language

Web 2.0

This is basically the social web. It contains tools and platforms for people to share their perspectives, opinions, thoughts and experiences. Web 2.0 applications tend to be more interactive with the end-user. “Web 2.0 is the business revolution in the computer industry caused by the move to the internet as a platform, and any attempt to understand the rules for success on that new platform.”– Tim O’Reilly.


Web 3.0 is characterised by some exiting features; 

Semantic Web

The next evolution of the Internet involves the semantic web. The Semantic Web will improve web technologies to generate, share and combine content through search and analysis based on the ability to understand the meaning of words, not keywords or figures.

Artificial Intelligence

By combining this capability with natural language processing computers will understand information just like humans, delivering faster and more relevant results. They are getting smarter to meet the wants of their users.

3D Graphics
More information connection
Ubiquity - Content is accessible by multiple applications

There is no clear definition or unique explanation of web 3.0. I think to really understand what it is we will have to wait a while longer. Then, we shall be able to compare it with web 1.0 and web 2.0 and see the notable differences.

Labels: , , , , , , , , , , , , , , ,

Monday, September 27, 2021

Why you need math for programming

Do I need Math for programming?

If you are a HTML code ninja you definitely will not require math in your work. But for most programmers math is inevitable. Without mathematical knowledge, you are basically handicapped. Here are a couple of math topics essential for programmers. 

Linear algebra.

It is one of the most important areas of mathematics and is often found in programming. This is especially true for data scientists as matrices are widely used to represent data in any machine learning tasks. As a programmer, you should be familiar with various terms such as matrix, vector, identity matrix, transpose, inverse, linear equations, linear transformations, etc as they are all part of basic linear algebra.

Probability and statistics

Probability and statistics show up all the time; the entire field of machine learning is based on probabilities and statistics. Each machine learning algorithm is modelled with an underlying probability distribution that produces observable data.

Boolean Logic 

Programming borrows some Boolean algebra concepts from mathematics. For example, various logics like AND, OR, NOT, XOR, and XNOR are Boolean algebra concepts. They all form the basis for understanding programming.

Calculus

Calculus is another important part of programming. Calculus problems are practically constant in machine learning. With any machine learning problem, the ultimate goal is to optimize the cost function. This optimization requires the extensive use of multivariate calculus, which are taught as part of a university curriculum. Calculus is also widely used in simulation-based programs where different objects interact with each other. The interaction is shaped by the laws of physics, which are ultimately supported by heavy math.

Whether you need math for programming depends on what you want to do with programming. While most programming doesn't require as much math it is important to understand the concepts of math that give coding its foundations. Most programming math is usually basic arithmetics (Addition, Subtraction, Multiplication and Division). It is therefore safe to say unless you are looking to dig dip into programming, you already know enough math to write meaningful code. 



Labels: , , , , , , , , , , , ,

Friday, September 24, 2021

Statistics Student take on Data Science


What is Data Science?

Data science is not about making complicated models, it is not about making awesome visualizations and it is not about writing code. Data science is about using data to create as much impact as possible for your company. The impact can be in the form of multiple things; it could be insights, data products or product recommendations for a company. Now to do those things then you need tools like making complicated models or data visualization or writing code. But essentially as a data scientist, your job is to solve real company problems using data. And what kind of tools do you use? They don't care!

There is a lot of misconception about data science especially on youtube and I think the reason for this is that there is a huge misalignment between what's popular to talk about and what's needed in the industry.

Before data science, we popularized the term data mining. In an article called From Data Mining to Knowledge Discovery in Databases data mining is described as the overall process of discovering useful knowledge from data. In 2001 William S. Cleveland wanted to bring data mining to another level. He did that by combing statistics with computer science. Basically, he made statistics a lot more technical which he believed will expand the possibilities of data mining and produce a powerful force for innovation. Now he could take advantage of computing power for statistics and he called this combo Data Science. It is also around this time that web 2.0 emerged. Websites were no longer just a digital pamphlet but a medium for shared experience among millions and millions of users. These were websites like myspace(2003), Facebook(2004) and YouTube(2005). People could now interact with this website meaning that they could contribute, post, comment, like, upload and share leaving their footprints in the digital landscape we call the internet. Eventually, so much data was created that it become too much to handle using traditional technologies. So we call this Big Data. That in turn opened a world of possibilities in finding insights using data. But it also meant that the simplest questions required sophisticated data infrastructure just to support the handling of the data. We needed power log computing technology like Hadoop, MapReduce and Spark. The rise of big data in 2010 sparked the rise of data science to support the needs of businesses and to draw insights from their massive unstructured datasets.

The journal of data science describes data science as almost everything that has something to do with data: Collecting, analyzing, modelling... yet the most important part is its applications--all sorts of applications. eg Machine Learning. In 2010, with the new abundance of data, it was possible to train machines with a data-driven approach rather than a knowledge-driven approach. All the theoretical papers about recuring neuro networks, support vector machines (SVN) become feasible. Some that changed the way we lived and how we experienced things in the world. Deep learning is no longer an academic concept in this thesis papers. It became a tangible useful class of machine learning that would affect our everyday lives. Machine learning and Artificial Intelligence dominated the media overshadowing every other concept of Data Science like exploratory analysis, experimentation and skills we traditionally called Business Intelligence. Now the general public thinks of data scientists as researchers focused on machine learning and AI but the industry is hiring data scientists as analysts so there is misalignment there. The reason being yes most of these Data scientists can work on more technical problems but big companies like Google, Facebook, Amazon have so many low hanging fruits to improve their products that they don't require any advanced machine learning or the statistical knowledge to find these impacts in their analysis.

 Being a good data scientist isn't about how advanced your models are. It is about how much impact you can have with your work. You are not a data cruncher, you are a problem solver. You are a strategist. Companies will give you the most ambiguous and hard data problems and expect you to guide them in the right direction.

Labels: , , , , , , ,