Data Science for Due 04/25/2021
Assignment 4
Part 1: Regression
Correlation and regression analysis are related in the sense that both deal with relationships among variables. The correlation coefficient is a measure of linear association between two variables. Values of the correlation coefficient are always between -1 and +1
In the following Linear Regression applet there are 10 points plotted in the coordinate plane. The line in the graph represents the best fit line for these 10 points. The correlation coefficient symbol is r.
https://www.geogebra.org/m/rJj6yr6C#material/nFJp7McJ
Interact with this applet by repositions the points (by dragging the points) before start answering the following questions:
1. Reposition the points so that the correlation coefficient (r) to 1. What does it mean to have r =1?
2. Reposition the points so that the correlation coefficient (r) to -1. What does it mean to have r =-1?
3. Reposition the points so that the correlation coefficient (r) to 0 or very close to zero. What does it mean to have r =0?
Include screenshots for every part and make a comparison between the three different scenarios in terms of the correlation between the two variables. Discuss your results.
Part 2: K-Means Clustering
In the following link you will find a visualization to the K-Means Clustering Algorithm.
https://www.naftaliharris.com/blog/visualizing-k-means-clustering/
Read the article and try to test the visualization before start answering the following questions:
In the following questions, use the
same dataset
to make comparisons between the three different strategies: (1) you choose the centroids, (2) Randomly, or (3) choose the farthest point.
1. Choose the first strategy to initial the centroids by “choosing them by yourself”. Include screen shots for the steps. How many iterations the algorithm did till it finds the best clusters?
2. Choose the second strategy to randomly choose the centroids. How many iterations the algorithm did till it finds the best clusters?
3. Choose the third strategy by using the Farthest point as the centroids. How many iterations the algorithm did till it finds the best clusters?
Discuss your conclusion about using the three different strategies. Add any interesting facts/notes that you found when tried this visualization.
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more