Changes in the standard Heatmap plot - symmetric bar colors, show only diagonal values, and column names at x,y axis ticks

I have a heatmap image (correlation between all matrix columns) and I'm straggling to preform all the changes below within the same image: bar colors should be symmetric around zero (e.g., correlation of 1 and -1 should be with the same color) change the correlation matrix to a diagonal matrix, since correlation values are symmetric - and show only upper matrix triangle (mask out the lower triangle ) show the correlation values in every cell of the diagonal matrix x,y …
Category: Data Science

need an explanation of the For Loop in the DBSCAN algorithm Demo

In the following code of the DBSCAN algorithm, as a beginner I need an explanation for what happens to the data in the bottom for loop and why ? Generate sample data import numpy as np from sklearn.cluster import DBSCAN from sklearn import metrics from sklearn.datasets import make_blobs from sklearn.preprocessing import StandardScaler centers = [[1, 1], [-1, -1], [1, -1]] X, labels_true = make_blobs(n_samples=750, centers=centers, cluster_std=0.4, random_state=0) X = StandardScaler().fit_transform(X) Compute DBSCAN db = DBSCAN(eps=0.3, min_samples=10).fit(X) core_samples_mask = np.zeros_like(db.labels_, dtype=bool) …
Category: Data Science

Why is matplotlib not plotting some boxplots?

I am trying to plot some data so get statistics about it, but matplotlib simply can't plot it as boxplots. I tried with histograms and it workd well: But when i change the code to plot boxplots it just doesnt work: I know that the y axis is in the wrong place, but I even searched on where it should be (for example SAQRS in the range of -150 to 50) but even there there is nothing. The plotting code …
Topic: matplotlib
Category: Data Science

How can I adjust the legend when visualizing clusters in two dimensions?

How can I change the legend as we can see now the legend has some cluster numbers missing. How can I adjust the legend so that it can show all the cluster numbers (such as Cluster 1, Cluster 2 etc, no it's only 0 3 6 9)? (codes I followed this link: Perform k-means clustering over multiple columns) kmeans = KMeans(n_clusters=10) y2 = kmeans.fit_predict(scaled_data) reduced_scaled_data = PCA(n_components=2).fit_transform(scaled_data) results = pd.DataFrame(reduced_scaled_data,columns=['pca1','pca2']) sns.scatterplot(x="pca1", y="pca2", hue=y2, data=results) #y2 is my cluster number plt.title('K-means …
Category: Data Science

please help, i got an error while trying to my data, i got an error like x and y must be thesame size

import pandas as pd import numpy as np import matplotlib.pyplot as plt data = pd.read_csv('housing.csv') data.drop('ocean_proximity', axis=1, inplace = True) data.head() longitude latitude housing_median_age total_rooms total_bedrooms population households median_income median_house_value 0 -122.23 37.88 41.0 880.0 129.0 322.0 126.0 8.3252 452600.0 1 -122.22 37.86 21.0 7099.0 1106.0 2401.0 1138.0 8.3014 358500.0 2 -122.24 37.85 52.0 1467.0 190.0 496.0 177.0 7.2574 352100.0 3 -122.25 37.85 52.0 1274.0 235.0 558.0 219.0 5.6431 341300.0 4 -122.25 37.85 52.0 1627.0 280.0 565.0 259.0 3.8462 342200.0 …
Category: Data Science

How to plot segmented bar chart (stacked bar graph) with Python?

cat = {'A':1, 'B':2, 'C':3} dog = {'A':2, 'B':2, 'C':4} owl = {'A':3, 'B':3, 'C':3} Suppose I have 3 dictionary, each containing pairs of (subcategory, count). How can I plot a segmented bar chart (i.e stacked bar graph) using Python with x being 3 categories (cat, dog, owl) and y being proportion (of each subcategory)? What I have in mind looks like this:
Category: Data Science

How to plot the bar charts of precision, recall, and f-measure?

I have used 4 machine learning models on a task and now I am struggling to plot their bar charts just like shown below in the image. I am printing classification report to get precision, recall etc. My code is shown: def Statistics(data): # Classification Report print("Classification Report is shown below") print(classification_report(data['actual labels'],data['predicted labels'])) # Confusion matrix print("Confusion matrix is shown below") cm=confusion_matrix(data['actual labels'],data['predicted labels']) plt.figure(figsize=(10,7)) sn.heatmap(cm, annot=True,cmap='Blues', fmt='g') plt.xlabel('Predicted') plt.ylabel('Truth') Statistics(data) How can I plot this type of chart …
Category: Data Science

Visualization with many lines, colors, and markers

I have a bunch of plots as the one reported below. The data is from measurements performed on different times and different days. In the plot (which is a cumulative distribution function, if that matters), the colors differentiate data relevant to different days; the markers are used to further differentiate the data within each day. The problem is that the plot is very crowded and a bit ugly. Some markers can be barely seen. Question: Any idea how I can …
Category: Data Science

How can I overlay a contour map over a picture of a country?

So for context, I have a massive dataset of over 2.7 million rows of average download/upload speeds of individuals in Canada, with province/city columns. I would like to plot a contour map of average down/up speed over a picture of the country Canada, kind of like this: https://www.floodmap.net/Elevation/ElevationMap/CountryMaps/?cz=US_1 But unfortunately I have no clue on how to make something like that. I would really appreciate it if someone could point me to the right direction.
Category: Data Science

Distribution of Regression Residuals: Is this a normal distribution?

I've created a histogram as well as a QQPlot from the residuals of my Regression Model: Mean: 0.35 Standard Deviation: 18.14 Judging from these plots, is it okay to say that my residuals are normally distributed? Or what else can I draw from these plots? Update: Created the Histogram using ns.distplot(x, hist=True) Here's the result:
Category: Data Science

Unsynchronized time series visualization

I would like to visualize a large amount of events composed of time serie windows. A typical event would be: Problem is, my events are not synchronized, and so if I plot them all, it would look like: Question Is there any way to visualize all my events so I can see their original/"typical" shape (preferably in the time domain) despite their unsynchronization ? What I have tried so far: Visualize features: approach is good but I have to guess …
Category: Data Science

Converting RGB values to contour values

I am using matplotlib to generate a filled contour plot, please consider the below example as a sample contour plot. I want to read off the contour values from such a filled contour plot using opencv's mouse interaction modules. For example, if the uses hovers the mouse over this contour image, it should dynamically display the contour values as the mouse moves over the image. I have the opencv part figured out, but I am struggling to link the RGB …
Category: Data Science

how do i add a dropdown menu display GOES image on a tkinter popup window

im trying to add a dropdown menu on my tkinter popup window but when ever i run it on my visual studio code ide nothing displays but when i run the code by it self on jupyter everything work fine so what is going on def btn4(): newWindow4 = Toplevel(root) newWindow4.title("GOES NOAA V1.0 ") newWindow4.geometry("1620x1300") fig = plt.figure(figsize=(6, 6)) canvas = FigureCanvasTkAgg(fig, master=newWindow4) canvas.get_tk_widget().pack(side=tkinter.TOP, fill=tkinter.BOTH, expand=1) channel_list = {u'1 - Blue Band 0.47 \u03BCm': 1, u'2 - Red Band 0.64 …
Category: Data Science

Plot multiple time series from single dataframe

I have a dataframe with multiple time series and columns with labels. My goal is to plot all time series in a single plot, where the labels should be used in the legend of the plot. The important point is that the x-data of the time series do not match each other, only their ranges roughly do. See this example: import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame([[1, 2, "A", "A"], [2, 3, "A", "A"], [3, 1, …
Category: Data Science

Plot three series on the same plot grouping data by day and month

I have a dataset containing three years of data which I would like to plot and compare by date and month; but, I am having a hard time with the final result. I am nearly there, but for some strange reason, while plotting I continue to get an annoying gap in between the data points, even if this does not seem to be included in the data series. The whole dataset is this: Day Visits 0 2018-04-01 1 1 2018-04-02 …
Category: Data Science

Plotting the confidence interval for a plot in python

I have a curve and I want to create the confidence interval for the curve. Here, I provide a simple example: mean, lower, upper = [],[],[] ci = 0.2 for i in range (20): a = np.random.rand(100) MEAN = np.mean(a) mean.append(MEAN) std = np.std(a) Upper = MEAN+ci*std Lower = MEAN-ci*std lower.append(Lower) upper.append(Upper) plt.figure(figsize=(20,8)) plt.plot(mean,'-b', label='mean') plt.plot(upper,'-r', label='upper') plt.plot(lower,'-g', label='lower') plt.xlabel("Value", fontsize = 30) plt.ylabel("Loss", fontsize = 30) plt.xticks(fontsize= 30) plt.yticks(fontsize= 30) plt.legend(loc=4, prop={'size': 30}) In the above example, I drew …
Category: Data Science

Time Series Plot for floating values

I have a Dataframe which looks as shown below I am trying to make a line plot for looking at the peaks for both columns (a,b), I have gotten as far as sns.set_style("darkgrid") plt.plot(wr['a'][:100]) plt.show() but the plot looks shabby, wr.set_index(['Date_x'],inplace=True) wr['a'][:100].plot() wr['b'][:100].plot() I am looking to have something like this Any Help is Appreciated.
Category: Data Science

Python: How to plot time interval from a Dataframe in Pandas

I have the a dataframe(df) which has the data of a Job being executed at different time intervals. It includes the following details about the execution of a job: Job Start Time (START) Job End Time (END) Time Interval (interval) i.e., END - START. A small part of dataframe is shown below. Dataframe(df): END | START | interval 1423.0 | 1357.0 | 66.0 33277.0 | 33325.0 | -48.0 42284.0 | 42250.0 | 34.0 53466.0 | 53218.0 | 248.0 62158.0 | …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.