Jupyter Notebook not importing pandas module

I am running Jupyter on a server on a virtual environment. I then tunnel my connection so I can access Jupyter on my browser. When I SSH into the server, I can use the Panda module in both Ipython and Python3. I ran this code in Ipython import pandas as pd In [2]: print(pd.__file__) /home/ubuntu/.local/lib/python3.6/site-packages/pandas/__init__.py Then I tried adding it to my path in Jupyter with the code below, still no luck. import os os.getcwd() import sys sys.path.append('/home/ubuntu/.local/lib/python3.6/site-packages/pandas/__init__.py') import pandas …
Category: Data Science

Two-sided grubbs test KeyError: 355

I am trying to this two sided grubbs test by passing in a pandas.Series object and an appropriate alpha value. whenever I do the test on the whole dataset, I have no problem. However, when i divide the dataset by a criterion, lets say id, in the format of a dictionary id: subset-df, and pass a series from the subset dataframe it gives me a KeyError. This is my code: for k, v in sensor_id_to_data.items(): # print(f'FOR SENSOR_ID: {k}') # …
Category: Data Science

How to download a Jupyter Notebook from GitHub?

This is a fairly basic question. I am working on a data science project inside of a Pandas tutorial. I can access my Jupyter notebooks through my Anaconda installation. The only problem is that the tutorial notebooks (exercise files) are on GitHub. My question: how do I download the exercise files from GitHub and then have them display in the Jupyter notebook section on my computer so that I can use them interactively? I am very new to Jupyter Notebooks. …
Category: Data Science

how to export the tables into a csv file pandas

The following is a piece of code I wrote to create a pivot table for categorical vs continuous variable. for row in categorical: for col in numeric: ptable = pd.pivot_table(df, values = col, index = row, aggfunc = ['min','max','median','mean','std',lambda x: 100*x.count()/df.shape[0]]) print(ptable) writer = pd.ExcelWriter('report.xlsx') ptable.to_excel(writer, 'Sheet1') writer.save() It displays the output as in the image: but this is not a data frame and when writing into an excel file it displays only the last iteration values. how do I …
Category: Data Science

Generate pdf from jupyter notebook without code

I have a Jupyter notebook that contains markdown, code, and outputs (graphs). I would like to generate PDF from this notebook. I tried to hide code using HTML code which I get from here then I tried to download it as pdf but again code shows up. But when I download it as HTML it don't show any code but again when I tried to convert HTML to pdf it again shows code.
Category: Data Science

How to Use Shap Kernal Explainer with Pipeline models?

I have a pandas DataFrame X. I would like to find the prediction explanation of a a particular model. My model is given below: pipeline = Pipeline(steps= [ ('imputer', imputer_function()), ('classifier', RandomForestClassifier() ]) x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0) y_pred = pipeline.fit(x_train, y_train).predict(x_test) Now for prediction explainer, I use Kernal Explainer from Shap. This is the following: # use Kernel SHAP to explain test set predictions shap.initjs() explainer = shap.KernelExplainer(pipeline.predict_proba, x_train, link="logit") shap_values = explainer.shap_values(x_test, nsamples=10) # …
Category: Data Science

How to group data and plot line graphs

This is the first time I am using pandas and iPython notebook and was not able to figure out the correct search terms for my problem. I have a .xls file for compile time data for 3 build-servers located at 3 sites A, B and C. These build servers compile multiple projects, so i will pick any specific project. Hence I need to plot data like this (for a specific project - not all in one graph, to keep it …
Category: Data Science

Jupyterlab Inline Interactive plot

I am trying to make my inline plots in jupyterlab interactive. So far, I have tried a suggestion as pointed out here, among others: https://stackoverflow.com/questions/50149562/jupyterlab-interactive-plot # %matplotlib notebook - does not work : Javascript Error: IPython is not defined # %matplotlib widget - works, but plots are overwritten The widget magic works in making the plots interactive, but unfortunately, my plots are overwritten. Subsequent cells render plots on top of the output of cell 1 as below: plt.scatter(trainData['x'], trainData['y'], color='Red', …
Category: Data Science

Open-source interactive dashboard in Python

I am trying to find a package to construct a dashboard with interactive graphs (including widgets such as sliders) in python (mainly IPython notebook). I know there is plotly but I would like a fully open-source solution without constraints (i.e. having a public repository as with plotly without the subscription fee). I have looked at IPython Dashboard package but it is not compatible with python 3 (because of MySQL-python). Has anyone had any luck with any other package?
Category: Data Science

How to export one cell of a jupyter notebook?

I'm currently working/prototyping into a Jupyter notebook. I want to run some of my code on a standalone iPython shell. For now, I export my iPython code (file --> download as) and then execute it in my iPython (with %run). It works, but I would like to export only one cell or set of cells. So, that I can run only what I modified in my Jupyter notebook.
Category: Data Science

Access keys of pandas dataframe when using groupby

I have the following database: And I would like to know how many times a combination of BirthDate and Zipcode is repeated throughout the data table: Now, my question is: How can I access the keys of this output? For instance, how can I get Birthdate=2000101 ZipCode=8002, for i = 0? The problem is that this is a 'Series' object, so I'm not able to use .columns or .loc here.
Category: Data Science

Confusion matrix doesn't display properly

I am trying to plot a confusion matrix using the Logistic Regression for a multi-class dataset. But the problem is when I plot the confusion matrix it only plot a confusion matrix for binary classification. Here is where I am plotting it. %matplotlib inline import matplotlib.pyplot as plt import pandas as pd dataframe = pd.read_csv("WA_Fn-UseC_-HR-Employee-Attrition.csv") from sklearn.linear_model import LogisticRegression LRModel = LogisticRegression(C=100, max_iter=5500) LRModel.fit(X_train, y_train) predicted_values_ = LRModel.predict(X_test) from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, predicted_values_) misclassified = (y_test != …
Category: Data Science

Python Code to find the number of hapax legomena in a Text or Words_List

In corpus linguistics, a hapax legomenon is a word that occurs only once within a context, either in the written record of an entire language, in the works of an author, or in a single text. The term is sometimes incorrectly used to describe a word that occurs in just one of an author's works, but more than once in that particular work. Hapax legomenon is a transliteration of Greek ἅπαξ λεγόμενον, meaning "(something) being said (only) once" Hapax_legomenon enter …
Category: Data Science

My corr() function in Python keeps resulting in an "ValueError: The truth value of a Series is ambiguous..."

I am a very inexperienced programmer, this is my first question on the Data Science StackExchange, I sorry if it is formatted poorly or comes across as basic. For some strange reason, in Python, whenever I try to run a correlation function on the population density & total cases per million columns of my COVID-19 DataFrame (which I imported/read into Spyder as a csv), I keep getting the same long error message, namely, "ValueError: The truth value of a Series …
Category: Data Science

How to install Polynote on Windows?

I've been searching around the Internet for a while but I have not been able to find detailed instructions on how to install Polynote (the polyglot notebook with first-class Scala support) for Windows with mixing multiple languages, Python and Scala. Github Link for Polynote. Official Website. According to the official website: Polynote is currently only tested on Linux and MacOS, using the Chrome browser as a client. We hope to be testing other platforms and browsers soon. Feel free to …
Category: Data Science

How to run a pyspark application in windows 8 command prompt

I have a python script written with Spark Context and I want to run it. I tried to integrate IPython with Spark, but I could not do that. So, I tried to set the spark path [ Installation folder/bin ] as an environment variable and called spark-submit command in the cmd prompt. I believe that it is finding the spark context, but it produces a really big error. Can someone please help me with this issue? Environment variable path: C:/Users/Name/Spark-1.4;C:/Users/Name/Spark-1.4/bin …
Category: Data Science

Issue with IPython/Jupyter on Spark (Unrecognized alias)

I am working on setting up a set of VMs to experiment with Spark before I spend go out and spend money on building up a cluster with some hardware. Quick note: I am an academic with a background in applied machine learning and work quit a bit in data science. I use the tools for computing, rarely would I need to set them up. I've created 3 VMs (1 master, 2 slaves) and installed Spark successfully. Everything appears to …
Category: Data Science

How to group identical values and count their frequency in Python?

Newbie to analytics with Python so please be gentle :-) I couldn't find the answer to this question - apologies if it is already answered elsewhere in a different format. I have a dataset of transaction data for a retail outlet. Variables along with explanation are: section: the section of the store, a str; prod_name: name of the product, a str; receipt: the number of the invoice, an int; cashier, the number of the cashier, an int; cost: the cost …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.