Has anyone succeeded in finding a good Scala/Spark kernel for Jupyter?

Question

Has anyone succeeded in finding a good Scala/Spark kernel for Jupyter?

Varun Gawande

2021年8月5日 03:09

The ones I've tried so far

Almond: Works very well for just Scala, but you have to import dependencies, and it gets tedious after a while. And unfortunately can't run when using Spark with YARN instead of Local.

Spylon-kernel: Kernels connects, but gets stuck in the initializing stage.

Apache Toree: I would've loved this so much only if it worked. Lots of language support, magics, incubated by apache. However, this kernel doesn't connect. Get's stuck on the Kernel Connecting stage.

Are we to never run Spark with the freedom and flexibility of Jupyter?

What curse is this?

Topic kernel jupyter scala apache-spark

Category Data Science

girip11 · Accepted Answer · 2021年3月1日 16:05

I am using both almond and toree. For executing only scala code, I am using almond. If you want to learn new features in scala (ex: 2.13), then use almond(Add Scala 2.13.4 support in almond v0.11.0). Almond might start supporting scala 3.0 once it is released.

For spark, I am using toree. Toree works fine for me. I am using toree 0.5.0 with spark 3.0.0. Toree uses scala 2.12.

First install spark 3.0.0 by following the steps provided here. After installation check if you are able to launch spark shell from the terminal.

#toree installation
#I used poetry for managing deps in virtual environment
pip install https://dist.apache.org/repos/dist/dev/incubator/toree/0.5.0-incubating-rc1/toree-pip/toree-0.5.0.tar.gz

# I installed/copied extracted spark binaries to /opt/spark
jupyter toree install --user --kernel_name=toree --spark_home=/opt/spark

With these steps, you should find the new kernel added to ~$HOME/.local/share/jupyter/kernels/. Launch jupyter notebook and you should be able to find the toree kernel listed when trying to create a new notebook. You could also execute jupyter kernelspec list to see if the new kernel is listed.

If you are able to create a new notebook, then you should have the kernel initialized without any issues. Try executing println("Hello world") and it should be successful.

I have captured installation summary here

Has anyone succeeded in finding a good Scala/Spark kernel for Jupyter?

About