How to compute the median of a Date type of column in Spark (JAVA)

I have extracted a column from a dataset that contains Date type of values: +-------------------+ | Created_datetime | +-------------------+ |2019-10-12 17:09:18| |2019-12-03 07:02:07| |2020-01-16 23:10:08| The Type of the column being StringType in Spark. And i want to compute the average of these dates, for example in the above case will be 2019-12-03 07:02:07 since it is the median date of the three dates. How to achieve that in Spark in Java? I tried using dataset.select(org.apache.spark.sql.functions.avg(dataset.col("Created_datetime").cast("timestamp"))).first().getDouble(0) But as it is …
Category: Data Science

Needed: Java library to calculate text readability/complexity

In principle the same as this but for Java (and ideally for multiple languages) (e.g. flesch reading ease, smog index, flesch kincaid grade, coleman liau index, automated readability index, dale chall readability score, linsear write formula, gunning fog etc). I guess there must be plenty of libs but I just cant find them ...
Topic: text java nlp
Category: Data Science

Lightweight execution of Spark MLLib models

I have some training data which I am using to build a Spark MLLib model which is in a Hive database. I am using simple linear regression models and the PySpark API. I have a code set up to train this model every day to get the most up-to-date model. (the real-world use case is that I am predicting vehicle unloading times, and my model must always be recently trained since the characteristics of the vehicles and locations change over …
Category: Data Science

Running Tensorflow MobileNet from Java

I am trying to run Tensorflow for image recognition (classification) in Java (JSE not Android). I am using the code from here, and here. It works for Inceptionv3 models, and for models retrained from Inceptionv3. But for MobileNet models, it does not work, (such as following this article). The code works but gives the wrong results (wrong classify labels). What code/settings are required to use MobileNet from Java? The code that works for Inceptionv3 is try (Tensor image = Tensor.create(imageBytes)) …
Category: Data Science

Changing default values of ANNIE resources in GATE from Java code

In GATE, default values for ANNIE are set during initialization, but sometimes based on requirements they have to be changed. My Requirement : I want to extract English sentences without considering the "nextline character" but considering "full stop" which gives correct sentences. For that, I need to change the default value of transducerURL in SentenceSplitter in ANNIE. This can be done in two ways: Using ANNIE_with_defaults.gapp - changing initparams value in Sentencesplitter and accessing from java: Gate.setGateHome(new File(Configuration.GATE_HOME)); Gate.init(); // …
Topic: java nlp
Category: Data Science

How Can I Implement MOEA/D Algorithm in Java From Pseudocode?

I want to implement MOEA/D algorithm for a spesific population but I could not figure out how to write the java code from the pseudocode. My population size is 50 and the chromosomes shape is like this: [1,0,0,1,0,0,], so the people are made of binary genes. Is there any simple implementation of that algorithm without using any framework? The steps are not clear for me. I have also tried to convert an matlab code but did not work. Where can …
Category: Data Science

Android: NLP library for date recognition in string

I am currently working on an android app which should make appointments automatically by reading the incoming messages from your mobile phone. I've managed to create a service which monitors the incoming messages, but now I need an Natural Language Processing algorithm in order to find the date for the appointment. I've tried DialogFlow, but I found out it cannot be used offline and that is not the purpose of the app. It should work offline too! Does anyone have …
Topic: java nlp
Category: Data Science

Help Creating a XOR Neural Network in Java?

I have been trying to create a neural network in Java, but it doesn't quite work as intended. I am using a XOR test before I move on to more advanced problems, and it doesn't seem to be learning much. I may have the algorithms wrong, but as far as I can tell, they all seem fine (I am using a tutorial on Brilliant.org - https://brilliant.org/wiki/backpropagation/). I've provided my Network and Main class below. Thank you for any help! import …
Category: Data Science

How does Stanford CRF encode NER string features?

Most features created by the NERFeatureFactory are strings e.g. from usePrev, useNext, useNGrams etc. From my understanding, that's too many tokens to fit in a dictionary or to use embeddings. I don't see how the UNKNOWN embedding would bring any value given that most features are not known words. I've been looking at the code on Github but haven't figured it out yet. I love New York! > love > love-I-W-PW, love-New-W-NW, #lo#, #ov#, #ve# etc
Category: Data Science

Load keras model in Java

What are the requirements to load the trained model by Keras in Java? I checked that DeepLearning4J supports Keras models, network architecture and weights can be easily loaded. The only cons are probably that we need to use ND4J backend or it does not matter? If there is a created model using keras and tensorflow, what is the best way to load it in Java ecosystem? I tried to use frozen graph script to save tensorflow model, but it cannot …
Topic: keras java
Category: Data Science

call Python script from JavaEE

i create model using sklearn library and i want to run this model in JavaEE application i have been trying Jython, but it's impossible to import some important library like pandas and numpy, so how I can do to call a python script for JavaEE application.
Topic: java python
Category: Data Science

Using the Datumbox Machine Learning Framework for website classification - guidelines?

A short while ago, I came across this ML framework that has implemented several different algorithms ready for use. The site also provides a handy API that you can access with an API key. I have need of the framework to solve a website classification problem where I basically need to categorize several thousand websites based on their HTML content. As I don't want to be bound to their existing API, I wanted to use the framework to implement my …
Category: Data Science

When to use deep learning for java as opposed to python

I have been asked to explore options to build deep learning based applications using java, so i happend to browse a website called dl4j (https://deeplearning4j.org) which has got implemantations of different neural networks starting from MLP to RNN/LSTM. But I couldn't understand the rationale of using dl4j over python based implemenation. So, could someone please clarify on following items please. ETL Data pre-processing Making use of pre-trained models / transfer learning Distributed computing Processing large voulme of data (images,time series …
Category: Data Science

Scala vs Java if you're NOT going to use Spark?

I'm facing some indecision when choosing how to allocate my scarce learning time for the next few months between Scala and Java. I would like help objectively understanding the practical tradeoffs. The reason I am interested in Java is that I think some of my production, frequently refreshed, forecasts and analyses at work would run much faster in Java (compared to R or Python) and by becoming more proficient in Java I would enable myself to work on interesting side …
Topic: java scala
Category: Data Science

30 Years Of Excel Test Data

I am a CS intern at an industrial company that has 30 years of excel files that need to be analyzed. Looking at the data, only a fraction of the files need to be looked at and used. After those files are identified, I need to pull out values from specific columns. The real issue is that there is no standard excel format for the tests and each column name can be different (ex. 'Front Axial Temperature' vs 'axial temp …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.