I am using the Conditional Random Fields CRF suite scikit-learn wrapper algorithm. I have read on the literature various approaches for feature selection, but I cannot find any on that package or, generally, available ones for CRF. Would you know any libraries (Python preferred) or easy to implement algorithms for this purpose? Update I tried using the scikit-learn feature selector's library but does not work for 2 reasons: 1) the CRF takes as an input list of lists of dicts …
I have a little more general question. My dataset consists of N sequences of events. Example of one sequence could be [A,B,C,D,X,Y] and another [A,B,Z], where letters represent different events. The sequences are at most 80 steps long. The idea is to predict next letter or next step from known previous events. For very simple example maybe after A will always come B. Next step would be measuring time of each event and the ultimate goal is to predict how …
I want to build a [probabilistic] model that aims to infer the true value of an unknown categorical variable, $y \in \{1,2,..., K\}$. We have a dataset $(X,y): \mathbb{R}^d\rightarrow \{1,2,..., K\}$ and we can train a classifier that gives $d$-dimensional data, $X$, and estimates the output $y$. Now, suppose that $X$s are correlated and all coming from a fixed $y$. I mean, we are observing $X^1, X^2,...., X^T,...$ over time and we know that $y$ is fixed for all of …
I have some user data where each user has a certain pattern of being at different places for some time. I would like to create a model which will cluster/classify these users based on these patterns and the time spent at each place. So suppose user patterns are like: Place_1(60 min)- Place_2(30 min)- Place_5(45 min)- user 1 -label(1) Place_1(60 min)- Place_2(60 min)- Place_5(45 min)- user 2 -label(2) Place_1(60 min)- Place_2(60 min)- Place_5(40 min)- user 3 -label(2) Place_2(60 min)- Place_1(60 min)- …
I have a sequence of recurring events that I would to group together into representing different operation activities of the underlying process. These events may have an order in their occurrence; or maybe not. Consequently, I would like to explore and investigate if any relationship exists between the events. Are there any better methods than using Hierarchical clustering? I might want to build a model that can determine the operational activity based on the events it recognized as belonging to …
I am looking for algorithms or models for detecting and identifying repeated patterns in a single image. For example, an arbitrary smaller image might be pasted at random locations in the image. In the situation at hand, no prior information is known about the appearance of the object or pattern. Do any algorithms/models for this exist?
I have a dataset containing a set of normal user sessions. Each session contains a suite of ordered user requests on N system resources {R1, ..., RN}. I want to design a continuous authentication algorithm, by confirming the user identity at each request command. More precisely, I don’t to let the user complete the whole session (all commands sequence) to authenticate him, but I want to do this at each resource requested command based on his previous normal sequences in …
Which algorithms for sequential pattern mining use the sequences timestamps, in a similar way to SPADE algorithm? I've been looking for a python implementation for the SPADE algorithm that isn't a wrapper (found pycspade and spmf-py, which are both wrappers), and since I didn't find any I wondered if the reason for that is that a different, more efficient, algorithm exists. (So if you know of a python implementation for SPADE that isn't a wrapper that would be useful as …
I have a long time series signal. This signal is usually very stable, but it will change when the sensor is stimulated, and this change is usually very short. I know this can be trained using the labeled method(like neural network ,CNN, etc), but it takes a lot of time to label, this is because my change time is very short(about 4 seconds), and the change time is not much. So, I want to generate a number of signals similar …
I have the following user activity data that where for each user the activity type they were engaged are recorded along with the phase: User | Phase | ActivityType | Date 321 1 A 12/20/2020 15:00 321 1 B 12/20/2020 16:00 321 2 A 12/21/2020 12:00 321 1 C 12/21/2020 13:00 321 3 B 12/22/2020 11:00 322 1 A 12/20/2020 15:00 322 1 A 12/20/2020 16:00 322 2 B 12/21/2020 12:00 322 1 C 12/21/2020 13:00 322 3 D 12/22/2020 …
I am curious if sequential pattern mining algoritmhs fill a unique gap or the same thing can be achieved with alternative methods for example wih machine learning or something else. Do you know any alternative methodology that can achieve the same thing? (For me this would be relevant, because if there is an alternative method I can compare them in my thesis.)
Has anyone used (and liked) any good "frequent sequence mining" packages in Python other than the FPM in MLLib? I am looking for a stable package, preferable stilled maintained by people. Thank you!
To provide some context, I am trying to do frequent pattern mining on a dataset of system error logs from servers. I organized it into transactions based on the thread ID, which results in some very long transactions (the longest one is 415). At first, I kept the items just as the error messages themselves (so each transaction would be a list of strings) but since there is a certain amount of possible error messages, I created a dictionary where …
I have a time series with a timestamp and an associated event: Time Event 1 A 2 B 3 C T A I was wondering if there is a technique/method to figure out which events most often precede others in a given time window. I.e. 10 seconds before A happens B and C occur with 90% chance. Ideally the output should be the sequences that occur most often. So far I have thought of using some sort of clustering, though …
I'm trying to implement the DDPmine algorithm from this article as part of some project, and I do not understand where in the algorithm we use the Class Label of each transaction? We have transactions from 2 different groups spouse group has a class label "0" and group b has the class label "1" and we want to find the Discriminative Patterns that are frequent in each group but not on the 2 groups combined but in which part of …
I'm looking for any actual working code implementation of the DDPMiner algorithm mentioned in the Direct Discriminative Pattern Mining for Effective Classification article form 2008 I'm having real trouble trying to implement it myself.
I have a dictionary of variable-length sequences: [(file_name[-10:], len(tag_is_header_list)) for file_name, tag_is_header_list in HEADER_PATTERN_DICT.items()] [('37bd1.html', 25), ('0bcce.html', 40), ('90364.html', 28), ('8f9c7.html', 24), ('d12d4.html', 73), ('46837.html', 37), ('adb92.html', 53), ('0a1e7.html', 69), ('da077.html', 43), ('9366a.html', 21), ('6ae4d.html', 37), ('f62ee.html', 19), ('73aee.html', 33), ('e090a.html', 35), ('8b093.html', 44)] These contain a label for each item as to whether or not they are a subject heading: HEADER_PATTERN_DICT[sorted([(file_name, len(tag_is_header_list)) for file_name, tag_is_header_list in HEADER_PATTERN_DICT.items()], key=lambda x: x[1])[0][0]] [(None, True), ('<div', False), ('<div', False), (None, True), (None, …
I am trying to detect timeline of brands histories. For my specific case, I believe it is easy because data is already clustered. For each Wikipedia article I can spot sentences surrounding dates. Here is an example: McDonald's Corporation is an American fast food company, founded in 1940 as a restaurant operated by Richard and Maurice McDonald, in San Bernardino, California, United States. They rechristened their business as a hamburger stand, and later turned the company into a franchise, with …