R 패키지 메타데이터와 수집 신호를 모아 봅니다.
첫 화면에서 판단해야 할 수집 신호를 먼저 배치합니다.
DESCRIPTION에서 감지한 backend 관련 package입니다.
기본 메타데이터를 작은 카드와 토큰으로 압축합니다.
Rcpp| Package | Type | Spec |
|---|---|---|
| Rcpp CRAN · 1.2.1 · 2026-05-30 | Depends | Rcpp |
| stats CRAN · 1.2.1 · 2026-05-30 | Imports | stats |
| Rcpp CRAN · 1.2.1 · 2026-05-30 | LinkingTo | Rcpp |
| testthat CRAN · 1.2.1 · 2026-05-30 | Suggests | testthat |
| 검색 결과가 없습니다. | ||
| Package | Type | Spec |
|---|---|---|
| 표시할 dependency edge가 없습니다. | ||
| 검색 결과가 없습니다. | ||
README code{white-space: pre-wrap;} span.smallcaps{font-variant: small-caps;} span.underline{text-decoration: underline;} div.column{display: inline-block; vertical-align: top; width: 50%;} div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} ul.task-list{list-style: none;} R package “jmotif”, provides an implementation of: z-Normalization of time series data PAA , i.e., Piecewise Aggregate Approximation SAX , i.e., Symbolic Aggregate approXimation HOT-SAX , an algorithm for the exact time series discord discovery VSM , i.e., Vector Space Model SAX-VSM , an algorithm for interpretable time series classification (and parameters optimization) RePair , an algorithm for grammatical inference Rule Density Curve , an efficient grammatical compression (i.e. Kolmogorov Complexity ) -based technique for variable length approximate time series anomaly discovery RRA (Rare Rule Anomaly), a grammatical compression (i.e. Kolmogorov Complexity ) -based algorithm for variable length exact time series anomaly discovery Most of this functionality is also implemented in Java and some in Python as well… Citing this work: While RRA was proposed in [8], the code was ported in R to assist for our newer development in SAX parameters optimization: Grammarviz 3.0 , please cite it: Senin, P., Lin, J., Wang, X., Oates, T., Gandhi, S., Boedihardjo, A.P., Chen, C., Frankenstein, S., GrammarViz 3.0: Interactive Discovery of Variable-Length Time Series Patterns , ACM Trans. Knowl. Discov. Data, February 2018. [Click here for Citation BibTeX] Notes: In order to process sets of timeseries with uneven length, pad shorter with NA within the input data frame (list). Window-based SAX discretization procedure (sliding window left to right) will detect NA within right side of sliding window and abandon any further processing for the current time series continuing to the next. References: [1] Dina Goldin and Paris Kanellakis, On similarity queries for time-series data: Constraint specification and implementation , In Principles and Practice of Constraint Programming – CP ’95, pages 137–153. (1995) [2] Keogh, E., Chakrabarti, K., Pazzani, M., & Mehrotra, S., Dimensionality reduction for fast similarity search in large time series databases , Knowledge and information Systems, 3(3), 263-286. (2001) [3] Lonardi, S., Lin, J., Keogh, E., & Patel, P., Finding motifs in time series , In Proc. of the 2nd Workshop on Temporal Data Mining (pp. 53-68). (2002) [4] Salton, G., Wong, A., Yang., C., A vector space model for automatic indexing , Commun. ACM 18, 11, 613–620, 1975. [5] Senin Pavel and Malinchik Sergey, SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model. , Data Mining (ICDM), 2013 IEEE 13th International Conference on, pp.1175,1180, 7-10 Dec. 2013. [6] Keogh, E., Lin, J., Fu, A., HOT SAX: Efficiently finding the most unusual time series subsequence , In Proc. ICDM (2005) [7] N.J. Larsson and A. Moffat. Offline dictionary-based compression. , In Data Compression Conference, 1999. [8] Pavel Senin, Jessica Lin , Xing Wang, Tim Oates, Sunil Gandhi, Arnold P. Boedihardjo, Crystal Chen, Susan Frankenstein, Time series anomaly discovery with grammar-based compression. , In Proc. of The International Conference on Extending Database Technology, EDBT 15. 0.0 Installation from latest sources install.packages("devtools") library(devtools) install_github('jMotif/jmotif-R') to start using the library, simply load it into R environment: library(jmotif) 1.0 Time series z-Normalization z-normalization ( znorm(ts, threshold) ) is a common to the field of time series patterns mining preprocessing step proposed by Goldin & Kannellakis which helps downstream analyses to focus on the time series structural features. x = seq(0, pi*4, 0.02) y = sin(x) * 5 + rnorm(length(x)) plot(x, y, type="l", col="blue", main="A scaled sine wave with a random noise and its z-normalization") lines(x, znorm(y, 0.01), type="l", col="red") abline(h=c(1,-1), lty=2, col="gray50") legend(0, -4, c("scaled sine wave","z-normalized wave"), lty=c(1,1), lwd=c(1,1), col=c("blue","red"), cex=0.8) z-normalization of a scaled sine wave 2.0 Piecewise Aggregate Approximation (i.e., PAA) PAA ( paa(ts, paa_num) ) is designed to reduce the input time series dimensionality by splitting it into equally-sized segments (PAA size) and averaging values of points within each segment. Typically, PAA is applied to z-Normalized time series. In the following example the time series of dimensionality 8 points is reduced to 3 points. y = c(-1, -2, -1, 0, 2, 1, 1, 0) plot(y, type="l", col="blue", main="8-points time series and it PAA transform into 3 points") points(y, pch=16, lwd=5, col="blue") abline(v=c(1,1+7/3,1+7/3*2,8), lty=3, lwd=2, col="gray50") y_paa3 = paa(y, 3) segments(1,y_paa3[1],1+7/3,y_paa3[1],lwd=1,col="red") points(x=1+7/3/2,y=y_paa3[1],col="red",pch=23,lwd=5) segments(1+7/3,y_paa3[2],1+7/3*2,y_paa3[2],lwd=1,col="red") points(x=1+7/3+7/3/2,y=y_paa3[2],col="red",pch=23,lwd=5) segments(1+7/3*2,y_paa3[3],8,y_paa3[3],lwd=1,col="red") points(x=1+7/3*2+7/3/2,y=y_paa3[3],col="red",pch=23,lwd=5) PAA transform of an 8-points time series into 3 points 3.0 SAX transform SAX transform ( series_to_string(ts, alphabet_size) ) is a discretization algorithm which transforms a sequence of rational values (time series points) into a sequence of discrete values - symbols taken from a finite alphabet. This procedure enables the application of numerous algorithms for discrete data analysis to continuous time series data. Typically, SAX applied to time series of reduced with PAA dimensionality, which effectively yields a low-dimensional, discrete representation of the input time series which preserves (to some extent) its structural characteristics. By employing this representation it is possible to design efficient algorithms for common time series pattern mining tasks as one can rely on the indexing of data in symbolic space. Note, that before processing with PAA and SAX, time series are z-Normalized. The figure below illustrates the PAA+SAX procedure: 8 points time series is converted into 3-points PAA representation at the first step, PAA values are converted into letters by using 3 letters alphabet at the second step. y <- seq(-2,2, length=100) x <- dnorm(y, mean=0, sd=1) lines(x,y, type="l", lwd=5, col="magenta") abline(h = alphabet_to_cuts(3)[2:3], lty=2, lwd=2, col="magenta") text(0.7,-1,"a",cex=2,col="magenta") text(0.7, 0,"b",cex=2,col="magenta") text(0.7, 1,"c",cex=2,col="magenta") > series_to_string(y_paa3, 3) [1] "acc" > series_to_chars(y_paa3, 3) [1] "a" "c" "c" an application of SAX transform (3 letters word size and 3 letters alphabet size) to an 8 points time series 4.0 Time series SAX transform via sliding window Another common way to use SAX is to apply the procedure to sliding window-extracted subseries ( sax_via_window(ts, win_size, paa_size, alp_size, nr_strategy, n_threshold) ). This technique is used in SAX-VSM, where it enables the conversion of a time series into the word bags. Note, the use of a numerosity reduction strategy. 5.0 SAX-VSM classifier I use the one of standard UCR time series datasets to illustrate the implemented approach. The Cylinder-Bell-Funnel dataset (Saito, N: Local feature extraction and its application using a library of bases. PhD thesis, Yale University (1994)) consists of three time series classes. The dataset is embedded into the jmotif library: # load Cylinder-Bell-Funnel data data("CBF") where it is wrapped into a list of four elements: train and test sets and their labels: > str(CBF) List of 4 $ labels_train: num [1:30] 1 1 1 3 2 2 1 3 2 1 ... $ data_train : num [1:30, 1:128] -0.464 -0.897 -0.465 -0.187 -1.136 ... $ labels_test : num [1:900] 2 2 1 2 2 3 1 3 2 3 ... $ data_test : num [1:900, 1:128] -1.517 -0.703 -1.412 -0.955 -1.449 ... 5.1 Pre-processing and bags of words construction At the first step, each class of the training data needs to be transformedHelp for package jmotif const macros = { "\\R": "\\textsf{R}", "\\mbox": "\\text", "\\code": "\\texttt"}; function processMathHTML() { var l = document.getElementsByClassName('reqn'); for (let e of l) { katex.render(e.textContent, e, { throwOnError: false, macros }); } return; } Package {jmotif} Contents CBF Gun_Point alphabet_to_cuts bags_to_tfidf cosine_dist cosine_sim early_abandoned_dist ecg0606 euclidean_dist find_discords_brute_force find_discords_hotsax find_discords_rra idx_to_letter is_equal_mindist is_equal_str letter_to_idx letters_to_idx manyseries_to_wordbag min_dist paa sax_by_chunking sax_distance_matrix sax_via_window series_to_chars series_to_string series_to_wordbag str_to_repair_grammar subseries znorm Version: 1.2.1 Encoding: UTF-8 Title: Time Series Analysis Toolkit Based on Symbolic Aggregate Discretization, i.e. SAX Description: Implements time series z-normalization, SAX, HOT-SAX, VSM, SAX-VSM, RePair, and RRA algorithms facilitating time series motif (i.e., recurrent pattern), discord (i.e., anomaly), and characteristic pattern discovery along with interpretable time series classification. URL: https://github.com/jMotif/jmotif-R BugReports: https://github.com/jMotif/jmotif-R/issues Depends: R (≥ 4.0.0), Rcpp Imports: stats Suggests: testthat LinkingTo: Rcpp LazyData: true License: GPL-2 RoxygenNote: 7.3.3 NeedsCompilation: yes Packaged: 2025-12-22 10:47:29 UTC; psenin Author: Pavel Senin [aut, cre] Maintainer: Pavel Senin <seninp@gmail.com> Repository: CRAN Date/Publication: 2025-12-22 11:00:02 UTC A standard UCR Cylinder-Bell-Funnel dataset from http://www.cs.ucr.edu/~eamonn/time_series_data Description A standard UCR Cylinder-Bell-Funnel dataset from http://www.cs.ucr.edu/~eamonn/time_series_data Usage CBF Format A four-elements list containing train and test data along with their labels labels_train: the training data labels, correspond to data matrix rows data_train: the training data matrix, each row is a time series instance labels_test: the test data labels, correspond to data matrix rows data_test: the test data matrix, each row is a time series instance A standard UCR Gun Point dataset from http://www.cs.ucr.edu/~eamonn/time_series_data Description A standard UCR Gun Point dataset from http://www.cs.ucr.edu/~eamonn/time_series_data Usage Gun_Point Format A four-elements list containing train and test data along with their labels labels_train: the training data labels, correspond to data matrix rows data_train: the training data matrix, each row is a time series instance labels_test: the test data labels, correspond to data matrix rows data_test: the test data matrix, each row is a time series instance Translates an alphabet size into the array of corresponding SAX cut-lines built using the Normal distribution. Description Translates an alphabet size into the array of corresponding SAX cut-lines built using the Normal distribution. Usage alphabet_to_cuts(a_size) Arguments a_size the alphabet size, a value between 2 and 20 (inclusive). References Lonardi, S., Lin, J., Keogh, E., Patel, P., Finding motifs in time series. In Proc. of the 2nd Workshop on Temporal Data Mining (pp. 53-68). (2002) Examples alphabet_to_cuts(5) Computes a TF-IDF weight vectors for a set of word bags. Description Computes a TF-IDF weight vectors for a set of word bags. Usage bags_to_tfidf(data) Arguments data the list containing the input word bags. References Senin Pavel and Malinchik Sergey, SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model. Data Mining (ICDM), 2013 IEEE 13th International Conference on, pp.1175,1180, 7-10 Dec. 2013. Salton, G., Wong, A., Yang., C., A vector space model for automatic indexing. Commun. ACM 18, 11, 613-620, 1975. Examples bag1 = data.frame( "words" = c("this", "is", "a", "sample"), "counts" = c(1, 1, 2, 1), stringsAsFactors = FALSE ) bag2 = data.frame( "words" = c("this", "is", "another", "example"), "counts" = c(1, 1, 2, 3), stringsAsFactors = FALSE ) ll = list("bag1" = bag1, "bag2" = bag2) tfidf = bags_to_tfidf(ll) Computes the cosine similarity between numeric vectors Description Computes the cosine similarity between numeric vectors Usage cosine_dist(m) Arguments m the data matrix Value Returns the cosine similarity Examples a <- c(2, 1, 0, 2, 0, 1, 1, 1) b <- c(2, 1, 1, 1, 1, 0, 1, 1) sim <- cosine_dist(rbind(a,b)) Computes the cosine distance value between a bag of words and a set of TF-IDF weight vectors. Description Computes the cosine distance value between a bag of words and a set of TF-IDF weight vectors. Usage cosine_sim(data) Arguments data the list containing a word-bag and the TF-IDF object. References Senin Pavel and Malinchik Sergey, SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model. Data Mining (ICDM), 2013 IEEE 13th International Conference on, pp.1175,1180, 7-10 Dec. 2013. Salton, G., Wong, A., Yang., C., A vector space model for automatic indexing. Commun. ACM 18, 11, 613-620, 1975. Finds the Euclidean distance between points, if distance is above the threshold, abandons the computation and returns NAN. Description Finds the Euclidean distance between points, if distance is above the threshold, abandons the computation and returns NAN. Usage early_abandoned_dist(seq1, seq2, upper_limit) Arguments seq1 the array 1. seq2 the array 2. upper_limit the max value after reaching which the distance computation stops and the NAN is returned. A PHYSIONET dataset Description A PHYSIONET dataset Usage ecg0606 Format A vector of numeric values Finds the Euclidean distance between points. Description Finds the Euclidean distance between points. Usage euclidean_dist(seq1, seq2) Arguments seq1 the array 1. seq2 the array 2. stops and the NAN is returned. Finds a discord using brute force algorithm. Description Finds a discord using brute force algorithm. Usage find_discords_brute_force(ts, w_size, discords_num) Arguments ts the input timeseries. w_size the sliding window size. discords_num the number of discords to report. References Keogh, E., Lin, J., Fu, A., HOT SAX: Efficiently finding the most unusual time series subsequence. Proceeding ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining Examples discords = find_discords_brute_force(ecg0606[1:600], 100, 1) plot(ecg0606[1:600], type = "l", col = "cornflowerblue", main = "ECG 0606") lines(x=c(discords[1,2]:(discords[1,2]+100)), y=ecg0606[discords[1,2]:(discords[1,2]+100)], col="red") Finds a discord (i.e. time series anomaly) with HOT-SAX. Usually works the best with lower sizes of discretization parameters: PAA and Alphabet. Description Finds a discord (i.e. time series anomaly) with HOT-SAX. Usually works the best with lower sizes of discretization parameters: PAA and Alphabet. Usage find_discords_hotsax(ts, w_size, paa_size, a_size, n_threshold, discords_num) Arguments ts the input timeseries. w_size the sliding window size. paa_size the PAA size. a_size the alphabet size. n_threshold the normalization threshold. discords_num the number of discords to report. References Keogh, E., Lin, J., Fu, A., HOT SAX: Efficiently finding the most unusual time series subsequence. Proceeding ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining Examples discords = find_discords_hotsax(ecg0606, 100, 3, 3, 0.01, 1) plot(ecg0606, type = "l", col = "cornflowerblue", main = "ECG 0606") lines(x=c(discords[1,2]:(discords[1,2]+100)), y=ecg0606[discords[1,2]:(discords[1,2]+100)], col="red") Finds a discord with RRA (Rare Rule Anomaly) algorithm. Usually works the best with higher than that for HOT-SAX sizes of discretization parameters (i.e., PAA and Alphabet sizes). Description Finds a discord with RRA (Rare Rule Anomaly) algorithm. Usually works the best with higher than that for HOT-SAX sizes of discretization parameters (i.e., PAA and Alphabet sizes). Usage find_discords_rra( series, w_size, paa_size, a_size, nr_strategy, n_threshold, discords_num )A standard UCR Cylinder-Bell-Funnel dataset from http://www.cs.ucr.edu/~eamonn/time_series_data
CBFA standard UCR Gun Point dataset from http://www.cs.ucr.edu/~eamonn/time_series_data
Gun_PointTranslates an alphabet size into the array of corresponding SAX cut-lines built using the Normal distribution.
alphabet_to_cuts(a_size)alphabet_to_cuts(5)Computes a TF-IDF weight vectors for a set of word bags.
bags_to_tfidf(data)bag1 = data.frame( "words" = c("this", "is", "a", "sample"), "counts" = c(1, 1, 2, 1), stringsAsFactors = FALSE ) bag2 = data.frame( "words" = c("this", "is", "another", "example"), "counts" = c(1, 1, 2, 3), stringsAsFactors = FALSE ) ll = list("bag1" = bag1, "bag2" = bag2) tfidf = bags_to_tfidf(ll)Computes the cosine similarity between numeric vectors
cosine_dist(m)a <- c(2, 1, 0, 2, 0, 1, 1, 1) b <- c(2, 1, 1, 1, 1, 0, 1, 1) sim <- cosine_dist(rbind(a,b))Computes the cosine distance value between a bag of words and a set of TF-IDF weight vectors.
cosine_sim(data)Finds the Euclidean distance between points, if distance is above the threshold, abandons the computation and returns NAN.
early_abandoned_dist(seq1, seq2, upper_limit)A PHYSIONET dataset
ecg0606Finds the Euclidean distance between points.
euclidean_dist(seq1, seq2)Finds a discord using brute force algorithm.
find_discords_brute_force(ts, w_size, discords_num)discords = find_discords_brute_force(ecg0606[1:600], 100, 1) plot(ecg0606[1:600], type = "l", col = "cornflowerblue", main = "ECG 0606") lines(x=c(discords[1,2]:(discords[1,2]+100)), y=ecg0606[discords[1,2]:(discords[1,2]+100)], col="red")Finds a discord (i.e. time series anomaly) with HOT-SAX. Usually works the best with lower sizes of discretization parameters: PAA and Alphabet.
find_discords_hotsax(ts, w_size, paa_size, a_size, n_threshold, discords_num)discords = find_discords_hotsax(ecg0606, 100, 3, 3, 0.01, 1) plot(ecg0606, type = "l", col = "cornflowerblue", main = "ECG 0606") lines(x=c(discords[1,2]:(discords[1,2]+100)), y=ecg0606[discords[1,2]:(discords[1,2]+100)], col="red")Finds a discord with RRA (Rare Rule Anomaly) algorithm. Usually works the best with higher than that for HOT-SAX sizes of discretization parameters (i.e., PAA and Alphabet sizes).
find_discords_rra( series, w_size, paa_size, a_size, nr_strategy, n_threshold, discords_num )discords = find_discords_rra(ecg0606, 100, 4, 4, "none", 0.01, 1) plot(ecg0606, type = "l", col = "cornflowerblue", main = "ECG 0606") lines(x=c(discords[1,2]:(discords[1,2]+100)), y=ecg0606[discords[1,2]:(discords[1,2]+100)], col="red")Get the ASCII letter by an index.
idx_to_letter(idx)# letter 'b' idx_to_letter(2)Compares two strings using mindist.
is_equal_mindist(a, b)is_equal_str("aaa", "bbb") # true is_equal_str("aaa", "ccc") # falseCompares two strings using natural letter ordering.
is_equal_str(a, b)is_equal_str("aaa", "bbb") is_equal_str("ccc", "ccc")Get the index for an ASCII letter.
letter_to_idx(letter)# letter 'b' translates to 2 letter_to_idx('b')Get an ASCII indexes sequence for a given character array.
letters_to_idx(str)letters_to_idx(c('a','b','c','a'))Converts a set of time-series into a single bag of words.
manyseries_to_wordbag(data, w_size, paa_size, a_size, nr_strategy, n_threshold)Computes the mindist value for two strings
min_dist(str1, str2, alphabet_size, compression_ratio = 1)str1 <- c('a', 'b', 'c') str2 <- c('c', 'b', 'a') min_dist(str1, str2, 3)Computes a Piecewise Aggregate Approximation (PAA) for a time series.
paa(ts, paa_num)x = c(-1, -2, -1, 0, 2, 1, 1, 0) x_paa3 = paa(x, 3) # plot(x, type = "l", main = c("8-points time series and its PAA transform into three points", "PAA shown schematically in blue")) points(x, pch = 16, lwd = 5) # paa_bounds = c(1, 1+7/3, 1+7/3*2, 8) abline(v = paa_bounds, lty = 3, lwd = 2, col = "cornflowerblue") segments(paa_bounds[1:3], x_paa3, paa_bounds[2:4], x_paa3, col = "cornflowerblue", lwd = 2) points(x = c(1, 1+7/3, 1+7/3*2) + (7/3)/2, y = x_paa3, pch = 15, lwd = 5, col = "cornflowerblue")Discretize a time series with SAX using chunking (no sliding window).
sax_by_chunking(ts, paa_size, a_size, n_threshold)Generates a SAX MinDist distance matrix (i.e. the "lookup table") for a given alphabet size.
sax_distance_matrix(a_size)sax_distance_matrix(5)Discretizes a time series with SAX via sliding window.
sax_via_window(ts, w_size, paa_size, a_size, nr_strategy, n_threshold)Transforms a time series into the char array using SAX and the normal alphabet.
series_to_chars(ts, a_size)y = c(-1, -2, -1, 0, 2, 1, 1, 0) y_paa3 = paa(y, 3) series_to_chars(y_paa3, 3)Transforms a time series into the string.
series_to_string(ts, a_size)y = c(-1, -2, -1, 0, 2, 1, 1, 0) y_paa3 = paa(y, 3) series_to_string(y_paa3, 3)Converts a single time series into a bag of words.
series_to_wordbag(ts, w_size, paa_size, a_size, nr_strategy, n_threshold)Runs the repair on a string.
str_to_repair_grammar(str)str_to_repair_grammar("abc abc cba cba bac xxx abc abc cba cba bac")Extracts a subseries.
subseries(ts, start, end)y = c(-1, -2, -1, 0, 2, 1, 1, 0) subseries(y, 0, 3)Z-normalizes a time series by subtracting its mean and dividing by the standard deviation.
znorm(ts, threshold = 0.01)x = seq(0, pi*4, 0.02) y = sin(x) * 5 + rnorm(length(x)) plot(x, y, type="l", col="blue") lines(x, znorm(y, 0.01), type="l", col="red")| Repository | Version | Published | First seen | Last seen | Docs |
|---|---|---|---|---|---|
| CRAN | 1.2.1 | 2026-05-29 | 2026-05-30 |
표시할 OSV 데이터가 없습니다.
표시할 OpenAlex 데이터가 없습니다.