Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit


[Up] [Top]

Documentation for package ‘udpipe’ version 0.2.2

Help Pages

as.data.frame.udpipe_connlu Convert the result of udpipe_annotate to a tidy data frame
as_phrasemachine Convert Parts of Speech tags to one-letter tags which can be used to identify phrases based on regular expressions
brussels_listings Brussels AirBnB address locations available at www.insideairbnb.com
brussels_reviews Reviews of AirBnB customers on Brussels address locations available at www.insideairbnb.com
brussels_reviews_anno Reviews of the AirBnB customers which are tokenised, POS tagged and lemmatised
collocation Extract collocations - a sequence of terms which follow each other
cooccurrence Create a cooccurence data.frame
cooccurrence.character Create a cooccurence data.frame
cooccurrence.cooccurrence Create a cooccurence data.frame
cooccurrence.data.frame Create a cooccurence data.frame
document_term_frequencies Aggregate a data.frame to the document/term level by calculating how many times a term occurs per document
document_term_frequencies.character Aggregate a data.frame to the document/term level by calculating how many times a term occurs per document
document_term_frequencies.data.frame Aggregate a data.frame to the document/term level by calculating how many times a term occurs per document
document_term_matrix Create a document/term matrix from a data.frame with 1 row per document/term
document_term_matrix.data.frame Create a document/term matrix from a data.frame with 1 row per document/term
document_term_matrix.DocumentTermMatrix Create a document/term matrix from a data.frame with 1 row per document/term
document_term_matrix.simple_triplet_matrix Create a document/term matrix from a data.frame with 1 row per document/term
document_term_matrix.TermDocumentMatrix Create a document/term matrix from a data.frame with 1 row per document/term
dtm_cor Pearson Correlation for Sparse Matrices
dtm_remove_lowfreq Remove terms occurring with low frequency from a Document-Term-Matrix and documents with no terms
dtm_remove_terms Remove terms from a Document-Term-Matrix and keep only documents which have a least some terms
dtm_remove_tfidf Remove terms from a Document-Term-Matrix and documents with no terms based on the term frequency inverse document frequency
dtm_reverse Inverse operation of the document_term_matrix function
dtm_tfidf Term Frequency - Inverse Document Frequency calculation
phrases Extract phrases - a sequence of terms which follow each other based on a sequence of Parts of Speech tags
predict.LDA_Gibbs Predict method for an object of class LDA_VEM or class LDA_Gibbs
predict.LDA_VEM Predict method for an object of class LDA_VEM or class LDA_Gibbs
txt_collapse Collapse a character vector while removing missing data.
txt_freq Frequency statistics of elements in a vector
txt_highlight Highlight words in a character vector
txt_next Get the n-th next element of a vector
txt_nextgram Based on a vector with a word sequence, get n-grams
txt_previous Get the n-th previous element of a vector
txt_recode Recode text to other categories
txt_sample Boilerplate function to sample one element from a vector.
txt_show Boilerplate function to cat only 1 element of a character vector.
udpipe_annotate Tokenise, Tag and Dependency Parsing Annotation of raw text
udpipe_annotation_params List with training options set by the UDPipe community when building models based on the Universal Dependencies data
udpipe_download_model Download an UDPipe model provided by the UDPipe community for a specific language of choice
udpipe_load_model Load an UDPipe model
udpipe_train Train a UDPipe model
unique_identifier Create a unique identifier for each combination of fields in a data frame