Managing, Querying and Analyzing Tokenized Text


[Up] [Top]

Documentation for package ‘corpustools’ version 0.4.2

Help Pages

A B C D E F G L M P R S T

-- A --

add_collocation_label Choose and add collocation strings based on collocation categories
agg_tcorpus Aggregate the tokens data
as.tcorpus Force an object to be a tCorpus class
as.tcorpus.default Force an object to be a tCorpus class
as.tcorpus.tCorpus Force an object to be a tCorpus class

-- B --

backbone_filter Extract the backbone of a network.
browse_hits View hits in a browser
browse_texts Create and view a full text browser

-- C --

calc_chi2 Vectorized computation of chi^2 statistic for a 2x2 crosstab containing the values [a, b] [c, d]
code_dictionary Dictionary lookup
code_features Code features in a tCorpus based on a search string
compare_corpus Compare tCorpus vocabulary to that of another (reference) tCorpus
compare_documents Calculate the similarity of documents
compare_subset Compare vocabulary of a subset of a tCorpus to the rest of the tCorpus
context Get a context vector
corenlp_tokens coreNLP example sentences
count_tcorpus Count results of search hits, or of a given feature in tokens
create_tcorpus Create a tCorpus
create_tcorpus.character Create a tCorpus
create_tcorpus.corpus Create a tCorpus
create_tcorpus.data.frame Create a tCorpus
create_tcorpus.factor Create a tCorpus

-- D --

deduplicate Deduplicate documents
delete_columns Delete column from the data and meta data
delete_meta_columns Delete column from the data and meta data
docfreq_filter Support function for subset method
dtm_compare Compare two document term matrices
dtm_wordcloud Plot a word cloud from a dtm

-- E --

ego_semnet Create an ego network
emoticon_dict Dictionary with common ASCII emoticons

-- F --

feats_to_columms Cast the "feats" column in UDpipe tokens to columns
feature_associations Get common nearby features given a query or query hits
feature_stats Feature statistics
feature_subset Filter features
freq_filter Support function for subset method

-- G --

get Access the data from a tCorpus
get_dfm Create a document term matrix.
get_dtm Create a document term matrix.
get_global_i Compute global feature positions
get_kwic Get keyword-in-context (KWIC) strings
get_meta Access the data from a tCorpus
get_stopwords Get a character vector of stopwords

-- L --

laplace Laplace (i.e. add constant) smoothing
lda_fit Estimate a LDA topic model

-- M --

melt_quanteda_dict Convert a quanteda dictionary to a long data.table format
merge_tcorpora Merge tCorpus objects

-- P --

plot.contextHits S3 plot for contextHits class
plot.featureAssociations visualize feature associations
plot.featureHits S3 plot for featureHits class
plot.vocabularyComparison visualize vocabularyComparison
plot_semnet Visualize a semnet network
plot_words Plot a wordcloud with words ordered and coloured according to a dimension (x)
preprocess Preprocess feature
preprocess_tokens Preprocess tokens in a character vector
print.contextHits S3 print for contextHits class
print.featureHits S3 print for featureHits class
print.tCorpus S3 print for tCorpus class

-- R --

refresh_tcorpus Refresh a tCorpus object using the current version of corpustools
replace_dictionary Replace tokens with dictionary match
require_package Check if package with given version exists

-- S --

search_contexts Search for documents or sentences using Boolean queries
search_dictionary Dictionary lookup
search_features Find tokens using a Lucene-like search query
search_recode Recode features in a tCorpus based on a search string
semnet Create a semantic network based on the co-occurence of tokens in documents
semnet_window Create a semantic network based on the co-occurence of tokens in token windows
set Modify the token and meta data.tables of a tCorpus
set_levels Change levels of factor columns
set_meta Modify the token and meta data.tables of a tCorpus
set_meta_levels Change levels of factor columns
set_meta_name Change column names of data and meta data
set_name Change column names of data and meta data
set_network_attributes Set some default network attributes for pretty plotting
set_special Designate column as columns with special meaning (token, lemma, POS, relation, parent)
sgt Simple Good Turing smoothing
show_udpipe_models Show the names of udpipe models
sotu_texts State of the Union addresses
stopwords_list Basic stopword lists
subset Subset a tCorpus
subset.tCorpus S3 subset for tCorpus class
subset_meta Subset a tCorpus
subset_query Subset tCorpus token data using a query
summary.contextHits S3 summary for contextHits class
summary.featureHits S3 summary for featureHits class
summary.tCorpus Summary of a tCorpus object

-- T --

tCorpus tCorpus: a corpus class for tokenized texts
tcorpus tCorpus: a corpus class for tokenized texts
tCorpus$code_dictionary Dictionary lookup
tCorpus$code_features Code features in a tCorpus based on a search string
tCorpus$compare_corpus Compare tCorpus vocabulary to that of another (reference) tCorpus
tCorpus$compare_documents Calculate the similarity of documents
tCorpus$compare_subset Compare vocabulary of a subset of a tCorpus to the rest of the tCorpus
tCorpus$context Get a context vector
tCorpus$deduplicate Deduplicate documents
tCorpus$delete_columns Delete column from the data and meta data
tCorpus$delete_meta_columns Delete column from the data and meta data
tCorpus$dfm Create a document term matrix.
tCorpus$dtm Create a document term matrix.
tCorpus$feats_to_columns Cast the "feats" column in UDpipe tokens to columns
tCorpus$feature_associations Get common nearby terms given a feature query
tCorpus$feature_stats Feature statistics
tCorpus$feature_subset Filter features
tCorpus$get Access the data from a tCorpus
tCorpus$get_meta Access the data from a tCorpus
tCorpus$kwic Get keyword-in-context (KWIC) strings
tCorpus$lda_fit Estimate a LDA topic model
tCorpus$preprocess Preprocess feature
tCorpus$replace_dictionary Replace tokens with dictionary match
tCorpus$search_contexts Search for documents or sentences using Boolean queries
tCorpus$search_features Find tokens using a Lucene-like search query
tCorpus$search_recode Recode features in a tCorpus based on a search string
tCorpus$semnet Create a semantic network based on the co-occurence of tokens in documents
tCorpus$semnet_window Create a semantic network based on the co-occurence of tokens in token windows
tCorpus$set Modify the token and meta data.tables of a tCorpus
tCorpus$set_levels Change levels of factor columns
tCorpus$set_meta Modify the token and meta data.tables of a tCorpus
tCorpus$set_meta_levels Change levels of factor columns
tCorpus$set_meta_name Change column names of data and meta data
tCorpus$set_name Change column names of data and meta data
tCorpus$set_special Designate column as columns with special meaning (token, lemma, POS, relation, parent)
tCorpus$subset Subset a tCorpus
tCorpus$subset_meta Subset a tCorpus
tCorpus$subset_query Subset tCorpus token data using a query
tCorpus$top_features Show top features
tCorpus_compare Corpus comparison
tCorpus_create Creating a tCorpus
tCorpus_data Methods and functions for viewing, modifying and subsetting tCorpus data
tCorpus_docsim Document similarity
tCorpus_features Preprocessing, subsetting and analyzing features
tCorpus_modify_by_reference Modify tCorpus by reference
tCorpus_querying Use Boolean queries to analyze the tCorpus
tCorpus_semnet Feature co-occurrence based semantic network analysis
tCorpus_topmod Topic modeling
tokens_to_tcorpus Create a tcorpus based on tokens (i.e. preprocessed texts)
tokenWindowOccurence Gives the window in which a term occured in a matrix.
top_features Show top features