BERT

BERT (Bidirectional Transformers for Language Understanding) is a transformer-based language model. This package supports tokens and sentence embeddings using pre-trained language models, supported by the package written by HuggingFace. In shorttext, to run:

>>> from shorttext_bert.encoder import WrappedBERTEncoder
>>> encoder = WrappedBERTEncoder()   # the default model and tokenizer are loaded
>>> sentences_embedding, tokens_embedding, tokens = encoder.encode_sentences(['The car should turn right.', 'The answer is right.'])

The third line returns the embeddings of all sentences, embeddings of all tokens in each sentence, and the tokens (with CLS and SEP) included. Unlike previous embeddings, token embeddings depend on the context; in the above example, the embeddings of the two “right“‘s are different as they have different meanings.

The default BERT models and tokenizers are bert-base_uncase. If you want to use others, refer to HuggingFace’s model list .

BERTScore

BERTScore includes a category of metrics that is based on BERT model. This metrics measures the similarity between sentences. To use it,

>>> from shorttext_bert.scorer import BERTScorer
>>> scorer = BERTScorer()    # using default BERT model and tokenizer
>>> scorer.recall_bertscore('The weather is cold.', 'It is freezing.')   # 0.7223385572433472
>>> scorer.precision_bertscore('The weather is cold.', 'It is freezing.')   # 0.7700849175453186
>>> scorer.f1score_bertscore('The weather is cold.', 'It is freezing.')   # 0.7454479746418043

For BERT models, please refer to

class shorttext_bert.scorer.BERTScorer(model=None, tokenizer=None, max_length=48, nbencodinglayers=4, device='cpu')[source]

This is the class that compute the BERTScores between sentences. BERTScores include recall BERTScores, precision BERTScores, and F1 BERTSscores. For more information, please refer to this paper:

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav Artzi, “BERTScore: Evaluating Text Generation with BERT,” arXiv:1904.09675 (2019). [arXiv]

__init__(model=None, tokenizer=None, max_length=48, nbencodinglayers=4, device='cpu')[source]

It is the class that compute the BERTScores between sentences.

Parameters:

model (str) – BERT model (default: None, with model bert-base-uncase to be used)
tokenizer (str) – BERT tokenizer (default: None, with model bert-base-uncase to be used)
max_length (int) – maximum number of tokens of each sentence (default: 48)
nbencodinglayers – number of encoding layers (taking the last layers to encode the sentences, default: 4)
device (str) – device the language model is stored (default: cpu)

compute_matrix(sentence_a, sentence_b)[source]

Compute the table of similarities between all pairs of tokens. This is used for calculating the BERTScores.

Parameters:

sentence_a (str) – first sentence
sentence_b (str) – second sentence

Returns:

similarity matrix of between tokens in two sentences

Return type:

numpy.ndarray

recall_bertscore(reference_sentence, test_sentence)[source]

Compute the recall BERTScore between two sentences.

Parameters:

reference_sentence (str) – reference sentence
test_sentence (str) – test sentence

Returns:

recall BERTScore between the two sentences

Return type:

float

precision_bertscore(reference_sentence, test_sentence)[source]

Compute the precision BERTScore between two sentences.

Parameters:

reference_sentence (str) – reference sentence
test_sentence (str) – test sentence

Returns:

precision BERTScore between the two sentences

Return type:

float

f1score_bertscore(reference_sentence, test_sentence)[source]

Compute the F1 BERTScore between two sentences.

Parameters:

reference_sentence (str) – reference sentence
test_sentence (str) – test sentence

Returns:

F1 BERTScore between the two sentences

Return type:

float

Reference

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv:1810.04805 (2018). [arXiv]

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav Artzi, “BERTScore: Evaluating Text Generation with BERT,” arXiv:1904.09675 (2019). [arXiv]