Technical University of Denmark
Browse
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
1/18
361 files

sts_bert_microsoft-mpnet-base 20 newsgroups embeddings

dataset
posted on 2023-05-03, 10:51 authored by Beatrix Miranda Ginn NielsenBeatrix Miranda Ginn Nielsen

Text embeddings for the 20 Newsgroups dataset

The 20 newsgroups dataset can be fetched through scikit-learn which fetches data from the 20 newsgroups website (http://qwone.com/~jason/20Newsgroups).

Embeddings are made with sentence BERT models where microsoft-mpnet-base (https://huggingface.co/microsoft/mpnet-base) is used as the base model.

 For more details on models, see the model item: 10.11583/DTU.20708785




 

Funding

Danish Pioneer Centre for AI, DNRF grant number P1

History

ORCID for corresponding depositor

Usage metrics

    DTU Compute

    Categories

    Licence

    Exports