Technical University of Denmark
Browse
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
1/18
360 files

sts_bert_microsoft-mpnet-base Yahoo Answers small embeddings

Download all (85.3 GB)
dataset
posted on 2023-05-03, 13:26 authored by Beatrix Miranda Ginn NielsenBeatrix Miranda Ginn Nielsen

Text embeddings for ten Percent of the Yahoo Answers dataset

The Yahoo Answers dataset can be fetched from https://www.kaggle.com/datasets/yacharki/yahoo-answers-10-categories-for-nlp-csv and was made by Zhang et al (https://doi.org/10.48550/arXiv.1509.01626).

 Indexes used can be found in the code repository https://github.com/bemigini/hubness-reduction-sentence-bert. 


Embeddings are made with sentence BERT models where microsoft-mpnet-base (huggingface.co/microsoft/mpnet-base) is used as the base model. 


For more details on models, see the model item: 10.11583/DTU.20708785 


 sts_bert_microsoft-mpnet-base_cos_ORTHOGONAL_z_False_n_True_c_False_seed42_yahoo_answers_small_train.h5 - currently missing

Funding

Danish Pioneer Centre for AI, DNRF grant number P1

History

ORCID for corresponding depositor

Usage metrics

    Categories

    Licence

    Exports