We are currently experiencing problem with download of datasets
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed1_yahoo_answers_small_test.h5 (20.26 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed1_yahoo_answers_small_train.h5 (469.44 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed6_yahoo_answers_small_test.h5 (20.26 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed6_yahoo_answers_small_train.h5 (469.49 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed7_yahoo_answers_small_test.h5 (20.26 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed7_yahoo_answers_small_train.h5 (469.56 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed10_yahoo_answers_small_test.h5 (20.27 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed10_yahoo_answers_small_train.h5 (469.59 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed11_yahoo_answers_small_test.h5 (20.26 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed11_yahoo_answers_small_train.h5 (469.53 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed12_yahoo_answers_small_test.h5 (20.27 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed12_yahoo_answers_small_train.h5 (469.56 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed13_yahoo_answers_small_test.h5 (20.26 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed13_yahoo_answers_small_train.h5 (469.53 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed22_yahoo_answers_small_test.h5 (20.27 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed22_yahoo_answers_small_train.h5 (469.59 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed23_yahoo_answers_small_test.h5 (20.26 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed23_yahoo_answers_small_train.h5 (469.55 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed42_yahoo_answers_small_test.h5 (20.27 MB)
.H5
sts_bert_microsoft-mpnet-base_cos_dist_ORTHOGONAL_z_False_n_False_c_False_seed42_yahoo_answers_small_train.h5 (469.57 MB)
1/0
sts_bert_microsoft-mpnet-base Yahoo Answers small embeddings
dataset
posted on 2023-05-03, 13:26 authored by Beatrix Miranda Ginn NielsenBeatrix Miranda Ginn NielsenText embeddings for ten Percent of the Yahoo Answers dataset
The Yahoo Answers dataset can be fetched from https://www.kaggle.com/datasets/yacharki/yahoo-answers-10-categories-for-nlp-csv and was made by Zhang et al (https://doi.org/10.48550/arXiv.1509.01626).
Indexes used can be found in the code repository https://github.com/bemigini/hubness-reduction-sentence-bert.
Embeddings are made with sentence BERT models where microsoft-mpnet-base (huggingface.co/microsoft/mpnet-base) is used as the base model.
For more details on models, see the model item: 10.11583/DTU.20708785
sts_bert_microsoft-mpnet-base_cos_ORTHOGONAL_z_False_n_True_c_False_seed42_yahoo_answers_small_train.h5 - currently missing
Funding
Danish Pioneer Centre for AI, DNRF grant number P1
History
ORCID for corresponding depositor
Usage metrics
Categories
Keywords
Licence
Exports
RefWorksRefWorks
BibTeXBibTeX
Ref. managerRef. manager
EndnoteEndnote
DataCiteDataCite
NLMNLM
DCDC