<p>Embeddings on 10% of the Yahoo Answers dataset.</p>
<p>Yahoo Answers dataset: https://www.kaggle.com/datasets/yacharki/yahoo-answers-10-categories-for-nlp-csv</p>
<p>Indexes used can be found in the code repository https://github.com/bemigini/hubness-reduction-sentence-bert. </p>
<p> </p>
<p>Embeddings are made with sentence BERT models where distilroberta-base (https://huggingface.co/distilroberta-base) is used as the base model. </p>
<p>For more details on models, see the model item: 10.11583/DTU.20708785</p>
Funding
Danish Pioneer Centre for AI, DNRF grant number P1