Matryoshka Representation Learning
Paper • 2205.13147 • Published • 27
How to use tsss1/modernbert-embed-base-legal-matryoshka-2 with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("tsss1/modernbert-embed-base-legal-matryoshka-2", trust_remote_code=True)
sentences = [
"the Polaris Solicitations as currently drafted do not comply with Section 3306(c)(3). In its request \nto apply Section 3306(c)(3) to the Polaris Solicitations, GSA stated that \n \n \n \nSupplement 2, AR at 2907–08. Because GSA adopted an overly broad understanding of Section \n3306(c)(3)’s scope, GSA stated the Solicitations will include a “full range of order types,”",
"What did Al-Hamim confirm about the citations?",
"What understanding did GSA adopt regarding Section 3306(c)(3)'s scope?",
"What was the reason for denying the agency's motion without prejudice?"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from nomic-ai/nomic-embed-text-v2-moe on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NomicBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("tsss1/modernbert-embed-base-legal-matryoshka-2")
# Run inference
sentences = [
'against six federal agencies pursuant to the Freedom of Information Act (“FOIA”), 5 U.S.C. \n§ 552, claiming that the defendant agencies have violated the FOIA in numerous ways.1 NSC’s \nclaims run the gamut, including challenges to: the withholding of specific information; the \nadequacy of the agencies’ search efforts; the refusal to process FOIA requests; the refusal to',
'How many federal agencies is the action against?',
'Which case was quoted in Entertainment Ltd. v. U.S. Dep’t of Interior regarding the retroactivity of statutes?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
dim_768, dim_512, dim_256, dim_128 and dim_64InformationRetrievalEvaluator| Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
|---|---|---|---|---|---|
| cosine_accuracy@1 | 0.5533 | 0.5502 | 0.524 | 0.4621 | 0.3277 |
| cosine_accuracy@3 | 0.6105 | 0.5997 | 0.5703 | 0.5209 | 0.3864 |
| cosine_accuracy@5 | 0.7125 | 0.7002 | 0.6754 | 0.609 | 0.4791 |
| cosine_accuracy@10 | 0.8083 | 0.7898 | 0.7682 | 0.6862 | 0.5641 |
| cosine_precision@1 | 0.5533 | 0.5502 | 0.524 | 0.4621 | 0.3277 |
| cosine_precision@3 | 0.5276 | 0.5219 | 0.4951 | 0.4456 | 0.322 |
| cosine_precision@5 | 0.4127 | 0.4046 | 0.3889 | 0.3536 | 0.2677 |
| cosine_precision@10 | 0.2502 | 0.243 | 0.2391 | 0.213 | 0.1692 |
| cosine_recall@1 | 0.1985 | 0.1989 | 0.1883 | 0.1656 | 0.1172 |
| cosine_recall@3 | 0.5175 | 0.5138 | 0.4858 | 0.4364 | 0.3215 |
| cosine_recall@5 | 0.6555 | 0.6434 | 0.6172 | 0.5608 | 0.4338 |
| cosine_recall@10 | 0.7895 | 0.7696 | 0.7508 | 0.6692 | 0.5402 |
| cosine_ndcg@10 | 0.6787 | 0.6665 | 0.6436 | 0.5742 | 0.4412 |
| cosine_mrr@10 | 0.6103 | 0.6034 | 0.5769 | 0.5144 | 0.3815 |
| cosine_map@100 | 0.6544 | 0.6473 | 0.6222 | 0.5623 | 0.4319 |
positive and anchor| positive | anchor | |
|---|---|---|
| type | string | string |
| details |
|
|
| positive | anchor |
|---|---|
aspect” of “substantial independent authority.” Dong v. Smithsonian Inst., 125 F.3d 877, 881 |
What court circuit is mentioned in connection with the case Sweetland v. Walters? |
the entire list of remaining PQPs shifts up one position. |
What is the GSA responsible for verifying? |
Department components], to assist with the processing of [FOIA or Privacy Act] requests for |
What is the identified purpose for assisting with processing FOIA or Privacy Act requests? |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
eval_strategy: epochper_device_train_batch_size: 4per_device_eval_batch_size: 2gradient_accumulation_steps: 4learning_rate: 2e-05num_train_epochs: 2lr_scheduler_type: cosinewarmup_ratio: 0.1bf16: Truetf32: Falseload_best_model_at_end: Trueoptim: adamw_torch_fusedbatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 4per_device_eval_batch_size: 2per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 4eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 2max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Falselocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
|---|---|---|---|---|---|---|---|
| 0.0549 | 10 | 2.6704 | - | - | - | - | - |
| 0.1099 | 20 | 1.7246 | - | - | - | - | - |
| 0.1648 | 30 | 1.3634 | - | - | - | - | - |
| 0.2198 | 40 | 1.0962 | - | - | - | - | - |
| 0.2747 | 50 | 0.8985 | - | - | - | - | - |
| 0.3297 | 60 | 0.8667 | - | - | - | - | - |
| 0.3846 | 70 | 0.7371 | - | - | - | - | - |
| 0.4396 | 80 | 1.038 | - | - | - | - | - |
| 0.4945 | 90 | 0.733 | - | - | - | - | - |
| 0.5495 | 100 | 0.9032 | - | - | - | - | - |
| 0.6044 | 110 | 0.7283 | - | - | - | - | - |
| 0.6593 | 120 | 0.6085 | - | - | - | - | - |
| 0.7143 | 130 | 0.5774 | - | - | - | - | - |
| 0.7692 | 140 | 0.6164 | - | - | - | - | - |
| 0.8242 | 150 | 0.8098 | - | - | - | - | - |
| 0.8791 | 160 | 0.6534 | - | - | - | - | - |
| 0.9341 | 170 | 0.6035 | - | - | - | - | - |
| 0.9890 | 180 | 0.5209 | - | - | - | - | - |
| 1.0 | 182 | - | 0.6911 | 0.6719 | 0.6341 | 0.5600 | 0.4203 |
| 1.0440 | 190 | 0.3718 | - | - | - | - | - |
| 1.0989 | 200 | 0.2309 | - | - | - | - | - |
| 1.1538 | 210 | 0.2128 | - | - | - | - | - |
| 1.2088 | 220 | 0.138 | - | - | - | - | - |
| 1.2637 | 230 | 0.1129 | - | - | - | - | - |
| 1.3187 | 240 | 0.0889 | - | - | - | - | - |
| 1.3736 | 250 | 0.0607 | - | - | - | - | - |
| 1.4286 | 260 | 0.1156 | - | - | - | - | - |
| 1.4835 | 270 | 0.0826 | - | - | - | - | - |
| 1.5385 | 280 | 0.098 | - | - | - | - | - |
| 1.5934 | 290 | 0.0891 | - | - | - | - | - |
| 1.6484 | 300 | 0.0451 | - | - | - | - | - |
| 1.7033 | 310 | 0.0581 | - | - | - | - | - |
| 1.7582 | 320 | 0.0722 | - | - | - | - | - |
| 1.8132 | 330 | 0.0785 | - | - | - | - | - |
| 1.8681 | 340 | 0.1407 | - | - | - | - | - |
| 1.9231 | 350 | 0.1022 | - | - | - | - | - |
| 1.9780 | 360 | 0.0771 | - | - | - | - | - |
| 2.0 | 364 | - | 0.6787 | 0.6665 | 0.6436 | 0.5742 | 0.4412 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
FacebookAI/xlm-roberta-base