<p dir="ltr">We have created a polyanion sodium cathode materials dataset that includes optimizations of structures to the lowest energy, ab initio molecular dynamics simulations trajectories sampled at 1000K, and structures generated from ML-driven molecular dynamics simulation at 1000K using active learning algorithms. The dataset consists of structures sampled from four sodium ion polyanionic cathode materials NaTMPO<sub>4</sub>(olivine) ,NaTMPO<sub>4</sub>(maricite), Na<sub>2</sub>TMSiO<sub>4</sub> and Na<sub>2.56</sub>TM<sub>1.72</sub>(SO<sub>4</sub>)<sub>3</sub> , along with various structures incorporating doping of transition metal ions (TM). We consider four different transition metal ions (Fe, Mn, Co, Ni).</p><p dir="ltr">The dataset consist of 113,532 structures with atomic charge and 184,612 structures without atomic charge.</p><p dir="ltr">For each sampled structure, we record its crystal composition, total energy, atom-wise force vectors, atom-wise magnetic moments derived from Mulliken analysis, and the atomic charges are obtained through Bader analysis when possible. Our polyanion sodium cathode materials dataset serves as a valuable addition to existing datasets, enabling the exploration of phase space while providing insights into the dynamic behavior of the materials.</p><p dir="ltr">The dataset is presented in XYZ format. The dataset is divided into single transition metal ions structures and multiple transition metal ion structures. This division is provided for each of the four cathode materials: NaTMPO<sub>4</sub>(olivine) ,NaTMPO<sub>4</sub>(maricite), Na<sub>2</sub>TMSiO<sub>4</sub> and Na<sub>2.56</sub>TM<sub>1.72</sub>(SO<sub>4</sub>)<sub>3</sub> . For example, Na<sub>2.56</sub>TM<sub>1.72</sub>(SO<sub>4</sub>)<sub>3</sub> structures are split into single transition metal ion types<i> Na2M2SO4_alluadite_single_total.xyz </i>and multiple transition metal ion types <i>Na2M2SO4_alluadite_multiple_optimization.xyz. </i>The single TM ion structures are further separated into the sampling method used to gather the dataset, structure optimization <i>Na2M2SO4_alluadite_single_optimization.xyz</i>, AIMD simulation <i>Na2M2SO4_alluadite_single_AIMD.xyz</i>, and ML driven MD sampling <i>Na2M2SO4_alluadite_single_MD_sampling.xyz</i>. This allows the user to either use the whole dataset or only structures gathered using a specific sampling method.</p><p dir="ltr">The atomic charge dataset is based in the <i>polyanion_cathode_dataset </i>folder and the combined dataset , consisting of 113,532 structures, is available in <i>polyanion_cathode_dataset</i>/<i>Combined.xyz.</i><br></p><p dir="ltr">The dataset, which do not include atomic charges but retain all other physical properties, is stored in the <i>polyanion_cathode_dataset_with_optimization_steps</i> folder and is divided into single TM ions structures and multiple TM ion structures, categorized separately for each of the four cathode materials. For example, the structures without atomic charge from the structural optimization process of Na<sub>2.56</sub>TM<sub>1.72</sub>(SO<sub>4</sub>)<sub>3 </sub>single TM ion structures are collected in <i>polyanion_cathode_dataset_with_optimization_steps/Na2M2SO4_alluadite_single_structural_optimization.xyz</i>.</p><p dir="ltr">The complete dataset, including all structures with atomic charge as well as the structures without atomic charge obtained during structural optimization (298,144 structures), is available as <i>polyanion_cathode_dataset_with_optimization_steps/Combined.xyz</i>.</p><p dir="ltr"><br>To extract structural compositions and physical properties, the ase.io.read function from ASE version 3.23.0 is used. An example of how to extract data and plot the physical properties is provided in https://github.com/dtu-energy/cathode-generation-workflow/tree/main/extract_data/read_data.py and https://github.com/dtu-energy/cathode-generation-workflow/tree/main/extract_data/utils.py contains two functions, one used to attached Bader charges to an ASE atom object an another to combine multiple XYZ data files.<br>To cite the data please use the doi https://doi.org/10.11583/DTU.27202446</p><p><br></p><p dir="ltr">For the sampling density functional theory (DFT) calculation were performed using the Vienna Ab initio simulation package (VASP) version 6.4. The Perdew-Burke-Ernzerhof (PBE) functional with Hubbard-U corrections were applied was utilized for all calculations. The U-values are similar to the ones used for materials project (Fe: 5.3eV, Mn: 3.9eV, Co: 3.32eV, Ni: 6.2eV). For all calculations, an energy cutoff of 520eV was applied, with a smearing width of 0.01eV and convergence criteria set to 1e-5eV for energy and 0.03eV/Å for forces. All calculations were performed with spin polarization. The k-points employed for the four materials were fixed, with NaTMPO<sub>4</sub>(olivine) and NaTMPO<sub>4</sub>(maricite) utilizing [3,4,6] gamma points, Na<sub>2</sub>TMSiO<sub>4 </sub>employing [3,4,4] gamma points and Na<sub>2.56</sub>TM<sub>1.72</sub>(SO<sub>4</sub>)<sub>3</sub> utilizing [2,3,4] gamma points. When constructing supercells, the gamma point in the direction of cell enlargement was halved.</p><p dir="ltr">All molecular dynamics (MD) simulations are conducted using the Langevin thermostat with a friction constant of 0.003. The temperature is maintained at 1000K to facilitate diffusion events, and a time step of 1fs as employed throughout the simulations. All simulations are executed within the canonical (NVT) ensemble and a sample frequency was set to 1fs.</p><p><br></p><p dir="ltr">Versions:<br>3: The dataset consist purely on structures with atomic charge</p><p dir="ltr">4: The dataset is divided into two parts, one with structures with atomic charge and with a subset of the structures without atomic charge<br>5: Finished things up for the publication. Clean the dataset a bit and changed the sign of the atomic charges to be in the unit of e instead of -e. ReadMe file and description has also been updated.</p>