The software and databases for the publication "Comparative Whole-Genome Analysis of China and Global Epidemic Pseudomonas aeruginosa High-Risk Clones"
The Comprehensive Antibiotic Resistance Database was used for antimicrobial resistance (AMR) gene profiling. The Virulence Factor Database (http://www.mgc.ac.cn/VFs/) was used for virulence gene detection. The serotype was identified by PAst 1.0. Sequence typing of all strains was performed via the MLST scheme (https:// pubmlst.org/paeruginosa/; last accessed August 11,2022).
SNP calling and whole-genome SNP based phylogenetic analysis
The NUCmer (NUCleotide MUMmer) function of MUMmer 3.23 was used to align genome sequences with the reference genome PAO1. The alignment file was filtered by the delta-filter function, and the SNP locus was identified by the show-SNPs function using default parameters. The SNPs, which are located in the repetitive regions or the distance between the two SNP loci is less than 100bp, will be discarded. Then, we concatenated the rest of the SNPs as whole genome SNP sequences. The evolution analysis was conducted by PhyML with an HKY model and 1000 bootstrap. The online tool iToL was used to display the phylogenetic tree. The population structure was calculated by hierBAPS. Clades and subclades were classified according to the phylogenetic tree, with a maximum number of populations (K) at 100, 200, and 300.
Core-Pan genes evolution
The whole gene pool were used for Core-Pan gene analysis by the CD-HIT software, with a 50% pairwise identity and a 0.7 difference in length. Region-specific and common genes were extracted. We then used UpSet image to present the distributions and gene frequencies between different regions. Similarities of pan-genes in different regions were shown as heatmaps.
Single Copy Core Gene-based phylogenetic analysis
Based on Core-Pan gene analysis, we selected the single-copy core genes for the evolutionary analysis. The protein sequences of those single-copy core genes were subjected to multiple sequence alignments by MUSCLE. The phylogenetic tree was inferred using the RAxML software (v8.2.12). Step 1: Use the parameters "-m PROTGAMMAAUTO -p 12345" to build phylogenetic tree 1. Step 2: Use the parameter "-m PROTGAMMAAUTO -p 12345 -f a -x 12345 -# 100" to build phylogenetic tree 2. Step 3: Use the parameters "-m PROTGAMMAAUTO -p 12345 -f b -t tree1 -z tree2" to combine phylogenetic tree 1 and tree 2 into a final version of the phylogenetic tree.
Correlation analysis between geographic regions and antibiotic resistance genes
The R package Hmisc was used for calculating Spearman's correlation, and significant relation were defined as adjusted p-value < 0.05, and absolute value of correlation coefficient > 0.5.
Most Recent Common Ancestor (MRCA) analysis
Bayesian analysis of divergence times was performed using BEAST (v2.4.2). All the SNPs in which at least one isolate differed from the reference strain PAO1 were concatenated. BEAUTi were used to estimate these SNPs and to calculate the phylogenetic distances between isolates. Then, we used BEAST to infer the most recent common ancestor with the following user-determined settings: a lognormal relaxed molecular clock model and a general time-reversible substitution model with gamma correction. Results were produced from one chain with 50 million steps, sampled every 1,000 steps. The first 3 million steps were discarded as a burn-in. The maximum clade credibility tree was generated using the TreeAnnotator program from the BEAST package and displayed using FigTree (v1.4.2). Tree parameters were calculated by Tracer (v1.6).