Genetic analysis of whole mitochondrial genome of Lateolabrax maculatus (Perciformes: Moronidae) indicates the presence of two populations along the Chinese coast

The whole mitochondrial genome of Lateolabrax maculatus (Cuvier, 1828) was used to investigate the reasons for the observed patterns of genetic differentiation among 12 populations in northern and southern China. The haplotype diversity and nucleotide diversity of L. maculatus were 0.998 and 0.00169, respectively. Pairwise FST values between populations ranged from 0.001 to 0.429, correlating positively with geographic distance. Genetic structure analysis and haplotype network analysis indicated that these populations were split into two groups, in agreement with geographic segregation and environment. Tajima’s D values, Fu’s Fs tests and Bayesian skyline plot (BSP) indicated that a demographic expansion event may have occurred in the history of L. maculatus. Through selection pressure analysis, we found evidence of significant negative selection at the ATP6, ND3, Cytb, COX3, COX2 and COX1 genes. In our hypotheses, this study implied that demographic events and selection of local environmental conditions, including temperature, are responsible for population divergence. These findings are a step forward toward the understanding of the genetic basis of differentiation and adaptation, as well as conservation of L. maculatus.


INTRODUCTION
The Chinese sea bass, Lateolabrax maculatus (Cuvier, 1828), is an endemic species in Northeast Asia, mainly distributed along the coasts of China, Japan and Korea (Niu et al. 2015). The species is associated with reefs in shallow oceanic waters that widely vary in temperature and salinity. This fish spawns on rocky reefs and juveniles may enter rivers. Because of its delicious meat, fast growth rate and short breeding period, L. maculatus is a valuable recreational and aquaculture fishery species. It is commonly reared in cages and ponds in China, with yields exceeding 10×10 4 tons from 2010 to 2014. In recent years, cage-reared L. maculatus have been imported from China into Korea and Japan for human consumption (An et al. 2014). Germplasm degradation and a reduction in disease resistance of cultured L. maculatus restricts its sustainable development , and calls for an assessment of the population structure of this fish (Barhoumi et al. 2017). The results of such study can be used to prevent overfishing and ultimately declines in the numbers of L. maculatus.
An earlier study using isozyme and RAPD analyses had revealed differences among L. maculatus Chines populations from north to south (Li et al. 2005). In addition, preliminary phylogenetic analysis demonstrated that populations of L. maculatus clustered into different clades, and suggested low genetic diversity within populations . Genetic diversity and population structure were investigated based on a single mitochondrial gene (COI) in five geographical populations . The maternally inherited mitochondrial genome, with not only low recombination frequency but also high mutation rate, is a feasible marker to evaluate the genetic relationships among individuals or populations (Dong et al. 2015, Stoljarova et al. 2016. Revealing the phenotypic and genetic differentiation of a species in distinct populations is a striking topic in evolutionary biology. Convincing evidence suggests that natural selection usually results in adapted phenotypes and genetic divergence (local adaptation) among populations inhabiting diverse environments (Hélène and Luca 2011, Hansen 2010, Kawecki and Ebert 2010. Several studies have ascertained selection in the mitochondrial genes of marine fish. The Cytochrome c oxidase (COX2) was selected to analyze tuna and billfish populations because of the high aerobic performance required of them in natural conditions (Dalziel et al. 2006). To analyze populations of Pacific salmon species, the ND2 and ND5 genes were used since they are under positive selection, illustrating the considerable role of the ND gene complex in the evolution (Garvin et al. 2011).
The whole mitochondrial genome was used to analyze the population structure and selection in the Atlantic herring and Japanese Swellshark. In contrast, a systematic study of the whole mitochondrial genome of L. maculatus has not yet been performed. A more comprehensive sequencing effort involving the mitochondrial genome may increase the resolution of not only population structure but also intraspecific selection dynamics of marine fish.
In this study, we sequenced a large number (n = 85) of whole mitochondrial genome sequences of L. maculatus from 12 locations, covering the entire Chinese coast (Fig. 1). The population structure, genetic diversity and selection pressure were analyzed applying the whole mitochondrial genome. We also compared 12 protein-coding genes in the mitochondria between northern and southern populations and to identify potential genes implicated in divergence and environmental adaptation. The results will be critical and helpful to resource conservation and fisheries management of this target species for stock enhancement in the China Sea (Chen et al. 2019).

Sampling and sequencing
A total of 85 individuals of L. maculatus were collected from 12 geographic locations including Tianjin (TJ), Yantai (YT), Wendeng (WD), Lianyungang (LY), Zhoushan (ZS), Wenzhou (WZ), Shantou (ST), Shenzhen (SZ), Zhanjiang (ZJ), Haikang (HK), Tieshan (TS), Fangcheng (FC) along Chinese coastal waters (Fig. 1). The sample numbers of every location are shown in Table 1. After eugenol anesthesia, white muscles were collected and used for DNA extraction . Total DNA extracted from muscle was used to construct a library. Sequencing libraries with 250 bp insert size were constructed according to the manufacturer's instructions (Illumina, San Diego, CA, USA) and paired-end sequencing was performed using the Illumina HiSeq2500 platform with a read length of 2 × 150 bp. All reads containing adaptor sequences were discarded before the removal of uncertain or low-quality bases using SolexaQA++ (version v.3.1.7.1) (Cox et al. 2010). The retained and trimmed reads (clean reads) were used for variants calling.
This study was approved by the Animal Care and Use committee at College of Ocean and Earth Sciences, Xiamen University. All the methods used in this study were carried out in accordance with approved guidelines.

Assembling mitochondrial genome assembly and annotation
Producing mitochondrial genome assembly from whole genome sequencing (WGS) data is an accurate and efficient approach. Data from the whole genome sequencing of L. japonicus were downloaded from NCBI SRA database (SRR6040926 and SRR6040927) for assembly. Data sets SRR6040926 and SRR6040927 were sequenced on the Illumina HiSeq2500 platform and consisted of paired end reads with a read length of 150bp. Mitochondrial genome sequence assembly based on short pair-end reads was performed using NOVOPlasty (version 2.7.2) (Dierckxsens et al. 2016) by taking L. japonicus mitochondrial genome (downloaded from NCBI, accession number NC_042503) (Lavoué et al. 2014) as a reference sequence. Subsequently, the mitochondrial reference genome of L. maculatus was annotated by DOGMA (http://dogma.ccbb.utexas.edu/index.html) (Wyman et al. 2004). Finally, the newly assembly genome was annotated by MITOS (version 999) (Bernt et al. 2013), a web server for the annotation of metazoan mitochondrial genomes.

Variants calling
The clean reads of 85 individuals were aligned to the reference mitochondrial genome of L. maculatus using BWA (version 0.7.17-r1188) (Li and Durbin 2009) with default parameters. Then GATK (version 4.0.2.1) (Mckenna et al. 2010) was applied for SNP and indel discovery and genotyping across all samples simultaneously using standard hard filtering parameters according to GATK Best Practices recommendations (Der Auwera et al. 2013). Further filtering was performed using vcftools (version 0.1.15) (Danecek et al. 2011) to remove all SNPs with minor allele counts less than 2, genotype missing counts greater than 2, or multiple alleles.

Genetic diversity, population structure and selection analysis
The number of haplotypes (h), haplotype diversity (Hd), number of variable sites (S) and nucleotide diversity (π) were calculated using DnaSP 5.0 (Librado and Rozas 2009). Fu's Fs statistics (Fu 1997) and Tajima's D (Tajima 1989) with 1000 permutation were used to test for neutrality by software Arlequin 3.0 (Excoffier et al. 2005) in historical demographic evaluation. The changes in effective population size were investigated by Bayesian skyline plot (BSP) method (Drummond et al. 2005) in BEAST 1.10 (Suchard et al. 2018) to reconstruct the demographic dynamics. We selected the HKY nucleotide substitution model, a strict clock (Soares et al. 2012), and a substitution rate of 2×10 -9 (Burridge et al. 2008). MCMC analyses were run for 2 × 10 8 generations when necessary to achieve adequate effective sample size (i.e., ≥ 200), sampling every 1,000 generations and discarding the first 10% as burn-in. BEAST outputs were visualized with the Tracer v.1.7.1 (http://tree.bio.ed.ac.uk/software/ tracer/). To explore molecular variance, the analysis of molecular variance components (AMOVA) and fixation index (F ST ) were carried out within and among different populations. Isolation by distance was assessed by correlating pairwise genetic distance (F ST ) between populations with geographical distance (calculated from longitude and latitude) using the Mantel tests (100,000 permutations) implemented by the software Arlequin 3.0 (Excoffier et al. 2005). Pairwise Kimura 2-parameter distances were calculated using Mega 5.0 (Tamura et al. 2011). Except for ND6 gene, which has a different codon usage biases (Hasegawa et al. 1998), the Ka/Ks values of the other 12 protein-coding genes in the mitochondria were evaluated independently utilizing DnaSP 5.0 (Librado and Rozas 2009). From each haplotype block generated by based "Confidence Intervals" method (Gabriel et al. 2002), one tag SNP were selected using Haploview (version 4.2) (Barrett et al. 2005). Then software FRAPPE version 1.1 (Tang et al. 2005) was used to estimate the genetic ancestry of each sample based on the genotypes of tag SNPs. The maximum iteration of EM (expectation maximization) was set as 10,000; convergence threshold was set as 10,000, and the number of populations (K) was set as two and three for each calculation.
Phylogenetic trees were constructed using Bayesian inference methods to explore the phylogenetic relationship among different individuals from 12 geographical locations. Lateolabrax japonicus (GenBank: AP006789.1) was used as an outgroup. Phylogenetic analysis was carried out with MrBayes3.1.2 (Huelsenbeck and Ronquist 2001). The program Modeltest version 3.7 (Posada and Crandall 1998) was used to choose an appropriate substitution model of sequence evolution. The (TrN+I) model was selected as the best fit model. Four independent Markov chains were run for 10,000,000 generations by sampling one tree per 1000 generations and allowing adequate time for convergence. The first 2500 trees (25%) were discarded as part of a burn-in procedure (that was determined by the likelihood of being stationary), the remaining 7500 sampling trees were used to construct a 50% majority rule consensus tree. Two independent runs were used to provide additional confirmation of the convergence of the Bayesian posterior probabilities (BPP) distribution. A median-joining network was constructed using Network 4.51 (Bandelt et al. 1999) to infer relationships among 78 haplotypes of L. maculatus.

Mitochondrial genome assembling and variants calling
The total length of the mitochondrial genome of L. maculatus was 16,601 bp, which is close to L. japonicus (16,593 bp). The mtDNA comprised 13 protein-coding genes (PCGs) with an average length of 878.46 bp as well as two rRNA genes and 22 tRNA genes (Fig. 2). About 1032 G clean bases were obtained by next generation sequencing for 85 individuals from 12 geographic locations along Chinese coast. After aligning all clean reads to the newly assembled mitochondrial genome of L. maculatus, the average mapping ratio was 0.2% while the mapping depth was approximately 1422 (Supplementary Table S1). We obtained 249 high-confidence single nucleotide polymorphism (SNP) sites and 24 indels (Fig. 2). At last, combining all SNPs, indels and the reference genome of L. maculatus, the complete mitochondrial sequences of 85 individuals were generated. The genotype data was given in Supplementary Table S2. The mtDNA sequences of 85 individuals are reliable and used for further analysis.

Population genetic structure and historic demography of L. maculatus
To evaluate the degree of differentiation among populations, pairwise F ST values and genetic distances were calculated based on mitochondrial sequences. According to SST in the sampling places, we can divide populations into two groups, South (FC, TS, HK, ZJ, SZ, ST) and North (WZ, ZS, LY, YT, TJ, WD) (Fig. 1). The pair wise F ST values ranged from 0.001 (SZ-ZS) to 0.429 (TS-YT). Moreover, most F ST values between northern and southern populations were larger than 0.05, which indicated significant genetic differentiation between them ( Table 2). The pairwise Kimura 2-parameter distances ranged from 0.019 (HK-TS) to 0.038 (YT-FC, YT-HK, YT-ST, YT-TS) ( Table 2). In our study, the test of isolation by distance (IBD) showed a significant correlation (R² = 0.6909, p = 1.196E-17) between genetic and geographical distances among populations (Fig. 3). We used hierarchical AMOVA grouping the populations according to their hypothesis (north and south). All populations were divided into two groups, north (WZ, ZS, LY, YT, TJ, WD) and south (FC, TS, HK, ZJ, SZ, ST), according to SST in the sampling places. AMOVA analysis indicates that 24.66% of the genetic variation is between northern and southern groups (p < 0.05). The genetic variance within populations is 72.39% and among populations is 2.95% (Table 3).  A clustering algorithm implemented in Structure (Pritchard et al. 2000) was used to infer the relationships among populations. To reduce the redundant information of highly linked SNPs, we identified 119 haplotype blocks from all SNPs and selected one tag SNP from each block. The number of populations, K, was set to two and three to reveal any hierarchical population structure. The most likely population structure was observed for K = 3, and discriminated the two dominating regions, i.e. southern and northern populations (Fig. 4). The phylogenetic analysis (Bayesian tree) based on whole-mitochondrial sequences was in accordance with their geographic distribution, of which WD, TJ, YT, LY, ZS and WZ clustered into group "North"; FC, TS, HK ZJ, and ST clustered into group "South" (Fig. 5). It is noteworthy that population SZ    is distributed in the southern and northern regions. To understand the phylogenetic and geographical relationships among all identified haplotypes, a network, based on the median-joining method, was constructed (Fig. 5). Consequently, a slightly star-like topology with a great number of unique haplotypes (91.8%) was obtained. Haplogroups South and North included most haplotypes from southern and northern populations, respectively, except for SZ, ZS and WZ populations, which had individuals in both haplogroups (Fig. 6). The neutrality tests of Tajima's D, Fu's Fs were carried out to detect the historic demography of L. maculatus. For all samples, the neutrality test obtained negative Tajima's D value (-1.451). Almost Tajima's D values for 12 populations were negative, and the values for TS and YT were significant (Table 1). In Fu's Fs tests, a very large negative Fu's Fs index (-26.785) was presented when all samples were pooled together. These values signified an excess of low frequency polymorphisms, indicating population size expansion and/or negative selection acting in the population. Meanwhile, the BSP analysis showed the species underwent population expansions (Fig. 7).

Mitochondrial genomic selective pressure analysis
To investigate the evolutionary rate differences between southern and northern clusters, the number of nonsynonymous substitutions per nonsynonymous site (Ka) and the number of synonymous substitutions per synonymous site (Ks) were calculated for 12 mitochondrial PCGs. The ratio of Ka/Ks is normally deemed to be an indicator of selective pressure and evolutionary relationship among species at the molecular level (Li et al. 2015, Shen et al. 2007. According to previous studies (Yang and Nielsen 2000), the Ka/Ks value > 1, Ka/Ks = 1, and Ka/ Ks < 1 suggest positive (adaptive) selection, neutral evolution and negative (purifying) selection, respectively. In our study, the Ka/Ks values of 12 PCGs varied approximately from 0.0177 (ATP6) to 0.1065 (ATP8), which indicated negative (purifying) selection (Fig. 8). Our result of the Ka/Ks ratio implied that the abundant genes evolved under negative selection, which means natural selection against deleterious mutations with negative selective factors (Yang and Bielawski 2000). The Ka/Ks values were higher in ATP8 and ND5 genes among all populations, while the values were lower in ATP6, ND3, Cytb, COX3, COX2 and COX1 genes. From this, ATP8 and ND5 genes may be under the less selective pressure while ATP6, ND3, Cytb, COX3, COX2 and COX1 genes might have undergone adaptive evolution to cope with different environment. But there were no significant genetic differences in certain mitochondrial genes between the southern and northern populations.  The genetic diversity of a population is correlated with the survival status of the species. A reduction or loss of genetic diversity is a great threat to wild populations living in a changeable environment. Haplotype diversity (Hd) and nucleotide diversity (π) are two important parameters to reflect the genetic diversity of a population (Liu et al. 2010). Both northern and southern populations of L. maculatus have high haplotype diversity. However, the nucleotide diversity of the southern population is lower than of the northern population. The F ST value is an effective index when assessing gene flow and genetic diversity between populations. The normal value, F ST of 0.05, was defined as inappreciable genetic differentiation, while greater than 0.25 means high genetic differentiation within a population (Holsinger and Weir 2009). Based on the pairwise F ST values of populations from the same major geographic location, there is no significant genetic variation (South: 0.004-0.122; North: 0.003-0.078) except SZ ( Table 2). The F ST value of SZ was closer to the northern populations than to the southern populations. This probably demonstrated that gene flow was prominent among populations along the coastline without physical barriers. Considering the aquaculture of L. maculatus in Shenzhen using specimens introduced from the northern regions, the SZ population may have spread from the northern population.
By contrast, comparing the southern and northern populations of L. maculatus from different habitats revealed signif-icant differences (0.003-0.429). The pairwise genetic distances between populations ranged from 0.010 to 0.038, which revealed a significant genetic divergence between geographically distant populations. Similar patterns were observed in the Fst value. In our research, a significant relationship between pairwise genetic distance (F ST ) and geographic distance (p < 0.05) was identified (Fig. 3). The greater the geographic distance between two populations, the higher the Fst value. When we used hierarchical AMOVA analysis between northern and southern groups, there was a 24.66% genetic variation. Adaptation to local conditions leads to a reduction in the within-population diversity as well as an increase in the among-population diversity (Storz 2010). Admixture analysis (Fig. 4), Phylogenetic trees (Fig. 5) and Median-joining network (Fig. 6) presented consistent results that individuals from different populations clustered with their geographic clade. The analysis indicated high level of genetic divergence between southern and northern geographic populations of L. maculatus, which was mainly in agreement with previous population genetic studies of L. maculatus based on markers from nuclear (Shao et al. 2009, Xu et al. 2017.

Population expansion of L. maculatus in history
The Tajima D-test (Tajima 1989) and F S test of Fu (Fu 1997) are used to examine whether the hypothesis of neutrality holds. Significant negative D and F S statistics can be interpreted as symbols of population expansion. Tajima's D statistics is more sensitive than Fu's Fs tests on population expansion (Fu 1997). A negative and significant Tajima's D value implies that the sequence contains more nucleotide changes than the neutral evolution model, which may suggest a population expansion event in history and explain the wide distribution of several shared haplotypes ). Tajima's D values for 12 populations of L. maculatus were mostly negative (Table  1). When all samples were pooled together, a highly negative Fs index (-26.785) and a significant negative D value (-1.451) were obtained, which indicated that a demographic expansion event in the past. The BSP analysis (Fig. 7) also supported the hypothesis of population expansion (Slatkin and Hudson 1991).

Negative selection in mitochondrial protein-coding genes
The effects of IBD in the genetic differentiation of populations was originally illustrated in 1948 (Wright 1943). Since then, IBD has been observed and reported in many population genetic studies, for example, the population genetic studies Nile Tilapia throughout west Africa and the coral-specialist yellowbar angelfish from the Northwestern Indian Ocean, the genetic distances between populations were clearly correlated to their geographic distance (Lind et al. 2019, Torquato et al. 2019. In this study, the IBD test performed by Mantel tests showed a significant correlation between genetic and geographical distances among L. maculatus populations. A high correlation value (R 2 = 0.6909) suggested that the gene flow in L. maculatus could be the impact of geographical distances, which limited the dispersion of wild populations. Apart from this, the biological characteristics (e.g., dispersal ability and adaptive capacity), ecological factors (e.g., complicated topography and climatic environment) and anthropogenic factors (e.g., aquaculture and agriculture cultivation) might account for the genetic isolation (Liu et al. 2019). For L. maculatus, one of the apparent selective factors is sea surface temperature (SST), which can influence enzyme kinetics, physiology and ecological fitness (Powers et al. 1991). In winter, the SST is 10 °C in the Bohai Sea Gulf and 25 °C in the Southern Chinese Sea demonstrated a steep temperature gradient from north to south along the coast of China . The lower water temperature in the Bohai Sea and Yellow Sea might limit the dispersal capabilities of L. maculatus . SST may be the principal environmental factor to adapt for specific population of L. maculatus (Strohm et al. 2015, Sun et al. 2011. Moreover, potential hydrological barriers could also result in a geographical segregation and lead to genetic divergence (Ma et al. 2010).
The rejection of the historical and equilibrium neutral models implicates natural selection. The southern population may display significant genetic differences in certain protein coding genes compared with the northern population, which is related to their respective adaptations to specific environments. To accurately detect the genomic footprints left by natural selection, we performed selection pressure analysis across the mitochondrial genome. The Ka/Ks ratios of 12 protein-coding genes were calculated but failed to detect significant evidence of selection between northern and southern populations. This could be a function of the fact that these mitochondrial genes are particularly conserved (James Bruce et al. 2008).
For comparison, the Cytb gene of wild boars was significantly different between the Siberian and the Vietnamese clades . Since Siberian wild boars can survive the cold subarctic climate, it is plausible to think that genes related to energy production might contribute to their local adaptation. Mitochondria are the powerhouses of cells, playing essential roles in both thermogenesis and thermoregulation (Gambert andRicquier 2007, Yoshihiko et al. 2008). However, the molecular mechanism of temperature adaptation may be not necessarily in the mitochondrion. For example, a comparison of the whole genomic sequences of Leuciscus waleckii inhabiting extremely alkaline water and freshwater revealed significant genetic differences in some genes and genome regions that have undergone positive selection in the alkaline water population (Xu et al. 2017).
Correlations between genetic diversity, differentiation and environmental factors were also of interest due to the strong environmental gradients found on the coast of China, as well as the potential adaptive value of mitochondrial variability. However, there is very little published information on the adaptive roles of individual mitochondrial genes in fish. Further genetic and evolutionary studies utilizing whole genome approaches are necessary for better understanding the local adaptation of L. maculatus in the coast of China.
The whole mitogenomes of L. maculatus from different latitudes have been sequenced and used for population genetic analysis. Apparent genetic differences exist between the northern and southern populations. Population expansion in L. maculatus was estimated to infer their phylogenetic status, selection pressures and historical demography. No significantly different selection pressures were identified in the mitochondrial PCGs between the northern and the southern population. Studies on the population genetics of L. maculatus is significant for the effective management of fishery genetic resources and insights into population structure. Our results afford a crucial step for future analyses of the genetics of adaptation and conservation efforts in the L. maculatus species.