Ⅰ. INTRODUCTION
Alfalfa (Medicago sativa L.) is an autotetraploid (2n = 4x = 32) perennial forage legume with a relatively large genome, ranging from approximately 800 to 1000 Mb (Blondon et al., 1994;Shen et al., 2020). It is recognized as one of the most important forage crops globally due to its high protein content, excellent forage quality, high productivity with multiple harvests, and adaptability to various environmental conditions (Shi et al., 2017;Brumă et al., 2023). Furthermore, alfalfa plays a crucial role in sustainable livestock and cropping systems by contributing to maintenance of soil fertility through symbiotic nitrogen fixation with root nodule bacteria (Elgharably and Benes, 2021).
In Korea, the demand for high-quality forage has been increasing due to improvements in livestock breeding and feeding management techniques, leading to a rise in high-performing cattle (Youm, 1991;Kim, 2025). However, a significant portion of high-quality forage, such as alfalfa hay, relies on imports, making it vulnerable to price fluctuations and supply instability. Recent external factors like exchange rate volatility due to international affairs, shipping delays, and abnormal weather conditions in major producing countries hinder the stable supply of alfalfa, imposing a significant financial burden on livestock farms (Chang et al., 2024). Under these circumstances, the development of domestically bred alfalfa cultivars suited to Korea's growing environment is crucial for ensuring stable forage supply and expanding the foundation for self-sufficiency. The National Institute of Animal Science (NIAS) of the Rural Development Administration (RDA) has been promoting the development of alfalfa cultivars with superior domestic adaptability through crossbreeding and selection using both domestic and international genetic resources, which has resulted in the development of the 'Alfaone' cultivar (Lee et al., 2025).
With the increasing distribution of Korea-bred cultivars, there is a growing need for accurate cultivar identification techniques to support cultivar protection and verify the authenticity of commercially marketed seeds (Korir et al., 2012;Yu and Chung, 2021). Alfalfa is a cross-pollinated species with high heterozygosity, and its morphological traits are significantly influenced by cultivation environment and growing conditions, thus limiting stable cultivar differentiation based solely on phenotype (Annicchiarico et al., 2025;Medina et al., 2025). Especially, traits like emergence characteristics, plant type, and yield can vary depending on the experimental year and cultivation site, making it difficult to achieve sufficient discrimination using only morphological characteristics. Therefore, a molecular marker-based discrimination system utilizing DNA-level variations is necessary to enhance the accuracy of cultivar protection and quality control.
Molecular markers, which directly detect nucleotide sequence variation in the genome, have been widely used to assess genetic differences among cultivars in crop genetic analysis and breeding programs (Park et al., 2017;Zhang et al., 2023). In alfalfa, a genetically complex autotetraploid species, molecular marker development for cultivar discrimination has also been actively pursued. Early studies used simple sequence repeat (SSR) markers to evaluate genetic diversity and identify cultivars (Flajoulot et al., 2005), and subsequent studies applied single nucleotide polymorphism (SNP) and SSR markers to analyze genetic diversity and population structure and to identify cultivarspecific variants (Li et al., 2014;Qiang et al., 2015). Among the available marker systems, SNP markers are particularly suitable for alfalfa cultivar discrimination because they are abundant throughout the genome, compatible with automated high-throughput analysis platforms, and highly reproducible across laboratories (Semagn et al., 2014;Thomson, 2014). In addition, next-generation sequencing (NGS) approaches, such as whole-genome resequencing and genotyping-by-sequencing (GBS), enable efficient discovery of polymorphisms among cultivars and facilitate the selection of candidate SNPs for marker development (Mammadov et al., 2012).
Therefore, this study aimed to develop SNP molecular markers that can accurately identify the domestically bred alfalfa cultivar 'Alfaone'. To achieve this, WGS-based data and GBS-based data were comparatively analyzed to select effective SNPs for ‘Alfaone’ discrimination, and the potential of the selected markers for cultivar differentiation was evaluated. The SNP markers developed in this study are expected to provide foundational data that can be utilized for the protection of 'Alfaone', verification of purity in distributed seeds, and enhancement of the efficiency of future domestic alfalfa breeding programs.
Ⅱ. MATERIALS AND METHODS
1. Plant materials
This study analyzed alfalfa lines and cultivars developed by the National Institute of Animal Science, Rural Development Administration in Korea. The plant materials included ‘Alfaone (MS001)’, ‘MS002’, ‘MS003’, ‘MSCB04’, ‘MSCB05’, ‘MSCB06’, and ‘Alfaking (MSCB07)’. All plant materials were grown for three weeks, after which young leaves were collected from each plant. To preserve DNA integrity, the collected leaves were immediately flash-frozen in liquid nitrogen. Frozen leaf samples were stored at -70°C in an ultra-low temperature freezer until genomic DNA extraction.
2. DNA extraction and sequencing data production
DNA was extracted from the leaf tissues of each cultivar using the CTAB method. To verify the quality of the extracted DNA, 3 μl of DNA was subjected to electrophoresis at 200V for 25 minutes, and a high-quality band was confirmed using a 1kb+ ladder.
For ‘Alfaone (MS001)’, ‘MS002’, and ‘MS003’ samples, sequencing was performed via Genotyping-by-sequencing (GBS), where DNA was fragmented using the ApeKI restriction enzyme. Barcode adapters and common adapters were diluted in 50 μM TE buffer (10mM Tris, 0.1mM EDTA), and then mixed with barcode and common adapters using 10x adapter buffer (500mM NaCl, 100mM Tris-Cl). Multiplexing PCR products were verified by agarose gel electrophoresis, purified using a QIAquick PCR Purification Kit, and then amplified restriction fragments were used to construct NGS libraries. For ‘MSCB04’, ‘MSCB05’, ‘MSCB06’, and ‘Alfaking (MSCB07)’ samples, 100 μg of gDNA was fragmented for Whole-genome-sequencing (WGS) library preparation, treated at 32°C for 15 minutes, and then fragmentation enzymes were inactivated at 65°C for 30 minutes. Adapter ligation was performed at 20°C for 15 minutes, followed by purification with a QIAquick PCR Purification Kit. PCR was conducted for 8 cycles using a GeneExplorer instrument from BIOER, and the band quality was assessed using a 100bp ladder before NGS library construction.
Finally, sequencing of the prepared libraries was performed at 2x150bp using the Illumina Hiseq X platform.
3. Sequencing data preprocessing and SNP analysis
The raw sequencing data were preprocessed using two programs: SolexaQA (version 1.13) and Trimmomatic (version 0.39) (Cox et al., 2010;Bolger et al., 2014). First, low-quality sequences with a phred score below 20 and reads shorter than 25bp were removed using DynamicTrim and LengthSort in SolexaQA to extract cleaned reads (Cox et al., 2010). Additional removal of sequences with phred quality scores below 20 was performed using Trimmomatic to obtain the final set of cleaned reads (Bolger et al., 2014). GBS data were de-multiplexed based on barcode sequences, after which barcode sequences were removed using cutadapt (version 1.8.3), and preprocessing was completed with Trimmomatic (Martin, 2011). The preprocessed cleaned reads were mapped to the Medicago sativa L. reference genome using BWA-MEM (version 0.7.17) (Li, 2013). The reference genome used was the 3.15 Gbp sequence composed of 32 chromosomes and 9,789 unscaffolded contigs, as reported by Chen et al. (2020).
SNP extraction was performed by analyzing differences with the reference genome using SAMTools (version 0.1.16), and the varFilter program within SAMTools was used for filtering (Li et al., 2009). During the SNP filtering process, key options such as a minimum mapping quality (-Q) of 30, a minimum read depth (-d) of 3, and a maximum read depth (-D) of 611 were applied. Additional settings were used to filter out nearby SNPs or variations near In/Dels. Extracted SNPs were classified as homozygous if the read rate was 90% or higher, and heterozygous if it was between 40-60%. Finally, SNP information from samples produced by GBS and WGS was aligned based on their positions in the reference genome to create an SNP matrix. Filtering criteria included a minor allele frequency (MAF) > 5%, Missing rate < 30%, and REF/ALT frequency > 25% for SNP selection.
SNP loci were annotated using the genome feature annotation tracks associated with the Medicago sativa L. reference assembly reported by Chen et al. (2020), which comprises 32 chromosomes and 9,789 unscaffolded contigs. Each SNP was represented as "a" for the reference allele, "b" for the alternative allele, "h" for heterozygous, and "-" for missing data. Candidate loci were retained where consistent homozygous types were observed across all ‘Alfaone (MS001)’ individuals, while polymorphism was observed in other cultivars. From these candidates, the minimum number of SNPs required to uniquely identify ‘Alfaone (MS001)’ was determined, and the selected SNPs were designated as cultivar-specific markers.
4. Phylogenetic tree and principal component analysis (PCA)
In this study, a phylogenetic tree analysis was performed to evaluate the genetic relationships among alfalfa cultivars. The phylogenetic tree was constructed using the neighbor-joining (NJ) method (Saitou and Nei, 1987) implemented in MEGA11 (Tamura et al., 2021). Genetic distance information calculated from 25,496 SNP loci selected through the filtering process was used for the analysis. To assess the reliability of the inferred tree, bootstrap analysis was conducted with 1,000 replicates, and the values shown at each node indicate the proportion of replicate trees in which the corresponding cluster was observed. Genetic distances among samples were calculated using the maximum composite likelihood method (Tamura et al., 2004), and branch lengths were expressed in the same units as those of the genetic distances. Based on these results, the genetic relationships and population structure among alfalfa cultivars were evaluated, and the potential for cultivar discrimination was examined.
PCA was also performed to assess the genetic relationships among cultivars. PCA was conducted using the SNPRelate package (Zheng et al., 2012) in R, based on the same set of 25,496 SNP loci selected after filtering. Genotype data from each sample were used to derive principal components, and the contribution of each principal component to the total variation was estimated by eigen decomposition. The resulting principal component scores were used to visualize the genetic similarity and population structure among samples, thereby allowing evaluation of the genetic differentiation among alfalfa cultivars.
Ⅲ. RESULTS AND DISCUSSION
1. Sequencing data production and preprocessing
For the GBS method, ‘Alfaone (MS001)’, ‘MS002’, and ‘MS003’ samples were sequenced in triplicate, and the integrated data yielded 1.30 Gb, 0.63 Gb, and 1.10 Gb, respectively (Table 1). Through WGS, 32.7 Gb, 39.2 Gb, 31.0 Gb, and 32.5 Gb of sequence data were produced from ‘MSCB04’, ‘MSCB05’,‘MSCB06’, and ‘Alfaking (MSCB07)’ samples, respectively, corresponding to 9.70 to 12.25-fold coverage relative to the reference genome published by Chen et al. (2020) (Table 2). After preprocessing, over 90% of the cleaned reads were retained for GBS data, resulting in 0.92 Gb, 0.45 Gb, and 0.79 Gb of sequences for ‘Alfaone (MS001)’, ‘MS002’, and ‘MS003’ samples, respectively. For WGS data, preprocessing refined the original data to a range of 81.28% to 81.67% cleaned reads, ultimately securing 6.36 to 8.32-fold whole-genome coverage data for ‘MSCB04’, ‘MSCB05’, ‘MSCB06’, and ‘Alfaking (MSCB07)’ samples, respectively.
2. Read mapping and SNP extraction
The cleaned GBS reads showed alignment rates to the reference genome of 92.54%, 91.53%, and 92.69% for ‘Alfaone (MS001)’, ‘MS002’, and ‘MS003’, respectively (Table 1). The cleaned WGS reads showed alignment rates to the reference genome of 98.93%, 99.20%, 99.08%, and 99.48% for ‘MSCB04’, ‘MSCB05’, ‘MSCB06’, and ‘Alfaking (MSCB07)’, respectively (Table 2).
While 1,762,346 SNPs were identified from ‘Alfaone (MS001)’, ‘MS002’, and ‘MS003’ based on GBS data, a fourfold higher total of 7,164,812 SNPs were extracted from ‘MSCB04’, ‘MSCB05’, ‘MSCB06’, and ‘Alfaking (MSCB07)’ using WGS data. A common intersection of 464,902 SNPs was also found between the two data types (WGS and GBS). In this experiment, GBS data selected only common genotypes from three repeated sequences, and since GBS data sequences only specific regions cut by restriction enzymes, the number of extracted SNPs was relatively smaller compared to WGS data based on the whole genome. Based on GBS data, 1,046, 586, and 791 SNPs were selected for ‘Alfaone (MS001)’, ‘MS002’, and ‘MS003’, respectively. From WGS data, 3,229,360, 3,228,988, 2,797,329, and 3,146,609 SNPs were identified for ‘MSCB04’, ‘MSCB05’, ‘MSCB06’, and ‘Alfaking (MSCB07)’, respectively (Table 3).
The extracted SNPs were aligned based on the reference genome positions to construct an SNP matrix, confirming 464,902 SNPs commonly found in WGS and GBS data (Table 4). Based on this, to select highly reliable distinguishing markers, 25,496 SNPs were filtered using criteria of MAF > 5% and Missing rate < 30%. Subsequently, an additional filtering of polymorphic SNPs yielded 17,982 SNPs. Finally, to facilitate downstream conversion into PCR-based markers such as qPCR and KASP assays, only SNPs with no additional sequence variation within 200 bp of the target site were retained, providing sufficient polymorphism-free flanking sequence for primer and probe design. Using this conservative filtering criterion, 1,716 ultimate SNPs were selected.
Using the final 1,716 SNP loci, the genetic relationships among the seven alfalfa lines and cultivars were evaluated by phylogenetic tree analysis and principal component analysis (PCA) (Fig. 1, 2). The first three principal components explained 44% of the total variation, with PC1, PC2, and PC3 accounting for 16%, 14%, and 14%, respectively. PCA based on the preselected SNP loci allowed relative separation among the analyzed lines and cultivars in three-dimensional space. In cultivated alfalfa, genetic differentiation among cultivars is often limited, whereas variation within cultivars can remain substantial due to its outcrossing autotetraploid nature and breeding history (Flajoulot et al., 2005;Malmberg et al., 2026). Therefore, the proportion of variance explained by the first three principal components was considered reasonable in the context of this study. In the phylogenetic tree, ‘MSCB06’ and ‘MSCB05’ were grouped as sister clades with a bootstrap support value of 100. Their close distribution in the 3D PCA also indicated high genetic similarity based on the SNP variation patterns used in this study.
3. Extraction of marker candidates for cultivar discrimination
In this study, to select SNPs that can distinguish the genetic differences between the 'Alfaone (MS001)' cultivar and other samples, SNP loci obtained from each sample were analyzed. This led to the selection of SNPs capable of distinguishing the 'Alfaone (MS001)' cultivar from the remaining six samples. A total of six barcode SNPs were selected. Using these six barcode SNPs, 'Alfaone (MS001)' could be differentiated from the six other cultivars, and the marker type used for 'Alfaone (MS001)' was designated as ‘aaabnn’ (Table 5). The average MAF of these SNPs was 19.6% (ranging from 8.2% to 35.1%), and the average Polymorphic Information Content (PIC) was 0.525 (ranging from 0.238 to 0.751) (Table 6). Since the PIC of 0.525 exceeds 0.5, it is considered a highly informative marker suitable for cultivar discrimination (Botstein et al., 1980).
Furthermore, two additional barcode SNPs were selected that could distinguish the ‘Alfaone (MS001)’ cultivar from the ‘MS002’ and ‘Alfaking (MSCB07)’ samples. The marker type used for ‘Alfaone (MS001)’ in this case was designated as ‘aa’ (Table 7). The average MAF for these SNPs was 15.3% (ranging from 5.3% to 37.0%), and the average PIC was 0.457 (ranging from 0.157 to 0.773) (Table 8). Although SNPs with lower MAF can still be useful for cultivar discrimination when they provide consistent diagnostic power for a specific cultivar, relatively higher MAF values may be advantageous for overall marker informativeness and can be considered preferentially when candidate SNPs show comparable discriminatory utility, particularly in genetically closely related cultivated alfalfa materials (Yang et al., 2022;Kanaka et al., 2023). Because a PIC value greater than 0.5 is generally regarded as indicating a highly informative marker, the use of SNP loci with relatively higher MAF and PIC values may allow cultivar discrimination with a minimal marker combination.
For instance, the combination of the SNP at position 63664495 (barcode 7) and the SNP at position 78242898 (barcode 8) both have MAFs within 30-50% and PICs greater than 0.5. Therefore, this combination can effectively distinguish ‘Alfaone (MS001)’ from ‘MS002’ and ‘Alfaking (MSCB07)’.
Ⅳ. CONCLUSIONS
This study analyzed genetic differences among alfalfa cultivars through NGS analysis and selected useful SNP markers for cultivar discrimination in seven cultivars, including ‘Alfaone (MS001)’. As a result, six barcode SNPs capable of distinguishing the ‘Alfaone (MS001)’ cultivar were selected, confirming that this marker set can differentiate ‘Alfaone (MS001)’ from the other six samples. Notably, it was found that ‘Alfaone (MS001)’ could be effectively distinguished from ‘MS002’ and ‘Alfaking (MSCB07)’ with just two SNPs. This valuable marker set for accurate identification of the ‘Alfaone (MS001)’ cultivar can serve as an important criterion for alfalfa cultivar identification. These findings can significantly contribute to the identification of domestic alfalfa cultivars and will play a crucial role in domestic cultivar protection and industry development. The barcode SNP set developed in this study can not only prevent cultivar admixture that may occur during the breeding process but also serve as an important tool for maintaining cultivar purity. Furthermore, the development of markers for alfalfa cultivar discrimination can contribute to the development of high-quality new cultivars and play an important role in preventing purity degradation due to cultivar mixing during distribution. This will enable the distribution and supply of high-quality alfalfa and contribute significantly to cultivar maintenance and management through systematic control at the genetic level.











