Journal Search Engine

ISSN : 2287-5824(Print)
ISSN : 2287-5832(Online)

Journal of The Korean Society of Grassland and Forage Science Vol.46 No.1 pp.38-48
DOI : https://doi.org/10.5333/KGFS.2026.46.1.38

NGS-Based Development of SNP Barcodes for Variety Identification of ‘Alfaone’ in Alfalfa

Chang-Woo Min¹, Bo Ram Choi¹, Yowook Song¹, Hyung Soo Park¹, Jun Gyeong Choi², Kang Yun Ju², Sang-Hoon Lee¹, Ki-Won Lee¹*

¹Forages Production Systems Division, National Institute of Animal Science, RDA, Cheonan 31000, Republic of Korea
²BIOTO, 187, Techno 2-ro, Yuseong-gu, Daejeon, Republic of Korea

^*Corresponding author: Ki-Won Lee, Forages Production Systems Division, National Institute of Animal Science, RDA, Cheonan 31000, Republic of Korea. Tel: +82-41-580-6757, Fax: +82-41-580-6779, E-mail: kiwon@korea.kr

Received March 12, 2026 Review March 25, 2026 Accepted March 26, 2026

Abstract

Molecular markers have been widely utilized in population genetics, diagnostic taxonomy, and genetic mapping, and can be applied to cultivar discrimination during field selection processes for alfalfa. In this study, whole-genome sequencing information was obtained for seven alfalfa lines and cultivars developed in Korea, including ‘Alfaone (MS001)’, using Next-Generation Sequencing (NGS). Single nucleotide polymorphism (SNP) analysis revealed that ‘Alfaone (MS001)’ could be distinguished from other lines and cultivars using six SNP loci. Specifically, only two SNP loci were sufficient to differentiate ‘Alfaone (MS001)’ from major lines and cultivars such as ‘MS002’ and ‘Alfaking (MSCB07)’. This set of SNP barcodes provides a reliable standard for alfalfa cultivar discrimination, contributing to domestic cultivar protection and the advancement of the Korea forage industry. Furthermore, the development of distinguishing markers across alfalfa cultivars will enhance genetic resource identification and support the breeding of high-quality new cultivars.

Key Words : Alfalfa , Variety , NGS , SNP marker

초록

키워드 :

This article has been cited by 0 article in crossref

Cited-By

Funding:

Ⅰ. INTRODUCTION

Alfalfa (Medicago sativa L.) is an autotetraploid (2n = 4x = 32) perennial forage legume with a relatively large genome, ranging from approximately 800 to 1000 Mb (Blondon et al., 1994;Shen et al., 2020). It is recognized as one of the most important forage crops globally due to its high protein content, excellent forage quality, high productivity with multiple harvests, and adaptability to various environmental conditions (Shi et al., 2017;Brumă et al., 2023). Furthermore, alfalfa plays a crucial role in sustainable livestock and cropping systems by contributing to maintenance of soil fertility through symbiotic nitrogen fixation with root nodule bacteria (Elgharably and Benes, 2021).

In Korea, the demand for high-quality forage has been increasing due to improvements in livestock breeding and feeding management techniques, leading to a rise in high-performing cattle (Youm, 1991;Kim, 2025). However, a significant portion of high-quality forage, such as alfalfa hay, relies on imports, making it vulnerable to price fluctuations and supply instability. Recent external factors like exchange rate volatility due to international affairs, shipping delays, and abnormal weather conditions in major producing countries hinder the stable supply of alfalfa, imposing a significant financial burden on livestock farms (Chang et al., 2024). Under these circumstances, the development of domestically bred alfalfa cultivars suited to Korea's growing environment is crucial for ensuring stable forage supply and expanding the foundation for self-sufficiency. The National Institute of Animal Science (NIAS) of the Rural Development Administration (RDA) has been promoting the development of alfalfa cultivars with superior domestic adaptability through crossbreeding and selection using both domestic and international genetic resources, which has resulted in the development of the 'Alfaone' cultivar (Lee et al., 2025).

With the increasing distribution of Korea-bred cultivars, there is a growing need for accurate cultivar identification techniques to support cultivar protection and verify the authenticity of commercially marketed seeds (Korir et al., 2012;Yu and Chung, 2021). Alfalfa is a cross-pollinated species with high heterozygosity, and its morphological traits are significantly influenced by cultivation environment and growing conditions, thus limiting stable cultivar differentiation based solely on phenotype (Annicchiarico et al., 2025;Medina et al., 2025). Especially, traits like emergence characteristics, plant type, and yield can vary depending on the experimental year and cultivation site, making it difficult to achieve sufficient discrimination using only morphological characteristics. Therefore, a molecular marker-based discrimination system utilizing DNA-level variations is necessary to enhance the accuracy of cultivar protection and quality control.

Molecular markers, which directly detect nucleotide sequence variation in the genome, have been widely used to assess genetic differences among cultivars in crop genetic analysis and breeding programs (Park et al., 2017;Zhang et al., 2023). In alfalfa, a genetically complex autotetraploid species, molecular marker development for cultivar discrimination has also been actively pursued. Early studies used simple sequence repeat (SSR) markers to evaluate genetic diversity and identify cultivars (Flajoulot et al., 2005), and subsequent studies applied single nucleotide polymorphism (SNP) and SSR markers to analyze genetic diversity and population structure and to identify cultivarspecific variants (Li et al., 2014;Qiang et al., 2015). Among the available marker systems, SNP markers are particularly suitable for alfalfa cultivar discrimination because they are abundant throughout the genome, compatible with automated high-throughput analysis platforms, and highly reproducible across laboratories (Semagn et al., 2014;Thomson, 2014). In addition, next-generation sequencing (NGS) approaches, such as whole-genome resequencing and genotyping-by-sequencing (GBS), enable efficient discovery of polymorphisms among cultivars and facilitate the selection of candidate SNPs for marker development (Mammadov et al., 2012).

Therefore, this study aimed to develop SNP molecular markers that can accurately identify the domestically bred alfalfa cultivar 'Alfaone'. To achieve this, WGS-based data and GBS-based data were comparatively analyzed to select effective SNPs for ‘Alfaone’ discrimination, and the potential of the selected markers for cultivar differentiation was evaluated. The SNP markers developed in this study are expected to provide foundational data that can be utilized for the protection of 'Alfaone', verification of purity in distributed seeds, and enhancement of the efficiency of future domestic alfalfa breeding programs.

Ⅱ. MATERIALS AND METHODS

1. Plant materials

This study analyzed alfalfa lines and cultivars developed by the National Institute of Animal Science, Rural Development Administration in Korea. The plant materials included ‘Alfaone (MS001)’, ‘MS002’, ‘MS003’, ‘MSCB04’, ‘MSCB05’, ‘MSCB06’, and ‘Alfaking (MSCB07)’. All plant materials were grown for three weeks, after which young leaves were collected from each plant. To preserve DNA integrity, the collected leaves were immediately flash-frozen in liquid nitrogen. Frozen leaf samples were stored at -70°C in an ultra-low temperature freezer until genomic DNA extraction.

2. DNA extraction and sequencing data production

DNA was extracted from the leaf tissues of each cultivar using the CTAB method. To verify the quality of the extracted DNA, 3 μl of DNA was subjected to electrophoresis at 200V for 25 minutes, and a high-quality band was confirmed using a 1kb+ ladder.

For ‘Alfaone (MS001)’, ‘MS002’, and ‘MS003’ samples, sequencing was performed via Genotyping-by-sequencing (GBS), where DNA was fragmented using the ApeKI restriction enzyme. Barcode adapters and common adapters were diluted in 50 μM TE buffer (10mM Tris, 0.1mM EDTA), and then mixed with barcode and common adapters using 10x adapter buffer (500mM NaCl, 100mM Tris-Cl). Multiplexing PCR products were verified by agarose gel electrophoresis, purified using a QIAquick PCR Purification Kit, and then amplified restriction fragments were used to construct NGS libraries. For ‘MSCB04’, ‘MSCB05’, ‘MSCB06’, and ‘Alfaking (MSCB07)’ samples, 100 μg of gDNA was fragmented for Whole-genome-sequencing (WGS) library preparation, treated at 32°C for 15 minutes, and then fragmentation enzymes were inactivated at 65°C for 30 minutes. Adapter ligation was performed at 20°C for 15 minutes, followed by purification with a QIAquick PCR Purification Kit. PCR was conducted for 8 cycles using a GeneExplorer instrument from BIOER, and the band quality was assessed using a 100bp ladder before NGS library construction.

Finally, sequencing of the prepared libraries was performed at 2x150bp using the Illumina Hiseq X platform.

3. Sequencing data preprocessing and SNP analysis

The raw sequencing data were preprocessed using two programs: SolexaQA (version 1.13) and Trimmomatic (version 0.39) (Cox et al., 2010;Bolger et al., 2014). First, low-quality sequences with a phred score below 20 and reads shorter than 25bp were removed using DynamicTrim and LengthSort in SolexaQA to extract cleaned reads (Cox et al., 2010). Additional removal of sequences with phred quality scores below 20 was performed using Trimmomatic to obtain the final set of cleaned reads (Bolger et al., 2014). GBS data were de-multiplexed based on barcode sequences, after which barcode sequences were removed using cutadapt (version 1.8.3), and preprocessing was completed with Trimmomatic (Martin, 2011). The preprocessed cleaned reads were mapped to the Medicago sativa L. reference genome using BWA-MEM (version 0.7.17) (Li, 2013). The reference genome used was the 3.15 Gbp sequence composed of 32 chromosomes and 9,789 unscaffolded contigs, as reported by Chen et al. (2020).

SNP extraction was performed by analyzing differences with the reference genome using SAMTools (version 0.1.16), and the varFilter program within SAMTools was used for filtering (Li et al., 2009). During the SNP filtering process, key options such as a minimum mapping quality (-Q) of 30, a minimum read depth (-d) of 3, and a maximum read depth (-D) of 611 were applied. Additional settings were used to filter out nearby SNPs or variations near In/Dels. Extracted SNPs were classified as homozygous if the read rate was 90% or higher, and heterozygous if it was between 40-60%. Finally, SNP information from samples produced by GBS and WGS was aligned based on their positions in the reference genome to create an SNP matrix. Filtering criteria included a minor allele frequency (MAF) > 5%, Missing rate < 30%, and REF/ALT frequency > 25% for SNP selection.

SNP loci were annotated using the genome feature annotation tracks associated with the Medicago sativa L. reference assembly reported by Chen et al. (2020), which comprises 32 chromosomes and 9,789 unscaffolded contigs. Each SNP was represented as "a" for the reference allele, "b" for the alternative allele, "h" for heterozygous, and "-" for missing data. Candidate loci were retained where consistent homozygous types were observed across all ‘Alfaone (MS001)’ individuals, while polymorphism was observed in other cultivars. From these candidates, the minimum number of SNPs required to uniquely identify ‘Alfaone (MS001)’ was determined, and the selected SNPs were designated as cultivar-specific markers.

4. Phylogenetic tree and principal component analysis (PCA)

In this study, a phylogenetic tree analysis was performed to evaluate the genetic relationships among alfalfa cultivars. The phylogenetic tree was constructed using the neighbor-joining (NJ) method (Saitou and Nei, 1987) implemented in MEGA11 (Tamura et al., 2021). Genetic distance information calculated from 25,496 SNP loci selected through the filtering process was used for the analysis. To assess the reliability of the inferred tree, bootstrap analysis was conducted with 1,000 replicates, and the values shown at each node indicate the proportion of replicate trees in which the corresponding cluster was observed. Genetic distances among samples were calculated using the maximum composite likelihood method (Tamura et al., 2004), and branch lengths were expressed in the same units as those of the genetic distances. Based on these results, the genetic relationships and population structure among alfalfa cultivars were evaluated, and the potential for cultivar discrimination was examined.

PCA was also performed to assess the genetic relationships among cultivars. PCA was conducted using the SNPRelate package (Zheng et al., 2012) in R, based on the same set of 25,496 SNP loci selected after filtering. Genotype data from each sample were used to derive principal components, and the contribution of each principal component to the total variation was estimated by eigen decomposition. The resulting principal component scores were used to visualize the genetic similarity and population structure among samples, thereby allowing evaluation of the genetic differentiation among alfalfa cultivars.

Ⅲ. RESULTS AND DISCUSSION

1. Sequencing data production and preprocessing

For the GBS method, ‘Alfaone (MS001)’, ‘MS002’, and ‘MS003’ samples were sequenced in triplicate, and the integrated data yielded 1.30 Gb, 0.63 Gb, and 1.10 Gb, respectively (Table 1). Through WGS, 32.7 Gb, 39.2 Gb, 31.0 Gb, and 32.5 Gb of sequence data were produced from ‘MSCB04’, ‘MSCB05’,‘MSCB06’, and ‘Alfaking (MSCB07)’ samples, respectively, corresponding to 9.70 to 12.25-fold coverage relative to the reference genome published by Chen et al. (2020) (Table 2). After preprocessing, over 90% of the cleaned reads were retained for GBS data, resulting in 0.92 Gb, 0.45 Gb, and 0.79 Gb of sequences for ‘Alfaone (MS001)’, ‘MS002’, and ‘MS003’ samples, respectively. For WGS data, preprocessing refined the original data to a range of 81.28% to 81.67% cleaned reads, ultimately securing 6.36 to 8.32-fold whole-genome coverage data for ‘MSCB04’, ‘MSCB05’, ‘MSCB06’, and ‘Alfaking (MSCB07)’ samples, respectively.

2. Read mapping and SNP extraction

The cleaned GBS reads showed alignment rates to the reference genome of 92.54%, 91.53%, and 92.69% for ‘Alfaone (MS001)’, ‘MS002’, and ‘MS003’, respectively (Table 1). The cleaned WGS reads showed alignment rates to the reference genome of 98.93%, 99.20%, 99.08%, and 99.48% for ‘MSCB04’, ‘MSCB05’, ‘MSCB06’, and ‘Alfaking (MSCB07)’, respectively (Table 2).

While 1,762,346 SNPs were identified from ‘Alfaone (MS001)’, ‘MS002’, and ‘MS003’ based on GBS data, a fourfold higher total of 7,164,812 SNPs were extracted from ‘MSCB04’, ‘MSCB05’, ‘MSCB06’, and ‘Alfaking (MSCB07)’ using WGS data. A common intersection of 464,902 SNPs was also found between the two data types (WGS and GBS). In this experiment, GBS data selected only common genotypes from three repeated sequences, and since GBS data sequences only specific regions cut by restriction enzymes, the number of extracted SNPs was relatively smaller compared to WGS data based on the whole genome. Based on GBS data, 1,046, 586, and 791 SNPs were selected for ‘Alfaone (MS001)’, ‘MS002’, and ‘MS003’, respectively. From WGS data, 3,229,360, 3,228,988, 2,797,329, and 3,146,609 SNPs were identified for ‘MSCB04’, ‘MSCB05’, ‘MSCB06’, and ‘Alfaking (MSCB07)’, respectively (Table 3).

The extracted SNPs were aligned based on the reference genome positions to construct an SNP matrix, confirming 464,902 SNPs commonly found in WGS and GBS data (Table 4). Based on this, to select highly reliable distinguishing markers, 25,496 SNPs were filtered using criteria of MAF > 5% and Missing rate < 30%. Subsequently, an additional filtering of polymorphic SNPs yielded 17,982 SNPs. Finally, to facilitate downstream conversion into PCR-based markers such as qPCR and KASP assays, only SNPs with no additional sequence variation within 200 bp of the target site were retained, providing sufficient polymorphism-free flanking sequence for primer and probe design. Using this conservative filtering criterion, 1,716 ultimate SNPs were selected.

Using the final 1,716 SNP loci, the genetic relationships among the seven alfalfa lines and cultivars were evaluated by phylogenetic tree analysis and principal component analysis (PCA) (Fig. 1, 2). The first three principal components explained 44% of the total variation, with PC1, PC2, and PC3 accounting for 16%, 14%, and 14%, respectively. PCA based on the preselected SNP loci allowed relative separation among the analyzed lines and cultivars in three-dimensional space. In cultivated alfalfa, genetic differentiation among cultivars is often limited, whereas variation within cultivars can remain substantial due to its outcrossing autotetraploid nature and breeding history (Flajoulot et al., 2005;Malmberg et al., 2026). Therefore, the proportion of variance explained by the first three principal components was considered reasonable in the context of this study. In the phylogenetic tree, ‘MSCB06’ and ‘MSCB05’ were grouped as sister clades with a bootstrap support value of 100. Their close distribution in the 3D PCA also indicated high genetic similarity based on the SNP variation patterns used in this study.

3. Extraction of marker candidates for cultivar discrimination

In this study, to select SNPs that can distinguish the genetic differences between the 'Alfaone (MS001)' cultivar and other samples, SNP loci obtained from each sample were analyzed. This led to the selection of SNPs capable of distinguishing the 'Alfaone (MS001)' cultivar from the remaining six samples. A total of six barcode SNPs were selected. Using these six barcode SNPs, 'Alfaone (MS001)' could be differentiated from the six other cultivars, and the marker type used for 'Alfaone (MS001)' was designated as ‘aaabnn’ (Table 5). The average MAF of these SNPs was 19.6% (ranging from 8.2% to 35.1%), and the average Polymorphic Information Content (PIC) was 0.525 (ranging from 0.238 to 0.751) (Table 6). Since the PIC of 0.525 exceeds 0.5, it is considered a highly informative marker suitable for cultivar discrimination (Botstein et al., 1980).

Furthermore, two additional barcode SNPs were selected that could distinguish the ‘Alfaone (MS001)’ cultivar from the ‘MS002’ and ‘Alfaking (MSCB07)’ samples. The marker type used for ‘Alfaone (MS001)’ in this case was designated as ‘aa’ (Table 7). The average MAF for these SNPs was 15.3% (ranging from 5.3% to 37.0%), and the average PIC was 0.457 (ranging from 0.157 to 0.773) (Table 8). Although SNPs with lower MAF can still be useful for cultivar discrimination when they provide consistent diagnostic power for a specific cultivar, relatively higher MAF values may be advantageous for overall marker informativeness and can be considered preferentially when candidate SNPs show comparable discriminatory utility, particularly in genetically closely related cultivated alfalfa materials (Yang et al., 2022;Kanaka et al., 2023). Because a PIC value greater than 0.5 is generally regarded as indicating a highly informative marker, the use of SNP loci with relatively higher MAF and PIC values may allow cultivar discrimination with a minimal marker combination.

For instance, the combination of the SNP at position 63664495 (barcode 7) and the SNP at position 78242898 (barcode 8) both have MAFs within 30-50% and PICs greater than 0.5. Therefore, this combination can effectively distinguish ‘Alfaone (MS001)’ from ‘MS002’ and ‘Alfaking (MSCB07)’.

Ⅳ. CONCLUSIONS

This study analyzed genetic differences among alfalfa cultivars through NGS analysis and selected useful SNP markers for cultivar discrimination in seven cultivars, including ‘Alfaone (MS001)’. As a result, six barcode SNPs capable of distinguishing the ‘Alfaone (MS001)’ cultivar were selected, confirming that this marker set can differentiate ‘Alfaone (MS001)’ from the other six samples. Notably, it was found that ‘Alfaone (MS001)’ could be effectively distinguished from ‘MS002’ and ‘Alfaking (MSCB07)’ with just two SNPs. This valuable marker set for accurate identification of the ‘Alfaone (MS001)’ cultivar can serve as an important criterion for alfalfa cultivar identification. These findings can significantly contribute to the identification of domestic alfalfa cultivars and will play a crucial role in domestic cultivar protection and industry development. The barcode SNP set developed in this study can not only prevent cultivar admixture that may occur during the breeding process but also serve as an important tool for maintaining cultivar purity. Furthermore, the development of markers for alfalfa cultivar discrimination can contribute to the development of high-quality new cultivars and play an important role in preventing purity degradation due to cultivar mixing during distribution. This will enable the distribution and supply of high-quality alfalfa and contribute significantly to cultivar maintenance and management through systematic control at the genetic level.

Ⅴ. ACKNOWLEDGEMENTS

This study was supported by the Cooperative Research Program for Agriculture Science & Technology Development (Project No. PJ01787801) and the 2026 Postdoctoral Fellowship Program of the National Institute of Animal Science funded by RDA, Republic of Korea.

Figure

Fig. 1.

Phylogenetic tree analysis of alfalfa Varieties.

Fig. 2.

PCA analysis of alfalfa Varieties

Table

Table 1.Statistic value after pre-processing generated by genotyping-by-sequencing of each 3 replication of ‘Alfaone (MS001)’, ‘MS002’, and ‘MS003’

¹Clean reads/Raw reads (%): (Total number of clean reads / Total number of raw reads) × 1001

List	MS001	MS002	MS003	Average	Total
Number of raw reads	2,869,441	8,608,322	1,396,277	4,188,832	2,438,965	7,316,896	2,234,894	20,114,050
Total length (bp) of raw reads	433,285,541	1,299,856,622	210,837,877	632,513,632	368,283,765	1,104,851,296	337,469,061	3,037,221,550
Number of clean reads	2,587,392	7,762,176	1,262,674	3,788,022	2,205,578	6,616,734	2,018,548	18,166,932
Total length (bp) of clean reads	307,072,160	921,216,480	151,106,977	453,320,931	264,049,819	792,149,456	240,742,985	2,166,686,867
Average length (bp) of clean reads	119	-	120	-	120	-	119	-
Clean reads/ Raw reads (%)1	90.25%	-	90.52%	-	90.53%	-	90.43%	-
Number of clean reads aligned to the reference genome	-	7,183,123	-	3,467,045	-	6,133,378	-	16,783,546
Alignment rate	-	92.54%	-	91.53%	-	92.69%	-	92.39%

Table 2.Statistic value after pre-processing generated by whole genome sequencing of ‘MSCB04’, ‘MSCB05’, ‘MSCB06’, ‘Alfaking (MSCB07)’

¹Clean reads/Raw reads (%): (Total number of clean reads / Total number of raw reads) × 100

List	MSCB04	MSCB05	MSCB06	MSCB07	Average	Total
Number of raw reads	216,783,792	259,513,938	205,471,674	215,280,590	224,262,499	897,049,994
Total length (bp) of raw reads	32,734,352,592	39,186,604,638	31,026,222,774	32,507,369,090	33,863,637,274	135,454,549,094
Number of clean reads	176,600,138	211,941,880	167,600,038	174,976,686	182,779,686	731,118,742
Total length (bp) of clean reads	22,019,410,297	26,610,538,891	20,362,229,062	21,449,780,233	22,610,489,621	90,441,958,483
Average length (bp) of clean reads	125	126	121	123	124	-
Clean reads/Raw reads (%)1	81.46%	81.67%	81.57%	81.28%	81.50%	-
Number of clean reads aligned to the reference genome	174,709,188	210,254,996	166,054,802	174,064,040	181,270,757	725,083,026
Alignment rate	98.93%	99.20%	99.08%	99.48%	99.17%	99.17%

Table 3.SNP matrix record and filter of statistic value from genotyping-by-sequencing

¹ Average: mean all SNPs of each alfalfa variety

² Total: sum all SNPs of each alfalfa variety.

Alfalfa variety	Filter level
MS001-1	83	75	1	7
MS001-2	495	340	54	101
MS001-3	468	312	47	109
MS002-1	255	200	16	39
MS002-2	227	180	18	29
MS002-3	104	93	1	10
MS003-1	477	302	65	110
MS003-2	143	127	7	9
MS003-3	171	146	6	19
MSCB04	3,229,360	1,350,589	726,504	1,152,267
MSCB05	3,228,988	1,427,418	665,896	1,135,674
MSCB06	2,797,329	1,298,223	580,923	918,183
MSCB07	3,146,609	1,203,346	747,359	1,195,904
Average1	954,208	406,258	209,300	338,651
Total2	12,404,709	5,281,351	2,720,897	4,402,461

Table 4.SNP matrix record and filter of statistic value

¹MAF (minor allele frequency) > 5%: SNP loci with a MAF greater than 5% across all samples were retained.

²Missing rate < 30%: SNP loci with a missing rate lower than 30% across all samples were retained.

Filter level	Number of SNP per grade
Total SNP (9 samples from GBS)	1,762,346
Total SNP (4 samples from WGS)	7,164,812
Same locus from GBS and WGS	464,902
MAF (minor allele frequency)1>5%	345,721
Missing2<30%	44,343
MAF >5% & Missing <30%	25,496

Table 5.Locus for biomarker to identify each alfalfa varieties

a, same nucleotide with reference genome, b, different nucleotide with reference genome, n, deleted.

^*, Alfaone; ^**, Alfaking.

Barcode number	1	2	3	4	5	6
MS001*	a	a	a	b	n	n
MS002	a	b	n	n	a	a
MS003	a	b	a	a	a	b
MSCB04	b	b	a	n	b	a
MSCB05	a	b	a	a	b	b
MSCB06	a	a	b	a	b	a
MSCB07**	a	a	a	a	n	n

Table 6.Nucleotide, MAF and PIC information about 12 SNPs for identification of 7 alfalfa varieties

Barcode No.	Id	POS	REF	ALT	Missing rate (%)	MAF (%)	PIC
1	chr4.2	52314549	G	C	18.0	11.0	0.344
2	chr5.3	79656557	T	A	8.0	23.9	0.500
3	chr1.2	3874422	C	G	27.0	8.2	0.238
chr8.4	39041413	G	C	8.0	9.8	0.434
4	chr2.3	43863109	A	T	11.0	34.8	0.664
chr7.2	16052654	A	T	26.0	35.1	0.456
5	chr3.3	21630583	T	A	21.0	11.4	0.592
chr3.3	21630636	T	A	20.0	11.3	0.744
chr4.4	31947260	A	T	10.0	24.4	0.751
chr6.2	17166719	G	C	13.0	14.9	0.376
6	chr6.1	51505497	A	T	11.0	33.7	0.626
chr7.3	26612354	A	T	23.0	16.9	0.578

Table 7.Locus for biomarker to identify each alfalfa varieties

a, same nucleotide with reference genome; b, different nucleotide with reference genome.

^*, Alfaone; ^**, Alfaking.

Barcode number	7	8
MS001*	a	a
MS002	a	b
MSCB07**	b	a

Table 8.Nucleotide, MAF and PIC information about 51 SNPs for identification of 3 alfalfa varieties

Barcode No.	Id	POS	REF	ALT	Missing rate (%)	MAF (%)	PIC
7	chr1.2	41631359	C	T	15.0	17.6	0.457
chr1.4	39489702	A	G	4.0	28.1	0.758
chr1.4	48807658	G	A	25.0	5.3	0.170
chr1.4	63664495	G	A	15.0	30.6	0.669
chr2.2	9070567	C	T	8.0	10.9	0.567
chr2.4	34814709	C	T	2.0	7.1	0.434
chr3.1	68701753	T	C	7.0	10.8	0.425
chr3.4	17841554	C	T	18.0	8.5	0.211
chr4.1	19797130	G	A	12.0	8.0	0.297
chr4.1	80356121	C	T	20.0	6.3	0.157
chr4.1	80356185	A	G	15.0	5.9	0.333
chr4.3	42359120	T	G	20.0	26.3	0.521
chr4.3	84465117	T	C	13.0	12.6	0.509
chr4.3	84465172	G	A	17.0	18.1	0.482
chr4.3	85312793	G	A	26.0	10.8	0.223
chr4.4	22885029	A	T	12.0	8.0	0.258
chr5.1	79240838	G	C	6.0	13.8	0.638
chr5.3	18124303	G	T	17.0	6.0	0.259
chr5.3	32747507	G	A	9.0	8.8	0.303
chr5.3	32747531	C	T	9.0	8.8	0.425
chr6.3	62101338	T	C	23.0	16.9	0.452
chr7.1	73832277	T	C	18.0	8.5	0.394
chr8.1	24868820	A	C	22.0	9.0	0.286
chr8.1	45525191	A	G	3.0	9.3	0.535
chr8.2	66447346	G	T	27.0	9.6	0.370
chr8.2	66447385	C	T	28.0	9.7	0.329
8	chr1.4	13393988	C	T	9.0	27.5	0.773
chr2.1	2458318	C	T	8.0	5.4	0.274
chr2.4	22955433	C	T	8.0	8.7	0.454
chr3.2	68197827	A	T	14.0	10.5	0.293
chr3.3	50563233	T	C	22.0	21.8	0.650
chr3.3	86469339	A	C	13.0	17.2	0.562
chr3.4	55965489	G	A	26.0	16.2	0.305
chr4.1	10968750	G	T	17.0	27.7	0.694
chr4.1	78242898	G	A	25.0	33.3	0.558
chr4.2	21029791	A	C	27.0	37.0	0.662
chr4.3	36504046	T	C	17.0	26.5	0.624
chr5.1	13476469	T	C	13.0	9.2	0.256
chr5.1	17444131	T	C	3.0	12.4	0.508
chr5.2	65899739	A	C	1.0	17.2	0.771
chr5.2	65899835	T	C	1.0	15.2	0.553
chr5.3	79656557	T	A	8.0	23.9	0.500
chr5.3	79656591	C	T	6.0	13.8	0.558
chr5.3	79656611	T	A	6.0	13.8	0.625
chr5.4	61279014	T	C	11.0	32.6	0.526
chr6.3	48154551	C	T	19.0	9.9	0.412
chr6.4	37027211	C	A	4.0	6.3	0.429
chr7.1	84359745	A	C	25.0	32.0	0.588
chr7.3	49764384	C	T	7.0	8.6	0.367
chr7.3	49764453	C	T	7.0	9.7	0.179
chr8.1	58972505	T	A	3.0	11.3	0.704

Reference

Annicchiarico, P., Franguelli, N., Ferrari, B., Campanella, G., Gualanduzzi, S., Crosta, M., Delogu, C., Spataro, G. and Nazzicari, N. 2025. Molecular markers enhance substantially the distinctness of alfalfa varieties for registration and protection. Plant Genome. 18(1):e20556.
Blondon, F., Marie, D., Brown, S. and Kondorosi, A. 1994. Genome size and base composition in Medicago sativa and M. truncatula species. Genome. 37(2):264-270.
Bolger, A.M., Lohse, M. and Usadel, B. 2014. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics. 30(15):2114-2120.
Botstein, D., White, R.L., Skolnick, M. and Davis, R.W. 1980. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. The American Journal of Human Genetics. 32(3):314-331.
Brumă, I.S., Toader, M., Popescu, G., Petcu, V. and Georgescu, E. 2023. The evolution of alfalfa, as important crop in organic farming system in Romania. Romanian Agricultural Research. 40:297-306.
Chang, H.K., Youn, G.Y. and Choi, S.C. 2024. A study on the quality recognition and attributes of forage for domestic forage. Journal of the Korea Academia-Industrial Cooperation Society. 25(5):329-340.
Chen, H., Zeng, Y., Yang, Y., Huang, L., Tang, B., Zhang, H., Hao, F., Liu, W., Li, Y., Liu, Y., Zhang, X., Zhang, R., Zhang, Y., Li, Y., Wang, K., He, H., Wang, Z., Fan, G., Yang, H., Bao, A., Shang, Z., Chen, J., Wang, W. and Qiu, Q. 2020. Allele-aware chromosome-level genome assembly and efficient transgene-free genome editing for the autotetraploid cultivated alfalfa. Nature Communications. 11:2494.
Cox, M.P., Peterson, D.A. and Biggs, P.J. 2010. SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics. 11:485.
Elgharably, A. and Benes, S. 2021. Alfalfa biomass yield and nitrogen fixation in response to applied mineral nitrogen under saline soil conditions. Journal of Soil Science and Plant Nutrition. 21(1):744-755.
Flajoulot, S., Ronfort, J., Baudouin, P., Barre, P., Huguet, T., Huyghe, C. and Julier, B. 2005. Genetic diversity among alfalfa (Medicago sativa) cultivars coming from a breeding program, using SSR markers. Theoretical and Applied Genetics. 111:1420-1429.
Kanaka, K.K., Sukhija, N., Goli, R.C., Singh, S., Ganguly, I., Dixit, S.P., Dash, A. and Malik, A.A. 2023. On the concepts and measures of diversity in the genomics era. Current Plant Biology. 33:100278.
Kim, D.H. 2025. The importance of feeding high-quality forage to growing cattle. Wolgan Hanwoo. 305:190-193. (in Korean)
Korir, N.K., Han, J., Shangguan, L., Wang, C., Kayesh, E., Zhang, Y. and Fang, J. 2012. Plant variety and cultivar identification: Advances and prospects. Critical Reviews in Biotechnology. 33(2):111-125.
Lee, K.W., Min, C.W., Choi, B.R., Park, H.S., Song, Y. and Lee, S.H. 2025. Development and evaluation of a new alfalfa cultivar, ‘Alfaone’. Journal of the Korean Society of Grassland and Forage Science. 45(3):199-205.
Li, H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint. arXiv:1303.3997.
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R. and 1000 Genome Project Data Processing Subgroup. 2009. The sequence alignment/map format and SAMtools. Bioinformatics. 25(16):2078-2079.
Li, X., Wei, Y., Acharya, A., Jiang, Q., Kang, J. and Brummer, E.C. 2014. A saturated genetic linkage map of autotetraploid alfalfa (Medicago sativa L.) developed using genotyping-by-sequencing is highly syntenous with the Medicago truncatula genome. G3: Genes|Genomes|Genetics. 4(10):1971-1979.
Malmberg, M.M., Suraweera, D.D., Baillie, R.C., Smith, K.F. and Cogan, N.O.I. 2026. Genomic resources for Australian alfalfa (Medicago sativa L.) genomics: Reformatted reference genome, annotated variants, gene presence-absence and diversity analysis from genome re-sequencing. BMC Plant Biology. 26:102.
Mammadov, J., Aggarwal, R., Buyyarapu, R. and Kumpatla, S. 2012. SNP markers and their impact on plant breeding. International Journal of Plant Genomics. 2012(1):728398.
Martin, M. 2011. Cutadapt removes adapter sequences from high- throughput sequencing reads. EMBnet. Journal. 17(1):10-12.
Medina, C.A., Zhao, D., Lin, M., Sapkota, M., Sandercock, A.M., Beil, C.T., Sheehan, M.J., Irish, B.M., Yu, L.X., Poudel, H., Claessens, A., Moore, V., Crawford, J., Hansen, J., Viands, D., Peel, M.D., Tilhou, N., Riday, H., Brummer, E.C. and Xu, Z. 2025. Pre-breeding in alfalfa germplasm develops highly differentiated populations, as revealed by genome-wide microhaplotype markers. Scientific Reports. 15:1253.
Park, S.I., Hwangbo, K., Gil, J., Chung, H., Kim, H.B., Kim, O.T., Kim, S.C., Koo, S.C., Um, Y. and Lee, Y. 2017. Determination of the origin of Angelica roots using Angelica gigas chloroplast based SSR markers. Korean Journal of Medicinal Crop Science. 25(6):361-366.
Qiang, H., Chen, Z., Zhang, Z., Wang, X., Gao, H. and Wang, Z. 2015. Molecular diversity and population structure of a worldwide collection of cultivated tetraploid alfalfa (Medicago sativa subsp. sativa L.) germplasm as revealed by microsatellite markers. PLoS ONE. 10(4):e0124592.
Saitou, N. and Nei, M. 1987. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution. 4:406-425.
Semagn, K., Babu, R., Hearne, S. and Olsen, M. 2014. Single nucleotide polymorphism genotyping using Kompetitive Allele Specific PCR (KASP): Overview of the technology and its application in crop improvement. Molecular Breeding. 33:1-14.
Shen, C., Du, H., Chen, Z., Lu, H., Zhu, F., Chen, H., Meng, X., Liu, Q., Liu, P., Zheng, L., Li, X., Dong, J., Liang, C. and Wang, T. 2020. The chromosome-level genome sequence of the autotetraploid alfalfa and resequencing of core germplasms provide genomic resources for alfalfa research. Molecular Plant. 13(9):1250-1261.
Shi, S., Nan, L. and Smith, K.F. 2017. The current status, problems, and prospects of alfalfa (Medicago sativa L.) breeding in China. Agronomy. 7(1):1.
Tamura, K., Nei, M. and Kumar, S. 2004. Prospects for inferring very large phylogenies by using the neighbor-joining method. Proceedings of the National Academy of Sciences. 101(30):11030-11035.
Tamura, K., Stecher, G. and Kumar, S. 2021. MEGA11: Molecular evolutionary genetics analysis version 11. Molecular Biology and Evolution. 38(7):3022-3027.
Thomson, M.J. 2014. High-throughput SNP genotyping to accelerate crop improvement. Plant Breeding and Biotechnology. 2:195-212.
Yang, Y., Lyu, M., Liu, J., Wu, J., Wang, Q., Xie, T., Li, H., Chen, R., Sun, D., Yang, Y. and Yao, X. 2022. Construction of an SNP fingerprinting database and population genetic analysis of 329 cauliflower cultivars. BMC Plant Biology. 22:522.
Youm, W.H. 1991. Utilization of alfalfa feed. Wolgan Naknongyukwoo. 113:76-79. (in Korean)
Yu, J.K. and Chung, Y.S. 2021. Plant variety protection: Current practices and insights. Genes. 12(8):1127.
Zhang, J., Yang, J., Lv, Y., Zhang, X., Xia, C., Zhao, H. and Wen, C. 2023. Genetic diversity analysis and variety identification using SSR and SNP markers in melon. BMC Plant Biology. 23:39.
Zheng, X., Levine, D., Shen, J., Gogarten, S., Laurie, C. and Weir, B. 2012. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 28(24):3326-3328.

Alfalfa variety	Filter level
Alfalfa variety	Total SNPs	Homozygous SNPs	Heterozygous SNPs	Other SNPs
MS001-1	83	75	1	7
MS001-2	495	340	54	101
MS001-3	468	312	47	109
MS002-1	255	200	16	39
MS002-2	227	180	18	29
MS002-3	104	93	1	10
MS003-1	477	302	65	110
MS003-2	143	127	7	9
MS003-3	171	146	6	19
MSCB04	3,229,360	1,350,589	726,504	1,152,267
MSCB05	3,228,988	1,427,418	665,896	1,135,674
MSCB06	2,797,329	1,298,223	580,923	918,183
MSCB07	3,146,609	1,203,346	747,359	1,195,904
Average1	954,208	406,258	209,300	338,651
Total2	12,404,709	5,281,351	2,720,897	4,402,461

Barcode number	1	2	3	4	5	6
Number of locus for marker	1	1	2	2	4	4
MS001*	a	a	a	b	n	n
MS002	a	b	n	n	a	a
MS003	a	b	a	a	a	b
MSCB04	b	b	a	n	b	a
MSCB05	a	b	a	a	b	b
MSCB06	a	a	b	a	b	a
MSCB07**	a	a	a	a	n	n