# Data Preparation for your own BAM/SAM file to run _MethylBERT_ ## Input requirements In order to run _MethylBERT_, these files are required: 1. Input bulk sample as a BAM/SAM file 2. Reference genome as a FASTA file 3. DMRs as a tab-separated .csv file 4. (Optional, in case you want to fine-tune the MethylBERT model with your data) Pure tumour and normal samples as BAM/SAM files #### 1. BAM/SAM File format _MethylBERT_ currently supports only [bismark](https://www.bioinformatics.babraham.ac.uk/projects/bismark/)-aligned samples where read-level methylation calls are given with `XM` tag. `XM` tage stores methylation calls as follows: - `x` : Unmethylated cytosine at CHH - `X` : Methylated cytosine at CHH - `h` : Unmethylated cytosine at CHG context - `H` : Methylated cytosine at CHG context - `z` : Unmethylated cytosine at CpG context - `Z` : Methylated cytosine at CpG context Each sequence read has its methylation call with `XM` tag like: ``` SRR5390326.sra.2060072_2060072_length=150 16 chr1 3000485 42 118M * 0 0 AATTTCAACTCTAAATTTAATTATTTCCTACTATCTACTCATCTTAAATAAATTTACTTCCTTTTATTCTAAAACTTCTAAATTTACTATCAAACTACTAATATATACTCTAATTTCC JA-FFJJJFJJJJJJJJJJJJJJFJJJJJJJJJFJJJJFJJFJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJFJJFJFJFJJJFJJJJJJJFJJAJJ