# README


## The content of the directory
```
data/
├── Marpol_135_accessions
│             ├── Marpol_135_accessions.txt
│             ├── Marpol_135_accessions.vcf.gz
│             └── Marpol_135_accessions.vcf.gz.tbi
├── Marpolmon_16_accessions
│             ├── Marpolmon_16_accessions.txt
│             ├── Marpolmon_16_accessions.vcf.gz
│             ├── Marpolmon_16_accessions.vcf.gz.tbi
│             ├── genomes.tar.gz
│             ├── GFFs.tar.gz
│             └── peptides.tar.gz
├── Marpolpol_13_accessions
│             ├── Marpolpol_13_accessions.txt
│             ├── Marpolpol_13_accessions.vcf.gz
│             ├── Marpolpol_13_accessions.vcf.gz.tbi
│             ├── genomes.tar.gz
│             ├── GFFs.tar.gz
│             └── peptides.tar.gz
├── Marpolrud_104_accessions
│             ├── Marpolrud_104_accessions.vcf.gz
│             ├── Marpolrud_104_accessions.vcf.gz.tbi
│             ├── Marpolrud_104_accessions.txt
│             ├── genomes.tar.gz
│             ├── GFFs.tar.gz
│             └── peptides.tar.gz
├── reference_genome
│             ├── MpTak_v6.1.genome.fasta
│             ├── MpTak_v6.1.genome.fasta.fai
│             ├── MpTak_v6.1r2.gff.gz
│             └── MpTak_v6.1r2.gff.gz.tbi
└── VCFs
    ├── Aberdeen.vcf.gz
    ├── Aberdeen.vcf.gz.tbi
    ├── Ale-A.vcf.gz
    ...
```

- __Marpol_135_accessions__
	- Marpol_135_accessions.txt: List of accessions included in the VCF file
    - Marpol_135_accessions.vcf.gz: Multi-sample VCF file (bgzipped) for all 3 subspecies of Marchantia polymorpha
    - Marpol_135_accessions.vcf.gz.tbi: The index file generated by tabix for the above VCF file .
    
- __Marpolmon_16_accessions__
	- Marpolmon_16_accessions.txt: List of accessions included in the VCF file
    - Marpolmon_16_accessions.vcf.gz: Multi-sample VCF files (bgzipped) for 16 accessions from Marchantia polymorpha subsp. montivagans.
    - Marpolmon_16_accessions.vcf.gz.tbi: The tabix index file for the VCF file above
    - genomes.tar.gz: Genomic FASTA files for subsp. montivagans 16 accessions, containing contigs generated by the de novo assembly of short reads
    - GFFs.tar.gz: Gene annotation in GFF format for subsp. montivagans 16 accessions.
    - peptides.tar.gz: Protein FASTA files for subsp. montivagans 16 accessions, generated from the genomic FASTA file and the GFF file.

- __Marpolpol_13_accessions__
    Same as above. For 13 accessions from M. polymorpha subsp. polymorpha.

- __Marpolrud_104_accessions__
    Same as above. For 104 accessions from M. polymorpha subsp. ruderalis.

- __reference_genome__
    The genome sequence for MpTak_v6.1 used as a reference genome and its faidx index file (.fai)
     MpTak_v6.1r2.gff.gz: Gene annotation in GFF format and its tabix index file (.tbi)

- __VCFs__
    Single-sample VCF files for each accession and their index files

## Caveat
- __Low quality data__
	The data for the following accessions are of low quality and may need careful consideration when interpreting analytical results.  
    
    RES36 \*1
    RES37 \*1, \*2
    RES38
    RES40
    WT-M \*1
    Tak1-HK \*1 
    Field-1 \*1
    NILSC10 \*1, \*2
    NILSC17
    NILSC24
    NILSC20

    \*1 Genome/protein FASTA and GFF are missing, since de novo assemblies for these accessions failed.
    \*2 They are included in the VCF for all accessions (Marpol_135_accessions), but not included in the VCF files for each subspecies.


## License

The data distributed in this directory is licensed under CC-BY (Creative Commons Attribution), which allows for the sharing and adaptation of the data, provided that proper attribution is given.

> __You are free to:__
> __Share__ — copy and redistribute the material in any medium or format for any purpose, even commercially.
> __Adapt__ — remix, transform, and build upon the material for any purpose, even commercially.
> The licensor cannot revoke these freedoms as long as you follow the license terms.

> __Under the following terms__:
> __Attribution__ - You must give appropriate credit , provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
> __No additional restrictions__ - You may not apply legal terms or technological measures that legally restrict others > from doing anything the license permits.

See https://creativecommons.org/licenses/by/4.0/

## Citation and contact
The Marchantia pangenome paper is under review.  
If you want to use the data distributed in this directory, please cite our preprint at bioRxiv: [DOI:10.1101/2023.10.27.564390](https://www.biorxiv.org/content/10.1101/2023.10.27.564390v1)  

For inquiries about the data, please contact us at the following mail addresses:
- Pierre Marc (pierre-marc.delaux @ cnrs.fr)
- Maxime Bonhomme (maxime.bonhomme @ univ-tlse3.fr)


