1971 | Born on 1971-02-10. |
1988 | Started medical studies at the medical faculty (Charité) of the Humboldt University in Berlin. |
1998 | Defended dr. med. thesis at the Humboldt University. |
1999 | Founded BioInfoBank Institute and took the position of CEO. |
Skills
science | Author of over 100 publications with over 10k citations (h-index>50). The articles cover mostly bioinformatics and related research areas. |
software | Author of several algorithms and services in science (bioinformatics) and commerce. Author of the ESC blockchain (Adshares). Author of ASIC designs (BTC mining). Used languages include C, C++, Perl, Python, Verilog, SQL, JavaScript, HTML and others. |
management | CEO of BioInfoBank Institute for over 20 years. Member of the management or supervisory boards in many other companies. |
investment | Co-founder of the VC fund BIB Seed Capital. Investor in Medicalgorithmics, Proteon Pharmaceuticals and Adshares. Early investor in crypto assets. |
mental | Highly skilled in forgetting (lost proficiency in German and Russian languages) of details and in generalizing. Lack of ability to learn if not interested. Light congenital amusia without problems in experiencing music. |
evolution | 6 children. |
Links
scholar linkedin facebook everybodywiki pl.wiki
Genome, lost genes (pLoF mutations) SNP,INDEL
Genome, clinical annotations (only pathogenic and protective clinvar mutations) SNP,INDEL
chr: chromosome; position: position in chromosome (using Hg38 coordinates); dbsnpid: id of variant (mutation); found: this variant found in so many genomes in this database; who: max 10 genomes with this variant from this database sorted by similarity; freq: (max) frequency of this variant in public databases; homozygous: variant in both copies of the genome; gene: gene name; effect and description: description from clinvar database; note: variants with high frequency in public (freq) or this database; comment: Clinical annotations will change a lot in the future! Annotation of lost genes may not change.
Circo
The outer concentric ring is chromosomal information; The second ring represents the read coverage in histogram style. A histogram is the average coverage of a 0.5Mbp region; The third ring represents indel density in scatter style. A black dot is calculated as indel number in a range of 1Mbp/1Mbp); The fourth ring represents snp density in scatter style. A green dot is calculated as snp number in a range of 1Mbp/1Mbp); The fifth ring represents the proportion of homozygous SNP (orange) and heterozygous SNP (grey) in histogram style. A histogram is calculated from a 1Mbp region; The sixth ring represents the CNV inference. Red means gain, and green means loss; The most central ring represents the SV inference in exonic and splicing regions. If SV is called using breakdancer or crest, then CTX (orange), INS (green), DEL (grey), ITX (pink) and INV (blue). If SV is called using delly, then TRA (orange), INS (green), DEL (grey), DUP (pink) and INV (blue);
Similar genomes (max total of 3704620 used for caculating match%)
who (max 20) | match | total | match% |
---|---|---|---|
3019395 | 3696098 | 81.69% | |
2943683 | 3693179 | 79.71% | |
2937818 | 3701289 | 79.37% | |
2934307 | 3698167 | 79.34% | |
2932255 | 3705093 | 79.15% | |
2928179 | 3682672 | 79.51% | |
2731812 | 3404540 | 80.24% | |
2728420 | 3429035 | 79.57% | |
2681015 | 3687671 | 72.70% | |
2577806 | 3700891 | 69.65% | |
2484607 | 3689285 | 67.35% | |
2468181 | 3891969 | 66.62% | |
2445977 | 3701675 | 66.08% | |
2440819 | 3700057 | 65.97% | |
2440411 | 3699992 | 65.96% | |
2432402 | 3686804 | 65.98% | |
2431693 | 3704046 | 65.65% | |
2430955 | 3689762 | 65.88% | |
2427856 | 3693003 | 65.74% | |
2426821 | 3682592 | 65.90% |
who: maximum 20 most similar genomes in this database sorted by number of common SNP variants; match: number of common SNP variants; total: total number of SNP variants in similar genome; match: similarity = (common variants)/min(total variants in tested or similar genome);
Statistics
sequencing quality | |
---|---|
Raw reads | 84080839 |
Raw data(G) | 99.86 |
Effective(%) | 0.04 |
Error(%) | 95.17 |
Q20(%) | 87.89 |
Q30(%) | 43.17 |
GC(%) | 43.77 |
mapping, coverage and depth | |
Total | 926015828 (100%) |
Duplicate | 93523564 (13.89%) |
Mapped | 673512843 (72.73%) |
Properly mapped | 659688462 (71.24%) |
PE mapped | 672039004 (72.57%) |
SE mapped | 2947678 (0.32%) |
With mate on different chr | 8797482 (0.95%) |
-''- and ((mapQ>=5)) | 5281845 (0.57%) |
Average_sequencing_depth | 33.68 |
Coverage | 99.81% |
Coverage_at_least_4X | 99.53% |
Coverage_at_least_10X | 98.68% |
Coverage_at_least_20X | 92.10% |
number of SNPs | |
CDS | 23908 |
synonymous_SNP | 12039 |
missense_SNP | 11498 |
stopgain | 92 |
stoploss | 12 |
unknown | 282 |
intronic | 1282778 |
UTR3 | 27207 |
UTR5 | 6227 |
splicing | 78 |
ncRNA_exonic | 14355 |
ncRNA_intronic | 228079 |
ncRNA_splicing | 65 |
upstream | 23546 |
downstream | 25252 |
intergenic | 2191568 |
Total | 3824170 |
feature of SNPs | |
Total | 3824170 |
Het | 2356303 |
Hom | 1467867 |
transition | 2552861 |
transvertion | 1271309 |
ts/tv | 2.01 |
dbSNP percentage | 3692215 (96.55%) |
novel | 131955 |
novel ts | 72463 |
novel tv | 59492 |
novel ts/tv | 1.22 |
number of InDels | |
CDS | 702 |
frameshift_deletion | 163 |
frameshift_insertion | 112 |
nonframeshift_deletion | 210 |
nonframeshift_insertion | 192 |
stopgain | 7 |
stoploss | 1 |
unknown | 24 |
intronic | 304561 |
UTR3 | 6958 |
UTR5 | 1018 |
splicing | 42 |
ncRNA_exonic | 2045 |
ncRNA_intronic | 50769 |
ncRNA_splicing | 16 |
upstream | 5748 |
downstream | 6409 |
intergenic | 454721 |
Total | 833267 |
feature of InDels | |
Total | 833267 |
Het | 526358 |
Hom | 306909 |
dbSNP percentage | 732380 (87.89%) |
novel | 100887 |
structural variants | |
DUP | 1509 |
INV | 791 |
INS | 76 |
DEL | 4388 |
BND | 1479 |
copy number variants | |
gain_count | 111 |
gain_size | 1667000 |
loss_count | 171 |
loss_size | 2203000 |
total_count | 282 |
total_size | 3870000 |