BioTuring
Cell Type Prediction
a.k.a HaiTam Algorithm

Automatic cell type prediction standing on a database of
80,574,317 cells • Huge efforts for cleaning and curating millions of single-cell
• State of the art neural network
• Advanced combinatorial algorithms on HPC for enumerating billions of possibilities

We can now classify
54 cell types and 183 sub types And this is not all, new sub-types are continuously added

What cell types can we predict?

Human
Mouse
B cell
Early pro-B cell
Follicular B cell
Germinal center B cell
Immature B cell
Mature B cell
Memory B cell
Naive B cell
Precursor B cell
Pro-B cell
Regulatory B cell

Plasma cell
IgA plasma cell
IgG plasma cell
IgM plasma cell
Plasmablast
Gamma-delta T cell
Naive gamma-delta T cell
Central memory gamma-delta T cell
Effector memory gamma-delta T cell
Effector gamma-delta T cell
Exhausted gamma-delta T cell
Mucosal associated invariant gamma-delta T cell
Macrophage
Hofbauer cell
Kupffer cell
Langerhans cell
Alveolar macrophage
Microglial cell
Perivascular macrophage
Pleural macrophage
Tissue-resident macrophage
Epithelial cell
Brush cell
Cholangiocyte
Cortical thymic epithelial cell
Epithelial cell of thymus
Hepatocyte
Luminal cell of prostate epithelium
Luminal epithelial cell of mammary gland
Medullary thymic epithelial cell
Myoepithelial cell
Pancreatic ductal cell
Simple columnar epithelial cell
Stratified epithelial cell
Urothelial cell
Epithelial cell of lung
Interstitial cell of cajal
Corneal epithelial cell
Foveolar cell of stomach
Glandular epithelial cell
Acinar cell
Eccrine cell
Squamous epithelial cell
Sertoli cell
Keratinocyte
Mesothelial cell
Peritubular myoid cell
Retinal cell
Mueller cell
Off-bipolar cell
On-bipolar cell
Cone retinal bipolar cell
Lens fiber cell
Photoreceptor cell
Retina horizontal cell
Retinal bipolar neuron
Retinal cone cell
Retinal ganglion cell
Retinal pigment epithelial cell
Retinal progenitor cell
Retinal rod cell
Rod bipolar cell
Muscle cell
Cardiac muscle cell
Cell of skeletal muscle
Skeletal muscle myoblast
Skeletal muscle satellite cell
Smooth muscle cell
Vascular associated smooth muscle cell

Osteoclast

Fat cell
Preadipocyte
Germ line cell
Decidual cell
Extravillous trophoblast
Placental villous trophoblast
Primordial germ cell
Syncytiotrophoblast cell
Trophoblast cell
Trophoblast giant cell
Extraembryonic cell
Trophectodermal cell
Hematopoietic stem cell
Common dendritic progenitor
Common lymphoid progenitor
Common myeloid progenitor
Erythroid progenitor cell
Granulocyte monocyte progenitor cell
Hematopoietic oligopotent progenitor cell
Megakaryocyte-erythroid progenitor cell
Myeloblast
Megakaryocyte progenitor cell
Hematopoietic multipotent progenitor cell

Blood cell
Erythroblast
Erythrocyte
Megakaryocyte
Platelet
Innate lymphoid cell
Innate lymphoid cell type 1
Innate lymphoid cell type 2
Innate lymphoid cell type 3
Natural killer cell
Lymphoid tissue-inducer cell

Myeloid suppressor cell

Natural killer T cell


Kidney epithelial cell
Epithelial cell of distal tubule
Epithelial cell of nephron
Epithelial cell of proximal tubule
Glomerular visceral epithelial cell
Juxtaglomerular complex cell
Kidney collecting duct epithelial cell
Kidney interstitial cell
Kidney loop of henle epithelial cell
Parietal epithelial cell
Renal alpha-intercalated cell
Renal beta-intercalated cell
Renal intercalated cell
Renal principal cell
Kidney connecting tubule epithelial cell
Kidney loop of henle ascending limb epithelial cell
Kidney pelvis urothelial cell

Intestinal epithelial cell
Enterocyte
Paneth cell
Ionocyte




Endothelial cell
Capillary endothelial cell
Endocardial cell
Endothelial cell of artery
Endothelial cell of high endothelial venule
Endothelial cell of lymphatic vessel
Endothelial cell of sinusoid
Endothelial cell of vascular tree
Endothelial stalk cell
Glomerular endothelial cell
Vein endothelial cell
Gut endothelial cell
Corneal endothelial cell


Fibroblast
Hepatic stellate cell
Myofibroblast cell
Pancreatic stellate cell
Reticular cell
Keratocyte


chondroblast

Stromal cell
Chondrocyte
Sperm
Spermatid
Spermatocyte
Spermatogonium

Oocyte
CD4+ T cell
CD4+, alpha-beta cytotoxic T cell
T follicular helper cell
T-helper 1 cell
T-helper 17 cell
T-helper 2 cell
Central memory CD4+, alpha-beta T cell
Effector memory CD4+, alpha-beta T cell
Effector memory CD4+, alpha-beta T cell, terminally differentiated
Naive thymus-derived CD4+, alpha-beta T cell
Regulatory T cell
Monocyte
Classical monocyte
Non-classical monocyte
Intermediate monocyte
Monoblast
Dendritic cell
Plasmacytoid dendritic cell
Conventional type 1 dendritic cell
Conventional type 2 dendritic cell
Mature conventional dendritic cell
Monocyte-derived dendritic cell

Mast cell

Pro-T cell


Endocrine cell
Chromaffin cell
Cortical cell of adrenal gland
Enteroendocrine cell
Granulosa cell
Neuroendocrine cell
Type A pancreatic cell
Type B pancreatic cell
Type D pancreatic cell
Pancreatic PP cell
Pancreatic centro-acinar cell
Pancreatic epsilon cell






Ciliated cell
Ependymal cell
Multi-ciliated epithelial cell
Melanocyte




Glial cell
Schwann cell
Astrocyte
Macroglial cell
Oligodendrocyte
Oligodendrocyte precursor cell
Radial glial cell
Schwann cell precursor







Mural cell
Mesangial cell
Pericyte cell





Odontoblast

Mesodermal cell
Intermediate mesodermal cell
Endodermal cell
Mesenchymal cell
Transit amplifying cell

CD8+ T cell
CD8+, alpha-beta cytotoxic T cell
Central memory CD8+, alpha-beta T cell
Effector CD8+, alpha-beta T cell
Effector memory CD8+, alpha-beta T cell
Effector memory CD8+, alpha-beta T cell, terminally differentiated
Naive thymus-derived CD8+, alpha-beta T cell




Intraepithelial lymphocyte
Alpha-beta intraepithelial T cell



Granulocyte
Basophil
Eosinophil
Immature neutrophil
Neutrophil
Neutrophilic myelocyte

Mucosal invariant T cell








Secretory cell
Leydig cell
Club cell
Exocrine cell
Gastrin secreting cell
Goblet cell
Mucus secreting cell
Peptic cell
Serous secreting cell
Thyroid follicular cell








Pneumocyte
Type I pneumocyte
Type II pneumocyte
Basal cell




Neural cell
Cajal-retzius cell
Gabaergic interneuron
Gabaergic neuron
Purkinje cell
Amacrine cell
Dopaminergic neuron
Enteric neuron
Excitatory neuron
Glutamatergic neuron
Glycinergic neuron
Granule cell
Inhibitory interneuron
Inhibitory neuron
Interneuron
Leptomeningeal cell
Motor neuron
Neural crest cell
Neuroblast (sensu vertebrata)
Neuronal brush cell
Neuronal stem cell
Pyramidal neuron
Sensory neuron
Serotonergic neuron
IT projecting neuron
ET projecting neuron
CT projecting neuron






Notochordal cell

Intestinal crypt stem cell
Hepatoblast
Tip cell


B cell
Follicular B cell
Germinal center B cell
Early pro-B cell
Pro-B cell
Precursor B cell
Mature B cell
Naive B cell
Memory B cell
Regulatory B cell
Immature B cell

Mast cell

Plasma cell
IgA plasma cell
IgG plasma cell
IgM plasma cell
Plasmablast
Macrophage
Langerhans cell
Kupffer cell
Alveolar macrophage
Microglial cell
Tissue-resident macrophage
Perivascular macrophage
Hofbauer cell
Pleural macrophage
Choroid-plexus macrophage
Inflammatory macrophage
Alternatively activated macrophage

Epithelial cell
Hepatocyte
Ionocyte
Luminal epithelial cell of mammary gland
Myoepithelial cell
Brush cell
Cholangiocyte
Hepatoblast
Urothelial cell
Pancreatic centro-acinar cell
Pancreatic ductal cell
Epithelial cell of thymus
Medullary thymic epithelial cell
Cortical thymic epithelial cell
Simple columnar epithelial cell
Luminal cell of prostate epithelium
Neuronal brush cell
Stratified epithelial cell
Corneal epithelial cell
Choroid plexus epithelial cell
Macula densa epithelial cell
Epiblast cell
Duct epithelial cell
Olfactory epithelial cell

Squamous epithelial cell
Keratinocyte
Mesothelial cell
Peritubular myoid cell
Neuron
Excitatory neuron
Amacrine cell
Inhibitory neuron
Gabaergic neuron
Dopaminergic neuron
Glutamatergic neuron
Granule cell
Serotonergic neuron
Neuroblast (sensu vertebrata)
Motor neuron
Interneuron
Glycinergic neuron
Cajal-retzius cell
Pyramidal neuron
Sensory neuron
Purkinje cell
Inhibitory interneuron
Enteric neuron
Gabaergic interneuron
Inhibitory motor neuron
Cholinergic neuron
Cerebellar golgi cell
Noradrenergic neuron
Retinal cell
Retinal bipolar neuron
Mueller cell
Photoreceptor cell
Retinal progenitor cell
Retinal rod cell
Retina horizontal cell
Retinal ganglion cell
Retinal cone cell
Retinal pigment epithelial cell
Renal intercalated cell
Lens fiber cell
Off-bipolar cell
Rod bipolar cell
On-bipolar cell
Cone retinal bipolar cell
Hematopoietic stem cell
Erythroid progenitor cell
Common dendritic progenitor
Megakaryocyte-erythroid progenitor cell
Common myeloid progenitor
Granulocyte monocyte progenitor cell
Myeloblast
Common lymphoid progenitor
Hematopoietic oligopotent progenitor cell
Hematopoietic precursor cell
Macrophage dendritic cell progenitor

Gamma-delta T cell

Blood cell
Erythroblast
Erythrocyte
Megakaryocyte
Platelet
Granulocyte
Neutrophil
Eosinophil
Neutrophilic myelocyte
Basophil
Immature neutrophil
Mature neutrophil
Double-negative thymocyte

Kidney epithelial cell
Epithelial cell of proximal tubule
Glomerular visceral epithelial cell
Renal alpha-intercalated cell
Renal beta-intercalated cell
Renal principal cell
Epithelial cell of distal tubule
Parietal epithelial cell
Juxtaglomerular complex cell
Kidney loop of henle epithelial cell
Epithelial cell of nephron
Kidney interstitial cell
Kidney loop of henle ascending limb epithelial cell
Kidney collecting duct epithelial cell
Kidney connecting tubule epithelial cell
Kidney pelvis urothelial cell
Kidney loop of henle thick ascending limb epithelial cell
Kidney distal convoluted tubule epithelial cell
Kidney collecting duct principal cell
Kidney collecting duct intercalated cell
Brush border cell of the proximal tubule
Kidney cortex artery cell
Kidney proximal convoluted tubule epithelial cell
Kidney proximal straight tubule epithelial cell
Intestinal epithelial cell
Enterocyte
Paneth cell
Brush cell of epithelium proper of large intestine
Glial cell
Astrocyte
Oligodendrocyte precursor cell
Macroglial cell
Oligodendrocyte
Schwann cell
Radial glial cell
Bergmann glial cell
Tanycyte
Olfactory ensheathing cell
Leptomeningeal cell
Vascular leptomeningeal cell
Fat cell
Preadipocyte


Germ cell
Syncytiotrophoblast cell
Extravillous trophoblast
Placental villous trophoblast
Trophoblast cell
Trophoblast giant cell
Sertoli cell
Spermatocyte
Spermatogonium
Spermatid
Primordial germ cell
Sperm
Oocyte
Endodermal cell
Spongiotrophoblast cell
CD4+ T cell
Naive thymus-derived CD4+, alpha-beta T cell
T-helper 17 cell
Effector memory CD4+, alpha-beta T cell, terminally differentiated
T follicular helper cell
T-helper 1 cell
Regulatory T cell
CD4+, alpha-beta cytotoxic T cell
Effector memory CD4+, alpha-beta T cell
Central memory CD4+, alpha-beta T cell
T-helper 2 cell
Natural killer T cell

Monocyte
Classical monocyte
Non-classical monocyte
Intermediate monocyte
Monoblast
Innate lymphoid cell
Innate lymphoid cell type 1
Innate lymphoid cell type 2
Innate lymphoid cell type 3
Natural killer cell
Lymphoid tissue-inducer cell

Myeloid suppressor cell

Endocrine cell
Type B pancreatic cell
Pancreatic A cell
Pancreatic PP cell
Pancreatic epsilon cell
Pancreatic D cell
Enteroendocrine cell
Cortical cell of adrenal gland
Chromaffin cell
Neuroendocrine cell
Granulosa cell
Type EC enteroendocrine cell
Gip cell
Type G enteroendocrine cell
Large luteal cell
Small luteal cell
Luteal cell


Melanocyte

Ciliated cell
Ependymal cell
Multi-ciliated epithelial cell

Muscle cell
Skeletal muscle myoblast
Smooth muscle cell
Vascular associated smooth muscle cell
Cardiac muscle cell
Cell of skeletal muscle
Skeletal muscle satellite cell
Interstitial cell of cajal
Skeletal muscle fiber
Myoblast
Mural cell

Tip cell



Mesodermal cell
Intermediate mesodermal cell
Ectodermal cell

Mesenchymal cell

Transit amplifying cell

CD8+ T cell
Naive thymus-derived cd8+, alpha-beta T cell
Central memory CD8+, alpha-beta T cell
Effector memory CD8+, alpha-beta T cell, terminally differentiated
Effector memory CD8+, alpha-beta T cell
Effector CD8+, alpha-beta T cell
Cd8+, alpha-beta cytotoxic T cell




Mucosal invariant T cell

Dendritic cell
Plasmacytoid dendritic cell
Conventional dendritic cell
Follicular dendritic cell
Pre-conventional dendritic cell
Intraepithelial lymphocyte
Alpha-beta intraepithelial T cell











Secretory cell
Mucus secreting cell
Serous secreting cell
Goblet cell
Club cell
Exocrine cell
Leydig cell
Gastrin secreting cell
Glandular epithelial cell
Acinar cell
Eccrine cell
Peptic cell
Thyroid follicular cell
Parietal cell
Pancreatic acinar cell
Basal cell

Pneumocyte
Type I pneumocyte
Type II pneumocyte

Fibroblast
Hepatic stellate cell
Myofibroblast cell
Pancreatic stellate cell
Reticular cell
Keratocyte
Tendon cell
Fibrocyte


Endothelial cell

Stromal cell
Decidual cell
Chondrocyte

Notochordal cell

chondroblast

Osteoclast

Odontoblast

Your cell subtypes are not on this list? Contact us at talk2data@bioturing.com

Benchmarks

We benchmarked BioTuring Cell Type Prediction against Seurat v4 on several datasets. Below are some highlights, full benchmarks will come shortly with our manuscript.
Landscape of infiltrating T cells in liver cancer revealed by single-cell sequencing
Chunhong Zheng, Liangtao Zheng, Jae-Kwang Yoo, Huahu Guo, Yuanyuan Zhang, Xinyi Guo, Boxi Kang, Ruozhen Hu, Julie Y. Huang, Qiming Zhang, Zhouzerui Liu, Minghui Dong, Xueda Hu, Wenjun Ouyang, Jirun Peng, Zemin Zhang
BioTuring accurately detected more subtypes including: Exhausted CD4+ T cell, Exhausted CD8+ T cell, NKT-like CD8+ cell. Seurat v4 mislabels effector memory CD8+ T cell with cytotoxic CD4+ T cell and NK cell.

BioTuring and Seurat v4 yielded nearly identical results on naive CD4+ T cell, central memory CD4+ T cell (CD4 TCM), effector memory CD4+ T cell (CD4 TEM), regulatory CD4+ T cell (CD4 Treg), naive CD8+ T cell, mucosal associated invariant CD8+ T cell (CD8 MAIT).
Landscape and dynamics of single immune cells in hepatocellular carcinoma
Zhang Q, He Y, Luo N, Patel SJ, Han Y, Gao R, Modak M, Carotta S, Haslinger C, Kind D, Peet GW, Zhong G, Lu S, Zhu W, Mao Y, Xiao M, Bergmann M, Hu X, Kerkar SP, Vogt AB, Pflanz S, Liu K, Peng J, Ren X, Zhang Z
In this dataset, Serat v4 misidentified most cell types: macrophages were incorrectly identified as CD14 monocytes, mast cells were mislabeled as erythrocytes. BioTuring correctly identified all these cell types as its model was built from much larger training data with more cell types.

Labeling your data now

Submit data
Running status
Submit your raw expression profile to our server and we will send the cell type labels to your email as soon as the process is finished.
Input data format

Zip:

a zipped folder contains 3 files:
• barcodes.(tsv|csv|gz|tar|tar.gz)
• features.(tsv|csv|gz|tar|tar.gz) or
genes.(tsv|csv|gz|tar|tar.gz)
• matrix.(mtx|gz|tar|tar.gz)

Hdf5:

• barcodes
• genes or features
• data
• indices
• indptr

Text:

a full matrix text file separated by tab or comma

*Note: you should only submit one batch at a time.
Get the current status of a submitted project.
Input a project ID to start

Command Line API

Python tool
HTTPS POST
Request token for API call. Token will be sent to your email
Download our commnand line tool here, then from terminal, run:
$ python3 get_prediction.py -h
usage: get_prediction.py [-h] [--species SPECIES] [--version VERSION] [--file FILE] [--type TYPE] [--shape SHAPE] [--token TOKEN]
[--output OUTPUT] [--project_id PROJECT_ID]
BioTuring's cell type prediction API.
-h, --help show this help message and exit
--species SPECIES Species (human | mouse)
--version VERSION Prediction version: human (1 | 2), mouse (1)
--file FILE [.zip] Zipped folder contains 3 files: matrix.mtx, barcodes.tsv, genes.tsv/features.tsv
[.hdf5] HDF5 object contains 5 keys: data, indices, indptr, barcodes, genes/features
[.tsv] Full expression
--type TYPE File type (zip | hdf5 | tsv)
--shape SHAPE [genesxcells | cellsxgenes]
--token TOKEN Authenticated token
--project_id PROJECT_ID If you have already submitted the data, adding this argument to get your result

For example, to submit your data, run:

$ python3 get_prediction.py --token your_token --species human --version 2 --file path/to/your/file.zip --file_type zip --shape genesxcells --output path/to/your/result/file.tsv

GSM4005491.zip: 4.23MMB [00:00, 57.3MMB/s]
[2021-07-25 23:11:48] Success to submit data. Project id: 9ef104fd-c277-443f-8cf1-eb534f56f632.
[2021-07-25 23:11:48] Waiting in the queue...
[2021-07-25 23:13:15] Extracting data...
[2021-07-25 23:13:17] Loaded: 2611 cells.
[2021-07-25 23:13:29] Preprocessing data...
[2021-07-25 23:13:31] Running dimensional reduction...
[2021-07-25 23:14:11] Clustering...
[2021-07-25 23:14:13] Training...
[2021-07-25 23:14:33] Removing ambiguous labels...
[2021-07-25 23:17:48] Completed!

Input data format:

Zip: a zipped folder contains 3 files:

• barcodes.(tsv|csv|gz|tar|tar.gz)
• features.(tsv|csv|gz|tar|tar.gz) or genes.(tsv|csv|gz|tar|tar.gz)
• matrix.(mtx|gz|tar|tar.gz)


Hdf5: a hdf5 file contains 5 keys:

• barcodes
• genes or features
• data
• indices
• indptr


Text: a full matrix text file separated by tab or comma

You can always retreive your submitted data result using:

$ python3 get_prediction.py --token your_token --project_id submitted_project_id --output path/to/your/result/file.tsv

To list out all your submitted project, run:

$ curl -X POST https://talk2data.bioturing.com/predict/get_info --form token='your_token'
{ "projects":[
    {
        "created_date":"date_time",
        "email":"your email",
        "file_name":"submitted_file_name",
        "file_shape":"genesxcells",
        "file_type":"zip",
        "project_id":"project_id",
        "status":"Completed"
    }, {
        "created_date":"date_time",
        "email":"your email",
        "file_name":"submitted_file_name",
        "file_shape":"cellxgenes",
        "file_type":"tsv",
        "project_id":"project_id",
        "status":"Running"
    }
]}

We provide 2 POST APIs for you to submit data and retrieve the result. For example, you can submit data with curl:

$ curl -X POST https://talk2data.bioturing.com/predict/submit --form token='your_token' --form species='human' --form version='2' --form type='zip' --form shape='genesxcels' --form exp_matrix='@path/to/your/file.zip'

{
    “status”:200,
    “message”:”Successfully submitted the data!”,
    “project_id”:”0f16115c-8f54-4380-932b-ae2ad26f0c13”
}

Input data format:

Zip: a zipped folder contains 3 files:

• barcodes.(tsv|csv|gz|tar|tar.gz)
• features.(tsv|csv|gz|tar|tar.gz) or genes.(tsv|csv|gz|tar|tar.gz)
• matrix.(mtx|gz|tar|tar.gz)


Hdf5: a hdf5 file contains 5 keys:

• barcodes
• genes or features
• data
• indices
• indptr


Text: a full matrix text file separated by tab or comma

You can always retrieve your submitted data result using:

$ curl -X POST https://talk2data.bioturing.com/predict/get_result —form token='your_token' —form project_id='0f16115c-8f54-4380-932b-ae2ad26f0c13'

If the process has not been finished, you will receive the current running status:

{
    “status”:400,
    “is_running”:true,
    “running_status”:”[2021-07-27 09:53:45] Waiting in the queue…\n[2021-07-27 09:53:45] Extracting data…\n”
}

If the process has been completed, you will receive result as tsv string:

{
    “status”:200,
    “data”:”Barcodes\tPredited cell type\nbarcode1\tcell type\n…”
}

To list out all your submitted project, run:

$ curl -X POST https://talk2data.bioturing.com/predict/get_info —-form token='your_token'
{ “projects”: [
    {
        “created_date”:”date_time”,
        “email”:”your email”,
        “file_name”:”submitted_file_name”,
        “file_shape”:”genesxcells”,
        “file_type”:”zip”,
        “project_id”:”project_id”,
        “status”:”Completed”
    }, {
        “created_date”:”date_time”,
        “email”:”your email”,
        “file_name”:”submitted_file_name”,
        “file_shape”:”cellxgenes”,
        “file_type”:”tsv”,
        “project_id”:”project_id”,
        “status”:”Running”
    }
]}
Success!
Fail

File uploading...