Talk2data

BioTuring
Cell Type Prediction

a.k.a HaiTam Algorithm

Automatic cell type prediction standing on a database of
80,574,317 cells • Huge efforts for cleaning and curating millions of single-cell
• State of the art neural network
• Advanced combinatorial algorithms on HPC for enumerating billions of possibilities

We can now classify
54 cell types and 183 sub types And this is not all, new sub-types are continuously added

What cell types can we predict?

Human

Mouse

B cell

Early pro-B cell
Follicular B cell
Germinal center B cell
Immature B cell
Mature B cell
Memory B cell
Naive B cell
Precursor B cell
Pro-B cell
Regulatory B cell

Plasma cell

IgA plasma cell
IgG plasma cell
IgM plasma cell
Plasmablast

Gamma-delta T cell

Naive gamma-delta T cell
Central memory gamma-delta T cell
Effector memory gamma-delta T cell
Effector gamma-delta T cell
Exhausted gamma-delta T cell
Mucosal associated invariant gamma-delta T cell

Macrophage

Hofbauer cell
Kupffer cell
Langerhans cell
Alveolar macrophage
Microglial cell
Perivascular macrophage
Pleural macrophage
Tissue-resident macrophage

Epithelial cell

Brush cell
Cholangiocyte
Cortical thymic epithelial cell
Epithelial cell of thymus
Hepatocyte
Luminal cell of prostate epithelium
Luminal epithelial cell of mammary gland
Medullary thymic epithelial cell
Myoepithelial cell
Pancreatic ductal cell
Simple columnar epithelial cell
Stratified epithelial cell
Urothelial cell
Epithelial cell of lung
Interstitial cell of cajal
Corneal epithelial cell
Foveolar cell of stomach

Glandular epithelial cell

Acinar cell
Eccrine cell

Squamous epithelial cell

Sertoli cell
Keratinocyte
Mesothelial cell
Peritubular myoid cell

Retinal cell

Mueller cell
Off-bipolar cell
On-bipolar cell
Cone retinal bipolar cell
Lens fiber cell
Photoreceptor cell
Retina horizontal cell
Retinal bipolar neuron
Retinal cone cell
Retinal ganglion cell
Retinal pigment epithelial cell
Retinal progenitor cell
Retinal rod cell
Rod bipolar cell

Muscle cell

Cardiac muscle cell
Cell of skeletal muscle
Skeletal muscle myoblast
Skeletal muscle satellite cell
Smooth muscle cell
Vascular associated smooth muscle cell

Osteoclast

Fat cell

Preadipocyte

Germ line cell

Decidual cell
Extravillous trophoblast
Placental villous trophoblast
Primordial germ cell
Syncytiotrophoblast cell
Trophoblast cell
Trophoblast giant cell
Extraembryonic cell
Trophectodermal cell

Hematopoietic stem cell

Common dendritic progenitor
Common lymphoid progenitor
Common myeloid progenitor
Erythroid progenitor cell
Granulocyte monocyte progenitor cell
Hematopoietic oligopotent progenitor cell
Megakaryocyte-erythroid progenitor cell
Myeloblast
Megakaryocyte progenitor cell
Hematopoietic multipotent progenitor cell

Blood cell

Erythroblast
Erythrocyte
Megakaryocyte
Platelet

Innate lymphoid cell

Innate lymphoid cell type 1
Innate lymphoid cell type 2
Innate lymphoid cell type 3
Natural killer cell
Lymphoid tissue-inducer cell

Myeloid suppressor cell

Natural killer T cell

Kidney epithelial cell

Epithelial cell of distal tubule
Epithelial cell of nephron
Epithelial cell of proximal tubule
Glomerular visceral epithelial cell
Juxtaglomerular complex cell
Kidney collecting duct epithelial cell
Kidney interstitial cell
Kidney loop of henle epithelial cell
Parietal epithelial cell
Renal alpha-intercalated cell
Renal beta-intercalated cell
Renal intercalated cell
Renal principal cell
Kidney connecting tubule epithelial cell
Kidney loop of henle ascending limb epithelial cell
Kidney pelvis urothelial cell

Intestinal epithelial cell

Enterocyte
Paneth cell

Ionocyte

Endothelial cell

Capillary endothelial cell
Endocardial cell
Endothelial cell of artery
Endothelial cell of high endothelial venule
Endothelial cell of lymphatic vessel
Endothelial cell of sinusoid
Endothelial cell of vascular tree
Endothelial stalk cell
Glomerular endothelial cell
Vein endothelial cell
Gut endothelial cell
Corneal endothelial cell

Fibroblast

Hepatic stellate cell
Myofibroblast cell
Pancreatic stellate cell
Reticular cell
Keratocyte

chondroblast

Stromal cell

Chondrocyte

Sperm

Spermatid
Spermatocyte
Spermatogonium

Oocyte

CD4+ T cell

CD4+, alpha-beta cytotoxic T cell
T follicular helper cell
T-helper 1 cell
T-helper 17 cell
T-helper 2 cell
Central memory CD4+, alpha-beta T cell
Effector memory CD4+, alpha-beta T cell
Effector memory CD4+, alpha-beta T cell, terminally differentiated
Naive thymus-derived CD4+, alpha-beta T cell
Regulatory T cell

Monocyte

Classical monocyte
Non-classical monocyte
Intermediate monocyte
Monoblast

Dendritic cell

Plasmacytoid dendritic cell
Conventional type 1 dendritic cell
Conventional type 2 dendritic cell
Mature conventional dendritic cell
Monocyte-derived dendritic cell

Mast cell

Pro-T cell

Endocrine cell

Chromaffin cell
Cortical cell of adrenal gland
Enteroendocrine cell
Granulosa cell
Neuroendocrine cell
Type A pancreatic cell
Type B pancreatic cell
Type D pancreatic cell
Pancreatic PP cell
Pancreatic centro-acinar cell
Pancreatic epsilon cell

Ciliated cell

Ependymal cell
Multi-ciliated epithelial cell

Melanocyte

Glial cell

Schwann cell
Astrocyte
Macroglial cell
Oligodendrocyte
Oligodendrocyte precursor cell
Radial glial cell
Schwann cell precursor

Mural cell

Mesangial cell
Pericyte cell

Odontoblast

Mesodermal cell

Intermediate mesodermal cell

Endodermal cell

Mesenchymal cell

Transit amplifying cell

CD8+ T cell

CD8+, alpha-beta cytotoxic T cell
Central memory CD8+, alpha-beta T cell
Effector CD8+, alpha-beta T cell
Effector memory CD8+, alpha-beta T cell
Effector memory CD8+, alpha-beta T cell, terminally differentiated
Naive thymus-derived CD8+, alpha-beta T cell

Intraepithelial lymphocyte

Alpha-beta intraepithelial T cell

Granulocyte

Basophil
Eosinophil
Immature neutrophil
Neutrophil
Neutrophilic myelocyte

Mucosal invariant T cell

Secretory cell

Leydig cell
Club cell
Exocrine cell
Gastrin secreting cell
Goblet cell
Mucus secreting cell
Peptic cell
Serous secreting cell
Thyroid follicular cell

Pneumocyte

Type I pneumocyte
Type II pneumocyte

Basal cell

Neural cell

Cajal-retzius cell
Gabaergic interneuron
Gabaergic neuron
Purkinje cell
Amacrine cell
Dopaminergic neuron
Enteric neuron
Excitatory neuron
Glutamatergic neuron
Glycinergic neuron
Granule cell
Inhibitory interneuron
Inhibitory neuron
Interneuron
Leptomeningeal cell
Motor neuron
Neural crest cell
Neuroblast (sensu vertebrata)
Neuronal brush cell
Neuronal stem cell
Pyramidal neuron
Sensory neuron
Serotonergic neuron
IT projecting neuron
ET projecting neuron
CT projecting neuron

Notochordal cell

Intestinal crypt stem cell

Hepatoblast

Tip cell

B cell

Follicular B cell
Germinal center B cell
Early pro-B cell
Pro-B cell
Precursor B cell
Mature B cell
Naive B cell
Memory B cell
Regulatory B cell
Immature B cell

Mast cell

Plasma cell

IgA plasma cell
IgG plasma cell
IgM plasma cell
Plasmablast

Macrophage

Langerhans cell
Kupffer cell
Alveolar macrophage
Microglial cell
Tissue-resident macrophage
Perivascular macrophage
Hofbauer cell
Pleural macrophage
Choroid-plexus macrophage
Inflammatory macrophage
Alternatively activated macrophage

Epithelial cell

Hepatocyte
Ionocyte
Luminal epithelial cell of mammary gland
Myoepithelial cell
Brush cell
Cholangiocyte
Hepatoblast
Urothelial cell
Pancreatic centro-acinar cell
Pancreatic ductal cell
Epithelial cell of thymus
Medullary thymic epithelial cell
Cortical thymic epithelial cell
Simple columnar epithelial cell
Luminal cell of prostate epithelium
Neuronal brush cell
Stratified epithelial cell
Corneal epithelial cell
Choroid plexus epithelial cell
Macula densa epithelial cell
Epiblast cell
Duct epithelial cell
Olfactory epithelial cell

Squamous epithelial cell

Keratinocyte
Mesothelial cell
Peritubular myoid cell

Neuron

Excitatory neuron
Amacrine cell
Inhibitory neuron
Gabaergic neuron
Dopaminergic neuron
Glutamatergic neuron
Granule cell
Serotonergic neuron
Neuroblast (sensu vertebrata)
Motor neuron
Interneuron
Glycinergic neuron
Cajal-retzius cell
Pyramidal neuron
Sensory neuron
Purkinje cell
Inhibitory interneuron
Enteric neuron
Gabaergic interneuron
Inhibitory motor neuron
Cholinergic neuron
Cerebellar golgi cell
Noradrenergic neuron

Retinal cell

Retinal bipolar neuron
Mueller cell
Photoreceptor cell
Retinal progenitor cell
Retinal rod cell
Retina horizontal cell
Retinal ganglion cell
Retinal cone cell
Retinal pigment epithelial cell
Renal intercalated cell
Lens fiber cell
Off-bipolar cell
Rod bipolar cell
On-bipolar cell
Cone retinal bipolar cell

Hematopoietic stem cell

Erythroid progenitor cell
Common dendritic progenitor
Megakaryocyte-erythroid progenitor cell
Common myeloid progenitor
Granulocyte monocyte progenitor cell
Myeloblast
Common lymphoid progenitor
Hematopoietic oligopotent progenitor cell
Hematopoietic precursor cell
Macrophage dendritic cell progenitor

Gamma-delta T cell

Blood cell

Erythroblast
Erythrocyte
Megakaryocyte
Platelet

Granulocyte

Neutrophil
Eosinophil
Neutrophilic myelocyte
Basophil
Immature neutrophil
Mature neutrophil

Double-negative thymocyte

Kidney epithelial cell

Epithelial cell of proximal tubule
Glomerular visceral epithelial cell
Renal alpha-intercalated cell
Renal beta-intercalated cell
Renal principal cell
Epithelial cell of distal tubule
Parietal epithelial cell
Juxtaglomerular complex cell
Kidney loop of henle epithelial cell
Epithelial cell of nephron
Kidney interstitial cell
Kidney loop of henle ascending limb epithelial cell
Kidney collecting duct epithelial cell
Kidney connecting tubule epithelial cell
Kidney pelvis urothelial cell
Kidney loop of henle thick ascending limb epithelial cell
Kidney distal convoluted tubule epithelial cell
Kidney collecting duct principal cell
Kidney collecting duct intercalated cell
Brush border cell of the proximal tubule
Kidney cortex artery cell
Kidney proximal convoluted tubule epithelial cell
Kidney proximal straight tubule epithelial cell

Intestinal epithelial cell

Enterocyte
Paneth cell
Brush cell of epithelium proper of large intestine

Glial cell

Astrocyte
Oligodendrocyte precursor cell
Macroglial cell
Oligodendrocyte
Schwann cell
Radial glial cell
Bergmann glial cell
Tanycyte
Olfactory ensheathing cell

Leptomeningeal cell

Vascular leptomeningeal cell

Fat cell

Preadipocyte

Germ cell

Syncytiotrophoblast cell
Extravillous trophoblast
Placental villous trophoblast
Trophoblast cell
Trophoblast giant cell
Sertoli cell
Spermatocyte
Spermatogonium
Spermatid
Primordial germ cell
Sperm
Oocyte
Endodermal cell
Spongiotrophoblast cell

CD4+ T cell

Naive thymus-derived CD4+, alpha-beta T cell
T-helper 17 cell
Effector memory CD4+, alpha-beta T cell, terminally differentiated
T follicular helper cell
T-helper 1 cell
Regulatory T cell
CD4+, alpha-beta cytotoxic T cell
Effector memory CD4+, alpha-beta T cell
Central memory CD4+, alpha-beta T cell
T-helper 2 cell

Natural killer T cell

Monocyte

Classical monocyte
Non-classical monocyte
Intermediate monocyte
Monoblast

Innate lymphoid cell

Innate lymphoid cell type 1
Innate lymphoid cell type 2
Innate lymphoid cell type 3
Natural killer cell
Lymphoid tissue-inducer cell

Myeloid suppressor cell

Endocrine cell

Type B pancreatic cell
Pancreatic A cell
Pancreatic PP cell
Pancreatic epsilon cell
Pancreatic D cell
Enteroendocrine cell
Cortical cell of adrenal gland
Chromaffin cell
Neuroendocrine cell
Granulosa cell
Type EC enteroendocrine cell
Gip cell
Type G enteroendocrine cell
Large luteal cell
Small luteal cell
Luteal cell

Melanocyte

Ciliated cell

Ependymal cell
Multi-ciliated epithelial cell

Muscle cell

Skeletal muscle myoblast
Smooth muscle cell
Vascular associated smooth muscle cell
Cardiac muscle cell
Cell of skeletal muscle
Skeletal muscle satellite cell
Interstitial cell of cajal
Skeletal muscle fiber
Myoblast

Mural cell

Tip cell

Mesodermal cell

Intermediate mesodermal cell

Ectodermal cell

Mesenchymal cell

Transit amplifying cell

CD8+ T cell

Naive thymus-derived cd8+, alpha-beta T cell
Central memory CD8+, alpha-beta T cell
Effector memory CD8+, alpha-beta T cell, terminally differentiated
Effector memory CD8+, alpha-beta T cell
Effector CD8+, alpha-beta T cell
Cd8+, alpha-beta cytotoxic T cell

Mucosal invariant T cell

Dendritic cell

Plasmacytoid dendritic cell
Conventional dendritic cell
Follicular dendritic cell
Pre-conventional dendritic cell

Intraepithelial lymphocyte

Alpha-beta intraepithelial T cell

Secretory cell

Mucus secreting cell
Serous secreting cell
Goblet cell
Club cell
Exocrine cell
Leydig cell
Gastrin secreting cell

Glandular epithelial cell

Acinar cell
Eccrine cell
Peptic cell
Thyroid follicular cell
Parietal cell
Pancreatic acinar cell

Basal cell

Pneumocyte

Type I pneumocyte
Type II pneumocyte

Fibroblast

Hepatic stellate cell
Myofibroblast cell
Pancreatic stellate cell
Reticular cell
Keratocyte
Tendon cell
Fibrocyte

Endothelial cell

Stromal cell

Decidual cell
Chondrocyte

Notochordal cell

chondroblast

Osteoclast

Odontoblast

Your cell subtypes are not on this list? Contact us at talk2data@bioturing.com

Benchmarks

We benchmarked BioTuring Cell Type Prediction against Seurat v4 on several datasets. Below are some highlights, full benchmarks will come shortly with our manuscript.

Landscape of infiltrating T cells in liver cancer revealed by single-cell sequencing

Chunhong Zheng, Liangtao Zheng, Jae-Kwang Yoo, Huahu Guo, Yuanyuan Zhang, Xinyi Guo, Boxi Kang, Ruozhen Hu, Julie Y. Huang, Qiming Zhang, Zhouzerui Liu, Minghui Dong, Xueda Hu, Wenjun Ouyang, Jirun Peng, Zemin Zhang

BioTuring accurately detected more subtypes including: Exhausted CD4+ T cell, Exhausted CD8+ T cell, NKT-like CD8+ cell. Seurat v4 mislabels effector memory CD8+ T cell with cytotoxic CD4+ T cell and NK cell.

BioTuring and Seurat v4 yielded nearly identical results on naive CD4+ T cell, central memory CD4+ T cell (CD4 TCM), effector memory CD4+ T cell (CD4 TEM), regulatory CD4+ T cell (CD4 Treg), naive CD8+ T cell, mucosal associated invariant CD8+ T cell (CD8 MAIT).

Landscape and dynamics of single immune cells in hepatocellular carcinoma

Zhang Q, He Y, Luo N, Patel SJ, Han Y, Gao R, Modak M, Carotta S, Haslinger C, Kind D, Peet GW, Zhong G, Lu S, Zhu W, Mao Y, Xiao M, Bergmann M, Hu X, Kerkar SP, Vogt AB, Pflanz S, Liu K, Peng J, Ren X, Zhang Z

In this dataset, Serat v4 misidentified most cell types: macrophages were incorrectly identified as CD14 monocytes, mast cells were mislabeled as erythrocytes. BioTuring correctly identified all these cell types as its model was built from much larger training data with more cell types.

Labeling your data now

Submit data

Running status

Submit your raw expression profile to our server and we will send the cell type labels to your email as soon as the process is finished.

Input data format

Zip:

a zipped folder contains 3 files:
• barcodes.(tsv|csv|gz|tar|tar.gz)
• features.(tsv|csv|gz|tar|tar.gz) or
genes.(tsv|csv|gz|tar|tar.gz)
• matrix.(mtx|gz|tar|tar.gz)

Hdf5:

• barcodes
• genes or features
• data
• indices
• indptr

Text:

a full matrix text file separated by tab or comma

*Note: you should only submit one batch at a time.

Get the current status of a submitted project.

Input a project ID to start

Command Line API

Python tool

HTTPS POST

Request token for API call. Token will be sent to your email

Download our commnand line tool here, then from terminal, run:

$ python3 get_prediction.py -h
usage: get_prediction.py	[-h] [--species SPECIES] [--version VERSION] [--file FILE] [--type TYPE] [--shape SHAPE] [--token TOKEN] [--output OUTPUT] [--project_id PROJECT_ID]
BioTuring's cell type prediction API.
-h, --help	show this help message and exit
--species SPECIES	Species (human \| mouse)
--version VERSION	Prediction version: human (1 \| 2), mouse (1)
--file FILE	[.zip] Zipped folder contains 3 files: matrix.mtx, barcodes.tsv, genes.tsv/features.tsv [.hdf5] HDF5 object contains 5 keys: data, indices, indptr, barcodes, genes/features [.tsv] Full expression
--type TYPE	File type (zip \| hdf5 \| tsv)
--shape SHAPE	[genesxcells \| cellsxgenes]
--token TOKEN	Authenticated token
--project_id PROJECT_ID	If you have already submitted the data, adding this argument to get your result

For example, to submit your data, run:

$ python3 get_prediction.py --token your_token --species human --version 2 --file path/to/your/file.zip --file_type zip --shape genesxcells --output path/to/your/result/file.tsv

GSM4005491.zip: 4.23MMB [00:00, 57.3MMB/s]
[2021-07-25 23:11:48] Success to submit data. Project id: 9ef104fd-c277-443f-8cf1-eb534f56f632.
[2021-07-25 23:11:48] Waiting in the queue...
[2021-07-25 23:13:15] Extracting data...
[2021-07-25 23:13:17] Loaded: 2611 cells.
[2021-07-25 23:13:29] Preprocessing data...
[2021-07-25 23:13:31] Running dimensional reduction...
[2021-07-25 23:14:11] Clustering...
[2021-07-25 23:14:13] Training...
[2021-07-25 23:14:33] Removing ambiguous labels...
[2021-07-25 23:17:48] Completed!

Input data format:

Zip: a zipped folder contains 3 files:

• barcodes.(tsv|csv|gz|tar|tar.gz)
• features.(tsv|csv|gz|tar|tar.gz) or genes.(tsv|csv|gz|tar|tar.gz)
• matrix.(mtx|gz|tar|tar.gz)

Hdf5: a hdf5 file contains 5 keys:

• barcodes
• genes or features
• data
• indices
• indptr

Text: a full matrix text file separated by tab or comma

You can always retreive your submitted data result using:

$ python3 get_prediction.py --token your_token --project_id submitted_project_id --output path/to/your/result/file.tsv

To list out all your submitted project, run:

$ curl -X POST https://talk2data.bioturing.com/predict/get_info --form token='your_token'
{ "projects":[
    {
        "created_date":"date_time",
        "email":"your email",
        "file_name":"submitted_file_name",
        "file_shape":"genesxcells",
        "file_type":"zip",
        "project_id":"project_id",
        "status":"Completed"
    }, {
        "created_date":"date_time",
        "email":"your email",
        "file_name":"submitted_file_name",
        "file_shape":"cellxgenes",
        "file_type":"tsv",
        "project_id":"project_id",
        "status":"Running"
    }
]}

We provide 2 POST APIs for you to submit data and retrieve the result. For example, you can submit data with curl:

$ curl -X POST https://talk2data.bioturing.com/predict/submit --form token='your_token' --form species='human' --form version='2' --form type='zip' --form shape='genesxcels' --form exp_matrix='@path/to/your/file.zip'

{
    “status”:200,
    “message”:”Successfully submitted the data!”,
    “project_id”:”0f16115c-8f54-4380-932b-ae2ad26f0c13”
}

Input data format:

Zip: a zipped folder contains 3 files:

• barcodes.(tsv|csv|gz|tar|tar.gz)
• features.(tsv|csv|gz|tar|tar.gz) or genes.(tsv|csv|gz|tar|tar.gz)
• matrix.(mtx|gz|tar|tar.gz)

Hdf5: a hdf5 file contains 5 keys:

• barcodes
• genes or features
• data
• indices
• indptr

Text: a full matrix text file separated by tab or comma

You can always retrieve your submitted data result using:

$ curl -X POST https://talk2data.bioturing.com/predict/get_result —form token='your_token' —form project_id='0f16115c-8f54-4380-932b-ae2ad26f0c13'

If the process has not been finished, you will receive the current running status:

{
    “status”:400,
    “is_running”:true,
    “running_status”:”[2021-07-27 09:53:45] Waiting in the queue…\n[2021-07-27 09:53:45] Extracting data…\n”
}

If the process has been completed, you will receive result as tsv string:

{
“status”:200,
“data”:”Barcodes\tPredited cell type\nbarcode1\tcell type\n…”
}

To list out all your submitted project, run:

$ curl -X POST https://talk2data.bioturing.com/predict/get_info —-form token='your_token'
{ “projects”: [
    {
        “created_date”:”date_time”,
        “email”:”your email”,
        “file_name”:”submitted_file_name”,
        “file_shape”:”genesxcells”,
        “file_type”:”zip”,
        “project_id”:”project_id”,
        “status”:”Completed”
    }, {
        “created_date”:”date_time”,
        “email”:”your email”,
        “file_name”:”submitted_file_name”,
        “file_shape”:”cellxgenes”,
        “file_type”:”tsv”,
        “project_id”:”project_id”,
        “status”:”Running”
    }
]}