BioTuring
Cell Type Prediction
a.k.a HaiTam Algorithm

Automatic cell type prediction standing on a database of
77,305,607 cells • Huge efforts for cleaning and curating millions of single-cell
• State of the art neural network
• Advanced combinatorial algorithms on HPC for enumerating billions of possibilities

We can now classify
45 humanimmune sub-types And this is not all, new sub-types are continuously added

What cell types can we predict?

B cell
Follicular B cell
Memory B cell
Marginal zone B cell
Naive B cell
Germinal center B cell
Plasma cell
CD4+ T cell
Naive CD4+ T cell
Central memory CD4+ T cell
Effector memory CD4+ T cell
Effector CD4+ T cell
CD4+ T helper 1 cell
CD4+ T helper 17 cell
CD4+ T follicular helper cell
CD4+ T regulatory cell
Cytotoxic CD4+ T cell
Exhausted CD4+ T cell
Dendritic cell
Plasmacytoid dendritic cell
Conventional type 1 dendritic cell
Conventional type 2 dendritic cell
Mature conventional dendritic cell
Monocyte-derived dendritic cell

CD8+ T cell
Naive CD8+ T cell
Central memory CD8+ T cell
Effector memory CD8+ T cell
Effector CD8+ T cell
CD8+ T cytotoxic 1 cell
CD8+ T cytotoxic 17 cell
CD8+ NKT-like cell
CD8+ T regulatory cell
Exhausted CD8+ T cell
Mucosal associated invariant CD8+ T cell
Monocyte
Classical monocyte
Non-classical monocyte
Intermediate monocyte



Gamma-delta T cell
Naive gamma-delta T cell
Central memory gamma-delta T cell
Effector memory gamma-delta T cell
Effector gamma-delta T cell
Exhausted gamma-delta T cell
Mucosal associated invariant gamma-delta T cell
Myeloid leukocyte
Mast cell
Macrophage
Basophil
Neutrophil


Innate lymphoid cell
Innate lymphoid cell type 1
Innate lymphoid cell type 2
Innate lymphoid cell type 3
Natural killer cell




Natural killer T cell
Your cell subtypes are not on this list? Contact us at talk2datat@bioturing.com

Benchmarks

We benchmarked BioTuring Cell Type Prediction against Seurat v4 on several datasets. Below are some highlights, full benchmarks will come shortly with our manuscript.
Landscape of infiltrating T cells in liver cancer revealed by single-cell sequencing
Chunhong Zheng, Liangtao Zheng, Jae-Kwang Yoo, Huahu Guo, Yuanyuan Zhang, Xinyi Guo, Boxi Kang, Ruozhen Hu, Julie Y. Huang, Qiming Zhang, Zhouzerui Liu, Minghui Dong, Xueda Hu, Wenjun Ouyang, Jirun Peng, Zemin Zhang
BioTuring accurately detected more subtypes including: Exhausted CD4+ T cell, Exhausted CD8+ T cell, NKT-like CD8+ cell. Seurat v4 mislabels effector memory CD8+ T cell with cytotoxic CD4+ T cell and NK cell.

BioTuring and Seurat v4 yielded nearly identical results on naive CD4+ T cell, central memory CD4+ T cell (CD4 TCM), effector memory CD4+ T cell (CD4 TEM), regulatory CD4+ T cell (CD4 Treg), naive CD8+ T cell, mucosal associated invariant CD8+ T cell (CD8 MAIT).
Landscape and dynamics of single immune cells in hepatocellular carcinoma
Zhang Q, He Y, Luo N, Patel SJ, Han Y, Gao R, Modak M, Carotta S, Haslinger C, Kind D, Peet GW, Zhong G, Lu S, Zhu W, Mao Y, Xiao M, Bergmann M, Hu X, Kerkar SP, Vogt AB, Pflanz S, Liu K, Peng J, Ren X, Zhang Z
In this dataset, Serat v4 misidentified most cell types: macrophages were incorrectly identified as CD14 monocytes, mast cells were mislabeled as erythrocytes. BioTuring correctly identified all these cell types as its model was built from much larger training data with more cell types.

Labeling your data now

Submit data
Running status
Submit your raw expression profile to our server and we will send the cell type labels to your email as soon as the process is finished.
Input data format

Zip:

a zipped folder contains 3 files:
• barcodes.(tsv|csv|gz|tar|tar.gz)
• features.(tsv|csv|gz|tar|tar.gz) or
genes.(tsv|csv|gz|tar|tar.gz)
• matrix.(mtx|gz|tar|tar.gz)

Hdf5:

• barcodes
• genes or features
• data
• indices
• indptr

Text:

a full matrix text file separated by tab or comma

*Note: you should only submit one batch at a time.
Get the current status of a submitted project.
Input a project ID to start

Command Line API

Python tool
HTTPS POST
Request token for API call. Token will be sent to your email
Download our commnand line tool here, then from terminal, run:
$ python3 get_prediction.py -h
usage: get_prediction.py [-h] [--file FILE] [--type TYPE] [--shape SHAPE] [--token TOKEN] [--output OUTPUT] [--project_id PROJECT_ID]
BioTuring's cell type prediction API.
-h, --help show this help message and exit
--file FILE [.zip] Zipped folder contains 3 files: matrix.mtx, barcodes.tsv, genes.tsv/features.tsv
[.hdf5] HDF5 object contains 5 keys: data, indices, indptr, barcodes, genes/features
[.tsv] Full expression
--type TYPE File type (zip | hdf5 | tsv)
--shape SHAPE [genesxcells | cellsxgenes]
--token TOKEN Authenticated token
--project_id PROJECT_ID If you have already submitted the data, adding this argument to get your result

For example, to submit your data, run:

$ python3 get_prediction.py --token your_token --file path/to/your/file.zip --file_type zip --shape genesxcells --output path/to/your/result/file.tsv

GSM4005491.zip: 4.23MMB [00:00, 57.3MMB/s]
[2021-07-25 23:11:48] Success to submit data. Project id: 9ef104fd-c277-443f-8cf1-eb534f56f632.
[2021-07-25 23:11:48] Waiting in the queue...
[2021-07-25 23:13:15] Extracting data...
[2021-07-25 23:13:17] Loaded: 2611 cells.
[2021-07-25 23:13:29] Preprocessing data...
[2021-07-25 23:13:31] Running dimensional reduction...
[2021-07-25 23:14:11] Clustering...
[2021-07-25 23:14:13] Training...
[2021-07-25 23:14:33] Removing ambiguous labels...
[2021-07-25 23:17:48] Completed!

Input data format:

Zip: a zipped folder contains 3 files:

• barcodes.(tsv|csv|gz|tar|tar.gz)
• features.(tsv|csv|gz|tar|tar.gz) or genes.(tsv|csv|gz|tar|tar.gz)
• matrix.(mtx|gz|tar|tar.gz)


Hdf5: a hdf5 file contains 5 keys:

• barcodes
• genes or features
• data
• indices
• indptr


Text: a full matrix text file separated by tab or comma

You can always retreive your submitted data result using:

$ python3 get_prediction.py --token your_token --project_id submitted_project_id --output path/to/your/result/file.tsv

To list out all your submitted project, run:

$ curl -X POST https://talk2data.bioturing.com/predict/get_info --form token='your_token'
{ "projects":[
    {
        "created_date":"date_time",
        "email":"your email",
        "file_name":"submitted_file_name",
        "file_shape":"genesxcells",
        "file_type":"zip",
        "project_id":"project_id",
        "status":"Completed"
    }, {
        "created_date":"date_time",
        "email":"your email",
        "file_name":"submitted_file_name",
        "file_shape":"cellxgenes",
        "file_type":"tsv",
        "project_id":"project_id",
        "status":"Running"
    }
]}

We provide 2 POST APIs for you to submit data and retrieve the result. For example, you can submit data with curl:

$ curl -X POST https://talk2data.bioturing.com/predict/submit --form token='your_token' --form type='zip' --form shape='genesxcels' --form exp_matrix='@path/to/your/file.zip'

{
    “status”:200,
    “message”:”Successfully submitted the data!”,
    “project_id”:”0f16115c-8f54-4380-932b-ae2ad26f0c13”
}

Input data format:

Zip: a zipped folder contains 3 files:

• barcodes.(tsv|csv|gz|tar|tar.gz)
• features.(tsv|csv|gz|tar|tar.gz) or genes.(tsv|csv|gz|tar|tar.gz)
• matrix.(mtx|gz|tar|tar.gz)


Hdf5: a hdf5 file contains 5 keys:

• barcodes
• genes or features
• data
• indices
• indptr


Text: a full matrix text file separated by tab or comma

You can always retrieve your submitted data result using:

$ curl -X POST https://talk2data.bioturing.com/predict/get_result —form token='your_token' —form project_id='0f16115c-8f54-4380-932b-ae2ad26f0c13'

If the process has not been finished, you will receive the current running status:

{
    “status”:400,
    “is_running”:true,
    “running_status”:”[2021-07-27 09:53:45] Waiting in the queue…\n[2021-07-27 09:53:45] Extracting data…\n”
}

If the process has been completed, you will receive result as tsv string:

{
    “status”:200,
    “data”:”Barcodes\tPredited cell type\nbarcode1\tcell type\n…”
}

To list out all your submitted project, run:

$ curl -X POST https://talk2data.bioturing.com/predict/get_info —-form token='your_token'
{ “projects”: [
    {
        “created_date”:”date_time”,
        “email”:”your email”,
        “file_name”:”submitted_file_name”,
        “file_shape”:”genesxcells”,
        “file_type”:”zip”,
        “project_id”:”project_id”,
        “status”:”Completed”
    }, {
        “created_date”:”date_time”,
        “email”:”your email”,
        “file_name”:”submitted_file_name”,
        “file_shape”:”cellxgenes”,
        “file_type”:”tsv”,
        “project_id”:”project_id”,
        “status”:”Running”
    }
]}
Success!
Fail

File uploading...