scrna3/6 Jupyter Notebook lamindata

Query individual files#

Here, weโ€™ll query individual files and inspect their metadata.

This guide can be skipped if you are only interested in how to leverage the overall dataset.

import lamindb as ln
import lnschema_bionty as lb
import anndata as ad
๐Ÿ’ก loaded instance: testuser1/test-scrna (lamindb 0.55.2)
ln.track()
๐Ÿ’ก notebook imports: anndata==0.9.2 lamindb==0.55.2 lnschema_bionty==0.31.2
๐Ÿ’ก Transform(id='agayZTonayqAz8', name='Query individual files', short_name='scrna3', version='0', type=notebook, updated_at=2023-10-10 15:43:41, created_by_id='DzTjkKse')
๐Ÿ’ก Run(id='5HcAjKbU7y8j3rJUe3rO', run_at=2023-10-10 15:43:41, transform_id='agayZTonayqAz8', created_by_id='DzTjkKse')

Access #

Query files by provenance metadata#

users = ln.User.lookup()
ln.Transform.filter(created_by=users.testuser1).search("scrna")
id __ratio__
name
scRNA-seq Nv48yAceNSh8z8 90.0
Append a new batch of data ManDYgmftZ8Cz8 36.0
Query individual files agayZTonayqAz8 36.0
transform = ln.Transform.filter(id="Nv48yAceNSh8z8").one()
ln.File.filter(transform=transform).df()
storage_id key suffix accessor description version size hash hash_type transform_id run_id initial_version_id updated_at created_by_id
id
oYS8NR45oBcEPgCISfCm YpFBBtr4 None .h5ad AnnData Conde22 None 57615999 6Hu1BywwK6bfIU2Dpku2xZ sha1-fl Nv48yAceNSh8z8 RNgU2xx7TeUrL3d83b4F None 2023-10-10 15:42:41 DzTjkKse

Query files based on biological metadata#

assays = lb.ExperimentalFactor.lookup()
species = lb.Species.lookup()
cell_types = lb.CellType.lookup()
query = ln.File.filter(
    experimental_factors=assays.single_cell_rna_sequencing,
    species=species.human,
    cell_types=cell_types.gamma_delta_t_cell,
)
query.df()
storage_id key suffix accessor description version size hash hash_type transform_id run_id initial_version_id updated_at created_by_id
id
oYS8NR45oBcEPgCISfCm YpFBBtr4 None .h5ad AnnData Conde22 None 57615999 6Hu1BywwK6bfIU2Dpku2xZ sha1-fl Nv48yAceNSh8z8 RNgU2xx7TeUrL3d83b4F None 2023-10-10 15:42:41 DzTjkKse

Transform #

Compare gene sets#

Get file objects:

query = ln.File.filter()
file1, file2 = query.list()
file1.describe()
File(id='oYS8NR45oBcEPgCISfCm', suffix='.h5ad', accessor='AnnData', description='Conde22', size=57615999, hash='6Hu1BywwK6bfIU2Dpku2xZ', hash_type='sha1-fl', updated_at=2023-10-10 15:42:41)

Provenance:
  ๐Ÿ—ƒ๏ธ storage: Storage(id='YpFBBtr4', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna', type='local', updated_at=2023-10-10 15:41:26, created_by_id='DzTjkKse')
  ๐Ÿ“” transform: Transform(id='Nv48yAceNSh8z8', name='scRNA-seq', short_name='scrna', version='0', type='notebook', updated_at=2023-10-10 15:41:34, created_by_id='DzTjkKse')
  ๐Ÿ‘ฃ run: Run(id='RNgU2xx7TeUrL3d83b4F', run_at=2023-10-10 15:41:34, transform_id='Nv48yAceNSh8z8', created_by_id='DzTjkKse')
  ๐Ÿ‘ค created_by: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-10-10 15:41:26)
  โฌ‡๏ธ input_of (core.Run): ['2023-10-10 15:42:50']
Features:
  var: FeatureSet(id='XGD5bALjzlxe3dYm2tcM', n=36503, type='number', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-10-10 15:42:29, modality_id='nG6MZ3aj', created_by_id='DzTjkKse')
    'TAF11L2', 'PGAP6', 'None', 'PTBP2', 'C5orf34-AS1', 'B4GALNT4', 'LINC02958', 'DMD', 'LINC00706', 'EEF1AKMT1', 'None', 'None', 'METRNL', 'MPND', 'NOBOX', 'LINC02706', 'TRIM50', 'IGKV6D-21', 'ZFHX4', 'AHCYL1', ...
  obs: FeatureSet(id='KrYPEOnuTBTRi4WqoelO', n=4, registry='core.Feature', hash='Z0BvFRBSIr9xpTLjV1nb', updated_at=2023-10-10 15:42:35, modality_id='NIjDnou1', created_by_id='DzTjkKse')
    ๐Ÿ”— donor (12, core.ULabel): '582C', 'A36', 'D503', 'A37', 'A29', '640C', 'D496', 'A52', 'A35', 'A31', ...
    ๐Ÿ”— tissue (17, bionty.Tissue): 'jejunal epithelium', 'caecum', 'ileum', 'lamina propria', 'thymus', 'duodenum', 'thoracic lymph node', 'skeletal muscle tissue', 'omentum', 'mesenteric lymph node', ...
    ๐Ÿ”— assay (4, bionty.ExperimentalFactor): 'single-cell RNA sequencing', '10x 3' v3', '10x 5' v2', '10x 5' v1'
    ๐Ÿ”— cell_type (32, bionty.CellType): 'naive thymus-derived CD8-positive, alpha-beta T cell', 'dendritic cell, human', 'non-classical monocyte', 'effector memory CD4-positive, alpha-beta T cell', 'megakaryocyte', 'naive thymus-derived CD4-positive, alpha-beta T cell', 'germinal center B cell', 'mast cell', 'alveolar macrophage', 'T follicular helper cell', ...
Labels:
  ๐Ÿท๏ธ species (1, bionty.Species): 'human'
  ๐Ÿท๏ธ tissues (17, bionty.Tissue): 'jejunal epithelium', 'caecum', 'ileum', 'lamina propria', 'thymus', 'duodenum', 'thoracic lymph node', 'skeletal muscle tissue', 'omentum', 'mesenteric lymph node', ...
  ๐Ÿท๏ธ cell_types (32, bionty.CellType): 'naive thymus-derived CD8-positive, alpha-beta T cell', 'dendritic cell, human', 'non-classical monocyte', 'effector memory CD4-positive, alpha-beta T cell', 'megakaryocyte', 'naive thymus-derived CD4-positive, alpha-beta T cell', 'germinal center B cell', 'mast cell', 'alveolar macrophage', 'T follicular helper cell', ...
  ๐Ÿท๏ธ experimental_factors (4, bionty.ExperimentalFactor): 'single-cell RNA sequencing', '10x 3' v3', '10x 5' v2', '10x 5' v1'
  ๐Ÿท๏ธ ulabels (12, core.ULabel): '582C', 'A36', 'D503', 'A37', 'A29', '640C', 'D496', 'A52', 'A35', 'A31', ...
file1.view_flow()
https://d33wubrfki0l68.cloudfront.net/6244de7d1739c5fc0cd0d9a544e6d3ced81626dd/a5ddc/_images/a013845db0835627bb151c09dc10769cf00db78259aad1bc8b9e0395dcf42ead.svg
file2.describe()
File(id='wRXv3wXHrtOF3OihYYya', suffix='.h5ad', accessor='AnnData', description='10x reference adata', size=857752, hash='j6o6e27xPdqHQyT7Em_7MQ', hash_type='md5', updated_at=2023-10-10 15:43:24)

Provenance:
  ๐Ÿ—ƒ๏ธ storage: Storage(id='YpFBBtr4', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna', type='local', updated_at=2023-10-10 15:41:26, created_by_id='DzTjkKse')
  ๐Ÿ“” transform: Transform(id='ManDYgmftZ8Cz8', name='Append a new batch of data', short_name='scrna2', version='0', type='notebook', updated_at=2023-10-10 15:42:50, created_by_id='DzTjkKse')
  ๐Ÿ‘ฃ run: Run(id='JRscXJ0ZxufwqGUoGmIJ', run_at=2023-10-10 15:42:50, transform_id='ManDYgmftZ8Cz8', created_by_id='DzTjkKse')
  ๐Ÿ‘ค created_by: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-10-10 15:41:26)
Features:
  var: FeatureSet(id='PlJKWLv2xNZtMob20tDy', n=754, type='number', registry='bionty.Gene', hash='WMDxN7253SdzGwmznV5d', updated_at=2023-10-10 15:43:23, modality_id='nG6MZ3aj', created_by_id='DzTjkKse')
    'NUCB2', 'CD37', 'HPCAL1', 'ZNF22', 'ADISSP', 'COA1', 'DAAM1', 'ADSL', 'BID', 'GZMA', 'SUMO2', 'HLA-DRA', 'LAT', 'LCK', 'NT5C3B', 'JCHAIN', 'PRF1', 'ABHD17A', 'SPCS2', 'CCT6A', ...
  obs: FeatureSet(id='nAOCw869x66BnmaHJlOE', n=1, registry='core.Feature', hash='J_-ageYakMRSB70Itj9F', updated_at=2023-10-10 15:43:24, modality_id='NIjDnou1', created_by_id='DzTjkKse')
    ๐Ÿ”— cell_type (9, bionty.CellType): 'B cell, CD19-positive', 'dendritic cell', 'CD16-positive, CD56-dim natural killer cell, human', 'CD8-positive, alpha-beta memory T cell', 'central memory CD8-positive, alpha-beta T cell', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'monocyte', 'mature T cell', 'Cd4-negative, CD8_alpha-negative, CD11b-positive dendritic cell'
  external: FeatureSet(id='4Wdwh7kBHsamYc1CH642', n=2, registry='core.Feature', hash='gmtVslHb3x-nqoqNZS2_', updated_at=2023-10-10 15:43:24, modality_id='NIjDnou1', created_by_id='DzTjkKse')
    ๐Ÿ”— species (1, bionty.Species): 'human'
    ๐Ÿ”— assay (1, bionty.ExperimentalFactor): 'single-cell RNA sequencing'
Labels:
  ๐Ÿท๏ธ species (1, bionty.Species): 'human'
  ๐Ÿท๏ธ cell_types (9, bionty.CellType): 'B cell, CD19-positive', 'dendritic cell', 'CD16-positive, CD56-dim natural killer cell, human', 'CD8-positive, alpha-beta memory T cell', 'central memory CD8-positive, alpha-beta T cell', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'monocyte', 'mature T cell', 'Cd4-negative, CD8_alpha-negative, CD11b-positive dendritic cell'
  ๐Ÿท๏ธ experimental_factors (1, bionty.ExperimentalFactor): 'single-cell RNA sequencing'
file2.view_flow()
https://d33wubrfki0l68.cloudfront.net/93fa3b1f3b5e02d8199f27622e548701d1cf3bbf/3da6d/_images/54e7aebf582249ccd8af20bffc39101e41d94cb1763fb5839507d5d11405107a.svg

Load files into memory:

file1_adata = file1.load()
file2_adata = file2.load()

Here we compute shared genes without loading files:

file1_genes = file1.features["var"]
file2_genes = file2.features["var"]

shared_genes = file1_genes & file2_genes
len(shared_genes)
749
shared_genes.list("symbol")[:10]
['PGAM1',
 'SMARCB1',
 'TIMM10',
 'DAP3',
 'APMAP',
 'SMIM24',
 'NDUFB11',
 'GNAI2',
 'FCER1G',
 'RAB7A']

Compare cell types#

file1_celltypes = file1.cell_types.all()
file2_celltypes = file2.cell_types.all()

shared_celltypes = file1_celltypes & file2_celltypes
shared_celltypes_names = shared_celltypes.list("name")
shared_celltypes_names
['CD16-positive, CD56-dim natural killer cell, human',
 'CD8-positive, alpha-beta memory T cell']

We can now subset the two datasets by shared cell types:

file1_adata_subset = file1_adata[
    file1_adata.obs["cell_type"].isin(shared_celltypes_names)
]

file2_adata_subset = file2_adata[
    file2_adata.obs["cell_type"].isin(shared_celltypes_names)
]

Concatenate subsetted datasets:

adata_concat = ad.concat(
    [file1_adata_subset, file2_adata_subset],
    label="file",
    keys=[file1.description, file2.description],
)
adata_concat
AnnData object with n_obs ร— n_vars = 244 ร— 749
    obs: 'cell_type', 'file'
    obsm: 'X_umap'
adata_concat.obs.value_counts()
cell_type                                           file               
CD8-positive, alpha-beta memory T cell              Conde22                120
CD16-positive, CD56-dim natural killer cell, human  Conde22                114
CD8-positive, alpha-beta memory T cell              10x reference adata      7
CD16-positive, CD56-dim natural killer cell, human  10x reference adata      3
dtype: int64