TMO data analysis dev¶

19/11/20

Paul Hockett, https://github.com/phockett/tmo-dev

Pretty rough so far, with the aim of developing some routine methods & visualisation/data exploration routines. Inital work based on examples in /reg/data/ana15/tmo/tmolw0618/results/tmo_ana/templates/.

Implemented:

Basic class for preprocessed data IO from h5 files (based on the template read_preproc.ipynb).
General data filter.
Interactive plots with Holoviews and Bokeh backend.
1D & 2D histrograms with Holoviews.
Basic electron image display.

To do:

Pull run metadata from elog.
Improve filtering routine.
More analysis routines!
- Subtraction & ref data.
- Electron image processing.
- Correlations/covariance routines (currently just using basic filtering).
Further development of Holoviews routines.
- Implement datashader?
- Additional plot methods/types.

This doc shows some examples so far. You'll just need the file tmoDataBase.py (available at https://github.com/phockett/tmo-dev), and to have Holoviews installed. (For getting setup with an installable env, see the SLAC notes on Anaconda and installing packages.)

Setup¶

import numpy as np

from h5py import File

from pathlib import Path

# HV imports
import holoviews as hv
from holoviews import opts
hv.extension('bokeh', 'matplotlib')

# Memory profiler - OPTIONAL for testing
# https://timothymonteath.com/articles/monitoring_memory_usage/
# %load_ext memory_profiler
# %memit

# Import class - this requires pointing to the `tmoDataBase.py` file.
# See https://github.com/phockett/tmo-dev

# Import class from file
import sys
sys.path.append(r'/cds/home/p/phockett/dev/tmo-dev')
import tmoDataBase as tb

tb.setPlotDefaults()  # Set plot defaults (optional)

Import data¶

# Set main data path
baseDir = Path('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2')

# Setup class object and which runs to read.
data = tb.tmoDataBase(fileBase = baseDir, runList = range(89,97+1), fileSchema='_preproc_elecv2')

# The class has a bunch of dictionaries holding info on the data
data.runs['files']

{89: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run89_preproc_elecv2.h5'),
 90: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run90_preproc_elecv2.h5'),
 91: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run91_preproc_elecv2.h5'),
 92: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run92_preproc_elecv2.h5'),
 93: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run93_preproc_elecv2.h5'),
 94: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run94_preproc_elecv2.h5'),
 95: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run95_preproc_elecv2.h5'),
 96: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run96_preproc_elecv2.h5'),
 97: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run97_preproc_elecv2.h5')}

# Read in the files with h5py
# There is very basic checking implemented currently, just to see if 'energies' is present.
data.readFiles()

*** WARNING: key 90 missing energies data, will be skipped.
*** WARNING: key 92 missing energies data, will be skipped.
*** WARNING: key 94 missing energies data, will be skipped.
*** WARNING: key 95 missing energies data, will be skipped.
*** WARNING: key 96 missing energies data, will be skipped.
Read 9 files.
Good datasets: [89, 91, 93, 97]
Invalid datasets: [90, 92, 94, 95, 96]

# Once read in, data is stored in a new dictionary, keyed by run #
print(data.data.keys())
print(data.data[89].keys())

for key in data.data[89].keys():
    print(f'\n{key}: {data.data[89][key]}')

dict_keys([89, 90, 91, 92, 93, 94, 95, 96, 97])
dict_keys(['raw', 'items', 'dims'])

raw: <HDF5 file "run89_preproc_elecv2.h5" (mode r)>

items: <KeysViewHDF5 ['bc2s', 'energies', 'epics', 'epics_strs', 'gas', 'intensities', 'ktofIpk', 'ktofslice', 'ktofslice_ts', 'ktoftpk', 'l3s', 'photonEs', 'pv', 'timestamp', 'ts', 'xc', 'xenergies', 'yc']>

dims: {'bc2s': (72318,), 'energies': (72318,), 'epics': (72318, 158), 'epics_strs': (158,), 'gas': (72318,), 'intensities': (72318, 5), 'ktofIpk': (72318, 1000), 'ktofslice': (72318, 1784), 'ktofslice_ts': (1784,), 'ktoftpk': (72318, 1000), 'l3s': (72318,), 'photonEs': (72318,), 'pv': (72318, 1000), 'timestamp': (72318,), 'ts': (5,), 'xc': (72318, 1000), 'xenergies': (72318,), 'yc': (72318, 1000)}

Basic Histograms¶

# The method .hist(dim) sets (1D) histograms for the given dimension
dim = 'intensities'
data.hist(dim)

Set self.data[key]['curve'] for dim=intensities.

# To display for a single run, call the relevant dictionary item for a specific run
run = 89
pType = 'curve'
data.data[run][pType][dim]

Note the plot above is interactive via the tools at the edge, and will also display values on rollover. Clicking the legend will turn data series on/off. The interactivity will remain upon HTML export.

# For more output, try the histOverlay method
data.histOverlay(dim)

Set self.ndoverlay, self.hmap and self.layout for dim=intensities.

# The ndoverlay plot shows all the data
data.ndoverlay

# This is a bit busy, so use the layout instead
# Note that these plots are linked, so zooming + panning is consistent over all plots.
data.layout

Correlation plots¶

These currently build and display 2D histograms using hv.HexTiles. The code needs some work, but for now use the holomap object generated to throw out some plots.

# Pass two dimensions here, note the 1st must be 1D
data.hist2d(dim=['energies','intensities'])

Set self.data[key]['hist2d'] for dim=['energies', 'intensities'].

# This is currently set as a dict of HexTiles plots per run, and these are stacked as a HoloMap object.
data.data[89]['hist2d']

{(89, 0): :HexTiles   [energies,intensities],
 (89, 1): :HexTiles   [energies,intensities],
 (89, 2): :HexTiles   [energies,intensities],
 (89, 3): :HexTiles   [energies,intensities],
 (89, 4): :HexTiles   [energies,intensities],
 'hmap': :HoloMap   [Run,Channel]
    :HexTiles   [energies,intensities]}

# Display HoloMap - this currently messes up with autoranges, not sure why!
data.data[89]['hist2d']['hmap']

# Display as a layout (needs some work!)
data.data[89]['hist2d']['hmap'].layout().cols(1)

Filtering¶

This is currently set for single-valued or inclusive ranges (passed as lists), and will set self.data[key]['mask'] for each dataset. This will be automatically applied to subsequent plots.

data.filterData(filterOptions={'gas':[True], 'energies':[0.02, 0.06]})

# Recalc correlation plots with filtered data
data.hist2d(dim=['energies','intensities'])

Set self.data[key]['hist2d'] for dim=['energies', 'intensities'].

# Display as a layout (needs some work!)
data.data[89]['hist2d']['hmap'].layout().cols(1)

# Display as a layout (needs some work!)
# With smaller tiled plots
data.data[89]['hist2d']['hmap'].opts(width=300, height=300).layout().cols(3)