TMO data analysis dev

19/11/20

Paul Hockett, https://github.com/phockett/tmo-dev

Pretty rough so far, with the aim of developing some routine methods & visualisation/data exploration routines. Inital work based on examples in /reg/data/ana15/tmo/tmolw0618/results/tmo_ana/templates/.

Implemented:

  • Basic class for preprocessed data IO from h5 files (based on the template read_preproc.ipynb).
  • General data filter.
  • Interactive plots with Holoviews and Bokeh backend.
  • 1D & 2D histrograms with Holoviews.
  • Basic electron image display.

To do:

  • Pull run metadata from elog.
  • Improve filtering routine.
  • More analysis routines!
    • Subtraction & ref data.
    • Electron image processing.
    • Correlations/covariance routines (currently just using basic filtering).
  • Further development of Holoviews routines.
    • Implement datashader?
    • Additional plot methods/types.

This doc shows some examples so far. You'll just need the file tmoDataBase.py (available at https://github.com/phockett/tmo-dev), and to have Holoviews installed. (For getting setup with an installable env, see the SLAC notes on Anaconda and installing packages.)

Setup

In [1]:
import numpy as np

from h5py import File

from pathlib import Path

# HV imports
import holoviews as hv
from holoviews import opts
hv.extension('bokeh', 'matplotlib')
In [2]:
# Memory profiler - OPTIONAL for testing
# https://timothymonteath.com/articles/monitoring_memory_usage/
# %load_ext memory_profiler
# %memit
In [3]:
# Import class - this requires pointing to the `tmoDataBase.py` file.
# See https://github.com/phockett/tmo-dev

# Import class from file
import sys
sys.path.append(r'/cds/home/p/phockett/dev/tmo-dev')
import tmoDataBase as tb

tb.setPlotDefaults()  # Set plot defaults (optional)

Import data

In [4]:
# Set main data path
baseDir = Path('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2')
In [5]:
# Setup class object and which runs to read.
data = tb.tmoDataBase(fileBase = baseDir, runList = range(89,97+1), fileSchema='_preproc_elecv2')
In [6]:
# The class has a bunch of dictionaries holding info on the data
data.runs['files']
Out[6]:
{89: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run89_preproc_elecv2.h5'),
 90: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run90_preproc_elecv2.h5'),
 91: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run91_preproc_elecv2.h5'),
 92: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run92_preproc_elecv2.h5'),
 93: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run93_preproc_elecv2.h5'),
 94: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run94_preproc_elecv2.h5'),
 95: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run95_preproc_elecv2.h5'),
 96: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run96_preproc_elecv2.h5'),
 97: PosixPath('/reg/data/ana15/tmo/tmolw0618/scratch/preproc/v2/run97_preproc_elecv2.h5')}
In [7]:
# Read in the files with h5py
# There is very basic checking implemented currently, just to see if 'energies' is present.
data.readFiles()
*** WARNING: key 90 missing energies data, will be skipped.
*** WARNING: key 92 missing energies data, will be skipped.
*** WARNING: key 94 missing energies data, will be skipped.
*** WARNING: key 95 missing energies data, will be skipped.
*** WARNING: key 96 missing energies data, will be skipped.
Read 9 files.
Good datasets: [89, 91, 93, 97]
Invalid datasets: [90, 92, 94, 95, 96]
In [8]:
# Once read in, data is stored in a new dictionary, keyed by run #
print(data.data.keys())
print(data.data[89].keys())

for key in data.data[89].keys():
    print(f'\n{key}: {data.data[89][key]}')
dict_keys([89, 90, 91, 92, 93, 94, 95, 96, 97])
dict_keys(['raw', 'items', 'dims'])

raw: <HDF5 file "run89_preproc_elecv2.h5" (mode r)>

items: <KeysViewHDF5 ['bc2s', 'energies', 'epics', 'epics_strs', 'gas', 'intensities', 'ktofIpk', 'ktofslice', 'ktofslice_ts', 'ktoftpk', 'l3s', 'photonEs', 'pv', 'timestamp', 'ts', 'xc', 'xenergies', 'yc']>

dims: {'bc2s': (72318,), 'energies': (72318,), 'epics': (72318, 158), 'epics_strs': (158,), 'gas': (72318,), 'intensities': (72318, 5), 'ktofIpk': (72318, 1000), 'ktofslice': (72318, 1784), 'ktofslice_ts': (1784,), 'ktoftpk': (72318, 1000), 'l3s': (72318,), 'photonEs': (72318,), 'pv': (72318, 1000), 'timestamp': (72318,), 'ts': (5,), 'xc': (72318, 1000), 'xenergies': (72318,), 'yc': (72318, 1000)}

Basic Histograms

In [9]:
# The method .hist(dim) sets (1D) histograms for the given dimension
dim = 'intensities'
data.hist(dim)
Set self.data[key]['curve'] for dim=intensities.
In [10]:
# To display for a single run, call the relevant dictionary item for a specific run
run = 89
pType = 'curve'
data.data[run][pType][dim]
Out[10]:

Note the plot above is interactive via the tools at the edge, and will also display values on rollover. Clicking the legend will turn data series on/off. The interactivity will remain upon HTML export.

In [11]:
# For more output, try the histOverlay method
data.histOverlay(dim)
Set self.ndoverlay, self.hmap and self.layout for dim=intensities.
In [12]:
# The ndoverlay plot shows all the data
data.ndoverlay
Out[12]:
In [13]:
# This is a bit busy, so use the layout instead
# Note that these plots are linked, so zooming + panning is consistent over all plots.
data.layout
Out[13]:

Correlation plots

These currently build and display 2D histograms using hv.HexTiles. The code needs some work, but for now use the holomap object generated to throw out some plots.

In [14]:
# Pass two dimensions here, note the 1st must be 1D
data.hist2d(dim=['energies','intensities'])
Set self.data[key]['hist2d'] for dim=['energies', 'intensities'].
In [15]:
# This is currently set as a dict of HexTiles plots per run, and these are stacked as a HoloMap object.
data.data[89]['hist2d']
Out[15]:
{(89, 0): :HexTiles   [energies,intensities],
 (89, 1): :HexTiles   [energies,intensities],
 (89, 2): :HexTiles   [energies,intensities],
 (89, 3): :HexTiles   [energies,intensities],
 (89, 4): :HexTiles   [energies,intensities],
 'hmap': :HoloMap   [Run,Channel]
    :HexTiles   [energies,intensities]}
In [16]:
# Display HoloMap - this currently messes up with autoranges, not sure why!
data.data[89]['hist2d']['hmap']
Out[16]:
In [17]:
# Display as a layout (needs some work!)
data.data[89]['hist2d']['hmap'].layout().cols(1)
Out[17]:

Filtering

This is currently set for single-valued or inclusive ranges (passed as lists), and will set self.data[key]['mask'] for each dataset. This will be automatically applied to subsequent plots.

In [18]:
data.filterData(filterOptions={'gas':[True], 'energies':[0.02, 0.06]})
In [19]:
# Recalc correlation plots with filtered data
data.hist2d(dim=['energies','intensities'])
Set self.data[key]['hist2d'] for dim=['energies', 'intensities'].
In [20]:
# Display as a layout (needs some work!)
data.data[89]['hist2d']['hmap'].layout().cols(1)
Out[20]:
In [21]:
# Display as a layout (needs some work!)
# With smaller tiled plots
data.data[89]['hist2d']['hmap'].opts(width=300, height=300).layout().cols(3)
Out[21]: