Note

Get this example as a Jupyter Notebook ipynb

Using dask¶

dask is a Python package build upon the scientific stack to enable scalling of Python through interactive sessions to multi-core and multi-node.

Of particular relevance to SEGY-SAK is that xrray.Dataset loads naturally into dask.

Imports and Setup¶

Here we import the plotting tools, numpy and setup the dask.Client which will auto start a localcluster. Printing the client returns details about the dashboard link and resources.

[2]:

import numpy as np
from segysak import open_seisnc, segy

import matplotlib.pyplot as plt
%matplotlib inline

[3]:

from dask.distributed import Client

client = Client()
client

[3]:

Client

Scheduler: tcp://127.0.0.1:37233
Dashboard: http://127.0.0.1:8787/status

Cluster

Workers: 2
Cores: 2
Memory: 7.29 GB

We can also scale the cluster to be a bit smaller.

[4]:

client.cluster.scale(2, memory='0.5gb')
client

[4]:

Client

Scheduler: tcp://127.0.0.1:37233
Dashboard: http://127.0.0.1:8787/status

Cluster

Workers: 2
Cores: 2
Memory: 7.29 GB

Lazy loading from SEISNC using chunking¶

If your data is in SEG-Y to use dask it must be converted to SEISNC. If you do this with the CLI it only need happen once.

[5]:

segy_file = 'data/volve10r12-full-twt-sub3d.sgy'
seisnc_file = 'data/volve10r12-full-twt-sub3d.seisnc'
segy.segy_converter(segy_file, seisnc_file, iline=189, xline=193, cdpx=181, cdpy=185, silent=True)

Fast direction is CROSSLINE_3D

By specifying the chunks argument to the open_seisnc command we can ask dask to fetch the data in chunks of size n. In this example the iline dimension will be chunked in groups of 100. The valid arguments to chunks depends on the dataset but any dimension can be used.

Even though the seis of the dataset is 2.14GB it hasn’t yet been loaded into memory, not will dask load it entirely unless the operation demands it.

[6]:

seisnc = open_seisnc('data/volve10r12-full-twt-sub3d.seisnc', chunks={'iline':100})
seisnc.seis.humanbytes

[6]:

'40.06 MB'

Lets see what our dataset looks like. See that the variables are dask.array. This means they are references to the on disk data. The dimensions must be loaded so dask knows how to manage your dataset.

[7]:

seisnc

[7]:

<xarray.Dataset>
Dimensions:  (iline: 61, twt: 850, xline: 202)
Coordinates:
    cdp_x    (iline, xline) float32 dask.array<chunksize=(61, 202), meta=np.ndarray>
    cdp_y    (iline, xline) float32 dask.array<chunksize=(61, 202), meta=np.ndarray>
  * iline    (iline) int64 10090 10091 10092 10093 ... 10147 10148 10149 10150
  * twt      (twt) float64 4.0 8.0 12.0 16.0 ... 3.392e+03 3.396e+03 3.4e+03
  * xline    (xline) int64 2150 2151 2152 2153 2154 ... 2347 2348 2349 2350 2351
Data variables:
    data     (iline, xline, twt) float32 dask.array<chunksize=(61, 202, 850), meta=np.ndarray>
Attributes:
    coord_scalar:        -100.0
    measurement_system:  m
    percentiles:         [-6.97198262e+00 -6.52054033e+00 -1.49142619e+00 -5....
    sample_rate:         4.0
    source_file:         volve10r12-full-twt-sub3d.sgy
    text:                C 1 SEGY OUTPUT FROM Petrel 2017.2 Saturday, June 06...
    ns:                  None
    d3_domain:           None
    epsg:                None
    corner_points:       None
    corner_points_xy:    None
    srd:                 None
    datatype:            None

xarray.Dataset

Dimensions:
- iline: 61
- twt: 850
- xline: 202

Coordinates: (5)

cdp_x

(iline, xline)

float32

dask.array<chunksize=(61, 202), meta=np.ndarray>

	Array	Chunk
Bytes	49.29 kB	49.29 kB
Shape	(61, 202)	(61, 202)
Count	2 Tasks	1 Chunks
Type	float32	numpy.ndarray

cdp_y

(iline, xline)

float32

dask.array<chunksize=(61, 202), meta=np.ndarray>

	Array	Chunk
Bytes	49.29 kB	49.29 kB
Shape	(61, 202)	(61, 202)
Count	2 Tasks	1 Chunks
Type	float32	numpy.ndarray

iline

(iline)

int64

10090 10091 10092 ... 10149 10150

array([10090, 10091, 10092, 10093, 10094, 10095, 10096, 10097, 10098, 10099,
       10100, 10101, 10102, 10103, 10104, 10105, 10106, 10107, 10108, 10109,
       10110, 10111, 10112, 10113, 10114, 10115, 10116, 10117, 10118, 10119,
       10120, 10121, 10122, 10123, 10124, 10125, 10126, 10127, 10128, 10129,
       10130, 10131, 10132, 10133, 10134, 10135, 10136, 10137, 10138, 10139,
       10140, 10141, 10142, 10143, 10144, 10145, 10146, 10147, 10148, 10149,
       10150])

twt

(twt)

float64

4.0 8.0 12.0 ... 3.396e+03 3.4e+03

array([   4.,    8.,   12., ..., 3392., 3396., 3400.])

xline
(xline)
int64
2150 2151 2152 ... 2349 2350 2351
```
array([2150, 2151, 2152, ..., 2349, 2350, 2351])
```

Data variables: (1)

data

(iline, xline, twt)

float32

dask.array<chunksize=(61, 202, 850), meta=np.ndarray>

	Array	Chunk
Bytes	41.89 MB	41.89 MB
Shape	(61, 202, 850)	(61, 202, 850)
Count	2 Tasks	1 Chunks
Type	float32	numpy.ndarray

Attributes: (13)
coord_scalar :
-100.0
measurement_system :
m
percentiles :
[-6.97198262e+00 -6.52054033e+00 -1.49142619e+00 -5.19284718e-05 1.44614165e+00 6.34640887e+00 6.88808892e+00]
sample_rate :
4.0
source_file :
volve10r12-full-twt-sub3d.sgy
text :
C 1 SEGY OUTPUT FROM Petrel 2017.2 Saturday, June 06 2020 10:15:00 C 2 Name: ST10010ZDC12-PZ-PSDM-KIRCH-FULL-T.MIG_FIN.POST_STACK.3D.JS-017534 ÝCroC 3 C 4 First inline: 10090 Last inline: 10150 C 5 First xline: 2150 Last xline: 2351 C 6 CRS: ED50-UTM31 ("MENTOR:ED50-UTM31:European 1950 Based UTM, Zone 31 North, C 7 X min: 433955.09 max: 436589.56 delta: 2634.47 C 8 Y min: 6477439.46 max: 6478790.23 delta: 1350.77 C 9 Time min: -3402.00 max: -2.00 delta: 3400.00 C10 Lat min: 58.25'52.8804"N max: 58.26'37.9493"N delta: 0.00'45.0689" C11 Long min: 1.52'7.1906"E max: 1.54'50.9616"E delta: 0.02'43.7710" C12 Trace min: -3400.00 max: -4.00 delta: 3396.00 C13 Seismic (template) min: -58.55 max: 54.55 delta: 113.10 C14 Amplitude (data) min: -58.55 max: 54.55 delta: 113.10 C15 Trace sample format: IEEE floating point C16 Coordinate scale factor: 100.00000 C17 C18 Binary header locations: C19 Sample interval : bytes 17-18 C20 Number of samples per trace : bytes 21-22 C21 Trace date format : bytes 25-26 C22 C23 Trace header locations: C24 Inline number : bytes 5-8 C25 Xline number : bytes 21-24 C26 Coordinate scale factor : bytes 71-72 C27 X coordinate : bytes 73-76 C28 Y coordinate : bytes 77-80 C29 Trace start time/depth : bytes 109-110 C30 Number of samples per trace : bytes 115-116 C31 Sample interval : bytes 117-118 C32 C33 C34 C35 C36 C37 C38 C39 C40 END EBCDIC
ns :
None
d3_domain :
None
epsg :
None
corner_points :
None
corner_points_xy :
None
srd :
None
datatype :
None

Operations on SEISNC using `dask`¶

In this simple example we calculate the mean, of the entire cube. If you check the dashboard (when running this example yourself). You can see the task graph and task stream execution.

[8]:

mean = seisnc.data.mean()
mean

[8]:

<xarray.DataArray 'data' ()>
dask.array<mean_agg-aggregate, shape=(), dtype=float32, chunksize=(), chunktype=numpy.ndarray>

xarray.DataArray

'data'

dask.array<chunksize=(), meta=np.ndarray>

Array Chunk

Bytes 4 B 4 B

Shape () ()

Count 4 Tasks 1 Chunks

Type float32 numpy.ndarray
Coordinates: (0)
Attributes: (0)

Whoa-oh, the mean is what? Yeah, dask won’t calculate anything until you ask it to. This means you can string computations together into a task graph for lazy evaluation. To get the mean try this

[9]:

mean.compute().values

[9]:

array(-7.317369e-05, dtype=float32)

Plotting with `dask`¶

The lazy loading of data means we can plot what we want using xarray style slicing and dask will fetch only the data we need.

[10]:

fig, axs = plt.subplots(nrows=2, ncols=3, figsize=(20, 10))

iline = seisnc.sel(iline = 10100).transpose('twt', 'xline').data
xline = seisnc.sel(xline = 2349).transpose('twt', 'iline').data
zslice = seisnc.sel(twt = 2900, method='nearest').transpose('iline', 'xline').data

q = iline.quantile([0, 0.001, 0.5, 0.999, 1]).values
rq = np.max(np.abs([q[1], q[-2]]))

iline.plot(robust=True, ax=axs[0, 0], yincrease=False)
xline.plot(robust=True, ax=axs[0, 1], yincrease=False)
zslice.plot(robust=True, ax=axs[0, 2])

imshow_kwargs = dict(
    cmap='seismic', aspect='auto', vmin=-rq, vmax=rq, interpolation='bicubic'
)

axs[1, 0].imshow(iline.values, **imshow_kwargs)
axs[1, 0].set_title('iline')
axs[1, 1].imshow(xline.values, **imshow_kwargs)
axs[1, 1].set_title('xline')
axs[1, 2].imshow(zslice.values, origin='lower', **imshow_kwargs)
axs[1, 2].set_title('twt')

[10]:

Text(0.5, 1.0, 'twt')

../_images/_examples_example_segysak_dask_18_1.png

Using dask¶

Imports and Setup¶

Client

Cluster

Client

Cluster

Lazy loading from SEISNC using chunking¶

Operations on SEISNC using dask¶

Plotting with dask¶

Operations on SEISNC using `dask`¶

Plotting with `dask`¶