Note

Get this example as a Jupyter Notebook ipynb

[ ]:

SEG-Y to Vector DataFrames and Back

The connection of segysak to xarray greatly simplifies the process of vectorising segy 3D data and returning it to SEGY. To do this, one can use the close relationship between pandas and xarray.

Loading Data

We start by loading data normally using the segy_loader utility. For this example we will use the Volve example sub-cube.

[1]:
import pathlib
from IPython.display import display
from segysak.segy import segy_loader, well_known_byte_locs, segy_writer

volve_3d_path = pathlib.Path("data/volve10r12-full-twt-sub3d.sgy")
print("3D", volve_3d_path.exists())

volve_3d = segy_loader(volve_3d_path, **well_known_byte_locs("petrel_3d"))
3D True
100%|██████████| 12.3k/12.3k [00:01<00:00, 11.2k traces/s]
Loading as 3D
Fast direction is TRACE_SEQUENCE_FILE
Converting SEGY: 100%|██████████| 12.3k/12.3k [00:03<00:00, 3.78k traces/s]

Vectorisation

Once the data is loaded it can be converted to a pandas.DataFrame directly from the loaded Dataset. The Dataframe is multi-index and contains columns for each variable in the originally loaded dataset. This includes the seismic amplitude as data and the cdp_x and cdp_y locations. If you require smaller volumes from the input data, you can use xarray selection methods prior to conversion to a DataFrame.

[2]:
volve_3d_df = volve_3d.to_dataframe()
display(volve_3d_df)
data cdp_x cdp_y
iline xline twt
10090 2150 4.0 0.020575 436400.500 6477447.0
8.0 0.022041 436400.500 6477447.0
12.0 0.019659 436400.500 6477447.0
16.0 0.025421 436400.500 6477447.0
20.0 0.025436 436400.500 6477447.0
... ... ... ... ... ...
10150 2351 3384.0 0.000000 434144.125 6478782.5
3388.0 0.000000 434144.125 6478782.5
3392.0 0.000000 434144.125 6478782.5
3396.0 0.000000 434144.125 6478782.5
3400.0 0.000000 434144.125 6478782.5

10473700 rows × 3 columns

We can remove the multi-index by resetting the index of the DataFrame. Vectorized workflows such as machine learning can then be easily applied to the DataFrame.

[3]:
volve_3d_df_reindex = volve_3d_df.reset_index()
display(volve_3d_df_reindex)
iline xline twt data cdp_x cdp_y
0 10090 2150 4.0 0.020575 436400.500 6477447.0
1 10090 2150 8.0 0.022041 436400.500 6477447.0
2 10090 2150 12.0 0.019659 436400.500 6477447.0
3 10090 2150 16.0 0.025421 436400.500 6477447.0
4 10090 2150 20.0 0.025436 436400.500 6477447.0
... ... ... ... ... ... ...
10473695 10150 2351 3384.0 0.000000 434144.125 6478782.5
10473696 10150 2351 3388.0 0.000000 434144.125 6478782.5
10473697 10150 2351 3392.0 0.000000 434144.125 6478782.5
10473698 10150 2351 3396.0 0.000000 434144.125 6478782.5
10473699 10150 2351 3400.0 0.000000 434144.125 6478782.5

10473700 rows × 6 columns

Return to Xarray

It is possible to return the DataFrame to the Dataset for output to SEGY. To do this the multi-index must be reset. Afterward, pandas provides the to_xarray method.

[4]:
volve_3d_df_multi = volve_3d_df_reindex.set_index(["iline", "xline", "twt"])
display(volve_3d_df_multi)
volve_3d_ds = volve_3d_df_multi.to_xarray()
display(volve_3d_ds)
data cdp_x cdp_y
iline xline twt
10090 2150 4.0 0.020575 436400.500 6477447.0
8.0 0.022041 436400.500 6477447.0
12.0 0.019659 436400.500 6477447.0
16.0 0.025421 436400.500 6477447.0
20.0 0.025436 436400.500 6477447.0
... ... ... ... ... ...
10150 2351 3384.0 0.000000 434144.125 6478782.5
3388.0 0.000000 434144.125 6478782.5
3392.0 0.000000 434144.125 6478782.5
3396.0 0.000000 434144.125 6478782.5
3400.0 0.000000 434144.125 6478782.5

10473700 rows × 3 columns

<xarray.Dataset>
Dimensions:  (iline: 61, xline: 202, twt: 850)
Coordinates:
  * iline    (iline) uint64 10090 10091 10092 10093 ... 10147 10148 10149 10150
  * xline    (xline) uint64 2150 2151 2152 2153 2154 ... 2348 2349 2350 2351
  * twt      (twt) float64 4.0 8.0 12.0 16.0 ... 3.392e+03 3.396e+03 3.4e+03
Data variables:
    data     (iline, xline, twt) float32 0.02057 0.02204 0.01966 ... 0.0 0.0 0.0
    cdp_x    (iline, xline, twt) float32 4.364e+05 4.364e+05 ... 4.341e+05
    cdp_y    (iline, xline, twt) float32 6.477e+06 6.477e+06 ... 6.479e+06

The resulting dataset requires some changes to make it compatible again for export to SEGY. Firstly, the attributes need to be set. The simplest way is to copy these from the original SEG-Y input. Otherwise they can be set manually. segysak specifically needs the sample_rate and the coord_scalar attributes.

[5]:
volve_3d_ds.attrs = volve_3d.attrs
display(volve_3d_ds.attrs)
{'ns': None,
 'sample_rate': 4.0,
 'text': Text HeaderC 1 SEGY OUTPUT FROM Petrel 2017.2 Saturday, June 06 2020 10:15:00
 C 2 Name: ST10010ZDC12-PZ-PSDM-KIRCH-FULL-T.MIG_FIN.POST_STACK.3D.JS-017534
 ÝCroC 3
 C 4 First inline: 10090  Last inline: 10150
 C 5 First xline:  2150   Last xline:  2351
 C 6 CRS: ED50-UTM31 ("MENTOR:ED50-UTM31:European 1950 Based UTM, Zone 31 North,
 C 7 X min: 433955.09 max: 436589.56 delta: 2634.47
 C 8 Y min: 6477439.46 max: 6478790.23 delta: 1350.77
 C 9 Time min: -3402.00 max: -2.00 delta: 3400.00
 C10 Lat min: 58.25'52.8804"N max: 58.26'37.9493"N delta: 0.00'45.0689"
 C11 Long min: 1.52'7.1906"E max: 1.54'50.9616"E delta: 0.02'43.7710"
 C12 Trace min: -3400.00 max: -4.00 delta: 3396.00
 C13 Seismic (template) min: -58.55 max: 54.55 delta: 113.10
 C14 Amplitude (data) min: -58.55 max: 54.55 delta: 113.10
 C15 Trace sample format: IEEE floating point
 C16 Coordinate scale factor: 100.00000
 C17
 C18 Binary header locations:
 C19 Sample interval             : bytes 17-18
 C20 Number of samples per trace : bytes 21-22
 C21 Trace date format           : bytes 25-26
 C22
 C23 Trace header locations:
 C24 Inline number               : bytes 5-8
 C25 Xline number                : bytes 21-24
 C26 Coordinate scale factor     : bytes 71-72
 C27 X coordinate                : bytes 73-76
 C28 Y coordinate                : bytes 77-80
 C29 Trace start time/depth      : bytes 109-110
 C30 Number of samples per trace : bytes 115-116
 C31 Sample interval             : bytes 117-118
 C32
 C33
 C34
 C35
 C36
 C37
 C38
 C39
 C40 END EBCDIC
 ,
 'measurement_system': 'm',
 'd3_domain': None,
 'epsg': None,
 'corner_points': None,
 'corner_points_xy': None,
 'source_file': 'volve10r12-full-twt-sub3d.sgy',
 'srd': None,
 'datatype': None,
 'percentiles': [-6.595060190682801,
  -6.114936243337226,
  -1.5039999629831111,
  -0.0003667792810293612,
  1.4698263440097095,
  6.1379671676876475,
  6.6378843598175346],
 'coord_scalar': -100.0}

The cdp_x and cdp_y positions must be reduced to 2D along the vertical axis “twt” and set as coordinates.

[6]:
volve_3d_ds["cdp_x"] = volve_3d_ds["cdp_x"].mean(dim=["twt"])
volve_3d_ds["cdp_y"] = volve_3d_ds["cdp_y"].mean(dim=["twt"])
volve_3d_ds = volve_3d_ds.set_coords(["cdp_x", "cdp_y"])
volve_3d_ds
[6]:
<xarray.Dataset>
Dimensions:  (iline: 61, xline: 202, twt: 850)
Coordinates:
  * iline    (iline) uint64 10090 10091 10092 10093 ... 10147 10148 10149 10150
  * xline    (xline) uint64 2150 2151 2152 2153 2154 ... 2348 2349 2350 2351
  * twt      (twt) float64 4.0 8.0 12.0 16.0 ... 3.392e+03 3.396e+03 3.4e+03
    cdp_x    (iline, xline) float32 4.364e+05 4.364e+05 ... 4.342e+05 4.341e+05
    cdp_y    (iline, xline) float32 6.477e+06 6.477e+06 ... 6.479e+06 6.479e+06
Data variables:
    data     (iline, xline, twt) float32 0.02057 0.02204 0.01966 ... 0.0 0.0 0.0
Attributes: (12/13)
    ns:                  None
    sample_rate:         4.0
    text:                C 1 SEGY OUTPUT FROM Petrel 2017.2 Saturday, June 06...
    measurement_system:  m
    d3_domain:           None
    epsg:                None
    ...                  ...
    corner_points_xy:    None
    source_file:         volve10r12-full-twt-sub3d.sgy
    srd:                 None
    datatype:            None
    percentiles:         [-6.595060190682801, -6.114936243337226, -1.50399996...
    coord_scalar:        -100.0

Afterwards, use the segy_writer utility as normal to return to SEGY.

[7]:
segy_writer(volve_3d_ds, "test.segy")
Writing to SEG-Y: 100%|██████████| 12322/12322 [00:00<00:00, 25621.19 traces/s]