Note

Get this example as a Jupyter Notebook ipynb

[ ]:

SEG-Y to Vector DataFrames and Back

The connection of segysak to xarray greatly simplifies the process of vectorising segy 3D data and returning it to SEGY. To do this, one can use the close relationship between pandas and xarray.

Loading Data

We start by loading data normally using the segy_loader utility. For this example we will use the Volve example sub-cube.

[1]:

import pathlib
from IPython.display import display
from segysak.segy import segy_loader, well_known_byte_locs, segy_writer

volve_3d_path = pathlib.Path("data/volve10r12-full-twt-sub3d.sgy")
print("3D", volve_3d_path.exists())

volve_3d = segy_loader(volve_3d_path, **well_known_byte_locs("petrel_3d"))

3D True

100%|██████████| 12.3k/12.3k [00:01<00:00, 11.2k traces/s]

Loading as 3D
Fast direction is TRACE_SEQUENCE_FILE

Converting SEGY: 100%|██████████| 12.3k/12.3k [00:03<00:00, 3.78k traces/s]

Vectorisation

Once the data is loaded it can be converted to a pandas.DataFrame directly from the loaded Dataset. The Dataframe is multi-index and contains columns for each variable in the originally loaded dataset. This includes the seismic amplitude as data and the cdp_x and cdp_y locations. If you require smaller volumes from the input data, you can use xarray selection methods prior to conversion to a DataFrame.

[2]:

volve_3d_df = volve_3d.to_dataframe()
display(volve_3d_df)

			data	cdp_x	cdp_y
iline	xline	twt
10090	2150	4.0	0.020575	436400.500	6477447.0
		8.0	0.022041	436400.500	6477447.0
		12.0	0.019659	436400.500	6477447.0
		16.0	0.025421	436400.500	6477447.0
		20.0	0.025436	436400.500	6477447.0
...	...	...	...	...	...
10150	2351	3384.0	0.000000	434144.125	6478782.5
		3388.0	0.000000	434144.125	6478782.5
		3392.0	0.000000	434144.125	6478782.5
		3396.0	0.000000	434144.125	6478782.5
		3400.0	0.000000	434144.125	6478782.5

10473700 rows × 3 columns

We can remove the multi-index by resetting the index of the DataFrame. Vectorized workflows such as machine learning can then be easily applied to the DataFrame.

[3]:

volve_3d_df_reindex = volve_3d_df.reset_index()
display(volve_3d_df_reindex)

	iline	xline	twt	data	cdp_x	cdp_y
0	10090	2150	4.0	0.020575	436400.500	6477447.0
1	10090	2150	8.0	0.022041	436400.500	6477447.0
2	10090	2150	12.0	0.019659	436400.500	6477447.0
3	10090	2150	16.0	0.025421	436400.500	6477447.0
4	10090	2150	20.0	0.025436	436400.500	6477447.0
...	...	...	...	...	...	...
10473695	10150	2351	3384.0	0.000000	434144.125	6478782.5
10473696	10150	2351	3388.0	0.000000	434144.125	6478782.5
10473697	10150	2351	3392.0	0.000000	434144.125	6478782.5
10473698	10150	2351	3396.0	0.000000	434144.125	6478782.5
10473699	10150	2351	3400.0	0.000000	434144.125	6478782.5

10473700 rows × 6 columns

Return to Xarray

It is possible to return the DataFrame to the Dataset for output to SEGY. To do this the multi-index must be reset. Afterward, pandas provides the to_xarray method.

[4]:

volve_3d_df_multi = volve_3d_df_reindex.set_index(["iline", "xline", "twt"])
display(volve_3d_df_multi)
volve_3d_ds = volve_3d_df_multi.to_xarray()
display(volve_3d_ds)

			data	cdp_x	cdp_y
iline	xline	twt
10090	2150	4.0	0.020575	436400.500	6477447.0
		8.0	0.022041	436400.500	6477447.0
		12.0	0.019659	436400.500	6477447.0
		16.0	0.025421	436400.500	6477447.0
		20.0	0.025436	436400.500	6477447.0
...	...	...	...	...	...
10150	2351	3384.0	0.000000	434144.125	6478782.5
		3388.0	0.000000	434144.125	6478782.5
		3392.0	0.000000	434144.125	6478782.5
		3396.0	0.000000	434144.125	6478782.5
		3400.0	0.000000	434144.125	6478782.5

10473700 rows × 3 columns

The resulting dataset requires some changes to make it compatible again for export to SEGY. Firstly, the attributes need to be set. The simplest way is to copy these from the original SEG-Y input. Otherwise they can be set manually. segysak specifically needs the sample_rate and the coord_scalar attributes.

[5]:

volve_3d_ds.attrs = volve_3d.attrs
display(volve_3d_ds.attrs)

{'ns': None,
 'sample_rate': 4.0,
 'text': Text HeaderC 1 SEGY OUTPUT FROM Petrel 2017.2 Saturday, June 06 2020 10:15:00
 C 2 Name: ST10010ZDC12-PZ-PSDM-KIRCH-FULL-T.MIG_FIN.POST_STACK.3D.JS-017534
 ÝCroC 3
 C 4 First inline: 10090  Last inline: 10150
 C 5 First xline:  2150   Last xline:  2351
 C 6 CRS: ED50-UTM31 ("MENTOR:ED50-UTM31:European 1950 Based UTM, Zone 31 North,
 C 7 X min: 433955.09 max: 436589.56 delta: 2634.47
 C 8 Y min: 6477439.46 max: 6478790.23 delta: 1350.77
 C 9 Time min: -3402.00 max: -2.00 delta: 3400.00
 C10 Lat min: 58.25'52.8804"N max: 58.26'37.9493"N delta: 0.00'45.0689"
 C11 Long min: 1.52'7.1906"E max: 1.54'50.9616"E delta: 0.02'43.7710"
 C12 Trace min: -3400.00 max: -4.00 delta: 3396.00
 C13 Seismic (template) min: -58.55 max: 54.55 delta: 113.10
 C14 Amplitude (data) min: -58.55 max: 54.55 delta: 113.10
 C15 Trace sample format: IEEE floating point
 C16 Coordinate scale factor: 100.00000
 C17
 C18 Binary header locations:
 C19 Sample interval             : bytes 17-18
 C20 Number of samples per trace : bytes 21-22
 C21 Trace date format           : bytes 25-26
 C22
 C23 Trace header locations:
 C24 Inline number               : bytes 5-8
 C25 Xline number                : bytes 21-24
 C26 Coordinate scale factor     : bytes 71-72
 C27 X coordinate                : bytes 73-76
 C28 Y coordinate                : bytes 77-80
 C29 Trace start time/depth      : bytes 109-110
 C30 Number of samples per trace : bytes 115-116
 C31 Sample interval             : bytes 117-118
 C32
 C33
 C34
 C35
 C36
 C37
 C38
 C39
 C40 END EBCDIC
 ,
 'measurement_system': 'm',
 'd3_domain': None,
 'epsg': None,
 'corner_points': None,
 'corner_points_xy': None,
 'source_file': 'volve10r12-full-twt-sub3d.sgy',
 'srd': None,
 'datatype': None,
 'percentiles': [-6.595060190682801,
  -6.114936243337226,
  -1.5039999629831111,
  -0.0003667792810293612,
  1.4698263440097095,
  6.1379671676876475,
  6.6378843598175346],
 'coord_scalar': -100.0}

The cdp_x and cdp_y positions must be reduced to 2D along the vertical axis “twt” and set as coordinates.

[6]:

volve_3d_ds["cdp_x"] = volve_3d_ds["cdp_x"].mean(dim=["twt"])
volve_3d_ds["cdp_y"] = volve_3d_ds["cdp_y"].mean(dim=["twt"])
volve_3d_ds = volve_3d_ds.set_coords(["cdp_x", "cdp_y"])
volve_3d_ds

Afterwards, use the segy_writer utility as normal to return to SEGY.

[7]:

segy_writer(volve_3d_ds, "test.segy")

Writing to SEG-Y: 100%|██████████| 12322/12322 [00:00<00:00, 25621.19 traces/s]