Note
Get this example as a Jupyter Notebook ipynb
[ ]:
SEG-Y to Vector DataFrames and Back
The connection of segysak to xarray
greatly simplifies the process of vectorising segy 3D data and returning it to SEGY. To do this, one can use the close relationship between pandas
and xarray
.
Loading Data
We start by loading data normally using the segy_loader
utility. For this example we will use the Volve example sub-cube.
[1]:
import pathlib
from IPython.display import display
from segysak.segy import segy_loader, well_known_byte_locs, segy_writer
volve_3d_path = pathlib.Path("data/volve10r12-full-twt-sub3d.sgy")
print("3D", volve_3d_path.exists())
volve_3d = segy_loader(volve_3d_path, **well_known_byte_locs("petrel_3d"))
3D True
100%|██████████| 12.3k/12.3k [00:01<00:00, 11.2k traces/s]
Loading as 3D
Fast direction is TRACE_SEQUENCE_FILE
Converting SEGY: 100%|██████████| 12.3k/12.3k [00:03<00:00, 3.78k traces/s]
Vectorisation
Once the data is loaded it can be converted to a pandas.DataFrame
directly from the loaded Dataset
. The Dataframe is multi-index and contains columns for each variable in the originally loaded dataset. This includes the seismic amplitude as data
and the cdp_x
and cdp_y
locations. If you require smaller volumes from the input data, you can use xarray selection methods prior to conversion to a DataFrame.
[2]:
volve_3d_df = volve_3d.to_dataframe()
display(volve_3d_df)
data | cdp_x | cdp_y | |||
---|---|---|---|---|---|
iline | xline | twt | |||
10090 | 2150 | 4.0 | 0.020575 | 436400.500 | 6477447.0 |
8.0 | 0.022041 | 436400.500 | 6477447.0 | ||
12.0 | 0.019659 | 436400.500 | 6477447.0 | ||
16.0 | 0.025421 | 436400.500 | 6477447.0 | ||
20.0 | 0.025436 | 436400.500 | 6477447.0 | ||
... | ... | ... | ... | ... | ... |
10150 | 2351 | 3384.0 | 0.000000 | 434144.125 | 6478782.5 |
3388.0 | 0.000000 | 434144.125 | 6478782.5 | ||
3392.0 | 0.000000 | 434144.125 | 6478782.5 | ||
3396.0 | 0.000000 | 434144.125 | 6478782.5 | ||
3400.0 | 0.000000 | 434144.125 | 6478782.5 |
10473700 rows × 3 columns
We can remove the multi-index by resetting the index of the DataFrame. Vectorized workflows such as machine learning can then be easily applied to the DataFrame.
[3]:
volve_3d_df_reindex = volve_3d_df.reset_index()
display(volve_3d_df_reindex)
iline | xline | twt | data | cdp_x | cdp_y | |
---|---|---|---|---|---|---|
0 | 10090 | 2150 | 4.0 | 0.020575 | 436400.500 | 6477447.0 |
1 | 10090 | 2150 | 8.0 | 0.022041 | 436400.500 | 6477447.0 |
2 | 10090 | 2150 | 12.0 | 0.019659 | 436400.500 | 6477447.0 |
3 | 10090 | 2150 | 16.0 | 0.025421 | 436400.500 | 6477447.0 |
4 | 10090 | 2150 | 20.0 | 0.025436 | 436400.500 | 6477447.0 |
... | ... | ... | ... | ... | ... | ... |
10473695 | 10150 | 2351 | 3384.0 | 0.000000 | 434144.125 | 6478782.5 |
10473696 | 10150 | 2351 | 3388.0 | 0.000000 | 434144.125 | 6478782.5 |
10473697 | 10150 | 2351 | 3392.0 | 0.000000 | 434144.125 | 6478782.5 |
10473698 | 10150 | 2351 | 3396.0 | 0.000000 | 434144.125 | 6478782.5 |
10473699 | 10150 | 2351 | 3400.0 | 0.000000 | 434144.125 | 6478782.5 |
10473700 rows × 6 columns
Return to Xarray
It is possible to return the DataFrame to the Dataset for output to SEGY. To do this the multi-index must be reset. Afterward, pandas
provides the to_xarray
method.
[4]:
volve_3d_df_multi = volve_3d_df_reindex.set_index(["iline", "xline", "twt"])
display(volve_3d_df_multi)
volve_3d_ds = volve_3d_df_multi.to_xarray()
display(volve_3d_ds)
data | cdp_x | cdp_y | |||
---|---|---|---|---|---|
iline | xline | twt | |||
10090 | 2150 | 4.0 | 0.020575 | 436400.500 | 6477447.0 |
8.0 | 0.022041 | 436400.500 | 6477447.0 | ||
12.0 | 0.019659 | 436400.500 | 6477447.0 | ||
16.0 | 0.025421 | 436400.500 | 6477447.0 | ||
20.0 | 0.025436 | 436400.500 | 6477447.0 | ||
... | ... | ... | ... | ... | ... |
10150 | 2351 | 3384.0 | 0.000000 | 434144.125 | 6478782.5 |
3388.0 | 0.000000 | 434144.125 | 6478782.5 | ||
3392.0 | 0.000000 | 434144.125 | 6478782.5 | ||
3396.0 | 0.000000 | 434144.125 | 6478782.5 | ||
3400.0 | 0.000000 | 434144.125 | 6478782.5 |
10473700 rows × 3 columns
<xarray.Dataset> Dimensions: (iline: 61, xline: 202, twt: 850) Coordinates: * iline (iline) uint64 10090 10091 10092 10093 ... 10147 10148 10149 10150 * xline (xline) uint64 2150 2151 2152 2153 2154 ... 2348 2349 2350 2351 * twt (twt) float64 4.0 8.0 12.0 16.0 ... 3.392e+03 3.396e+03 3.4e+03 Data variables: data (iline, xline, twt) float32 0.02057 0.02204 0.01966 ... 0.0 0.0 0.0 cdp_x (iline, xline, twt) float32 4.364e+05 4.364e+05 ... 4.341e+05 cdp_y (iline, xline, twt) float32 6.477e+06 6.477e+06 ... 6.479e+06
The resulting dataset requires some changes to make it compatible again for export to SEGY. Firstly, the attributes need to be set. The simplest way is to copy these from the original SEG-Y input. Otherwise they can be set manually. segysak
specifically needs the sample_rate
and the coord_scalar
attributes.
[5]:
volve_3d_ds.attrs = volve_3d.attrs
display(volve_3d_ds.attrs)
{'ns': None,
'sample_rate': 4.0,
'text': Text HeaderC 1 SEGY OUTPUT FROM Petrel 2017.2 Saturday, June 06 2020 10:15:00
C 2 Name: ST10010ZDC12-PZ-PSDM-KIRCH-FULL-T.MIG_FIN.POST_STACK.3D.JS-017534
ÝCroC 3
C 4 First inline: 10090 Last inline: 10150
C 5 First xline: 2150 Last xline: 2351
C 6 CRS: ED50-UTM31 ("MENTOR:ED50-UTM31:European 1950 Based UTM, Zone 31 North,
C 7 X min: 433955.09 max: 436589.56 delta: 2634.47
C 8 Y min: 6477439.46 max: 6478790.23 delta: 1350.77
C 9 Time min: -3402.00 max: -2.00 delta: 3400.00
C10 Lat min: 58.25'52.8804"N max: 58.26'37.9493"N delta: 0.00'45.0689"
C11 Long min: 1.52'7.1906"E max: 1.54'50.9616"E delta: 0.02'43.7710"
C12 Trace min: -3400.00 max: -4.00 delta: 3396.00
C13 Seismic (template) min: -58.55 max: 54.55 delta: 113.10
C14 Amplitude (data) min: -58.55 max: 54.55 delta: 113.10
C15 Trace sample format: IEEE floating point
C16 Coordinate scale factor: 100.00000
C17
C18 Binary header locations:
C19 Sample interval : bytes 17-18
C20 Number of samples per trace : bytes 21-22
C21 Trace date format : bytes 25-26
C22
C23 Trace header locations:
C24 Inline number : bytes 5-8
C25 Xline number : bytes 21-24
C26 Coordinate scale factor : bytes 71-72
C27 X coordinate : bytes 73-76
C28 Y coordinate : bytes 77-80
C29 Trace start time/depth : bytes 109-110
C30 Number of samples per trace : bytes 115-116
C31 Sample interval : bytes 117-118
C32
C33
C34
C35
C36
C37
C38
C39
C40 END EBCDIC
,
'measurement_system': 'm',
'd3_domain': None,
'epsg': None,
'corner_points': None,
'corner_points_xy': None,
'source_file': 'volve10r12-full-twt-sub3d.sgy',
'srd': None,
'datatype': None,
'percentiles': [-6.595060190682801,
-6.114936243337226,
-1.5039999629831111,
-0.0003667792810293612,
1.4698263440097095,
6.1379671676876475,
6.6378843598175346],
'coord_scalar': -100.0}
The cdp_x
and cdp_y
positions must be reduced to 2D along the vertical axis “twt” and set as coordinates.
[6]:
volve_3d_ds["cdp_x"] = volve_3d_ds["cdp_x"].mean(dim=["twt"])
volve_3d_ds["cdp_y"] = volve_3d_ds["cdp_y"].mean(dim=["twt"])
volve_3d_ds = volve_3d_ds.set_coords(["cdp_x", "cdp_y"])
volve_3d_ds
[6]:
<xarray.Dataset> Dimensions: (iline: 61, xline: 202, twt: 850) Coordinates: * iline (iline) uint64 10090 10091 10092 10093 ... 10147 10148 10149 10150 * xline (xline) uint64 2150 2151 2152 2153 2154 ... 2348 2349 2350 2351 * twt (twt) float64 4.0 8.0 12.0 16.0 ... 3.392e+03 3.396e+03 3.4e+03 cdp_x (iline, xline) float32 4.364e+05 4.364e+05 ... 4.342e+05 4.341e+05 cdp_y (iline, xline) float32 6.477e+06 6.477e+06 ... 6.479e+06 6.479e+06 Data variables: data (iline, xline, twt) float32 0.02057 0.02204 0.01966 ... 0.0 0.0 0.0 Attributes: (12/13) ns: None sample_rate: 4.0 text: C 1 SEGY OUTPUT FROM Petrel 2017.2 Saturday, June 06... measurement_system: m d3_domain: None epsg: None ... ... corner_points_xy: None source_file: volve10r12-full-twt-sub3d.sgy srd: None datatype: None percentiles: [-6.595060190682801, -6.114936243337226, -1.50399996... coord_scalar: -100.0
Afterwards, use the segy_writer
utility as normal to return to SEGY.
[7]:
segy_writer(volve_3d_ds, "test.segy")
Writing to SEG-Y: 100%|██████████| 12322/12322 [00:00<00:00, 25621.19 traces/s]