Pyce Tools Modules¶
pyce_tools.pyce_tools module¶
-
pyce_tools.pyce_tools.calculate_raw(blank_source, type_, location, process, sample_name, collection_date, analysis_date, issues, num_tubes, vol_tube=0.2, rinse_vol=20, size=None, flow_start=None, flow_stop=None, sample_stop_time=None)¶ Calculates raw data.
Creates an XLSX spreadsheet of blank corrected, calculated INP data for samples using given args. Resulting spreadsheet has a seperate tab for unheated and heated samples, with respective metadata in each. Saves the output to interim calculated folder –> data/interim/IN/calculated/[seawater or aerosols].
- Parameters
blank_source (str) – Path to source of calculated blank data. Currently accepts one file, but need to account for average of multiple files.
type (str) – The sample type for which this blank was collected. [seawater, aerosol]
location (str) – Where sample was collected. [uway, wboatsml, wboatssw, bubbler, coriolis]
process (str) – Identify whether the sample has been unfiltered or filtered. Unheated and heated processes are already included in the file. [uf,f]
sample_name (str) – sample source name as seen on the vial.
collection_date (str) – sample collection date in NZST. [DDMMYYYY HHhMM]
analysis_date (str) – LINDA analysis date in NZST. [DD/MM/YY]
issues (str) – Issues noted in the LINDA analysis log. [DEFAULT = None]
num_tubes (int) – Number of tubes per heated/unheated analysis. [DEFAULT = 26]
vol_tube (int) – Volume in ml of sample solution per tube. [DEFAULT = 0.2]
rinse_vol (int) – Volume in ml of mq water used for rinsing filters (aerosol sample types only).
size (str) – Size of particles for filter samples (if applicable). Defaults to None if not given. Only used for aerosol samples. [super, sub]
flow_start (float) – Flow rate in liters per minute (LPM) at start of sampling. Only used for aerosol samples.
flow_stop (float) – Flow rate in LPM at end of sampling. Only used for aerosol samples.
sample_stop_time (str) – Time in NZST at which sample collection was halted. Only valid for aerosol collections. [DDMMYYYY HHhMM]
Notes
raw input data: [PROJECT_ROOT]datarawIN[SAMPLE_TYPE][FILE] calculated output file: [PROJECT_ROOT]datainterimINcalculated[SAMPLE_TYPE][SAMPLE_LOCATION][FILE]
Examples
Below is an example for a seawater INP sample collected from the workboat ssw.
>>> issues = 'Exact timing of sample collection on workboat is unknown; only that it was around 8-10 am.' >>> blank_source = '..\data\interim\IN\calculated\blank\mq_wboat_blank_uf_260320_calculated.xlsx' >>> type_ = 'seawater' >>> location = 'wboatssw' >>> process = 'uf' >>> sample_name = 'TAN2003 IN 8b' >>> collection_date = '23032020 08h00' >>> analysis_date = '08/12/2020' >>> num_tubes = 26 >>> vol_tube = 0.2 >>> raw = pt.calculate_raw(blank_source, type_, location, process, sample_name, collection_date, analysis_date, issues, num_tubes, vol_tube) ...IN data calculated! Calculated report file saved to ..\data\interim\IN\calculated\seawater\wboat_ssw\seawater_wboatssw_uf_230320_0800_calculated.xlsx.
Below is an example for INP data from the bubbler.
>>> issues = 'None.' >>> blank_source = '..\data\interim\IN\calculated\blank\bubbler_blank_uf_super_day10_day11_calculated_avg.xlsx' >>> type_ = 'aerosol' >>> size = 'super' >>> location = 'bubbler' >>> process = 'uf' >>> sample_name = 'Day 09 bubbler 3 stage PC' >>> collection_date = '25032020 11h45' >>> sample_stop_time = '26032020 08h52' >>> analysis_date = '31/07/2020' >>> num_tubes = 26 >>> vol_tube = 0.2 >>> flow_start = 10.50 >>> flow_stop = 9.35 >>> rinse_vol = 20 >>> raw = pt.calculate_raw(blank_source, type_, location, process, sample_name, collection_date, analysis_date, issues, num_tubes, vol_tube, rinse_vol, size, flow_start, flow_stop, sample_stop_time) ...IN data calculated! Calculated report file saved to ..\data\interim\IN\calculated\aerosol\bubbler\aerosol_bubbler_uf_super_250320_1145_calculated.xlsx.
-
pyce_tools.pyce_tools.calculate_raw_blank(type_, process, location, sample_name, collection_date, analysis_date, issues, num_tubes, vol_tube=0.2, rinse_vol=20, size=None)¶ Loads raw data from LINDA BLANK experiments and creates a ‘calculated’ INP data file using given arguments. Saves the output as an XLSX file which can be later used as the blank in sample calculations of LINDA experiments.
- Parameters
type (str) – The sample type for which this blank was collected. Note that mq is for any seawater samples, whether they be from the underway or workboat. [aerosol, mq]
process (str) – Identify whether the sample has been left unfiltered (uf) or filtered (f). Unheated (UH) and heated (H) processes are already included in the file. [uf,f]
location (str) – [bubbler, coriolis, mq, mq_wboat]
sample_name (str) – Sample source name as seen on the vial. There is no rule on this, as it is simply stored as metadata, but ideally the sample name is following some kind of consistent standard.
collection_date (str) – Sample collection date. Make note of your timezone, as the code does not assume one. This is used to help locate the raw data file. [DDMMYYYY HHhMM].
analysis_date (str) – Date of LINDA analysis in NZST time. [DD/MM/YY]
issues (str) – Issues noted in the LINDA analysis log. [DEFAULT = None]
num_tubes (int) – Number of tubes per heated/unheated analysis. [DEFAULT = 26]
vol_tube (int) – Volume in ml of sample solution per tube. [DEFAULT = 0.2]
rinse_vol (int) – Volume in ml of mq water used for rinsing filters, if the sample type makes use of a filter.
size (str) – Size of particles for filter samples if sample was size resolved. [super, sub]
- Returns
A spreadsheet of calculated blank data.
- Return type
xlsx
Notes
raw input data: [PROJECT_ROOT]datarawINblank[FILE]calculated output file: [PROJECT_ROOT]datainterimINcalculatedblank[FILE]
-
pyce_tools.pyce_tools.calculate_wilson_errors(project, location, type_, n=26)¶ Takes calculated report files and creates a csv of error bars. The csv is saved in the same location as the cleaned combined time series data file. lower and upper bounds for blank subtracted frozen fraction of tubes are calculated using subfunctions (upperBound, lowerBound, respectively). These fractions are then converted to a number of blank subtracted tubes that are frozen (upper_N-BLNK, lower_N-BLNK, respectively). These bounds are then converted into INP/tube upper and lower bounds. Then they are converted to IN/mL and IN/L upper and lower bounds. Finally, the difference between each bound and the original observed value is calculated to determine the size of the error bars.
- Parameters
project (str) – The project. This needs to be defined since projects before Sea2Cloud were saved in different formats. [me3, nz2020]
location (str) – Where sample was collected. [uway, ASIT, wkbtsml, wkbtssw, bubbler, coriolis]
type (str) – Description
n (int) – Number of tubes.
Notes
raw input data: [PROJECT_ROOT]datainterimINcalculated[SAMPLE_TYPE][SAMPLE_LOCATION][FILE] cleaned output file: [PROJECT_ROOT]datainterimINcleanedcombinedtimeseries[SAMPLE_TYPE][FILE]
-
pyce_tools.pyce_tools.clean_aqualog(instr, outpath)¶ Loads all raw aqualog data files for a given instrument (aqlog1 or aqlog2) and cleans it up. Returns the cleaned dataset to chosen outpath.
- The steps of the cleaning process are as follows:
open all data files in te data/raw folder path
append all data files into one df
remove bad chars in column names
create a timeString column in UTC time
save df to csv in specified folder
- Parameters
instr (string) – aqualog1 or aqualog2
outpath (string) – location where cleaned csv file is saved.
- Returns
dfBig (df) – the df that was just saved to a folder
outName (string) – string of the start and end datetimes
-
pyce_tools.pyce_tools.clean_calculated_in(type_, location)¶ Creates an XLSX spreadsheet of cleaned data ready for analysis.
- Parameters
type (str) – The sample type for which this blank was collected. [seawater, aerosol]
location (str) – Where sample was collected. [uway, ASIT, wkbtsml, wkbtssw, bubbler, coriolis]
Notes
raw input data: [PROJECT_ROOT]datainterimINcalculated[SAMPLE_TYPE][SAMPLE_LOCATION][FILE] cleaned output file: [PROJECT_ROOT]datainterimINcleanedcombinedtimeseries[SAMPLE_TYPE][FILE]
-
pyce_tools.pyce_tools.clean_inverted(inpath, nbins, outpath)¶ Accepts inverted scanotron data files from a specified given folder. Appends them into one dataframe and sends them out to /interim/scanotron/combinedtimeseries/ folder. Also returns the completed dataframe as a variable for immediate use, as well as the file name and dLogDp value.
- Parameters
inpath (str) – Path to inverted scanotron data files. [example: ‘..datainterim"+instr+”invertedproBHS']
nbins (int) – Number of diameter bins for scanotron.
outpath (str) – Desired location for the combined time series csv file. [example: ‘..datainterim'+instr+’combinedtimeseriesBHS']
- Returns
dfBig – Dataframe of combined time series of scanotron data. Rows are timestring and columns include time, diameters, and year, month, day, hour, minute, second, pex, tex, rhsh, tgrad, nb, dbeg, dend, conctotal.
outName – Name of the file that is saved to the computer.
dLogDp – Integer of dLogDp.
Notes
inverted scanotron input data folder path: ..datainterim"+instr+”invertedproBHScalculated output file: ..datainterim'+instr+’combinedtimeseriesBHS[FILE]
-
pyce_tools.pyce_tools.clean_magic(inpath, outpath, timezone)¶ Loads all raw magic CPC data files, cleans it up, and appends it into one file. Returns the cleaned dataset to chosen outpath as csv file.
- The steps of the cleaning process are as follows:
open all data files in the data/raw folder path
append all data files into one df
remove bad chars in column names
create a timeString column in UTC time
save df to csv in specified folder
- Parameters
inpath (str) – location where raw csv file is found.
outpath (str) – location where cleaned csv file is saved.
- Returns
dfBig (df) – the df that was just saved to a folder
outName (str) – string of the start and end datetimes
-
class
pyce_tools.pyce_tools.inp(inp_type, inp_location, cyto_location, cyto_data, uway_bio_data, inp_data)¶ Bases:
object-
corr(data, regime, temp)¶
-
correlations(temps, process, inp_units, dfs=None, size=None)¶
-
plot_corr_scatter(temp, units, processes, row_num)¶
-
plot_ins_inp()¶
-
sa_normalize(dA_total)¶ Function takes a dA_total and inp and calculates surface area normalized in units of INP/cm2
INPUT
dA_total: total surface area integrated across all particle sizes for each day. Index should be a day number and SA should be in units of um2/cm3. If no index name, the function names for us. inp: INP/l of air. Columns should only be temperatures. Any other identifying information should be in the index. Index needs to include day.
-
var_names_uway= {'SB21_SB21sal': 'salinity', 'TRIPLET_TripletBeta660': 'Triplet Beta', 'TRIPLET_TripletCDOM': 'CDOM', 'TRIPLET_TripletChl': 'Chl-a', 'chla_cdom': 'chla/cdom', 'nanophyto2-20um': 'Nanophytoplankton', 'picophyto<2um': 'Picophytoplankton', 'prokaryoticpico-syne': 'Prok. Pico. Syn.'}¶
-
-
pyce_tools.pyce_tools.load_scano_data(date_string, instrument)¶ Loads interim scanotron data that has already been pre-processed using the clean_inverted function. Returns two dataframes of dN and dNdLogDp where rows are time and columns are diameters.
- Parameters
date_string (str) – Dates of scanotron data that are requested. [YYYY-MM-DD_YYYY_MM_DD]
instrument (str) – Instrument that is being loaded. [scanotron]
- Returns
dN – A dataframe of particle counts where rows are dates and columns are diameters.
dNdLogDp – A dataframe of log-normalized particle counts where rows are dates and columns are diameters.
Notes
raw input data: [PROJECT_ROOT]datainterimscanotroncombinedtimeseries[FILE]
-
pyce_tools.pyce_tools.load_wilson_errors()¶ See IN_Analysis_V1.ipynb. This will load and clean up the spreadsheets of error bars.
-
pyce_tools.pyce_tools.plot_number_dist(smps_daily_mean_df, smps_daily_std_df)¶ Creates a plot of scanotron data.
- Parameters
smps_daily_mean_df (pandas dataframe) – Size distribution data where rows are size bins and columns are the mean of daily (or other timespan) data.
smps_daily_std_df (pandas dataframe) – Size distribution standadr deviation data where rows are size bins and columns are the mean of daily (or other timespan) data.
- Returns
fig – A plotly graph object.
- Return type
object
-
pyce_tools.pyce_tools.plot_sml_inp(inp_df)¶ This accepts a dataframe of INP values along with their uncertainties and plots heated vs unheated as well as literature values.
-
pyce_tools.pyce_tools.plot_ssw_inp(inp_df)¶ This accepts a dataframe of INP values along with their uncertainties and plots heated vs unheated as well as literature values.
-
pyce_tools.pyce_tools.plot_surface_dist(dAdLogDp, dAdLogDp_std)¶
-
pyce_tools.pyce_tools.surface_area(smps_daily_mean_df, smps_daily_std_df, nbins)¶
-
pyce_tools.pyce_tools.wilsonLower(p, n=26, z=1.96)¶ p is the frozen fraction n is number of tubes z is confidence level
-
pyce_tools.pyce_tools.wilsonUpper(p, n=26, z=1.96)¶