Unsupervised Learning for Predicting Machine Failures for Aircraft Engine

Review of Unsupervised Learning for Predicting Machine Failures for Aircraft Engine Run-to Failure Simulation

Abstract— This research paper shows how damaged machines can be fixed inside the aircraft gas turbine motors. Keeping that in mind, reaction surfaces of all sensors are produced by means of a thermo-dynamical re-enactment model for the motor as a component of varieties of stream and proficiency of the modules of intrigue. An exponential rate of progress for stream and effective loss was imposed for each data set, beginning at a randomly chosen initial deterioration set point. The rate of change of the stream and efficiency indicates unspecified fault to damage and worse the effect. The rates of progress of the flaws were obliged to an upper threshold, were generally picked randomly. Damage propagation was permitted to proceed until failure. A health index was characterized as the base until it reaches zero. Output of the model was the time series (cycles) of detected estimations normally accessible from airplane gas turbine engines. The following research analysis has been done with numerous available open source technologies.

Keywords — Damage modelling, Classification, Linear Regression, Performance Evaluation, Data Modelling, Unsupervised Learning.


Would one be able to anticipate when an engine or device breaks down? This appears to be an engineering question. Be that as it may, these days it is likewise a Data Science question. More solidly, it is a critical inquiry wherever engines and devices utilize information to direct upkeep, for example, air ship engines (1), windmill engines (2), and rotatory machineries (3). With respect to human wellbeing and logistic planning, it’s anything but not a smart thought to simply hold up until the point that an engine breaks down. It is fundamental effective to design support to maintain a maintenance to avoid costly breakdowns.

Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service

Proactive planning is empowered by various sensor advances that gives powerful information about the condition of machines and devices. Sensors incorporates data, for example, temperature, fan and core speed, static weight and so on. Would we be able to utilize this information to anticipate inside specific edges to what extent an air ship motor will work without failures? Also, assuming if this is the case, how to do it? This is the question about the concept of remaining useful life (RUL) which is yet to be answered. It means to appraise the rest of the time a thing, part, or framework can work as per its expected reason before justifying substitution. The present research demonstrates to utilize profound learning in R using PCA (Htelling’s T-square method, Gaussian mixture model and One-class SVM) to foresee the RUL with unsupervised learning. It is intended to give a precedent contextual investigation, not a comprehensive and extreme arrangement. There is a lack of real data to answer this inquiry. Be that as it may, data simulations have been made and given a special asset. One such an entrancing reproduction is given by the C-MAPSS information [1]. It gives prepare information that show sensor-based time-arrangement until the timepoint the motor separates. Conversely, the test data comprise of sensor-based time-series a “random” time before the endpoint. The key point is to assess the RUL of the test set, that is, how much time is left after the last recorded time-point.

I.     Literature Review

II.   Data Analysis

A.      Data Description and Preparation

The raw data was collected from NASA free data sets, named as “Damage Propagation Modelling for Aircraft Engine Run-to-Failure Simulation,” International Conference on Prognostics and Health Management, (2008)., which contains 4 machines training dataset with 4 test data set as well, and their respective (RUL) data set.

The data set incorporates time series for every engine. All enginess are of a similar sort, however every engine begins with various degrees of starting wear and varieties in the manufacturing processes, which is not known to the user. There are three optional settings that can be utilized to change the execution of each machine. Every engine has 21 sensors gathering distinctive estimations identified with the engine state at runtime. Gathered data is debased with sensor noise.

After some time, every engine builds up a fault, which can be seen through sensor readings. The time series closes some time before the failure. Data incorporates unit( engine) number, time stamps, three settings, and readings for 21 sensors.

Table. C-MAPSS contributions to simulate different degradation situations in any of the five rotating engines of the simulated engine.




Unit number of the machines


Recording Data time


Setting of Engine 1


Setting of Engine 2


Setting of Engine 3


Sensor 1 Data w/r to Time


Sensor 2 Data w/r to Time


Sensor 3 Data w/r to Time


Sensor 4 Data w/r to Time


Sensor 5 Data w/r to Time


Sensor 6 Data w/r to Time


Sensor 7 Data w/r to Time


Sensor 8 Data w/r to Time

Up to Sensors 21…

Sensors Data w/r to Time

B.      What to Predict?

In other way, which variable is our target. Since the engine is degrading after each operational, we need to predict how many cycles is left before the engine breaks down. This brings us to Remaining Useful Life or RUL. This can help us measure how many life cycles are remaining before it breaks down.

1.)     So, what is RUL?

The Remaining Useful Life (RUL) is a term that helps us to know the number of residual years that a thing, segment, or framework is assessed to have the capacity to work as per its planned reason before justifying substitution. The remaining useful life is assessed dependent on perceptions, or normal appraisals of comparable things, segments, or frameworks, or a mix thereof. For instance, the remaining useful life of a rooftop with a PVC film that was introduced around seven years prior and was ineffectively kept up may be roughly ten years. The Remaining Useful Life of building segments and frameworks is noted in a Property Condition Assessment and is utilized to help ascertain expected short-and long-term capital costs required to keep up a property.

2.)     Technical issues with the Data

The training data does not have RUL variable.

The only provided RUL is in the last cycle of each engine.

In other words, training data is not labelled but test data is partially.

3.)     Solutions to the issues:

Assuming that the last cycle of each machine is failure.

Which means, at the last cycle of each machines RUL equals zero.

So, in each cycle, RUL decreases until it comes to 0.

C.     Experimental Setting

1)     Environment

There were many tools used as to depict outcomes with different tools, and all of them are open sourced for this research paper.

2)     Model Selection Process

  At first, when modelled and graphed according to the senor readings, the reading was frequent as:

Since, a lot of sensor data were co-related to each other, and a lot of sensor measurements were constant. The models (s2, s8, s9) was selected for the final model.

3)     Modelling Approach

a.)     Data Pre-processing (Training set):  The following process is applied for Data Pre-Processing:

       Setting variables name

       Extracting sensors effectively

       Labelling conditions into 4 categories based on RUL.

b.)     Data Pre-Processing (Test set):

The relationship between each category are as follows:

          0~50 cycles: urgent

51~125 cycles: short

126~200 cycles: medium

201~: long

4)     Visualizing Data


1.)     Dimensional reduction by PCA

By applying Principal Component Analysis (PCA) to make the data in standard form for training the data set. We can see by creating plot for the first 2, we can confirm that first “long” class data forms clusters, and value of first increases as per the number of cycles.

Htelling’s T-square method:

Among numerous statistical oddity location strategies, Hotelling’s T-square strategy, a multivariate factual examination procedure, has been a standout amongst the most ordinary technique. This technique has a crucial suspicion that the data pursue a unimodal distribution. In light of this supposition, the strategy computes squared Mahalanobis distance for every datum in multi-dimensional space, and judges x percent anomalies in dataset as an oddity.

Training Data Model:

Testing Data Model:

Same goes for the GMM Model:

Despite the fact that Hotelling’s T-square strategy is material for some multi-dimensional data sets, this technique has a major suspicion that the data pursue a unimodal conveyance. Thus, when the data pursues multimodal appropriation, other PdM strategies ought to be connected. Gaussian blend demonstrate is a probabilistic model that expect every one of the information focuses are produced from a blend of a different Gaussian dissemination. Therefore, by evaluating mean and difference esteems for each Gaussian circulation from watched information, x percent anomalies can be distinguished. To figure the greatest probability estimation of Gaussian blend display, Desire Augmentation (EM) calculation is regularly utilized.

Training Data Model

Testing Data Model

Same for the One-Class SVM

The past strategies, Hotelling’s T-square strategy and Gaussian blend demonstrate, utilize Gaussian dissemination based parametric model. Nonetheless, in handy circumstance, in some cases data dissemination does not have express groups or, in more serious case, can’t be gotten a handle on for some reasons, for example, extensive number of measurements. In such a case, non-parametric model can be pertinent. In this demo, I might want to demonstrate how one class SVM, one of run of the mill non-parametric order strategy, can recognize a x percent exception from a given data collection.

Training Data Model

Testing Data Model:


The primary target of predictive maintenance is to predict when hardware failure can happen. At that point keep that failure by taking significant actions. Predictive Maintenance System (PMS) screens future failures and will plan support ahead of time.

This can help us in:

Reducing maintenance frequently.

Cost saving

Reducing machine failures

Predictive maintenance using different regression and classification algorithms. These techniques require large amount of training data that includes failure readings. Since failures does not happen frequently, data collections can take quite a long time. This still remains a significant issue in predictive maintenance. This type of data needs to dealt in different scenario. In other words, making it a classification problem rather than regression.

III. References

  A. Saxena, K. Goebel, D. Simon and N. Eklund, “Damage Propagation Modelling for Aircraft Engine Run-to-Failure Simulation,” International Conference on Prognostics and Health Management, (2008).

Turbofan Engine Degradation Simulation Data Set

Mathworks (2018). Examples of Data Analytics for Predictive Maintenance

Roshan Alwis., Srinath Perera, Srini Penchikala (2015)..Machine Learning Techniques for Predictive Maintenance

Ruthger Righart (2018). Sensor Time Series of Aircraft Engines

Hank Roark (2016). Machine Learning for Sensored Internet of Things

R4DS Online Learning Community (2018). Explorartory Data Analysis of NASA Turbo Engine Degradation Data

Research Gate (2014). Predictive Maintenance Data Sets

University of South Florida Scholar Commons (2015). Essemble Learning Method on Machine Maintenance Data

Ye Xing (2017). Aircraft Predictive Maintenance Project

Science Direct R.B. Jolly, S.O.T. Ogaji (2016). Gas Turbine diagnostics using artificial neural-networks for a high bypass ratio military turbo engine

Predicting Wind and Solar Generation from Weather Data Using Linear Regression


This project is basically concerned with predicting the winds and solar generation using simple linear regression method from the weather data. So in practice we will practice with many data science concepts and algorithms.

So in order to implement this project firstly we have checked all the data algorithms from those areas where data sets for weather are clear and these data sets are just ready for machine learning implementation

The second thing is that the data comes from winds, solar generation and weather is in hourly resolution means after each an hour solar and winds condition of the specific area has been checked regularly. That project allowed us to deal with time series in a real-world context.

The other thing is that as we are not using some critical models or algorithms to predict the weather system but we are using the real and correct authentic data that has been previously observed in the specific areas where we are concerned to predict the solar generation and winds.

It is not a good exercise to work with the data that is connected to energy which is renewable.

We can image that it is most important predicting the future renewables generation of nature like wind and solar generation.

Main Body

The data that we have used in the analysis of predicting the solar generation and wind generation is come from a free source named Open Power System Data that is arranged by country wise and in time series

Basically this platform contains data for many countries like about 37 to 40 countries by year or month wise but now in our project/prediction we will just use the weather data for Germany and moreover we will just concentrate on two of the parameters :

Time series with load, wind and solar prices in hourly resolution. The CSV file with data for all 37 countries from 2006 and 2017.


Weather Data

A greater amount of data has been provided with many parameters like wind speed temperature radiations solar heat raining water and many more measurements.


Wind and solar generation

Now we have the CSV file that the data with time series of more about 37 European countries of 12 years from the year 2006 to 2017 but as we have talk earlier about that we will just concern about the data of weather only Germany

production = pd.read_csv(“data/time_series_60min_singleindex.csv”,

                        usecols=(lambda s: s.startswith(‘utc’) |


                        parse_dates=[0], index_col=0)

In the following code the function used optionare_dates=[0] with the index_col=0, this one of the function guarantee that in DatetimeIndex date and time with column will be stored.

Finally we have just filters the rows of year 2016 for Germany and just got the data frames 8784 entries and just about the 48 column each of the row and column reates to the different quantries such as wind capacity and solar capacity.

But as it is clear from the name of our project and the goal of our project that we just concerned with ofly two columns Wind generation and the solar generation



Luckily we can make the plots for the data for wind generation and solar generation because there is no any missing value in the data that we have got so it will be the best practice.

From the above plotted pattern for wind generation in the Germany it can be clearly says that there is no clear and understandable pattern so that from this plot we can’t predict or understand as many things as we have to understand.

Secondly below another plot has been generated using the solar generation data in Germany in the MW.

Now from the above plotted graph one main thing can be observed that the solar generation much greater in the middle 

Weather data

weather = pd.read_csv(“data/weather_data_GER_2016.csv”,

                     parse_dates=[0], index_col=0)

If we check the info atribute of the weather DataFrame, we obtain:

DatetimeIndex: 2248704 entries, 2016-01-01 00:00:00 to

2016-12-31 23:00:00

Data columns (total 14 columns):

cumulated hours    int64

lat                float64

lon                float64

v1                 float64

v2                 float64

v_50m              float64

h1                 int64

h2                 int64

z0                 float64

SWTDN              float64

SWGDN              float64

T                  float64

rho                float64

p                  float64

dtypes: float64(11), int64(3)

memory usage: 257.3 MB

The other columns are as follows:

Parameters of wind:

v1: velocity [m/s] at height h1 (2 meters above displacement height);

v2: velocity [m/s] at height h2 (10 meters above displacement height);

v_50m: velocity [m/s] at 50 meters above ground;

h1: height above ground [m] (h1 = displacement height +2m);

h2: height above ground [m] (h2 = displacement height +10m);

z0: roughness length [m];

Solar parameters:

SWTDN: total top-of-the-atmosphere horizontal radiation [W/m²];

SWGDN: total ground horizontal radiation [W/m²];

Temperature data:

T: Temperature [K] at 2 meters above displacement height (see h1);

Air data:

Rho: air density [kg/m³] at surface;

p: air pressure [Pa] at surface.

As we did not know the locations of winds panels and solar panels in the Germany so we have this one of the main limitation for analysis so simply we are just grouping the weather data for each of the hour and we will take the average over the chunks.

Now firstly we have to check the average weather behavior with the help of plots in Germany 2016

In the plot of wind velocity generation that is given above it can be clearly observed that the speed of wind is not normally in all over the year but it is larger in the months February And November December

As with the wind generation, we see that the wind velocity does not follow a specific pattern, although it was larger in February, November and December.

In the summer/hot months like June July august the horizontal radiations at the level of ground expected to be larger than others.

By all of the above plots that has been plotted for wind generations and solar generations from these plots somethings can be observed or understand like the correlation between the solar generation and wind generation.

So further observation can also be obtained by the help of plots that is plotted below in which wind and solar generation by the function of some weather parameters has been shown.

As we have talked earlier that there must exists a relation between solar generation and wind generation by means of some other weather quantities so from the above plots it has been cleared that the co relation between winds and solar generation is by the function of velocities like v1, v2, and v_50m

So what is v1, v2 and v_50 it has been mentioned and described in details above in code.

From the above plotted graphs it can be observed that there exists a linear relation between the top of the atmosphere ground radiations and solar generations

By all of the observations now we are going to make linear regression algorithm to predict the solar generation and wind generation by means of some of the above quantities

Predicting the wind and solar generation using linear regression

Linear regression is a linear approach to modelling the relationship between a dependent variable and independent variables.

The resulted output of a linear regression algorithm is a linear function of the input.

The equation of linear regression is:   

The objective is to find the parameters which minimize the mean squared error:

This can be achieved using Linear Regression from the scikit-learn library.


Wind generation

A feature matrix named X_wind with the following features v1, v2 and v_50m with the target named Y_wind with actual wind generation has been created in order to predict the wind generation

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import cross_val_score

lr = LinearRegression()

scores_wind = cross_val_score(lr, X_wind, y_wind, cv=5)

print(scores_wind, “
average =”, np.mean(scores_wind))

we will analyse that what is going in the algorithm because we are familiar with this

Frst of all our first job is to import the LinearRegression from sklearn.linear_model  by which ordinary least square regression will be done.

After applying the regression algorithm now the prime duty is evaluating the performance of the algorithm so for this purpose we divide the data using a specific procedure named cross validation. For the K folds of the cross validation the data must be divided into tiny sets and then the resulting model is validated on the remaining part of the data set as it can also be observed in the picture and concept will be clear

The performance measure provided by the CV is then the average of the performance measure computed in each experiment. In the code above, we use cross_val_score from sklearn.model_selection, with number of folds cv=5

Example of 5-fold CV. Source: https://www.kaggle.com/dansbecker/cross-validation

The performance measure that LinearRegression gives by default is the coefficient of determination R² of the prediction. It measures how well the predictions approximate the true values. A value close to 1 means that the regression makes predictions which are close to the true values. It is formally computed using the formula:

The output of the code above for our case is:

[0.88261401 0.88886305 0.83623262 0.88974363 0.85338174]

average = 0.870167010172279

Now we have to give some idea about analyzing of this output means what is going on in this output or what this output represent/show.

The one line contains the results of R² for each of the five sets, but the line 2 basically contains their average that has been resulted in line 1.

Solar generation

Here again the matrix X_solar with features SWTDN, SWGDN and the T has been constructed and the matrix Y_solar with the actual solar generation has been created after that this algorithm has been implemented

scores_solar = cross_val_score(lr, X_solar, y_solar, cv=5)

print(scores_solar, “
average =”, np.mean(scores_solar))

The output is:

[0.8901974  0.95027431 0.95982151 0.95090201 0.8715077 ]

average = 0.9245405855731855

Now again we have to give some idea about analyzing of this output means what is going on in this output or what this output represent/show.

The one line contains the results of R² for each of the five sets, but the line 2 basically contains their average that has been resulted in line 1

As it has been concluded that we have done a better job using the real weather data of the world and used very simple algorithms for predicting the winds and solar generations of course more sophisticated and better performance algorithms and analysis can also be made but those will be beyond out scope.


So here through this code we predict wind and solar generation by using the ML technique of linear regression. The results of mean square value seems to be quite accurate as it comes out to be near to 1.

We find out that it’s quite easy and impressive to predict the results using this simple regression technique and we get the accuracy of almost 85% for solar generation and 92% for wind generation.

However we can use more complex models to improve the accuracy even more.


[1] https://towardsdatascience.com/introduction-to-machine-learning-algorithms-linear-regression-14c4e325882a?fbclid=IwAR3RCZ9w0QZJwNgRgGAPsfCfLNdlIwNIZphzuEqS_l8rfy_uwe0MevNwi-Y

[2] https://medium.com/hugo-ferreiras-blog/predicting-wind-and-solar-generation-from-weather-data-using-machine-learning-998d7db8415e

[3] https://towardsdatascience.com/introduction-to-machine-learning-algorithms-linear-regression-14c4e325882a

[4] https://www.quora.com/What-are-the-advantages-and-disadvantages-of-linear-regression

[5] https://datascience.stackexchange.com/questions/30465/tips-to-improve-linear-regression-model

Variables Predicting Physical Attacks on Public School Teachers

Variables Predicting Physical Attacks on Public School Teachers in the United States with Data from the Schools and Staffing Survey: A Brief Report
In this brief report, data from the 2011-2012 Schools and Staffing Survey Teacher Questionnaire was analyzed through multiple regression to determine variables that were statistically significantly associated with physical attacks on public school teachers in the United States.  It was found that special education teachers were statistically significantly more likely to be physically attacked than general education teachers.  In addition, the number of students with Individualized Education Programs, the number of students with limited English proficiency, and the number of threats received were statistically significant predictors of the number of physical attacks on teachers.  The four variables accounted for 46 percent of the variance for teachers who reported being physically attacked in the preceding 12 months.
Key Words: Schools and Staffing Survey, School and Staffing Survey Teacher Questionnaire, student-on-teacher attacks, special education, school violence
This brief report presents results of a preliminary investigation regarding physical attacks on public school teachers in the United States. It is not the intent of this paper to provide an in-depth review of the literature on student-on-teacher violence and a detailed analysis of the literature concerning attacks on school teachers is not presented in this paper. A thorough analyses of the literature concerning school violence can be found in Espelage, Anderman, Brown, Jones, Lane, McMahon, Reddy, & Reynolds (2013), McMahon, Martinez, Espelage, Rose, Reddy, Lane, Anderman, Reynolds, & Brown (2014), and Robers, Zhang, Morgan, and Musu-Gillette  (2015).

Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service

The purpose of this study was to examine variables related to student-on-teacher violence using data from a nationally representative sample of teachers in the United States. As Espelage et al., (2013) stated, contextual information about student-on-teacher assaults in the United States is lacking. The Indicators of School Crime and Safety Report showed that public school teachers were more likely to report being threatened and physically attacked that their private school peers (Robers et al., 2015). McMahon et al. (2014) found that a majority of teachers surveyed in the United States reported at least one type of harassment or victimization experience in the past year while others have suggested these experiences are under-reported (Levin et al., 2006).
Williams, Billingsley, and Banks (2018) found that special education teachers were statistically significantly more likely to be physically attacked than 11 other categories of public school teachers listed on the 2011-2012 Schools and Staffing Survey Teacher Questionnaire (SASS TQ). Williams and Ernst (2016) presented a descriptive analysis of demographic characteristics from the SASS TQ that were associated with higher levels of reported physical attacks on public school teachers. However, there have been no statistical analysis utilizing SASS TQ variables that predict student-on-teacher violence.  This investigation was launched to determine if there were statistically significant variables that may predict physical attacks on public school teachers in the United States based on data from the SASS TQ. 
This study examined measures of physical attacks on public school teachers sin the United States using the following research question to guide the analysis:

Which SASS TQ variables are statistically significant predictors of public school teachers who reported being physically attacked by students?

The Schools and Staffing Survey was conducted by the National Center for Education Statistics and administered by the Institute for Education Sciences (IES) on behalf of the U.S. Department of Education to collect extensive data on American elementary and secondary schools. This study analyzed data from the SASS TQ. The purpose of the SASS TQ was to obtain information about teachers, such as education and training, teaching assignment, certification, workload, and perceptions and attitudes about teaching to present a comprehensive picture of elementary and secondary education in the United States. The SASS TQ was designed to produce national, regional, and state estimates and is an excellent resource for analysis and reporting on elementary and secondary educational issues. (Tourkin, Thomas, Swaim, Cox, Parmer, Jackson, Cole, & Zhang, 2010, p. 1).
This study examined data from full and part-time public school teachers who completed the SASS TQ and responded affirmatively to the question asking whether they were physically attacked in the past 12 months.  This resulted in an unweighted sample size of 1,550 teachers and 197,360 teachers when weighted. 
Variables Analyzed
Table 1 includes the question or the variable name from the SASS TQ and coding for each variable incorporated in the study. The dependent variable in this study was physical attacks. The independent variables were threats, main teaching assignment, gender, race, students with individualized education plans (IEP) caseload, students with limited English proficiency (LEP), education level, years teaching experience,  certification route, certification status, school type and poverty level. 

Table 1. Variables used in the analysis.


SASS TQ Question*

Variable Type and Coding

Dependent Variable


Physical Attacks

In the past 12 months, how many times has a student FROM THIS SCHOOL physically attacked you? (SASS TQ, p. 38 )


Independent Variables



In the past 12 months, how many times has a student FROM THIS SCHOOL threatened to injure you? (SASS TQ, p. 38)


Main Teaching Assignment

The IES created ASSIGN03 to categorized teachers into their main teaching assignments.  ASSIGN03 consisted of 12 categories: 1) Early Childhood or General Elementary; 2) Special Education; 3) Arts or Music; 4) English and Language Arts; 5) ESL or Bilingual Education; 6) Foreign Language; 7) Health or Physical Education; 8) Mathematics; 9) Natural Science; 10) Social Science; 11) Vocational Career, or Technical Education, and 12) All Others.

Special Education/General Education


Are you male or female? (SASS TQ, p. 42)



What is your race? The SASS TQ provided five choices for race: White, Black or African-American, Asian, Native Hawaiian or Other Pacific Islander, or American Indian or Alaska Native. (SASS TQ, p. 43 )

White/Teachers of Color

IEP Caseload

Of all the students you teach at this school, how many have an Individualized Education Program (IEP) because they have disabilities or are special education students? (SASS TQ, p. 10 )


LEP Caseload

Of all the students you teach at this school, how many are of limited-English proficiency or are English-language learners (ELLs)? (Students of limited-English proficiency [LEP] or English-language learners [ELLs] are those whose native or dominant language is other than English and who have sufficient difficulty speaking, reading, writing, or understanding the English language as to deny them the opportunity to learn successfully in an English-speaking-only classroom.) (SASS TQ, p. 10 )


School Poverty

The IES created variable NSLAPP_S asked of schools that participated in the National School Lunch Program (NSLP), the percentage of their K–12 enrollment that was approved for free or reduced-price lunches.

High/All other levels
75%  or greater free or reduced lunch was deemed high poverty

Certification Type

Which of the following describes the teaching certificate you currently hold that certifies you to teach in THIS state? (SASS TQ, p. 22 )

Fully Certified/Not Fully Certified

Certification Route

Did you enter teaching through an alternative certification program?  (SASS TQ, p. 27 )


Education Level

The IES created variable HIDEGR was used to determine the highest degree obtained.

Master’s Degree or greater/Bachelor’s degree or less

Teacher Age

The IES created variable AGE_T was used to determine teacher age.


Teaching Experience

The IES created variable TOTYREXP was used to determine years of teaching experience


School Level

The IES created variable TLEV2_03 grouped teachers’ responses into either elementary or secondary as the instructional level.


Note: * A detailed explanation of the variables in the SASS TQ can be found in Cox, Parmer, Strizek & Thomas (2016).

This study consisted of a secondary analysis of the 2011-2012 SASS TQ restricted-use dataset. Specific NCES reporting protocols were followed and all findings were submitted to the IES for approval and authorization for release. The findings met reporting protocols for release to the general public.
Data were analyzed using STATA 13. Data were weighted using the Teacher Final Sampling Weight (TFNLWGT) variable and the SASS TQ supplied 88 replicate weight variables (TREPWT1-TREPWT88). The analysis utilized a balanced repeated replication procedure as required by IES. Multiple regression was used to examine the relationship between physical attacks and the selected teacher, school, and student demographic variables. Table 2 shows the values for the variables employed. Probability levels of .05 or less were deemed to statistically significant. The IES required that all degrees of freedom be rounded to the nearest 10. The results presented are based on weighted data.

Table 2. Descriptive statistics for regression variables.






Physical Attacks





IEP Students





LEP Students





High Poverty





Certification Route





Certification Status





Special or General Education





School Level

























Teaching Experience





The number of students with IEPs, the number of students with LEP, and number of threats received, and being a special education teacher were statistically significant predictors of the number of physical attacks on teachers in public schools in the United States. These variables explained a statistically significant proportion of variance in physical attacks, R2 = .465, F(10, 80) = 9.090, p t values, and p values.

Table 3. Regression coefficients, standard errors, t-values, and p-values






IEP Students





LEP Students





High Poverty





Certification Route





Certification Status





Special or General Education





School Level

























Teaching Experience





Note. BBR is balanced repeated replication. SE is standard error.

Violence and aggression toward public school teachers in the United States is a problematic issue.  Our findings indicated that special education teachers reported a statistically significantly higher number of physical attacks by students than their general education counterparts in the United States. These findings were consistent with prior research studies (Duhart, 2001; Wei et al., 2013; Williams, Billingsley, & Banks; 2018). It was interesting to note that the lower the number of students with IEPs, the more likely the teacher was to be assaulted. One possible explanation could be that these teachers work with students who are in a more restrictive environment such as self-contained classrooms where student to teacher to student ratios are typically low and the students typically have more severe behavioral issues. A higher number of students’ with LEPs was a statistically significant indicator of being physically attacked as was the number of times that a teacher was threatened. None of the other variables examined in this study produced statistically significant findings.
It is important to acknowledge the limitations of these findings. The results of this study are exploratory in nature and represented a set of variables that Williams and Ernst (2016) found to be associated with physical attacks in their descriptive analysis of teachers who reported being physically attacked within the preceding 12 months on the SASS TQ.  As the dependent variable was based on teacher self-reports, it is subject to errors of recall and bias. To reduce errors of recall, we only selected participants who indicated the frequency of physical attacks experienced from students in the previous 12 months. The variables examined in this analysis were not exhaustive in nature and other variables not included in this analysis might be statistically significant as well.  In addition, no interaction effects were examined in this study. Therefore, results should be interpreted with caution.
The results are also limited inasmuch as the SASS TQ data set does not provide information on the severity of students’ threats or attacks or the nature of the attacks.  As stated in Williams, Billingsley and Banks (2018) the SASS TQ offers no information about whether physical attacks were perpetrated by a single student or whether multiple students are involved. Since the SASS TQ is self-reported data, teacher perceptions of what constitutes a physical attack would likely vary across teachers and environments. We believe that these findings, despite the limitations, provide a sound basis for the continued investigation into student on teacher physical attacks using nationally representative data sets such as the SASS TQ to gain broader insight into issues related to school violence.

Cox, S., Parmer, R., Strizek, G., and Thomas, T. (2016). Documentation for the 2011–12 Schools and Staffing Survey (NCES 2016-817). U.S. Department of Education. Washington, DC: National Center for Education Statistics. Retrieved December 1, 2018 from http://nces.ed.gov/pubsearch.
Duhart, D. T. (2001). Violence in the Workplace, Bureau of Justice Statistics, 1993-99.
Espelage, D., Anderman, E. M., Brown, V. E., Jones, A., Lane, K. L., McMahon, S. D., . . . Reynolds, C. R. (2013). Understanding and preventing violence directed against teachers: Recommendations for a national research, practice, and policy agenda. American Psychologist, 68(2), 75-87. doi:10.1037/a0031307Levin, P.F., Martinez, M. Q., Walcott-McQuigg, J., Chen, S. P., Amman, M., & Guenette, C. (2006). Injuries associated with teacher assaults, AAOHN Journal, 54(5), 210-216.
McMahon, S. D., Martinez, A., Espelage, D. L., Rose, C., Reddy, L. A., Lane, K., … Brown, V. (2014). Violence directed against teachers: Results from a national survey. Psychology in the Schools, 51(7), 753-766. https://doi.org/10.1002/pits.21777
Robers, S., Zhang, A., Morgan, R.E., & Musu-Gillette, L. (2015). Indicators of School Crime and Safety: 2014 (NCES 2015-072/NCJ 248036). National Center for Education Statistics, U.S. Department of Education, and Bureau of Justice Statistics, Office of Justice Programs, U.S. Department of Justice. Washington, DC. 
Tourkin, S., Thomas, T., Swaim, N., Cox, S., Parmer, R., Jackson, B., Cole, C., & Zhang, B. (2010). Documentation for the 2007–08 Schools and Staffing Survey (NCES 2010-332). U.S. Department of Education. Washington, DC: National Center for Education Statistics. Retrieved December 17, 2015 from http://nces.ed.gov/pubsearch
Wei, C., Gerberich, S. G., Alexander, B. H., Ryan, A. D., Nachreiner, N. M., & Mongin, S. J. (2013). Work-related violence against educators in Minnesota: Rates and risks based on hours exposed. Journal of Safety Research, 44(1), 73–85. http://doi.org/10.1016/j.jsr.2012.12.005
Williams, T. O.,Billingsley, B.B., & Banks, A. (2018). Student on teacher threats and assaults: A comparison of general and special education teachers. Journal of Special Education Leadership (31)1.
Williams Jr, T., & Ernst, J. (2016). Physical Attacks: An Analysis Of Teacher Characteristics Using The Schools and Staffing Survey. Contemporary Issues In Education Research (CIER), 9(3), 129-136. doi:10.19030/cier.v9i3.9708


Methods for Predicting the Future of a Business

How to Predict the Future for Your Business.

There are those that say, ‘If it ain’t broke, don’t fix it.’ but there is always improvement to be made to increase a business’s profits. While some co-workers may think that you can tell the future, you know that it is simply a program that everyone has on his or her computers at home or at the office. Using just one forward-looking program; probability, distribution, uncertainty, sampling, statistical inference, regression analysis, time series, forecasting methods, optimization, and/or decision tree modeling, can take a business that is experiencing a loss and turn them around, so they will see a profit later in their fiscal year, if not sooner.

Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service

 All businesses need to be able to see what is coming before it hits so that they can prepare for whatever the future has in store for them. By using any of the listed models; probability, distribution, uncertainty, sampling, statistical inference, regression analysis, time series, forecasting methods, optimization, and/or decision tree modeling, a company will be in a better position to avert or take advantage of the coming issue.


Using probabilities is something that everybody does every day. There is a chance that someone will slip and fall in the shower or burn themselves with coffee on the way to work, or work out to heavily at the gym and tear a tendon, or any number of other events. Probability helps the owners of a business to predict if an event will, or will not happen (Test Prep Toolkit, n.d.). If an event as a zero probability, that event cannot happen; if the event has anything greater than a zero, there is a chance for that event to happen. (Albright & Winston, 2017). It is like playing 21in a casino at the one deck table. The player’s odds of hitting a 21 are slim, but if they continue to play with the same deck, and the player knows how to count cards, his odds increase as he plays. The odds for the player may be something like twenty percent at the beginning of the game but will improve closer to ninety percent the closer to the end of the deck they get.


There are many types of distributions that can be used for many things. The text discusses only four, and only two in-depth, normal and binomial. The normal distribution, or the bell curve as it is commonly known, can be used for anything from comparing the height of people to where someones I.Q. is compared to others in a group. For example, distributions can be used in a manufacturing plant to determine if the parts that a machine produces fall within the parameters of the blueprints for that part. “Once you know how your data is distributed, you can plan the appropriate type of analysis going forward (SAS Institute Inc., 2018, para. 2). The Quality Assurance worker will pull the specified number of samples off the line and takes them back to their lab. The worker measures each one of the samples carefully and enters the data into a computer. If the computer determines that the parts are too far out of specification, the worker will see that. Once they see that, they can stop the machine making the part to determine what the problem is and what needs to be completed to correct the issue and get the parts back to specifications.

The binomial distribution is a distribution that can be used “when sampling from a population that only has two types of members, and when performing a sequence of identical experiments, each of which has only two possible outcomes (Albright & Winston, 2017, p. 190). An example of this is throwing two multi-colored darts at a dartboard. Both the darts and the colored darts have an equal chance to be used and the same chance to not be used.


“Uncertainty is the measurement of the goodness of a result (NIST/SEMATECH, 2013, Section 2.5). There are four basic steps in using uncertainty. The first step is to “specify the target parameter of interest and an equation for its estimator” (Jurek, Maldonado, Greenland, & Church, 2007, para. 1). Next, it is necessary for someone to “specify the equation for random and bias effects on the estimator” (Jurek, Maldonado, Greenland, & Church, 2007, para. 1). For the third step, one must “specify prior probability distributions for the bias parameters” (Jurek, Maldonado, Greenland, & Church, 2007, para. 1). The final step is to use the “Monte-Carlo or another analytic technique to propagate the uncertainty about the bias parameters through the equation, to obtain an approximate posterior probability distribution for the parameter of interest (Jurek, Maldonado, Greenland, & Church, 2007, para. 1). If there was not something that could check the results, we could not judge the results for making the decisions relating to scientific excellence (NIST/SEMATECH, 2013). An example of this would be checking to make sure that the results of a prior, like a probability, test are a good fit. There can be more than one answer to uncertainty, depending on the prior test that was completed.


 The easiest to understand and simplest sampling methodology is simple random sampling. A simple random sample is a subset of a population where each object has the same chance as any other object in that population to be picked (“Simple Random Sample,” 2018). A disadvantage to using simple random sampling is that the examiner may create a sampling error. One example of a sampling error is when an analyst does not pick a sample that represents the entire population (“Sampling error,” 2018).

An example of this would be playing Bingo. There is a population of numbered balls in a spinner. The caller stops the spinner and withdraws one of the balls. The spinner is the original population, and the balls that the caller pulls out is the sample. Each member of the original population has the same chance as the other members of the population to be picked.

Statistical Inference

There are “three forms of statistical inference …each one representing a different way of using the information obtained in the sample to draw conclusions about the population” (“Unit 4A: Introduction to statistical inference,” 2018, para. 15), but here we will only discuss statistical inference in general.

“The general idea that underlies statistical inference is the comparison of particular statistics from on observational dataset (i.e., the mean, the standard deviation, the differences among the means of subsets of the data), with an appropriate reference distribution in order to judge the significance of those statistics.” (Bartlein, 2018, para. 1).

An example of this would be people taking exit polls during an election. The poll takers ask certain questions to attempt to determine how the election will turn out. Sometimes they are wrong and sometimes they are right.

Regression Analysis

Regression analysis is a way of using math to sort out which variable, the independent or the dependent variable, does have an impact (Gallo, 2015). For example, you are the manager of a fast food restaurant, and you want to know if it is better to stay open all night or to close the restaurant at 10:00 p.m. You gather the data about two years’ worth of paychecks for the workers along with the paychecks of the workers that have worked the overnight shift. After that it is a simple matter of inputting the right numbers into the regression analysis table and you will find your answer.

Inputting the numbers and coming out with a table of information, and then adding the chart and trendline, the line equation, and the R^2 value is the simple part of the problem, the confusing part is the numbers that are created when you add your raw data to the regression chart and try to make sense of them.

Time Series

Time series is a way to look at two variables, the dependent and the independent variable, over some amount of time that has elapsed at regular intervals and make predictions. It can be used with one variable or two. There is one caveat to using time series, and that is one of your variables has to be taken at uniform time intervals. It is like being a lawyer. One day you are told that there is an inmate that would like to see you. When you see them you already know the case, everybody already knows the basics of the case. The lawyer has two things they can do, take the case pro bono and lose the money they would have made taking other cases or take the case and use the publicity to improve their business reputation in the hopes that it will draw in more and better clients. To do this, they decide to use a time series analysis. He uses information from the time they started taking pro bono cases and their income, as it was reported on their tax returns. If the time series proves that they can take on the case without losing money, then they can defend their client; if it shows that they will lose money by taking the case, then they do not have to take the case.

Forecasting Methods

Forecasting is about predicting the future as accurately as possible, given all the information available, including historical data and knowledge of any future events that might impact the forecasts” (Hyndman & Athanasopoulos, 2018, Section 1.2). To forecast you need to start with five simple steps: define the problem, gather the needed information, do an exploratory analysis, choose and fit models, and finally using and evaluating the forecasting model (Hyndman & Athanasopoulos, 2018).

An example of this would be a call center wanting to know how many employees they need for the next week. They pull up information consisting of how many calls they had received during the last four years as well as the number of employees they had staffed during the same time period. Once they have all the information they use one of several models for both the number of calls over the past four days and the number of employees that they had staffed for the same four years, run the numbers through one of several models and get the answers that they want to get, if everything is correct in the data that has been input.


Optimization means that you methodically choose numbers from the decision variable to make the objective as large or as small as possible and contain all the constraints to be satisfied (Albright & Winston, 2017). Once this is completed, you move into the model development step. It is here that you decide what the decision variables will be, what your objective will be, which constraints you will use and how everything comes together (Albright & Winston, 2017).

Joe is a farmer getting ready to plant the hay that he needs to get his cattle through the summer if needed, and through the winter. So, he takes the amount of hay that he has produced over the last ten years, how many cattle he has over the winter and uses the optimizing software that he bought but never used to find out that for the last four years he has had to purchase 40 more large rolls of hay to make it until next spring. That means that he has to plant an additional ten acres for sixty-four large round bales to make sure his livestock makes it through the summer, fall, and winter. Now he needs to find out which of his other fields he can short and not lose a lot of money or shorting the farm somewhere down the road. He uses the optimization software, finds two fields that will have little or no effect on his income.

Decision Tree Modeling

A decision tree is a graphical representation of a decision, and every outcome of that decision has been grafted to the decision tree, and everything comes to a natural end, your decision tree has found the solution to the problem. They give everyone who uses them an easy way to understand the options of their decision and all of the possible outcomes.

In a business, there is a manager that has a dilemma. He needs to be present when a new project comes through so he can make sure that everyone knows what to do and how to do it, but his boss wants to take him to lunch today, and it has been scheduled for a month, and he needs to discuss some things with the boss before they get out of hand. He decides to use a decision tree on his problem, something that was just taught to them yesterday. After running through the decision tree twice his path was clear. He would go to his boss’s office to tell him that he needs to reschedule the lunch appointment for today because his crew is starting a new project and he wants to be there to make sure that things go well, and if they do not go well then he is there to take the brunt of the ridicule that will happen.


This paper has discussed ten different approaches to use so that you can find significant information buried in data. While there are ten of them, with a little practice you will have no trouble deciding which to use to get the information that you need at that time. Any business that uses these methodologies will be preparing for the future, good or bad, and taking advantage of them. These ten models: probability, distribution, uncertainty, sampling, statistical inference, regression analysis, time series, forecasting methods, optimization, and/or decision tree modeling, assist a company to increase its profits or avoid an issue in the new product or any other decisions that need to be made.


Albright, S. C., & Winston, W. L. (2017). Business analytics: Data analysis and decision making (6th ed.). Retrieved from https://platform.virdocs.com/app/v5/doc/351675/pg/1/toc

Bartlein, P. (2018). Statistical inference. Retrieved November 15, 2018, from http://geog.uoregon.edu/bartlein/courses/geog495/lec10.html

Gallo, A. (2015). A refresher on regression analysis. Retrieved from https://hbr.org/2015/11/a-refresher-on-regression-analysis

Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and practice. Retrieved November 15, 2018, from https://otexts.org/fpp2/index.html

Jurek, A. M., Maldonado, G., Greenland, S., & Church, T. R. (2007). Uncertainty analysis: an example of its application to estimating a survey proportion. Retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2465740/

NIST/SEMATECH. (2013). e-Handbook of Statistical Methods. Retrieved November15, 2018, from https://www.itl.nist.gov/div898/handbook/index.htm

Sampling error. (2018). Retrieved from https://www.investopedia.com/terms/s/samplingerror.asp

SAS Institute Inc. (2018). Distributions: Using the distribution platform. Retrieved from https://www.jmp.com/support/help/14/distributions.shtml

Simple Random Sample. (2018). Retrieved from https://www.investopedia.com/terms/s/simple-random-sample.asp

Test Prep Toolkit. (n.d.). Data analysis, statistics, and probability. Retrieved from https://www.testpreptoolkit.com/data-analysis-statistics-probability/

Unit 4A: Introduction to statistical inference. (2018). Retrieved November 15, 2018, from https://bolt.mph.ufl.edu/6050-6052/unit-4/


Predicting the Occurence of Deep Aquifier

Unaweep Canyon (western Colorado, USA) bisects the northwestward-trending Uncompahgre Plateau on the northern Colorado Plateau and is the only major canyon in the Colorado River drainage not occupied by a river (Soreghan et al., 2015). The Colorado and Uncompahgre Plateaus form part of the greater Rocky Mountain orogenic plateau, a large region of high elevation in the United States (McMillan et al., 2006). Studies done on this area have mainly focused on the history and drainage evolution of the Unaweep Canyon; however, the hydrochemistry characteristics of the area is underdeveloped. My research is to provide more insight into the chemical characteristics of the various aquifer systems present at Unaweep Canyon. A shallow fluvial aquifer system has already been identified by domestic well construction activities (as seen in Figure 1), but the probability of a deeper lacustrine aquifer has not been examined. So, I hypothesize that if the Unaweep Canyon has a deeper aquifer, then the hydrochemical characteristics of the Lacustrine deposits should match that of the seeps and springs found at bedrock fractures/faults. The significance of this research is to provide an alternative source for irrigation and also a source for drinking water for humankind. Usage of the deeper aquifers for domestic water consumption appears to be sustainable since the recharge rate of the deeper aquifers calculated from residence times is of comparable magnitude (Zheng et al., 2005). Proving the presence of a deeper aquifer can assist with the understanding of the interaction between groundwater and bedrock faults in mineral deposits. For example, Uranium fixed onto mineral grain boundaries or present in less-resistant minerals such as biotite or hornblende can be readily leached by groundwater (Gascoyne et al., 2002).

Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service

Water resources are closely linked to the wellbeing of humankind (Huang et al., 2019). Groundwater is a crucial aspect of life as it is essential as a source of drinking water supply, irrigation supply, and also supporting ecosystems. However, before it can be qualified to be safe for use and consumption, its water quality analyses must be carried out. Hydrochemical indices are commonly used to ascertain aquifer characteristics, salinity problems, anthropogenic inputs, and resource management, among others (Litaor et al., 2010). Hydrochemistry is an interdisciplinary field combining the expertise of hydrology, the study of the earth’s water with chemistry, which identifies the chemical composition of water. In my thesis, water samples from domestic wells seeps, and springs would be collected and analyzed to reveal its hydrochemical properties and possibly identify any differences or similarities from the different sampling locations, which would then be interpreted as accurate as possible using other analytical methods. Dissolved Organic matter concentrations and characterization, a hydrochemical measurement, for each sample would also be measured using a TOC-Analyzer and Spectrofluorometer (as seen in Figure 2), respectively, which can then be used as a tracer. Conversely, given its fluorescence properties, DOM could be used as a natural tracer in place of artificial dyes if its characteristics change with either space or time (Baker & Lamont-Black, 2001). If the DOM constituents differ in the wells and the seeps and springs, then it would highlight the probability of there being two different sources of water for both outlets.
The use of stable isotope analysis, oxygen, and hydrogen isotopes, in particular, is a handy tool in locating deep aquifers. Generally, waters originating from deep aquifers should have a lighter isotopic signature compared to surface water as a result of their reduced interaction with meteoric water. Thus, the source and evolution process of groundwater in a specific location can be determined based on the groundwater’s relationship with hydrogen and oxygen isotopes in meteoric water (Yeh & Lee, 2018). The presence of seeps and springs originating from the Precambrian rock at Unaweep Canyon provides an excellent source of groundwater that can be analyzed for its isotopic composition using a mass spectrometer and then compared to the isotopic composition of the domestic wells drilled into the shallow aquifer. If there is a considerable difference in the

18O and ∂2H values in both water, then it hints the presence of a confined aquifer that forces water to the surface of the earth; perhaps, a deeper aquifer system. If this deeper aquifer is proven, an additional step would be obtaining its age using ∂3H values. The measurement of the relationship between helium (the daughter isotope produced by the decay of tritium) and tritium (the parent isotope) can be used to calculate the groundwater age using the following equation (Moran & Hudson, 2006):
Groundwater Age (years) = -17.8 x ln (1 + 3Hetrit/3H)
The groundwater obtained would highlight the last time it was in contact with the atmosphere as meteoric water or as surface water being it got into the subsurface. The calculation of the groundwater age is only feasible if He is present and can be proven to be the daughter isotope of tritium. Therefore, performing this step is still being considered and dependent on cost, and the results gotten from an XRD analysis showing the occurrence of He.
Resistivity, a geophysical method, is another approach that can be used in identifying groundwater by measuring the resistivity/conductivity of the subsurface. The resistivity methods and especially the vertical electrical sounding method have been used successfully for investigating groundwater quality in different lithological settings because the instrumentation is simple, field logistics are easy, and the analysis of data is straight forward compared to other methods (Zohdy et al. 1974). At Unaweep Canyon, a resistivity investigation has been carried out (as seen in Figure 3) and then inverted to translate raw geophysical measurements to spatial patterns of the geophysical parameter (Binley et al., 2015). Examining this inverted data has shown possible areas of low high conductivity that might highlight water in the earth’s subsurface. However, because a medium’s resistivity is dependent on porosity, water content, and the concentration of salts, the information gotten from the VES survey needs to be correlated with the chemistry of the groundwater as well as lithological studies done in the area of interest. By correlating the inverted geophysical data with hydrochemical analyses, I would be able to examine the existence of a deeper aquifer system.
My thesis research at Unaweep Canyon, western Colorado, would involve mostly hydrochemical and isotope analysis to try to predict the occurrence of a deep aquifer. The age and source of the deeper groundwater would be inferred as well as its water quality. Thus, possibly providing an alternative to supply the drinking and irrigation needs of people.

Baker, A., & Lamont-Black, J. (2001). Fluorescence of dissolved organic matter as a natural tracer of groundwater. Ground Water, 39(5), 745-750. Retrieved from https://search-proquest-com.ezproxy.lib.ou.edu/docview/236850756?accountid=12964
Behm, M. (2019). Geophysics for Applications in Hydrology [Powerpoint slides]. Retrieved from: https://canvas.ou.edu/courses/163663/files/folder/Geophysics?preview=17678699
Binley, A., Hubbard, S. S., Huisman, J. A., Revil, A., Robinson, D. A., Singha, K., & Slater, L. D. (2015). The emergence of hydrogeophysics for improved understanding of subsurface processes over multiple scales: The Emergence of Hydrogeophysics. Water Resources Research, 51(6), 3837–3866. https://doi.org/10.1002/2015WR017016
Gascoyne, M., Miller, N. H., & Neymark, L. A. (2002). Uranium-series disequilibrium in tuffs from Yucca Mountain, Nevada, as evidence of pore-fluid flow over the last million years. Applied Geochemistry, 17(6), 781–792. https://doi.org/10.1016/S0883-2927(02)00038-0
Huang, F., Zhang, Y., Zhang, D., & Chen, X. (2019). Environmental Groundwater Depth for Groundwater-Dependent Terrestrial Ecosystems in Arid/Semiarid Regions: A Review. International journal of environmental research and public health, 16(5), 763. doi:10.3390/ijerph16050763
Litaor, M. I., Brielmann, H., Reichmann, O., & Shenker, M. (2010). Hydrochemical analysis of groundwater using a tree-based model. Journal of Hydrology, 387(3–4), 273–282. https://doi.org/10.1016/j.jhydrol.2010.04.017
McMillan M.E. Heller P.L. Wing L.W., 2006, history and causes of post-Laramide relief in the Rocky Mountain orogenic plateau: Geological Society of America Bulletin, v. 118, p. 393–405, doi:10.1130/B25712.1
Moran, J.E., Hudson, G.B. (2006). Using groundwater age and other isotopic signatures to delineate groundwater flow and stratification. In: Russell, H.A.J., Berg, R.C., Thorleifson, L.H. (Eds), Three-Dimensional Geological Mapping for Groundwater Applications: Workshop Extended Abstracts. Geol. Surv. Canada Open-File Rep. 5048, 53-56.
Soreghan, G. S., Sweet, D. E., Thomson, S. N., Kaplan, S. A., Marra, K. R., Balco, G., & Eccles, T. M. (2015). Geology of Unaweep Canyon and its role in the drainage evolution of the northern Colorado Plateau. Geosphere, 11(2), 320–341. https://doi.org/10.1130/GES01112.1
Yeh, H.-F., & Lee, J.-W. (2018). Stable Hydrogen and Oxygen Isotopes for Groundwater Sources of Penghu Islands, Taiwan. Geosciences, 8(3), 84. https://doi.org/10.3390/geosciences8030084
Zheng, Y., van Geen, A., Stute, M., Dhar, R., Mo, Z., Cheng, Z., … Ahmed, K. M. (2005). Geochemical and hydrogeological contrasts between shallow and deeper aquifers in two villages of Araihazar, Bangladesh: Implications for deeper aquifers as drinking water sources. Geochimica et Cosmochimica Acta, 69(22), 5203–5218. https://doi.org/10.1016/j.gca.2005.06.001
Zohdy A, Eaton GP, Mabey DR (1974) Application of surface geophysics to groundwater investigations: techniques of water-resources investigations of the United States Geological Survey, chap D1, book 2, 116 p


Figure 2. Digital elevation model of research area and surroundings. The mouth of Unaweep Canyon that hosts the Cutler rocks lies near Gateway along the Dolores River (Soreghan et al., 2015)

Figure 3. Resistivity profile of Unaweep Canyon, Colorado (Behm M., 2019, slide 17)