The post Tamba Publish World-First Report appeared first on Select Statistical Consultants.

]]>We helped Tamba to analyse the data collected from each unit, to understand whether there were any associations between actions taken by the units in improving their adherence to the NICE guidelines and changes in patient outcomes.

Tamba estimated that the NHS could save £8 million within one year of implementing the guidelines if all units in England achieved similar improvements in care practice, as there would be substantial reductions in neonatal admissions and emergency c-sections.

The post Tamba Publish World-First Report appeared first on Select Statistical Consultants.

]]>The post Developing the Next Generation of Data Scientists appeared first on Select Statistical Consultants.

]]>Students from Exeter College and Exeter Maths School who were nearing the end of their secondary education and had an interest in data science and developing their analytical skills in R were invited to attend the event, following completion of an intensive one-week summer course at the University. The Data Science Education for Teens project, led by Dr ZhiMin Xiao and funded by the HEFCE Catalyst project, aimed to foster the students’ interest in the field and develop key skills in areas including data processing, statistical thinking, inference, prediction, and model testing.

Speaking about the event, Sarah said, “I was delighted to be involved in this project, which encourages the teaching of data skills earlier in students’ schooling. Not only will these analytical skills be of great value in their future studies, I hope that opportunities such as this will encourage students to consider a career in statistics and help grow the next generation of data scientists.” She added that, “I was very impressed with how engaged the students were with the project and with their enthusiasm to learn about applying statistics in the real world. It was a real pleasure to have the opportunity to speak with young people who are interested in where further study in maths and statistics could take them in their careers.”

The post Developing the Next Generation of Data Scientists appeared first on Select Statistical Consultants.

]]>The post Assessing Questionnaire Validity appeared first on Select Statistical Consultants.

]]>

As constructs are intangible and complex human behaviours or characteristics, they are not well measured by any single question. They are better measured by asking a series of related questions covering different aspects of the construct of interest. The responses to these individual but related questions can then be combined to form a score or scale measure along a continuum.

As with scientific measurement instruments, two important qualities of surveys are consistency and accuracy. These are assessed by considering the survey’s *reliability **and validity*.

Following on from our previous blog which looked at approaches to assessing reliability, this blog focusses on ways to assess a survey’s validity.

As with reliability, the validity of a survey can be assessed in a number of different ways and the methods to choose will depend on the survey design and purpose. Often it is desirable to use more than one to facilitate a more rounded judgement of validity.

Validity is the extent to which an instrument, a survey, measures what it is supposed to measure: validity is an assessment of its accuracy.

*Face validity* and *content validity* are two forms of validity that are usually assessed qualitatively. A survey has ** face validity** if, in the view of the respondents, the questions measure what they are intended to measure. A survey has

Face and content validity are subjective opinions of non-experts and experts. Face validity is often seen as the weakest form of validity, and it is usually desirable to establish that your survey has other forms of validity in addition to face and content validity.

** Criterion validity** is the extent to which the measures derived from the survey relate to other external criteria. These external criteria can either be concurrent or predictive.

* Concurrent validity* criteria are measured at the same time as the survey, either with questions embedded within the survey, or measures obtained from other sources. It could be how well the measures derived from the survey correlate with another established, validated survey which measures the same construct, or how well a survey measuring affluence correlates with salary or household income.

Often the purpose of a survey is to make an assessment about a situation in the future, say the suitability of a candidate for a job or the likelihood of a student progressing to a higher level of education. * Predictive validity* criteria are gathered at some point in time after the survey and, for example, workplace performance measures or end of year exam scores are correlated with or regressed on the measures derived from the survey.

If the external criteria is categorical (for example, how well a survey measuring political opinion distinguishes between Conservative and Labour voters), while still criterion validity, how well a survey distinguishes between different groups of respondents is referred to as ** known-group validity**. This could be assessed by comparing the average scores of the different groups of respondents using t-tests or analysis of variance (ANOVA).

** Construct validity** is the extent to which the survey measures the theoretical construct it is intended to measure, and as such encompasses many, if not all, validity concepts rather than being viewed as a separate definition.

Confirmatory factor analysis (CFA) is a technique used to assess construct validity. With CFA we state how we believe the questionnaire items are correlated by specifying a theoretical model. Our theoretical model may be based on an earlier exploratory factor analysis (EFA), on previous research or from our own a priori theory. We calculate the statistical likelihood that the data from the questionnaire items fit with this model, thus confirming our theory.

This blog doesn’t provide an introduction to factor analysis, we’ll post an article on this topic in the future. Here we explain how factor analysis is used in the context of validity.

Below is a diagram representing a simple theoretical model.

Here there are five questionnaire items (labelled Q1 to Q5 in the diagram above), each of which is measured with a component of error or uncertainty (labelled e_{1} to e_{5}). The model hypothesises that an individual’s responses to each of the survey questions is influenced by the underlying latent construct, the factor. The construct is “latent” because it is not directly observed, it is observed or measured through responses to questions related to the construct. For example, respondents with higher levels of self confidence will be more likely to endorse statements such as “*I am happy being the centre of attention*” or “*I feel comfortable talking to people I don’t know*“, whereas respondents with lower levels of self-confidence are more likely to disagree with these statements. So respondents’ questionnaire responses are driven by, and correlated with, their underlying characteristic.

This kind of model is known as a factor analysis model. It shows how the correlations between the questionnaire items can be explained by correlations between each questionnaire item and an underlying latent construct, the factor. These correlations are known as factor loadings and are represented by arrows between the latent factor and the questionnaire items.

By fitting the model we can estimate these factor loadings. We then compare the estimates of the factor loadings with their standard errors and calculate the likelihood that these are different from zero, and therefore how much statistical evidence there is to support our hypothesis (that the theoretical factor analysis model fits the data). We can also compare the fit of the model overall with goodness of fit statistics such as the model Chi-squared, the Comparative Fit Index and the Root Mean Square Error of Approximation.

Using confirmatory factor analysis we test the extent to which the data from our survey is a good representation of our theoretical understanding of the construct; it tests the extent to which the questionnaire survey measures what it is intended to measure.

While reliability and validity are two different qualities, they are closely related and interconnected. A survey can have high reliability but poor validity. A survey, or any measurement instrument, can accurately measure the wrong thing. For example, a watch that runs 10 minutes fast. However, for a survey, or measurement instrument, to have good validity it must also have high reliability. Without good reliability a survey is not validly measuring what it is intended to measure: it is measuring something else (other constructs or noise). Reliability is a necessary but not sufficient condition for validity.

The post Assessing Questionnaire Validity appeared first on Select Statistical Consultants.

]]>The post Embracing Uncertainty appeared first on Select Statistical Consultants.

]]>Suppose we want to know whether a die is biased. If there is no way to deduce with certainty the die’s fairness using physical tests, we might turn to statistical methods to infer whether it is fair or not. We roll the die a few times and check the results. Say 40% of the rolls land on a 5. Does this mean the die is biased? Answering this question is not straightforward, because no matter how many times we roll the die we will never know with certainty whether it is biased – we might just be seeing an unusual pattern of rolls for a fair die. Nevertheless, we might think that our assessment of the die should differ if we roll it 500 times rather than just 5 times. How do we navigate between unjustified certainty and falsely modest claims of ignorance?

A wrong approach to the dilemma, but an approach that is conceptually similar to determining statistical significance, would be to set a given number of rolls as the threshold for certainty. Let us say 500 rolls is the cut-off. So, if 40% of the die’s rolls land on a 5 after 500 rolls we determine that the die is biased, but if 40% of the die’s rolls land on a 5 after only 499 rolls we have no basis on which to claim that the die is not fair. Dismissing the evidence if it does not reach a given threshold and ignoring the room for error if it reaches the threshold makes little statistical or logical sense. However, in a statistical significance test, a p-value of less than 0.05 being considered “king” and declared as ‘statistically significant is falling into this exact trap – it misguidedly interprets the reaching of an essentially arbitrary threshold as something approaching stone-cold proof.

The imposed diametric choice between certainty and cluelessness, reflected in the binary ‘statistically significant’ or ‘not statistically significant’ paradigm, makes little sense because it misrepresents the nature of statistical evidence. Statistical evidence grows gradually stronger as we gather more data or roll the die more times in our example above. Assuming we are sensible, this means that our confidence in our hypothesis should grow gradually stronger as we roll more times, all other things being equal. There is no statistical reason to venture into the realms of a ‘no evidence’ and ‘complete evidence’ dichotomy.

The argument that it makes practical sense might proceed as: “Decisions must be one or the other. There is no room for degrees when faced with a choice so we should set standards for evidence that mirror this.” There is an important question to be asked about the amount of evidence that should weigh in favour or against a decision; but to blur this question with the definition of the evidence itself is a misguided approach. There is logical room to say that we do not know for certain that something is the case but that, on balance, the decision to act as if the hypothesis is true is justified. Wrong decisions are more likely be made if we treat something as certain that is not.

At Select, we use statistics to give the best possible view of the inferential evidence in the data. This means embracing the uncertainty of statistical results and giving a range of likely scenarios which the data support with varying degrees. The decision-maker can then combine this evidence with the other factors that inevitably weigh into a decision such as expert knowledge, economic constraints, and the utilities of each of the possible outcomes. When statistical significance takes too prominent a place in the statistical results and p-values are over-interpreted there is more danger of data making decisions rather than informing them. We prefer to think about the valuable perspective data provide as one voice among many justifying a decision.

The great thing about embracing uncertainty is that once you start to think in these terms more possibilities present themselves.

Professional statisticians across the world are recognising and revolting against the damage that refusal to accept uncertainty can cause. Prominent in the discussion is a desire to move away from narrow-minded statistics where decisions are automated based on arbitrary thresholds and move towards a more thoughtful and holistic approach where data is used to inform decisions, not to make them. The great thing about embracing uncertainty is that once you start to think in these terms more possibilities present themselves. Maybe you can implement a probabilistic strategy rather than a dichotomous one. Maybe there are more, higher quality data you can collect to get closer to the truth. Maybe the data throws up a relationship that you are surprised by and decide to investigate further. Statistics and data are about much more than testing hypotheses, after all.

The post Embracing Uncertainty appeared first on Select Statistical Consultants.

]]>The post Women in Data UK Retail Analytics Event appeared first on Select Statistical Consultants.

]]>Louise said, “It was great to see so many people at the event. Data science is a rapidly growing industry, which offers exciting opportunities and challenges in retail analytics. Lots of the discussion centred on the topic of collaboration, both with fellow data scientists and with non-technical staff.

Women are well represented at Select, so it was interesting to hear first-hand that this is not always the case in the data sector. Given the relative youth of the industry, I believe that data science can be an exception to the rule of STEM (Science, Technology, Engineering and Mathematics) subjects and see women well-represented at all levels of the profession.”

The post Women in Data UK Retail Analytics Event appeared first on Select Statistical Consultants.

]]>The post Louise is Awarded Graduate Statistician Status appeared first on Select Statistical Consultants.

]]>As well as a valuable and hard-earned professional award in its own right, becoming a Graduate Statistician is an important stepping-stone towards achieving Chartered Statistician status. Louise will now start working towards becoming a Chartered Statistician by gaining more experience as a professional statistician at Select, abiding by the Society’s Code of Conduct, and undertaking plenty of Continuing Professional Development. A minimum of 5 years’ professional experience is required before a statistician becomes eligible for the Chartered Statistician award, making it a significant step from early career practitioner to fully-fledged professional statistician.

The post Louise is Awarded Graduate Statistician Status appeared first on Select Statistical Consultants.

]]>The post Data Visualisation with R’s ggplot2 Package appeared first on Select Statistical Consultants.

]]>The

tidyverseis a collection of R packages with data science in mind.

At Select, our go-to analysis tool is the statistical software package R. Not only was R specifically designed for statistical computing, it was also developed with a focus on graphics. As well as the standard plotting functions available in base R, additional functionality is also available through add-on packages of code. The *ggplot2 package* was developed by Hadley Wickham as part of the *tidyverse* (a collection of packages designed with data science in mind) and is considered one of the best tools for plotting graphs. It combines high levels of customisation with clean and visually pleasing graphics, often with minimal effort put in on the part of the programmer.

At Select we’ve been putting this to the test, using *ggplot2* to develop data visualisations such as those in our EU blog series (for example, see our article ‘How Are EU Migrants Represented Across the UK Workforce?’).

Many large corporations, including Google, Pfizer, Lloyds of London and Shell use R to analyse and present their data. In particular, data journalists on the BBC News’ Visual and Data Journalism team have been using R for complex and reproducible data analysis and to build prototypes for some time. For example, R was used to extract, wrangle, clean and explore data from hundreds of spreadsheets on NHS targets, for their award-winning NHS tracker project.

More recently, the BBC have fundamentally changed how they produce graphics for publication on the BBC News website. In their article, ‘How the BBC Visual and Data Journalism team works with graphics in R’, they describe their journey in moving to using R’s *ggplot2* package to create production-ready charts. They talk in-depth about how they have documented their progress and code, sharing what they have learned along the way via their BBC R graphics cookbook.

With *ggplot2*, it is possible to build up style frameworks which can be applied to any future plots, giving all your visualisations a consistent aesthetic theme. The BBC have made theirs public via the *bbplot package*. At Select we’ve similarly been setting-up a ‘Select theme’ to create clear and consistent plots for all of our projects.

We can also go one step further, creating dynamic web-applications with which viewers can interact using the R package *Shiny*. This can further enhance the communication of analyses, as discussed in our previous article ‘Interacting with Your Data’. *Shiny* allows users to interact with statistical codes and outputs; to re-do analyses and explore changing inputs, facilitating scenario planning. You can find lots of other interesting examples of *Shiny* apps via the *Shiny *Gallery and User Showcase.

R is a flexible and powerful data processing and analysis tool; combined with the power of its visualisation and web-app tools (including *ggplot2* and *Shiny*), it allows us to help our clients to fully explore and understand their data.

The post Data Visualisation with R’s ggplot2 Package appeared first on Select Statistical Consultants.

]]>The post Select Enjoys Research Students’ Conference appeared first on Select Statistical Consultants.

]]>Since 1980, this annual students’ conference has been run by and for post-graduate students working in statistics and probability. It is a great opportunity for post-graduate students to hear about cutting-edge research topics in statistics, to network with potential future colleagues, and to gain experience presenting their research.

Sally and Louise headed down to the conference to enjoy plenary talks by Leeds-based professor, John Paul Gosling, and president of the Royal Statistical Society, Deborah Ashby, as well as the students’ poster session.

The plenary talks recounted the speakers’ careers in statistics to date and highlighted the broad range of interesting projects that professional statisticians can and do get involved in. Louise says, “From assessing the severity of criminal sentencing, to exploring the effectiveness of alternatives to animal testing, to directing national drug regulation, it was inspiring to hear about the speakers’ work.”

The poster session showcased the research being conducted as part of statistics PhD programmes. Again, the range of statistical techniques being explored and potential applications being championed by the students was considerable. Louise commented that, “It was great to talk with students about their varied projects and exciting plans for the future.”

The post Select Enjoys Research Students’ Conference appeared first on Select Statistical Consultants.

]]>The post Novel Educational Research using Longitudinal Survey Data appeared first on Select Statistical Consultants.

]]>The research, published by the Institute for Social and Economic Research used longitudinal survey data collected in Understanding Society, the UK Household Longitudinal Study (UKHLS), which asked parents how often they helped their child with their homework. The study selected data from households where the child’s school had been inspected by Ofsted in the same year that the survey took place. The researchers exploited a natural feature of the survey and of Ofsted inspections; that either could occur at any point throughout the year. Households were divided into two groups; one group where the Ofsted inspection occurred before the survey interview and the other where the Ofsted inspection occurred after the survey interview.

Researchers exploited a natural feature of the survey… [for] random assignment. In education, often randomisation is not possible, is impractical, or sometimes considered unethical.

This meant that one group of parents knew the outcome of the Ofsted inspection while the other group did not. The authors say that “*This is as good as random assignment.*” Random assignment is rare in studies which analyse education data, as well as in many other areas of social science. Most studies are based on observational data, often because randomisation is not possible, is impractical, or sometimes considered unethical. Random assignment is preferred over observational data as it avoids underlying differences between the groups, in both observed and unobserved characteristics, which might bias the results.

The researchers linked data published by Ofsted with data published by the DfE and used a statistical model to predict the likely outcome of Ofsted inspections given schools’ performance data and background characteristics. The results of this model were used to identify households where the Ofsted inspection was better than would be expected, and parents received good news, and other households where the Ofsted inspection was worse than expected. At Select, we regularly combine our clients’ data with publicly available data in this way. For example, we recently incorporated school demographic data into our analysis of PG Online’s teaching materials.

A difference in differences model, a technique often used in the field of econometrics, was applied to explore whether the amount of help parents gave their child with homework had changed from the previous survey wave, while taking into account a number of associated factors. An alternative approach, from the more traditional statistical toolkit, may have been to apply a mixed effects model to the data.

Most attention is focussed at the school level… this study explores potential changes in parents’ behaviour and the support a child receives a home.

While controlling for differences in child, parent, household and school characteristics the research specifically focussed on whether any change was the same or different for parents who knew the outcome of the Ofsted inspection compared to parents who didn’t know the inspection outcome, and whether the parents who knew the outcome had received positive or negative news. The study found that parents who receive positive news about the Ofsted inspection of their child’s school tended to reduce the amount of time they help their child with homework. While it is widely known that there are often unintended consequences of accountability systems, including testing and inspections in schools, most attention is focussed at the school level. This study is unusual in that it explores potential changes in parents’ behaviour and the support a child receives at home.

There are many national surveys, conducted by a variety of organisations who make data available for research via services like the UK Data Service. Select analysed one such survey, a longitudinal survey of household finances, using a statistical model to explore the relationship between the level of cash savings and the avoidance of problem debt in our project for debt charity StepChange. While such national surveys and administrative data collections play a role in producing national statistics, population estimates and for monitoring trends, their value is truly realised when they are further used in novel research such as the above.

The post Novel Educational Research using Longitudinal Survey Data appeared first on Select Statistical Consultants.

]]>The post Assessing Questionnaire Reliability appeared first on Select Statistical Consultants.

]]>While some things we want to measure are simple and can be asked in a single question, for example “*How do you intend to vote?*” or “*Do you plan to go to university?*“, we may actually wish to consider a broader perspective, for example political views or future aspirations. Maybe what we are interested to measure cannot be summarised in a single question, measuring a healthy lifestyle or wellbeing, for example. Surveys are often employed to measure more complex and multifaceted human behaviours or characteristics, known as constructs. Being complex and multifaceted these are better measured by asking a series of related questions covering different aspects of the construct of interest. The responses to these individual questions can then be combined to form a score or scale measure along a continuum.

Two important qualities of surveys, as with all measurement instruments, are consistency and accuracy. These are assessed by considering the survey’s ** reliability **and

There are a number of different statistics we can use to estimate reliability and to make an assessment of validity. Choices of which statistics to consider will depend on the survey design and purpose. Some statistics may be more suitable in certain situations, and different statistics will give different results, reflecting different aspects of the survey’s performance. Reliability and validity are also not fixed qualities, they may change over time. Consequently it is desirable to use a number of alternative statistics to get a rounded assessment of a survey’s qualities.

In this blog, we focus on approaches to assessing reliability. We will discuss how to assess a survey’s validity in a future blog.

Reliability is the extent to which an instrument would give the same results if the measurement were to be taken again under the same conditions: its consistency.

One estimate of reliability is ** test-retest reliability**. This involves administering the survey with a group of respondents and repeating the survey with the same group at a later point in time. We then compare the responses at the two timepoints.

For categorical variables we can cross-tabulate and determine the percentage of agreement between the test and retest results, or calculate Cohen’s kappa^{1}.

For continuous variables, or where individual questions are combined to construct a score on a scale, we can compare the values at the two timepoints with a correlation.

One immediately obvious drawback of test-retest reliability is memory effects. The test and the retest are not happening under the same conditions. If people respond to the survey questions the second time in the same way they remember responding the first time, this will give an artificially good impression of reliability. Increasing the time between test and retest (to reduce the memory effects) introduces the prospect of genuine changes over time.

If the survey is to be used to make judgements or observations of another subject, for example clinicians assessing patients with pain or mental health issues, or teachers rating different aspects of children’s writing, we can compare different raters’ responses for the same subject; ** inter-rater reliability**. Here we would use the same statistics as for test-retest reliability. As with test-retest reliability the two measurements are again not taken under the same conditions, the raters are different; one may be systematically “harsher” than the other.

** Parallel-form reliability** involves developing two equivalent, parallel forms of the survey; form A and form B say, both measuring the same underlying construct, but with different questions in each. Respondents are asked to complete both surveys; some taking form A followed by form B, others taking form B first then form A. As the questions differ in each survey, the questions within each are combined to form separate scales. Based on the assumption that the parallel forms are indeed interchangeable, the correlation of the scale scores across the two forms is an estimate of their reliability. The disadvantage of this is that it is expensive; potentially double the cost of developing one survey.

An alternative is ** split-half reliability.** Here we divide the survey arbitrarily into two halves (odd and even question numbers, for example), and calculate the correlation of the scores on the scales from the two halves. Reliability is also a function of the number of questions in the scale, and we have effectively halved the number of questions. So we adjust the calculated correlation to estimate the reliability of a scale that is twice the length, using the Spearman Brown formula.

Split-half reliability is an estimate of reliability known as internal consistency; it measures the extent to which the questions in the survey all measure the same underlying construct. ** Cronbach’s alpha** is another measure of internal consistency reliability. For surveys or assessments with an even number of questions Cronbach’s alpha is the equivalent of the average reliability across all possible combinations of split-halves. Most analysis software will also routinely calculate, for each question or questionnaire item in the scale, the value of Cronbach’s alpha if that questionnaire item was deleted. These values can be examined to judge whether the reliability of the scale can be improved by removing any of the questionnaire items as demonstrated in the example below.

The scale that is constructed from these 6 questionnaire items has a Cronbach’s alpha of 0.866. The 4^{th} questionnaire item (Q4) has the weakest correlation with the other items, and removing this questionnaire item from the scale would improve the reliability, increasing Cronbach’s alpha to 0.893.

In this blog we have discussed various approaches to assessing reliability. Reliability and validity are interconnected. If a survey has poor reliability it will have poor validity. However, good reliability does not necessarily mean a survey will have good validity. Soon we’ll post another blog considering validity.

The post Assessing Questionnaire Reliability appeared first on Select Statistical Consultants.

]]>The post Data in Devon 2019 appeared first on Select Statistical Consultants.

]]>Select were pleased to attend the conference and our Senior Consultant and Acting MD, Sarah, was delighted to be asked to present at the event, holding a session on ‘What is R and Why Should You Use It?’.

In her talk, Sarah discussed the value of using R for data analysis. R is a programming language, specifically designed for statistical computing and graphics and is both free and open source.

R has been developed with visualisation in mind – there’s a huge range of different types of charts, graphs and plots available. Analyses conducted in R can also be turned into dynamic and interactive web applications using the package Shiny, which Sarah demonstrated during her talk.

With R, all the different processes required to turn data to insights – from data processing and manipulation to complex statistical analyses to visualisations and creating interactive dashboards – can be accommodated within a single platform.

We really enjoyed attending the event and it was a great opportunity to promote the benefits of R to other data professionals in the South West!

The post Data in Devon 2019 appeared first on Select Statistical Consultants.

]]>The post Select Works with PG Online to Explore the Effectiveness of its Teaching Materials appeared first on Select Statistical Consultants.

]]>We were asked to analyse the 2018 school-level GCSE and A level results, published by the Department for Education (DfE), for schools that had purchased PG Online’s materials, comparing these with similar schools that were not PG Online customers. In addition to exam result data, we used other publicly available school demographic data, so that our comparison was to schools that were similar in as many aspects as possible.

“Select have been meticulous in their attention to detail.”

We found that the average GCSE point score of schools that had purchased PG Online’s materials for the new 9-1 GCSE syllabus was, on average, 0.12 points higher, (with a 95% confidence interval of (0.05, 0.20)) than other schools, taking into account differences in schools’ background characteristics, such as average GCSE Computer Science score the previous year, school type, region and the proportion of students with levels of special educational needs (SEN) and those eligible for free school meals (FSM; a proxy for deprivation). This can also be expressed approximately as 3 students out of every 25 being expected to achieve one grade higher at GCSE.

There was less data available for the A level model than for GCSE, owing to the DfE suppressing the results for schools that were based on fewer than 5 students. From the data available, we again found that the average A Level point score of schools that had purchased PG Online’s A Level materials was, on average, 0.23 points higher (with a 95% confidence interval of (0.07, 0.39)) than other schools, again taking into account differences in schools’ background characteristics (specifically average A Level Computer Science score the previous year and the proportion of students who were eligible for FSM). This is equivalent to just less than 6 students out of every 25 being expected to achieve one grade higher at GCSE.

PG Online produced the following infographic, illustrating our findings:

Robert Heathcote, Director at PG Online, said of the work,

*“Select have been meticulous in their attention to detail and care in the output of their results, factoring in many external variables that we hadn’t previously thought possible. Their analysis was accurate, in-depth and on time, and has enabled us to test a key part of our mission which informs our future development.”*

Further details of the analyses and the results can be found in the full report published on PG Online’s website.

The post Select Works with PG Online to Explore the Effectiveness of its Teaching Materials appeared first on Select Statistical Consultants.

]]>The post Restaurant Ratings Model: Visualising the Key Findings appeared first on Select Statistical Consultants.

]]>We often use a logistic regression model to answer questions about the strength of factors associated with a binary outcome; for example, “What are the biggest factors associated with stroke occurrence?”. The logistic regression model can be tricky to interpret as the results are often presented as relative probabilities. For example, we might say that having a medical history of strokes increases the odds of having a stroke by 50% relative to someone who doesn’t have a medical history of strokes. The clever part about this visualisation is that we see the results as absolute probabilities – for example that the probability of having a stroke for those with a medical history of strokes is 40%. This probability statement is less convoluted and can be derived from exactly the same model.

To test the concept’s suitability to questions outside of political science, we found some open source data about restaurants – their characteristics and customer ratings – and we started by asking the question: “What are the biggest factors associated with restaurant ratings?”. We used a logistic regression model to predict the probability that a restaurant will receive a positive rating in a customer review. The model estimates how the expected probability of receiving a positive rating is related to restaurant characteristics. For example, are restaurants that have on-site parking rated more highly on average? Do we expect restaurants in different locations to attract different ratings?

Here is our version of The Economist’s graph:

It turned out that, on average, a restaurant had a 42% chance of receiving a positive rating. The graph is centred on 42% and the bars show the differences from the average; the longer the bar, the bigger the deviation. The bars indicate the estimated probability of a positive rating for a restaurant within a group of restaurants which all have the feature labelled at the end of the bar, but otherwise have the average characteristics of the sample. So, for example, the average restaurant within a group of restaurants which have formal dress codes, but otherwise reflect the sample distribution of characteristics, is predicted to have a 93% chance of receiving a positive customer rating. Whereas the average restaurant within a group of restaurants which have permissive smoking policies, but have all other characteristics at the average rates, is predicted to have a 32% chance of receiving a positive customer rating.

The graph works so well because the green and red bars allow viewers to quickly assess which restaurant features are important. We can see that dress code, smoking policy, and the availability of other services are clearly important factors associated with restaurant rating, but parking, alcohol selection, and ambience are clearly not. We can also compare across attributes. For example, a formal dress code is associated with a probability of a positive rating about 40 percentage points higher than the probability associated with smoking confined to the bar area.

The tools available for us to create flexible, eye-catching graphics have never been better. We can create graphs such as this using packages in R, our standard statistical analysis software. In fact, we have used the same software to produce this graph that the BBC use to create the visualisations that accompany their stories (called ggplot2). Of course, the statistical details are an integral part of any analysis and should be considered when interpreting the results; but as a gateway to the key findings, striking visualisations are very useful for us and our clients.

The post Restaurant Ratings Model: Visualising the Key Findings appeared first on Select Statistical Consultants.

]]>The post Congratulations to HiLo on Global Safety Award appeared first on Select Statistical Consultants.

]]>Select were delighted to provide statistical support to HiLo, carrying out an independent review of their risk modelling and providing statistical advice on the approach. We look forward to hopefully working with them again on future projects.

The post Congratulations to HiLo on Global Safety Award appeared first on Select Statistical Consultants.

]]>The post Jo Joins Women in Statistics and Data Science SIG appeared first on Select Statistical Consultants.

]]>The SIG aims to raise the profile of women working in statistics and data science, to advocate for opportunities for and to support women working in these fields and to share experiences.

One of our senior consultants, Jo, was pleased to be accepted as a volunteer on the committee.

Speaking of her involvement, Jo said, “My secondary school encouraged me and others, particularly young women, to pursue science and technology, and my mother encouraged me more generally not to restrict choices and thinking based on stereotypes. Not all young women have those influences and it would be good to provide positive role models and to contribute to showcasing careers that involve statistics, to encourage those interested in STEM subjects”, adding that, “while from my experience, the field of statistics has more gender balance than in some other STEM areas (there were proportionally more women in my statistics Masters class than in my school technical drawing class) I am keen to see women represented at all levels of the profession.”

The post Jo Joins Women in Statistics and Data Science SIG appeared first on Select Statistical Consultants.

]]>The post EU Freedom of Movement & the Migrant Workforce: How have the numbers changed? appeared first on Select Statistical Consultants.

]]>The latest release (in February 2019) from the ONS for the UK labour market shows that the number of EU nationals working in the UK is 2.27 million as of October to December 2018, out of a total 32.60 million people aged over 16 in work (7.0%). While this is a slight increase on what was observed in January to March 2016 (at 2.14 million), plotting the data over time (see right-hand plot in Figure 1) there’s an apparent leveling-off of the number of EU nationals employed in the UK. This follows a steady increase over the last 10 years, as noted in our previous blog on migrants in the UK workforce. A similar pattern is also observed for EU born workers (see left-hand plot in Figure 1).

The ONS also notes in their February 2019 release of the October to December 2018 figures, that compared to the same period in the year before (2017), there has been a decrease of 61 thousand EU nationals working in the UK, whereas UK nationals and non-EU nationals have both seen increases of 372 thousand and 130 thousand, respectively. We should however note that though we may see a year-on-year change, since the referendum (indicated by the red vertical line in Figure 1), the number of EU migrants in the workforce has remained broadly consistent compared with the previously increasing trend over the past decade.

The latest figures available from the ONS, as presented above, are for the last quarter of 2018. It’ll be interesting to see if and how the picture changes when the data for 2019 become available and as we reach the deadline of Article 50.

More than two years on from our post on EU freedom of movement numbers, Figure 2 shows how migration to and from the UK has changed between 2015 and 2017. The bar chart shows the estimated total number of UK migrants in EU and non-EU countries (green bars) for 2017 and 2015, together with the number of migrants in the UK from EU and non-EU counties (mustard bars) for those years. These figures are taken from the United Nations Population Division from mid-2015 and mid-2017 population estimates of migrant stock, where a migrant is defined as “a person who is living in a country other than his or her country of birth”.

We can see that the largest change has been the decrease in the number of UK migrants outside the EU, while the number of migrants from the UK in the EU has remained at a similar level. In terms of immigration to the UK, we see the opposite: non-EU migrants have reduced in number, whereas the number of EU migrants in the UK has increased. Despite this, we still see that the majority of migrants living in the UK are from non-EU countries, though between 2015 and 2017 the gap has slightly narrowed.

Due to the time it can take to gather and process data before it becomes available for public use, we do not have the full picture as to the current UK migration statistics and how these may have changed since the EU referendum over two years ago. However, there have clearly been changes in the interim – in particular, it’s interesting to see that the number of EU born workers and EU nationals working in the UK has not continued to increase at the same steady rate as over the previous decade. In an upcoming post, we will look at updating the figures from some of our other blogs in our EU series, including looking at the UK’s trading partners.

The post EU Freedom of Movement & the Migrant Workforce: How have the numbers changed? appeared first on Select Statistical Consultants.

]]>The post An Alternative Approach to Evaluating Interventions in Education appeared first on Select Statistical Consultants.

]]>The EEF is dedicated to breaking the link between disadvantage and educational achievement, and as such require their evaluators to analyse trial results for the subset of pupils who are eligible for free school meals. In many situations subgroup analyses are viewed as bad practice, as they are prone to being under-powered. Though EEF evaluators are now required to present power calculations for the analysis of this pre-specified subset of pupils.

ZhiMin proposed a new approach using a Pupil Advantage Index to estimate the outcomes for all pupils in the dataset. This approach involves building a model which takes account of all available background information, including free school meal eligibility, along with any interactions. This exploits the heterogeneity of the data, recognising that pupils and education are multifaceted and complex, rather than considering background characteristics individually in subgroup analyses. As an alternative to subgroup analyses, rather than answering the question *“*d*oes this intervention work, on average, for this subset of pupils?”*, the Pupil Advantage Index asks *“for what kinds of pupils does this intervention work?”*. As well as knowing which pupils benefited most from a particular intervention in a trial, it can also help future decisions by answering the question *“which intervention(s) are best suited to my particular set of pupils?”*

“This was a really interesting seminar”, said Jo. “It was great to learn of the latest developments in the analysis of education data and to hear about further analyses that are being conducted on the wealth of data collected by the EEF”.

The post An Alternative Approach to Evaluating Interventions in Education appeared first on Select Statistical Consultants.

]]>The post Meta-analysis: Reducing Salmonella in Animal Feed appeared first on Select Statistical Consultants.

]]>*Salmonella Typhimurium* is a pathogen which can cause gastroenteritis in many mammals, including humans. Select were asked by Anitox to combine the results from 90 individual studies which each estimated the mean percentage reduction of *Salmonella Typhimurium* in animal feed following the application of a pathogen-control product (Berge, 2015).

The studies were conducted for different types of animal feed, and under different conditions including: recontamination or not, different concentrations of pathogen-controlling additive, and different sample sizes, i.e., numbers of culture plates tested. As a result, each study yielded a different effect size for the additive in reducing the presence of *Salmonella Typhimurium.*

The aim of the project was to collate the results from the individual studies to give overall estimates of the effects of the additive for different feed types and doses, allowing for other differences in the study conditions.

When dealing with multiple studies estimating similar effects, a meta-analysis can be used to collate the results, while accounting for how much evidence is provided by each study.

The studies were split into groups based on the type of feed (poultry feed, pet food or protein meal) considered, the additive dosage used, and whether recontamination had occurred or not. This stratification ensured that only those studies designed to estimate similar effects were combined in each case.

However, within each of these groups, examining the treatment effects from the different studies, it was clear that they varied more than we might have expected by chance. In order to account for the heterogeneity in observed treatment effect between the different studies, a random-effects statistical model was fitted to the data, following a similar approach as described in our previous case study: ‘Meta-analysis: Combining Results from Multiple Studies’. This approach allows for there to be real differences in the treatment effect due to, for example, differences in the location, batch of animal feed or study protocol (such as length of time over which the product was applied) used in each study.

As well as producing an estimate of the overall average effect, when performing a meta-analysis, we can also use forest plots as a means of visualising the analysis. These plots provide a way of visually comparing and combining the studies by representing several individual studies, alongside the estimated overall average effect (and the associated confidence intervals), on the same axis. Figure 1 below shows an example of such a forest plot.

By using a meta-analysis, we were able to condense the information from 90 studies conducted under different conditions into a set of clear and consistent results for each separate treatment effect of interest. This helped Anitox to better understand the effects of their pathogen-control product, and to optimise treatment conditions and application technology accordingly.

By using a meta-analysis to aggregate the results of different studies, we can draw together many individual, smaller trials, gaining a higher statistical power overall. This means our final estimates are more robust compared to each study individually, and allowing us to make best use of the individual trial results.

The post Meta-analysis: Reducing Salmonella in Animal Feed appeared first on Select Statistical Consultants.

]]>The post Presenting the Results of a Multinomial Logistic Regression Model: Odds or Probabilities? appeared first on Select Statistical Consultants.

]]>When fitting a multinomial logistic regression model, one generally wants to understand what motivates the choice made by the student (e.g. what are the key drivers) and how those drivers affect the choice made. For instance, in our example, we might be interested in understanding how the choice of programme varies for students with different maths scores.

As we saw in the model coefficient table in our previous blog, we get two coefficients, one of -0.11 for the comparison between a general and an academic programme and one of -0.14 for the comparison between a vocational and an academic programme. As they are both negative, this tells us that as maths score increases the log-odds of choosing either a general or a vocational programme (instead of an academic programme) decrease. But what does this actually mean? And which programme is actually the most popular choice?

To help understand the relationship between maths score and programme choice, we can plot the predictions from the models for a hypothetical student, assuming all other drivers in the model are fixed (i.e. assuming the student is from the middle socio-economic status (SES) group, attends a public school, and has average prior reading and science scores).

In Figure 1, we use the model coefficients to look at the linear relationship (on the log-odds scale) between programme choice and maths score. We can see that, as the maths score increases, the log odds of choosing a vocational vs. academic course are decreasing faster than the log odds of choosing a general vs. academic course, reflecting the difference of size in their model coefficients.

What is interesting from this plot is that for maths scores below 45, both sets of log odds are positive indicating that choosing either of the two programmes is more likely than choosing an academic programme.

We can also see that for a maths score of just above 60 the two lines cross over, i.e. the odds of a general vs. academic choice are now higher than the odds of a vocational vs. academic course. But what actually are those odds?

To obtain the odds of either set of choices, we can take the exponential of the predicted log odds above and again plot them against maths score in Figure 2.

Figure 2 highlights the non-linear nature of the relationships between maths score and the odds of either choices. The difference between the odds is more pronounced for lower values of maths scores with the gap narrowing as maths scores increase. As in the case of a logistic regression, the odds are a measure of the relative association between maths score and programme choice. For example, for a maths score of 40, the odds of choosing a general versus academic programme is 2.1, while the odds of choosing a vocational versus an academic course is 4.4. This means that a student with such a maths score is 2.1 times more likely to choose a general course compared to an academic course, and 4.4 times more likely to choose a vocational course over an academic one.

In addition to what we saw in Figure 1, for maths scores from about 55 upwards both sets of odds are actually quite small, getting close to 0, indicating that both general and vocational courses are very unlikely to be chosen.

Both the odds and log odds plots are useful and accurate representations of the model coefficients. However, they only provide relative measures of the association between maths score and programme choice, using academic programme as the reference for the comparisons. These are thus useful to gain an understanding of the student relative preferences between programmes.

At the same time, the model coefficients cannot directly be used to assess which course is most likely to be chosen by an average student attending public school from a middle socio-economic background, as a function of the maths score. To get that information, the odds above needs to be converted to the predicted probability of each outcome (see Figure 3).

It is only when the results of the model are converted to probabilities that it becomes obvious that for students with a maths score up to just over 50, a vocational course is more likely to be chosen over any of the other two, while for students with scores higher than that, an academic course is the most likely chosen programme. The other interesting point is that across the whole range of observed maths scores, an average student of middle socio-economic status from a public school is not likely to choose a general course. So regardless of the maths score, middle socio-economic students from public schools with average reading and science scores are unlikely to ever choose a general programme.

As we saw above, the coefficients obtained directly from the model provides an immediate indication as to the direction of the relationships, with the conversion to odds ratio giving an estimate of the relative changes in the odds of choosing one alternative option versus the reference option. However, it is only when converting those odds back to probabilities (as in Figure 3) that one can really see the relationships between the explanatory variables and the likelihood of each outcome.

The output from a multinomial logistic regression model may appear complicated at first and converting the coefficients back to probabilities does make it easier to interpret the model and thus gain useful and actionable insights from it.

In most practical cases, as in the example given here, one is often more interested in how likely each outcome actually is, rather than the relative chance (or odds) of observing one outcome versus another. For instance, a business with a new pricing policy might want to know how sensitive online, in-store or phone customers are to different prices, and might not be so interested in the changes in the balance between online vs. in-store customers, and phone vs. in-store customers as price varies.

The post Presenting the Results of a Multinomial Logistic Regression Model: Odds or Probabilities? appeared first on Select Statistical Consultants.

]]>The post Will it be Turkey this Christmas? appeared first on Select Statistical Consultants.

]]>Today, it seems like everyone has turkey at Christmas, but what about the rest of the year? If we look at the data available from the Department for Environment, Food and Rural Affairs for the number of turkeys slaughtered per year, we can see a definite spike around December.

From Figure 1 we can see that over the last 20 or so years, nearly 1 million more turkeys are killed around December each year compared to May. The British people tend to enjoy turkey as a seasonal meat – we can see that turkey numbers increase throughout September, October and November before peaking in December. Think about all the Christmas related food sold around this season – Turkey and stuffing sandwiches are not only sold during December!

Since turkeys were originally an American import, how does the UK annual pattern compare to that of the US? Americans have less of a specific tradition around Christmas dinner – “Turkey day” for them refers to Thanksgiving, which takes place on the fourth Thursday of November.

Looking at data available from the United States Department of Agriculture we can see these trends reflected. Figure 2 below shows the UK and US turkey production in pounds, estimated for the UK by using the average turkey weight 14 pounds, and scaled by the population in each country, so that the graph tells us how many pounds of turkey is produced each month per person.

There are two interesting things to note from this plot. Firstly, the smoothed average line shows an increase in the USA’s production of turkeys in October and November, in the lead up to Thanksgiving, compared to the rest of the year. This is interestingly followed by a drop in December; the opposite to what is happening in the UK. Secondly, there is much less variation in turkey production over the whole year compared to the UK, implying that Americans tend to eat turkey at a reasonable steady rate all year long, and not just as a special treat during the holidays.

We can also look at whether turkey’s consumption has changed by plotting the annual total of turkeys slaughtered in pounds for both countries over the last 24 years (Figure 3 below). It is noticeable from the two previous graphs that there is a lot of variation across years, most particularly so in the UK.

Since the beginning of the century, it appears that UK turkey consumption has been steadily decreasing, and nearly halved between 1995 and 2007, while in the US there has been a slow increase. Could this be due to turkey being replaced by other meats, fish or vegetarian alternatives at Christmas? And you, what are you having for Christmas dinner?

The post Will it be Turkey this Christmas? appeared first on Select Statistical Consultants.

]]>