Select Statistical Consultants

A Randomised Controlled Trial of Cobalamin in Dogs

Jo Morrison — Mon, 26 Feb 2024 10:31:59 +0000

Select were delighted to hear that work carried out in partnership with our client ADM Protexin, a supplement and probiotic manufacturer in the UK, has been published in a peer-reviewed journal. The randomised controlled trial involved dogs with vitamin B12 deficiency (or hypocobalaminemia) and has been published in the Journal of Small Animal Practice. The dogs in the trial were given treatment to improve their vitamin B12 levels (cobalamin supplementation). Dogs were randomised to either receive supplementation as a daily oral dose or a weekly parenteral dose (via injection) to examine the efficacy and tolerance of dogs to the two types of administration.

The team at Select conducted the statistical analysis for this trial and produced a report, which our client then used to write the journal article.

The outcome measures we analysed were collected at the start of the trial, at week 7 and at week 13. As there were multiple measures per dog, we used mixed effects models to test for differences in the trial outcomes between the oral supplemental group and the parenteral supplemental group. The team found that oral administration is as effective and well-tolerated as an injection and could therefore be considered as a suitable treatment option.

The authors acknowledged our contribution saying, “Our gratitude goes to Sarah Littler and Jo Morrison at Select Statistics, who provided statistical analysis.”

The journal article can be accessed here https://doi.org/10.1111/jsap.13705.

The post A Randomised Controlled Trial of Cobalamin in Dogs appeared first on Select Statistical Consultants.

Clustering Types of Advice for Citizens Advice Scotland

Jo Morrison — Fri, 28 Jul 2023 10:12:35 +0000

The Challenge

Citizens Advice Scotland (CAS) bureaux provide advice to clients from all walks of life which covers a wide range of information and advice needs: as their website says “whoever you are and whatever your problem“. CAS knew, from their experience and qualitative research, that the advice sought by their clients was often complex and inter-connected. After seeking advice initially on one topic, conversations with advisors often revealed other issues which led to subsequent advice being given on other topics. For example, clients seeking advice about pensions often also received advice about disability/carers and about tax, and many clients seeking relationship advice also received legal advice and advice about housing. CAS wanted to better understand the inter-connectedness of the advice needs of their clients and asked us to analyse the co-incidences of advice received by their clients.

The Approach

CAS supplied data from their database which contained details about the type of advice sought by their clients as well as information such as the method of contact and the dates on which advice was given. In dialogue with CAS we used this data to calculate variables that captured the key characteristics of the advice received by clients, such as:

the number of sessions of advice given (clients can return multiple times for follow-up advice),
the duration of the advice (how long the client case took from the initial to the final session),
the number of different types of advice given, and
the number of units of advice given for each of the advice types.

These variables allowed us to measure the simplicity or complexity of the initial and subsequent advice received and how different advice types interacted with one another.

We then used cluster analysis to group clients based on the characteristics and co-incidences of the types of advice they received. Cluster analysis groups observations (in this case, clients) that are similar together into clusters and separates observations that are dissimilar into different clusters based on the key characteristics defined above. We identified a total of 15 clusters in the data.

To visualise the clusters, we produced the bubble plot in Figure 1, below. The plot is divided into two parts.

The circles in the top rectangle illustrate the characteristics of the advice sessions (for example the total number of occasions or sessions of advice and the duration from the first to the last session of advice). Here the size of the circles represents the average (mean) value for a client case in each cluster. For example, we can see from the plot that the mean number of total sessions is largest for cluster 15, followed by clusters 9 and 3 (with mean values 25, 8 and 7 sessions respectively).

The circles in the bottom rectangle of the plot illustrate the types of advice received. The size of the circles represents the proportion of units of different types of advice given to client cases in each cluster. For example, Cluster 13 predominantly contains cases involving advice about Consumer and Legal Proceedings. The sizes of the clusters is shown at the bottom (the percentage of clients in each).

Figure 1: Bubble-plot of cluster characteristics

While some clusters are predominantly about one type of advice, e.g. Utilities and Communications (Cluster 10) and Immigration, Asylum and Nationality (Cluster 14), other clusters are about multiple advice types. For example, Cluster 4 shows that advice about Universal Credit, Disability or Carers, Crisis or Exceptional Circumstances and Charitable Support often coincide. The cases in Cluster 3 (Disability and Carers), Cluster 9 (Debt) and Cluster 15 (Health and Community Care) were long and complex involving relatively high number of sessions over a long duration.

After the cluster analysis we were able to compare the demographics of clients in each of the clusters and also to explore the sequential nature of advice given to clients in each cluster, to explore the advice journey. We produced Sankey plots, also known as river plots, like the one below to illustrate the complexity of advice journeys. These plots illustrate flows through a system, through a client journey, by representing the volume in each ‘avenue’ by the width of the flow in the plot. Each client’s advice journey is unique, and so we couldn’t show all journeys. For pragmatic reasons we limited the information that we presented to the top 20 most common journeys in each cluster over the first 3 sessions.

Figure 2 shows a river plot for Cluster 14 which comprises cases involving advice about Immigration, Asylum and Nationality. For the 20 most common advice journeys in Cluster 14 (33% of the total journeys in the cluster) the plot shows the proportion of client cases receiving different types of advice in their first three sessions (labelled S1, S2 and S3). A large proportion of advice given in each session is solely about Immigration, but other common advice co-incidences are Universal Credit, Travel/Transport, and Working Benefits, though there are many more co-incidences of advice types.

Figure 2: River plot for Cluster 14

Overall, these river plots illustrated the complexity of the co-incidences of advice given to clients. Even within the clusters that have the simplest co-incidence structure (e.g., predominantly one advice type in the clustering) and restricting to viewing the 20 most common advice journeys, the sequence and the combination of advice given is complex. This reflects the natures of peoples’ lives; many problems are inter-connected and multi-faceted and the initial type of advice about which a client seeks advice often leads to advice on other and sometimes more substantive issues.

The Value

The data we analysed was hugely complex and had never been explored in detail before. We were able to synthesise the information in the data and provide insights for CAS. By understanding the co-incidences of types of advice sought by clients, CAS is able to identify more quickly additional needs that clients may have when they initially seek help. While some organisations provide advice in one area, for example debt or relationships, it is important that organisations such as CAS, who provide a broad spectrum of advice, are able to demonstrate to potential funders that the service they provide addresses multifaceted issues and problems.

“Select Statistics did a remarkable job for us. Having given them a large and disjointed set of data, not only did they successfully wrangle this into a useable shape but also conducted analysis that helped us gain an understanding of our data (and, by association, our clients) that we’ve never had before. The work carried out by Lynsey and Jo has been genuinely transformative for us in a number of ways, and I very much hope we get the chance to work with them again.”

Linda Hutton – Senior Research Officer

The post Clustering Types of Advice for Citizens Advice Scotland appeared first on Select Statistical Consultants.

Key Drivers of Mental Wellbeing in the Film and TV Industry

Jo Morrison — Mon, 05 Jun 2023 13:24:55 +0000

The Film and TV Charity aims to provide a supportive community for everyone working behind the scenes in film, TV and cinema; a community that works together to make their industry work better. The Charity provides a range of resources including legal advice, a 24-hour helpline, financial advice and grants, toolkits and emotional support.

One of the Charity’s specific aims is to support and promote mental wellbeing. To help this, the Charity now conducts regular surveys to both provide a barometer of mental wellbeing among the people they support and to understand more about the issues facing those working in the FTV industry.

Select were delighted to be approached by the Charity who were looking to conduct a more in-depth analysis of the responses to their latest Looking Glass Survey to delve further than the headline findings. They asked us to create a statistical model in order to arrive at deeper insights.

The Challenge

The Looking Glass survey included a series of questions that comprise an established measure of mental wellbeing: the Short Warwick Edinburgh Mental Wellbeing Scale (SWEMWBS). The responses to these questions were scored (1 for strongly disagree to 5 for strongly agree) and summed to give a score to measure mental wellbeing. Researchers at the FTV Charity used the SWEMWBS scores to benchmark the mental wellbeing of their respondents against norms for the UK population.

The FTV Charity then wanted to dig deeper into the responses to their survey. They wanted to understand what the key drivers of mental wellbeing were. By understanding which aspects of working life impact, both positively and negatively, on mental wellbeing, the Charity can focus its activities and target initial support in areas that stand to gain the most benefit.

The Approach

Our plan was to use a statistical model to explore the key drivers of mental wellbeing. This model would show how the responses to the survey questions were associated with mental wellbeing (as measured by the SWEMWBS score).

However, the FTV’s survey, as with many other surveys, comprised many questions. The Charity’s own report said that the “Looking Glass ’21 had 59 questions, each with up to 12 sub-questions offering up to five possible answers [on a five-point Likert scale] to each question.” Researchers at the Charity recognised that there were “so many statistical relationships in this dataset that we would never be able to report more than a small fraction of them”. An additional feature of this and other surveys is that many of the survey questions were interconnected or correlated.

Therefore, prior to fitting a statistical model we used factor analysis to take advantage of the correlations and the richness of the data. Factor analysis is a technique that identifies which questions are correlated and combines the individual question responses to create more robust measures of the underlying trait that they collectively assess. (We explain factor analysis using a practical example in our blog.) We worked closely with researchers at the Charity to define the factors, as they had clear ideas, informed by the Charity’s own empirical experience and by other research, about which aspects of working in the FTV industry that were likely to be associated (positively and negatively) with mental wellbeing.

In the factor analysis we included questions from the survey that asked respondents’ opinions or attitudes to topics such as: financial security, views of their colleagues, the working culture around reporting behaviours, the nature of networking, work-life balance.

Having synthesised the survey responses into factors we then fitted a statistical model. We used the SWEMBS scores, the measure of mental wellbeing, as the outcome. Our model included all the factors that we had derived; the UCLA loneliness scale score (the other established scale that was embedded in the survey); selected individual survey questions that were not included in the factor analysis; and a number of background characteristics that could be associated with mental wellbeing, for example, age, whether the respondent was an employee or freelancer, the number of hours worked last week.

The figure below shows the results of our model, in descending order of each variable or component’s contribution to mental wellbeing.

Variables associated with mental welling scores, an order of contribution.

The percentage points shown on the plot are the amount that the explanatory power of the model changes when the variable in question is removed from the model, and the direction of the bars show whether each variable is positively or negatively correlated with mental wellbeing.

The largest contribution was loneliness, which was negatively associated with mental wellbeing. The next 5 contributors were work-related factors. Those who responded positively about career development opportunities, workplace culture, the impact of changes to working practices due to Covid and their work-life balance tended to have more positive mental wellbeing. Respondents who reported that they were struggling financially tended to have lower mental wellbeing.

The Value

The combination of factor analysis and a statistical model meant that we were able to incorporate a large amount of information from the survey data in an economical way. Our analyses enabled the Charity to dig deeper into their data and “identify the key influences on mental health and the strength of their effects“. Knowing in which areas the charity and the wider industry should focus efforts has already generated conversations and actionable ideas about initiatives and provision of support aimed at improving mental wellbeing.

You can read more details of our work by downloading the FTV Charity’s report (in which they provide a link to our report).

“By developing a statistical model of the mental health of film and TV workers, Select enabled us to mine the information from our biennial survey to an extent that we had not been able to do before. The findings were insightful and generated much constructive discussion in our organisation plus a successful published report. We would definitely use them again. Indeed, we have already been back to them for further advice about the data we hold.”

David Steele – Research and Insight Manager

The post Key Drivers of Mental Wellbeing in the Film and TV Industry appeared first on Select Statistical Consultants.

Age Standardisation or Norming of Educational Tests

Jo Morrison — Wed, 29 Mar 2023 11:30:46 +0000

Select recently had the opportunity to work with Taylor and Francis, who publish books and academic journals. They were looking to update and publish a 5^th edition of the Renfrew Expressive Vocabulary Test, and approached us to provide statistical support.

The Expressive Vocabulary Test contains a set of flashcards, each of which has an illustration of a word: either a noun, adjective or a verb. Children are shown the flashcards one at a time and are asked to name as many words as they can. The test is designed to assess the speech and language of children aged 3 to 11.

The Challenge

Raw scores on a test (the total number of questions answered correctly) often increase with age; older children tend to score more highly than younger children. Therefore, raw scores are not very informative. The Renfrew Expressive Vocabulary Test is perhaps fairly unusual among tests, in that it spans a wide age range of 9 years. The age range is also over a time when children’s vocabulary will be expanding considerably. So, the raw scores of older children will be considerably higher than those of younger children. How do we make a fair comparison?

The Solution

To take account of the raw scores across children of different ages, we used a statistical model to translate raw scores onto a scale that is independent of age. Taking account of the effect of age in this way is known as age-adjusting, age-standardising or age-norming.

The children are grouped according to their age (we used 6-month age bands) and within each age-group percentiles of raw score are calculated (for example, the 5^th, 10^th, 20^th etc percentile). A statistical model is then fitted to the set of percentiles with age as a covariate. There are a range of different models that can be applied, and the aim is to obtain a model that is a good fit to the observed data (the test scores) while avoiding the possibility of the model over-fitting the data.

The plot below shows the results of our model. The dots are the observed data (the percentiles of raw scores of the children in each age group) and the lines show our fitted model.

Figure 1: Plot of percentiles by age.

Produced using cNORM. Lenhard, A., Lenhard, W., Gary, S. (2018). Continuous Norming (cNORM). The Comprehensive R Network, Package cNORM, available: https://CRAN.R-project.org/package=cNORM

The plot clearly shows that not only do raw scores increase with age, but also that the scores of older children vary more than the scores of younger children (the percentiles are further apart for older age-groups compared to younger ages).

There may also be some evidence of a small ceiling effect (the maximum score was 100). The focus of the test, however, is to identify children who may need additional support i.e., those at the lower end of the score range for their age.

Using the results of our model, we provided a look-up table for raw scores and age. After testing a child this will be used by practitioners to convert their raw score to an age-appropriate percentile rank, telling them how the child’s score compares to other children of the same age.

The Value

It is good practice to update or recalibrate educational and psychological assessments that have been in use for a long time to ensure they remain relevant to the skills being tested and the current characteristics of the population. As well as new words, the updated edition of this test includes the addition of verbs and adjectives, updated illustrations and extends the age range.

The model that we fitted across this wider age range and our look-up table allows practitioners, usually Speech and Language Therapists, to assess children’s vocabulary relative to their peers, and to identify children who would benefit from additional support in the development of their language skills.

“Select provided us with invaluable guidance throughout our project, from key considerations as the journey started and expert advice on a complex re-standardisation process, to the creation of an accessible statistical model to assess children’s vocabulary. In addition to this, the statistician we worked with, Jo, was friendly, collaborative, and speedy to reply to (and problem solve) our many and varied queries and we have no doubt we would work with Select again – and hope to in the future!”

Clare Ashworth – Senior Editor, Routledge Education (Informa Group)

The post Age Standardisation or Norming of Educational Tests appeared first on Select Statistical Consultants.

The Film TV Charity publish their Mind-Craft report

Jo Morrison — Mon, 10 Oct 2022 12:35:17 +0000

The Film TV Charity has chosen World Mental Health Day 2022 to publish their Mind-Craft report.

The report explores the factors that are associated with the mental wellbeing of people working behind the scenes in the film, TV and cinema industry. It is based on responses to the second edition of the Looking Glass survey conducted by the charity, who are keen to better understand how working life impacts on the mental health and wellbeing of the workers they support.

Select were delighted to be approached by the Film TV Charity who were looking to conduct a more in-depth analysis of the survey responses: to build on and delve further than the headline findings.

We worked with members of the charity’s research team to identify aspects of working life that they hypothesised (based on the charity’s own empirical experience and other research) were associated with positive and negative mental wellbeing. We identified which questions in the survey best measured these aspects. We then used factor analysis to group questions with related responses into a smaller number of factors, thereby capturing economically a large amount of the information contained in the dataset.

Following this we tested the hypotheses using a statistical model. The Looking Glass survey included a series of questions that comprise an established measure of mental wellbeing: the Short Warwick Edinburgh Mental Wellbeing Scale, and so our model could explore the associations with this embedded measure.

Our model showed that the main contributor to mental wellbeing was loneliness (which has also been found to be a key driver of wellbeing in research conducted outside the film and TV sector), followed by a number of aspects of working life measured by the survey, principally: opportunities for career development; the culture, values and the way people communicate in a workplace; whether the respondent was struggling financially; and their work-life balance.

Reflecting on the work, Jo said “It was great to work with colleagues at FTVC. By combining our analytical skills with FTVC’s knowledge and experience of their industry, we created a model that provided valuable insight. The results have prompted many conversations and discussions, supported by anecdotal evidence, which is helping the charity broaden the support they can offer.”

The post The Film TV Charity publish their Mind-Craft report appeared first on Select Statistical Consultants.

IMA/RSS Workshop: Involving Employers in the Development of the Mathematical Sciences Curriculum

Sarah Littler — Mon, 11 Jul 2022 15:26:49 +0000

At the end of June, Sarah was thrilled to be invited to talk at the Institute of Mathematics and its Applications’ (IMA) workshop, “Involving Employers in the Development of the Mathematical Sciences Curriculum”. The goal of the workshop was to review the various ways that employers can work with maths departments to develop curriculums which motivate and excite students whilst improving upon their employment opportunities. Sarah was delighted to talk about her role as a statistical consultant and the skills required for those looking to undertake a career in statistical consultancy, both in terms of technical/statistical know-how and the equally important wider skills including problem-solving, communication, time management, prioritisation and continuing professional development (CPD). The workshop, which was part of a wider Higher Education Teaching and Learning Series, was also supported by the Royal Statistical Society (RSS), in collaboration with European Social Fund (ESF) Smart Specialisation.

The event featured short talks from members of academia and industry, followed by a panel session covering the following key questions.

Do we need to move beyond asking employers for “skills” of graduate applicants to employers and University staff co-designing modules?
How do we make sure students with academic interests in for example: pure maths or theoretical physics, have the skills that employers want?
What is the role of the library/careers service in teaching employability skills and how do we get the students to engage?

Speaker presentations are to be made available on GitHub and will be accessible via the following link: https://github.com/cmcneile/EmployMaths. Further resources on the topic can also be found in the this booklet: “Employability development for HE mathematics and statistics: case studies of successful practice”; and via the Sigma Network’s Employability Special Interest Group (SIG).

Speaking about the event, Sarah commented that, “it was a great opportunity to engage with educators and to contribute to the conversation on how to support students’ preparedness for a successful career after undertaking Science, Technology, Engineering, and Mathematics (STEM) subjects at University.”

The post IMA/RSS Workshop: Involving Employers in the Development of the Mathematical Sciences Curriculum appeared first on Select Statistical Consultants.

Analysing Diversity Data for the Academy of Medical Sciences

Lynsey McColl — Mon, 31 Jan 2022 11:21:26 +0000

The Challenge

The Academy of Medical Sciences is an independent body in the UK representing the broad range of medical sciences. It is made up of around 1,300 Fellows who are elected from fields across the biomedical and health sciences including laboratory science, clinical academic medicine, veterinary science, dentistry, medical and nursing care, and other professions allied to medical science. The Academy’s mission is to advance biomedical and health research and its translation into benefits for society. It achieves this through promoting excellence, developing talented researchers, influencing research and policy and engaging patients, the public and professionals.

The Academy is committed to working towards full equality of opportunity within the organisation and its work, and publish a yearly diversity report based on data collected internally on all key work areas. For the 2019-20 diversity report, the Academy approached Select Statistics and Inclusive Recruiting (a recruitment and Equality, Diversity and Inclusion (EDI) consultancy who teach, train and transform workplaces to become more equitable) to provide an independent analysis and assessment of how inclusive and diverse the organisation is.

The Solution

We worked in partnership with Vanessa Johnson-Burgess, CEO at Inclusive Recruiting, to produce the diversity report collaboratively. Select Statistics provided the quantitative focus to the report, analysing data and producing frequency tables, visualisations, and conducting hypothesis tests. Vanessa interpreted the diversity data and translated the results into what they meant for EDI at the Academy and what actions they might take. We worked together at each step to ensure that the figures and tables produced, and their interpretation, were both statistically rigorous and meaningful and actionable from a EDI point of view.

For each of the Academy’s key work areas (e.g., governance, fellowship, grants etc.), tables of breakdowns by gender, ethnicity and disability were provided and visualised using horizonal bar charts, an example of which is given below. The left-hand bar chart gives the percentage breakdown (with the total number of people in each category written in each bar) and the right-hand bar chart shows the breakdown of the counts. Key points from each bar chart were provided; for example: out of 1,329 fellows, there are 88 (7%) Black, Asian and/or Minority Ethnic (BAME) Fellows of which 50 (57%) are clinical and 38 (43%) are non-clinical.

AWB = Any White Background; BAME = Black, Asian and/or Minority Ethnic; PNS = Prefer Not to Say.

The BAME grouping is made up of 15 combined categories, which may limit our understanding of how diverse the Academy and its work is across different ethnicities. Therefore, where possible we provided an additional breakdown of ethnicities to better understand the BAME category (see for example, the bar charts of the percentage breakdown of each category within BAME for Clinical, Non-Clinical and Total Fellows below). The statistical analysis and EDI narrative combined were able to support the recommendation that differing identities of race should be treated separately as these identity groups have a different and separate experience of discrimination and marginalisation. The report included recommendations to improve this.

Tables of success rates for Fellow and grant applications were provided and statistical hypothesis tests were applied to test, for example, if there was evidence of a statistical difference in success rates between different groups of applicants (e.g., males and females).

Using the above information, Vanessa from Inclusive Recruiting provided an EDI narrative reflecting on the key points and beginning to unpick some of the assumptions, understandings, systems and processes behind the data. This provided a stimulus for the Academy to begin asking questions to help understand how to progress in its inclusion journey. From the EDI narrative, key recommendations were provided to help the Academy develop an action plan to advance its diversity and inclusion work.

The Value

Based on the findings, 8 key recommendations were provided in the report to be taken forward to progress the EDI journey and impact for the Academy. By bringing together Select’s statistical know-how and Inclusive Recruiting’s expertise as champions of equity, diversity & inclusion, we ensured that meaningful insights were extracted from robust, data driven evidence.

Following a presentation of our results at an Academy Council Meeting, the Fellows agreed that bolder action was needed by the Academy for sustained change. As the Academy’s Diversity Champions stated in the foreword, “While the Academy has made efforts to achieve greater diversity and inclusion across all its activities, this report tells us there is much more to do. It shows that the Academy’s work towards equality is an ongoing journey. Things do not improve overnight, or even from year to year, without deliberate and thoughtful actions. This report crystallises our desire to shift from ’chipping away’ to bolder action for real, sustained change.” Further information on the report findings and a copy of the report can be downloaded direct from the Academy: AMS Annual Diversity Report.

“Working with Select was a fantastic experience. Their rigorous statistical analysis of our data formed the bedrock of evidence in what turned out to be a seminal diversity report for us. It was great that Lynsey responded to positively to our request to work in partnership with our EDI consultants, because the combination proved to be a total powerhouse. Lynsey was patient and generous with her time and helped us all draw deeper understanding from the data we had collected. We remain in contact and look forward to working together again in the future.”

Nick Hillier – Director of Communications and Engagement

The post Analysing Diversity Data for the Academy of Medical Sciences appeared first on Select Statistical Consultants.

Getting More from your Survey Questions with Factor Analysis

Jo Morrison — Thu, 13 Jan 2022 11:03:55 +0000

Surveys can be a rich source of information, including not only factual questions, but asking about attitudes, behaviours, and activities. The results from a survey analysis can also provide more than just percentages, averages and crosstabulations.

Factor analysis is a statistical technique that combines questions that are related (correlated) into a smaller number of factors, to create more robust measures.

By combining questions or variables and using the resulting measures rather than analysing and reporting the questions individually, factor analysis is useful as a dimensionality-reduction technique (other dimension-reduction techniques include Principal Component Analysis [PCA], for example). And being based on correlations it can help to avoid some of the problems of collinearity that can arise in analyses. Furthermore, factors can often provide more meaningful results, by capturing overall, intrinsic characteristics and qualities, rather than individual, separate questions.

It is worth noting though that factor analysis can be used with many types of data, not just with survey responses. It can be used to analyse, for example, items bought in shops or supermarkets, time spent in different office areas (solo pods, meeting rooms, conference spaces, etc.), patient reported outcomes (PROs) (e.g., of pain or depression), and so on. Hence, factor analysis can not only help you to understand your students’, your customers’ or your employees’ attitudes and opinions, it can be used to help uncover their preferences and behaviours via transaction or office utilisation data, for example.

In this blog we show factor analysis in action.

What Is a Factor?

A factor, sometimes called a latent trait or construct, is an intrinsic characteristic or quality. Factors are multi-faceted and difficult to measure directly. Examples being qualities like empathy, IQ, self-confidence, or ethos.

The theory of factor analysis is that these deeper level factors or latent traits underpin your actions and attitudes and also influence your responses to questions about these topics.

Example Data

To illustrate factor analysis, we use some data from the Organisation for Economic Co-operation and Development’s (OECD) Programme for International Student Assessment (PISA) as an example. The PISA study runs every 3 years, across many countries, and assesses the numeracy, literacy and science knowledge and skills of 15-year-old students. The OECD make PISA data available for secondary analyses.

PISA also includes a teacher questionnaire. The 2018 questionnaire asked teachers to answer the following set of questions:

Our example is based on the responses of teachers in the UK.

Factor Analysis

The first step in factor analysis is to calculate the correlations between each of the questions. As the responses are on a Likert scale (from ‘strongly disagree‘ to ‘strongly agree‘) and are ordered categorical (ordinal) data rather than on a continuous scale, we calculate polychoric correlations, which are appropriate for these sorts of data (unlike, for example, Pearson product-moment correlations).

In the next step, the appropriate number of factors to be extracted is determined (ensuring, for example, that a sufficient proportion of the variation in the data is explained), and factor solutions are calculated based on the correlations (e.g., perhaps using the fa() function in the psych package in the statistical software R). Factor analysis groups together questions that are highly correlated to derive a smaller set of factors that retain a high proportion of the information in the original questions. We won’t go into the detail of how this is done here (as it’s not the focus of this post), but in our example we find that 2 factors are potentially a good solution.

A key aim of factor analysis is to obtain factors that are interpretable. To interpret the factors, we look at the “factor loadings” from the factor analysis output. Each factor has a set of factor loadings corresponding to the input questions. These are the correlations between each input question and the factor, the underlying latent construct.

We identify the questions that are strongly correlated with each of the underlying factors. Strong correlations are indicated by values close to +1 or to -1 (positively and negatively correlated, respectively), weaker correlations are values closer to zero.

In the example below, we see that the 1^st, 2^nd, 4^th and 6^th statements are strongly correlated with factor 1, and the 3^rd, 5^th and 7^th statements are strongly correlated with factor 2. (These are indicated by the shaded cells.) Though, we note that statements 8, 9 and 10 also correlate with the factors, but to a lesser extent.

Interpreting the Factors

Factor 1 – Satisfaction with Teaching

The strong correlations between the statements with shaded cells and factor 1 indicate that teachers who agreed that “The advantages of being a teacher clearly outweigh the disadvantages” also tended to agree with the statement “If I could decide again, I would still choose to work as a teacher“. In addition, teachers who agreed with these first 2 statements also tended to disagree (indicated by the negative correlation) with the statements “I regret that I decided to become a teacher” and “I wonder whether it would have been better to choose another profession.”

The converse is also true; teachers who disagreed with the first two statements tended to agree to the latter two shaded statements.

Collectively these four statements provide a measure of teachers’ satisfaction with being a teacher: their satisfaction with their profession.

Factor 2 – Satisfaction with Their School

In factor 2, the strong correlations between the 3^rd, 5^th and 7^th statements (with shaded cells) and the factor indicate that teachers who agreed that “I enjoy working at this school” also tended to agree with the statement “I would recommend my school as a good place to work“, and additionally tended to disagree (indicated by the negative correlation) with the statement “I would like to change to another school if that were possible“. Again, the converse is also true; teachers who agreed with the first statement tended to disagree with the latter two statements.

Collectively these three statements provide a measure of teachers’ satisfaction with their particular school.

All in All, I Am Satisfied with My Job

Ideally the correlations between statements and factors should show associations between each statement and only one of the factors (or neither of the factors). In the factor loading matrix above, the final statement (“All in All, I Am Satisfied with My Job“) is positively correlated with both Factor 1: teachers’ satisfaction with their profession, and Factor 2: teachers’ satisfaction with their school, though the correlations are weaker than for the shaded statements (and are described as moderate rather than strong). It is quite sensible in terms of interpretation that teachers’ overall job satisfaction is (positively) related to both their satisfaction with the profession and with their school. However, since the correlations associated with this statement are not strong, the factors may be improved by excluding this statement, and the other statements with small correlations, from the analysis, as they may be adding more noise than information.

What Next?/Using Factors

Having identified and interpreted the factors, we can use the data and the factor solution to calculate factor scores: in this case use the teachers’ responses to calculate the ‘satisfaction with the profession’ and ‘satisfaction with their school’ measures for each teacher. (A weighted combination of the factor loadings multiplied by the corresponding question responses gives the factor score, measuring the relative magnitude of each factor (i.e., trait), for each teacher.)

Teachers’ scores, whether they are high or low, or nearer the average, will reflect (because they are calculated from) their levels of agreement and disagreement and the strength of their opinions. And so, we have taken responses to (in this case) 10 categorical variables and created two scale measures (continuous variables), which provide more robust measures of teachers’ satisfaction than the survey questions individually.

While factor analysis is a technique in its own right, it is not usually the analysis outcome itself. The derived factors can be really useful when used in subsequent analyses. They can be used to compare or describe different groups of teachers, for example to answer hypotheses such as are older teachers more satisfied with their profession than younger teachers? They can be used in statistical models, for example to explore whether and how students’ outcomes vary according to their teachers’ levels of satisfaction or what are the drivers of teachers’ satisfaction? They can be used with cluster analysis to identify groups of teachers according to their characteristics. Similarly, in another context, with a customer or brand survey, for example, we could investigate whether customer satisfaction might be associated with a particular customer demographic, or cluster customers into different groups based on their attitudes, opinions, preferences, and shopping behaviours, to better understand your customer base and brand positioning, and target products and/or advertisements accordingly.

The post Getting More from your Survey Questions with Factor Analysis appeared first on Select Statistical Consultants.

Careers in Statistics: Female experiences of a career in data

Jo Morrison — Thu, 06 Jan 2022 09:25:48 +0000

Earlier in November our Senior Statistical Consultant, Jo, was delighted to chair the speaker session of a Royal Statistical Society (RSS) event – “Careers in Statistics: Female experiences of a career in data”. The event was co-organised by the RSS’s Women in Data Science and Statistics special interest group (of which Jo is a member) and the Young Statisticians Section, which unites young statisticians from all sectors and provides resources for the community of young statisticians.

The session, which aimed to provide advice and encourage career-young women pursuing a career in statistics or data science, featured short talks followed by a panel session of questions from the audience.

The six guest speakers were women whose careers spanned both academia and industry, including Professor Deborah Ashby, past President of the RSS and Director of the School of Public Health at Imperial College London where she holds the Chair in Medical Statistics and Clinical Trials, and was Founding Co-Director of the Imperial Clinical Trials Unit. The speakers described their career paths and how they had followed their interests. They spoke of the opportunities, decisions and advice taken along the way. The event created a lot of discussion from different perspectives with questions from the audience covering where to get experience, how to find the key opportunities, balancing the commitments of family life and a career, and experiences of founding and running a company.

Jo commented that, “it was a great opportunity to share experiences and gain advice from women who between them had a wide variety of experience, and fantastic too that we were joined by an audience of over 130!”.

The post Careers in Statistics: Female experiences of a career in data appeared first on Select Statistical Consultants.

LMS Women in Mathematics Day 2021

Sarah Littler — Mon, 29 Mar 2021 16:10:23 +0000

Last week, our Senior Statistical Consultant Sarah was delighted to be invited to give a talk at the London Mathematics Society (LMS) Women in Mathematics Day 2021, to speak to mathematicians of all genders and backgrounds at any stage of their career about what it’s like to work as a statistical consultant.

The event, sponsored by the London Mathematical Society, the Royal Statistical Society South West Local Group, and the University of Plymouth, aimed to promote interest and careers in mathematics for women. Female speakers from a range of career stages across mathematics and statistics, from academia and industry, gave talks (Titles and Abstracts) and took part in a lively panel discussion with questions from attendees (Videos [YouTube]). A poster competition was also run for women mathematicians at undergraduate and postgraduate as well as early career level, with the three best receiving cash prizes of £100 each – poster submissions can be viewed here.

Speaking about the event, Sarah said, “I was thrilled to be invited to speak at this fantastic event, which encouraged people from all backgrounds to consider a career in mathematics. I hope that the attendees found the range of work being undertaken by women in mathematics as inspiring as I did, and that days like these can help grow the next generation of mathematicians and statisticians.”

The post LMS Women in Mathematics Day 2021 appeared first on Select Statistical Consultants.

The Academy of Medical Sciences Diversity Report

Lynsey McColl — Tue, 09 Feb 2021 17:03:01 +0000

The Academy of Medical Sciences plays a hugely important role in supporting and advancing the UK’s biomedical and health research and its translation into benefits for society. Select were thrilled to partner with Vanessa Johnson-Burgess from Inclusive Recruiting to deliver their most recent diversity report. Blending our analytical skills with Vanessa’s background in diversity, equity and inclusion brought a unique and detailed perspective to this important report.

Using the diversity data that the Academy collected throughout the year on their key areas such as on grants, careers, and policy, Select carried out an analysis and visualisation of the Academy’s diversity. Using our analysis, Vanessa was then able to provide an equality, diversity, and inclusion (EDI) narrative to bring the diversity evidence and data information to life and highlight key recommendations to aid the Academy in achieving greater diversity and inclusion across all its activities.

Following the report, the Academy’s Diversity Champions added a foreword stating that:

“While the Academy has made efforts to achieve greater diversity and inclusion across all its activities, this report tells us there is much more to do. It shows that the Academy’s work towards equality is an ongoing journey. Things do not improve overnight, or even from year to year, without deliberate and thoughtful actions. This report crystallises our desire to shift from ’chipping away’ to bolder action for real, sustained change.”

The post The Academy of Medical Sciences Diversity Report appeared first on Select Statistical Consultants.

Oxford-AstraZeneca Vaccine: Why the Uncertainty?

Jo Morrison — Tue, 15 Dec 2020 15:47:59 +0000

As we reported in our previous blog, the preliminary results from phase III trials of COVID-19 vaccines have been reported in the media. The vaccine developed and trialled by Pfizer and BioNTech is reported to be 95% effective and Moderna’s is 94.5% effective.

In contrast to these precise figures, reports about the vaccine being developed by Oxford University and AstraZeneca say it “may be up to 90 per cent effective“. The results from the trial show that the vaccine is 70% effective, but “with potential for that to rise to 90 per cent”.

These results are less conclusive than the other studies because of a dosing error. Scientists noticed that some trial participants had weaker adverse reactions to the vaccine. They quickly realised that the first dose, out of a 2 dose regimen, was half the strength. This was not part of the trial’s design as it did not follow what should have happened according to the trial’s protocol. After reporting this to the regulators the trial was allowed to continue with the doses as originally planned.

Reporting the results for the trial overall (following the protocol, as required) the vaccine is 70% effective, however, for the group of participants who received an initial half dose, the results show the vaccine to be 90% effective. This is good news – a serendipitous consequence of an accident. There have been many important scientific discoveries that were the result of an accident; Penicillin being one well-known example.

You’ve taken two studies for which different doses were used and come up with a composite that doesn’t represent either of the doses.

While these are genuine results from the trial, the scientists are being cautious in their interpretation. This is for a number of reasons.

While the data have been analysed according to the protocol, finding the vaccine to be 70% effective overall, this is actually an average across two subgroups of participants, which may be problematic. As one expert explained “You’ve taken two studies for which different doses were used and come up with a composite that doesn’t represent either of the doses.”

Reporting results for the two subgroups separately also has problems. The original intention of the trial was for everyone to receive two equal doses and the sample size was calculated accordingly. Analysing the data as two subgroups is likely to mean that each analysis is underpowered. This means that the number of participants in the sample (the number in the subgroup) is too small to be confident that the results are not due to chance, and that the same results would be obtained if the study were to be repeated with another sample. This would particularly apply to the smaller group (where the vaccine was found to be 90% effective).

An additional reason to be careful interpreting the results is that since they were obtained from an unplanned subgroup of participants, the characteristics of the subgroup may not reflect the characteristics of the general population. The full sample of the trial was designed to reflect the general population, and since those who received the initial half dose were the first participants to take part, there may be features of the way the trial was rolled out which mean that this group is not representative of the population. Indeed, the participants who received the initial half dose were all under 55, so we do not know the effectiveness of the initial half dose in older participants who are more severely affected by COVID-19.

So while the preliminary results of the Oxford University and AstraZeneca trial seem encouraging, and full results will be published in a peer-reviewed journal which will allow further scrutiny of the data, there is only one way to resolve these issues. Oxford University and AstraZeneca are now conducting a new trial to properly test the half dose regimen.

The post Oxford-AstraZeneca Vaccine: Why the Uncertainty? appeared first on Select Statistical Consultants.

What’s in a name: Statistics vs Data Science

Jo Morrison — Mon, 07 Dec 2020 14:30:00 +0000

Whilst the Select team are very much missing working together in the office, there have been positives in the move to working more remotely. One of these is that many meetings and gatherings have moved online meaning that they are open to a wider pool of participants who many not normally have been able to attend in person. Last week one of our senior consultants, Jo, virtually joined a Royal Statistical Society (RSS) meeting hosted by the Glasgow local group discussing the differences between statistics and data science. Surprisingly, Jo wasn’t the furthest afield as one participant hailed from Perth, Australia!

The discipline of data science has boomed in the past decade along with increasing volumes of data that are being generated and stored. Those who have the skills to manipulate, analyse and visualise big data to gain insights are in demand. But a question that is asked repeatedly among statisticians, data scientists and others is whether statistics and data science truly are different? or whether, underneath the buzzwords of “machine learning” and “AI”, data science is a rebranding of statistics? This is not a new question; it has been the subject of many discussions, for example in one of our previous blogs, the media and other RSS events.

This meeting explored the differences and the commonalities of statistics and data science.

There are certainly a lot of techniques in common (for example classification and regression trees or clustering) and the two fields often have different names or terminology for the same techniques (for example Gaussian processing and kriging, hypothesis testing and A-B testing). Both roles also require good communication skills, programming skills and the ability to extract usable insights from data.

Some subtle potential differences, or differences in emphasis were also discussed. While statisticians have programming skills, data scientists may use a wider range of programming languages. Statisticians are possibly seen as being more focussed on assumptions than data scientists. Many of the traditional statistical techniques are interpretable whereas the more modern machine learning techniques such as random forests or neural network are ‘black boxes’, producing predictions rather than interpretations. But all that said, the two disciplines have more in common than there are differences. Both focus on applying analytical techniques to solve Real World problems with data. And while data science is seen as new and fashionable, a lot of the machine learning techniques applied by data scientists are statistical and commonly applied by statisticians in their work.

The post What’s in a name: Statistics vs Data Science appeared first on Select Statistical Consultants.

COVID-19 Vaccines: What Does 90% Effective Mean?

Sally Hunton — Wed, 25 Nov 2020 16:31:38 +0000

In the past few weeks, we’ve heard from multiple pharmaceutical companies that interim results of their Covid-19 vaccines are showing surprisingly good results. Pfizer and BioNTech announced their vaccine candidate was found to be more than 90% effective and less than a week later Moderna announced that their vaccine was protecting 94.5% of people.

But what does this actually mean? How do we know what percentage of a population a vaccine is able to protect when we don’t know how many people may have been exposed to the virus? In standard clinical drug trials the treatments are given to people known to be suffering from the disease in question. This is not the case with infectious diseases where not everyone will be exposed to the infection, and even if they are exposed, they may not develop a case of their own. Researchers don’t know for certain who will contract COVID-19 which makes it difficult to decipher how well the vaccine has protected people.

How do we know who a vaccine is able to protect when we don’t know how many people may have been exposed?

All clinical trials go through several phases and it’s not until phase III where they look at how well the treatment or vaccine works on a large number of participants. Usually half of the participants who are enrolled into a trial will receive the vaccine and the other half (known as the control group) will receive a placebo — likely an identical looking vaccine which contains no therapeutic effect. Using a placebo means that we can calculate the prevalence of the virus (the proportion of people suffering from the disease) in the control group, and then compare this to the group who did receive the active vaccine.

Trials are usually run blind, meaning that the participants don’t know whether they receive the treatment or the placebo. In double-blind trials the scientists as well as the participants do not know which participants receive the treatment or placebo. So scientists running the trial will count the number of participants who are confirmed to have COVID-19, without knowing whether those people had been given the candidate vaccine or the placebo. Once the number of cases reaches a previously agreed upon number (determined by sample size calculation), the results are “unblinded” and it is revealed how many of the cases were contracted by participants given the vaccine compared to those given the placebo.

To calculate how effective the vaccine is, we compare the proportion of positive cases in the vaccinated group to the proportion of positive cases in the placebo group:

But, since we have the same numbers of participants in each group, we can simply calculate

we get the percentage effectiveness of the vaccine.

Now, looking at the results that have been in the news we can use this to interpret what they actually mean. Pfizer’s interim analysis reported that the number of cases in their trial participants had reached 94. At this point they would have unblinded the trial so that the scientists could see the number of cases among individuals who had been given the candidate vaccine. To reach 90% effectiveness, they would have found 8 or fewer cases among the vaccinated group out of the 94 total.

More recently, Pfizer and BioNTech have announced that they’ve reached the end of their phase III study. They detail that there were 170 confirmed cases of COVID-19 within their study participants, with 162 in the placebo group and still only 8 in the vaccinated group, increasing their effectiveness to 95%.

Moderna also recently published their interim analysis results. Out of 95 cases, only 5 were in the vaccine group which gives an effectiveness of 94.5%.

More and more companies are announcing similarly positive results, including encouraging results from the Oxford vaccine. Thanks to the use of placebos, researchers are able to calculate how effective the vaccine is which is imperative in the process of a vaccine gaining approval.

The post COVID-19 Vaccines: What Does 90% Effective Mean? appeared first on Select Statistical Consultants.

What the A Level Grade Scandal Can Show Us about Algorithmic Bias

Sally Hunton — Tue, 06 Oct 2020 10:21:28 +0000

Recently, we have seen a particularly visible example of artificial intelligence make the news. UK A level grades were calculated using an algorithm since students were unable to sit exams due to the Covid-19 pandemic. There was outrage since many students received grades substantially lower than those they had been expecting, predicted or needed to confirm their university place. After the backlash, the national governments each reversed their position and allowed students to be awarded their predicted grades instead.

Artificial intelligence (AI) is a term used for a huge range of methods of automated decision making techniques based on data rather than human evaluation. These techniques are being used more and more frequently in many sectors, from healthcare to social media and personalised advertising. They are also being used at an increasing rate, due to the improvements and ease of access to the right technology.

While AI can improve efficiency and sometimes accuracy for many tasks, there are a lot of ways in which it can be unfair. This is usually referred to as algorithmic bias. Examples include giving arbitrary population groups preferential treatment, for example discrimination based on gender or race.

Algorithmic bias can happen at several different stages of the data collection, model development and deployment of the results and insights. A big problem is often the data used which underpin the models. Biases can be introduced if the data used is not properly representative of the population you are interested in. If you build an algorithm based on a data source consisting of 90% men, features of the data which are more closely associated with men than women will be given more preference in the outcome, despite the fact that the algorithm may be used in decision making about men and women equally. This may sound like an exaggerated example but since data collection can be very difficult, time consuming and expensive (often due to ethical and privacy considerations), it is far simpler to use existing data, even if it is not ideal.

Even if your data is perfectly representative, biases can be brought in based on historical trends. An algorithm can only “know” what it has learned from the data used to build it. Some criticisms of the A level results algorithm have been related to the issues of only using a narrow set of historical data. The algorithm places most weight on a student’s school’s performance over the past three years. This has resulted in preferential treatment to the private and high-achieving state schools while students from lower-achieving schools have a restricted potential to achieve the highest grades – due to their school’s historical performance and not the students’ own personal attainment. While this method has allowed this years’ results to be in keeping with the past few years of grades, it means that on the individual level, many students were given unfair treatment.

The A level results algorithm has been an example of when individuals are acutely aware of how fair the model has been for them: grades are a very clear outcome, and students not getting what they were expecting can have a clear and lasting impact on their future in terms of getting into university and beginning their careers. This is a noteworthy occasion as often AI decisions are use within large systems which can allow biases to go unchecked. The public engagement with the idea of an algorithm which has the potential for misuse and unfairness is hopefully showing how we should be very careful going forward using AI. We should not blindly accept the outcomes of algorithms since there are many ways biases can influence the results in unfair manners.

The outcry over the results did lead to the governments’ U-turn in allowing students to use their teacher-predicted grades. Despite the grade inflation this will result in (when asked, teachers responded that their predicted grades reflected what they thought a student could get on a good day), since each predicted grade was made by a teacher with personal knowledge of the individual student, there is less of a chance of systematic bias based on school.

The post What the A Level Grade Scandal Can Show Us about Algorithmic Bias appeared first on Select Statistical Consultants.

Assessing the Diagnostic Accuracy of a Dyslexia Screening Tool

Sally Hunton — Mon, 10 Feb 2020 11:10:14 +0000

The Challenge

Dyslexia is a learning difficulty estimated to affect up to 1 in 10 of the UK population. It can cause people difficulty with learning to read and write, comprehension of written instructions, carrying out sequences of directions, and planning and organisation. By getting the appropriate diagnosis, those with dyslexia can benefit from support with various management techniques. Children with dyslexia can make the most of educational interventions such as one-to-one lessons with a specialist teacher, and extra time on exams, allowing them to reach their full potential. Assistance is also available in the workplace where employers are required by law to make reasonable adjustments for those with dyslexia.

In order to receive a formal diagnosis, people with dyslexia must have an in-depth assessment carried out by an educational psychologist or qualified dyslexia specialist. These assessments can cost upwards of £500, which can be a barrier to those seeking a diagnosis and getting support with their learning difficulties.

Pico Educational Systems have developed the QuickScan Questionnaire and QuickScreen Dyslexia Test (QS Dyslexia Tests) in order to offer an accessible, online dyslexia screening tool. The QS

Dyslexia Tests, which are part of the British Dyslexia Association (BDA) Assured Scheme, provide an indication of dyslexia (as well as a detailed report of the test results and valuable insights into learning styles, abilities and difficulties) at a much lower cost than a full educational psychologist’s assessment. This can help individuals to identify whether they have positive dyslexia symptoms and obtain suggestions for training and development. Those who are identified as being at a higher risk of having the condition may then choose to undertake an in-depth assessment to formally confirm their diagnosis.

An essential step in the evaluation process of any diagnostic or screening test is to assess its accuracy. How useful a screening tool such as this is, depends in part on how much confidence we can have in the indication it provides.

The Solution

In order to help assess the accuracy of the QuickScreen Dyslexia Test, we carried out an analysis of observational data compiled by Pico Educational Systems to produce a number of diagnostic accuracy measures.

Sensitivity (true positive rate),
Specificity (true negative rate),
Positive predictive value (PPV), and
Negative predictive value (NPV).

The sensitivity of a diagnostic test indicates how good it is at identifying people with the condition in question. It tells us the probability that someone who is dyslexic is identified as such by the test. The specificity of a diagnostic test indicates how good it is at identifying people who do not have the condition, so, the probability that someone who is not dyslexic is identified as not dyslexic by the test.

The predictive values tell us the “post-test probabilities”. So, the positive predictive value (PPV) tells us the probability of a participant being dyslexic given a positive indication from the test. Likewise, if someone has a negative indication from the test, the negative predictive value (NPV) will tell them the probability that they do not have dyslexia, i.e., that the indication was correct.

In addition to the accuracy measures, we provided confidence intervals to capture the uncertainty in the estimated measures. Rather than taking a naïve approach to calculating the measures (i.e., taking observed proportions), we also used a transformation and applied a so-called continuity correction to provide more reliable estimates and confidence intervals.

The QuickScreen Test has five possible outcomes (None, Borderline, Mild, Moderate and Strong). Therefore, each of the four diagnostic measures were calculated for each possible indication, as well as combinations of these outcomes. For example, we estimated that 96.6% of non-dyslexic individuals will receive a QuickScreen indication of “None or Borderline”.

The Value

By calculating reliable diagnostic measures, we can describe how well the QuickScreen Test performs; assessing its accuracy in providing an indication that is likely to be consistent with an independent education psychologist’s assessment. Furthermore, the predictive values provide important information for a particular participant, answering the question “How likely is it that I have or don’t have dyslexia given the indication that I have received?”

“We are really pleased with the report Select provided on their initial analysis of our QuickScreen dyslexia test results. It doesn’t get much better than that for a first round – beautifully presented and very clearly set out.”

Dr Dee Walker – Dyslexia Consultant, Pico Educational Systems Ltd

The post Assessing the Diagnostic Accuracy of a Dyslexia Screening Tool appeared first on Select Statistical Consultants.

Capturing the Decade in a Statistic

Sally Hunton — Fri, 31 Jan 2020 11:56:31 +0000

In our last blog post, we discussed the winner of the Royal Statistical Society’s annual Statistics of the Year competition. With the start of 2020 and the end of a decade, the RSS also released their Statistics of the Decade. These are stand-out statistics that the judging panel felt captured the spirit of some of the biggest issues of the last ten years.

For the UK, the Statistic of the Decade was announced to be 0.3%: the estimated average annual increase in UK productivity in the decade or so since the financial crisis. To put this into context, this means that the UK has experienced its worst decade for productivity growth since the early 1800s, and it shows a marked difference from the level of productivity seen in the decade before the financial crisis (2%). As the Executive Director of the RSS explains, “Most people won’t have paid attention to a dull sounding number on productivity. But we think it is probably the most important UK statistic of the last decade as productivity is the single biggest key to our shared prosperity.”

We see a more positive message in one of the highly commended statistics: women now hold 30.6% of all board positions in the UK’s 350 biggest listed companies. This is up from just 9.5% in early 2011, a huge improvement over the last decade. This is an excellent illustration of what can be achieved when an issue such as this is highlighted and championed. The 30% club, in particular, was instrumental in championing and targeting improvements in FTSE100 companies.

The winner of the International Statistic of the Decade describes the deforestation in the Amazon rain forest. The deforestation in the last 10 years is around 24,000 square miles, which is a difficult size for anyone to truly visualise. The International Statistic of the Decade was 8.4 million: the estimated accumulated deforestation in the Amazon rainforest in terms of numbers of football pitches. By using something visual and familiar to peoples’ everyday lives, we are better able to really grasp the magnitude of the issue.

Professor Jennifer Rogers, chair of the judging panel and the RSS’s vice-president for external affairs, said: “Much has been discussed regarding the environment in the last few years, and the judging panel felt this statistic was highly effective in capturing one of the decade’s worst examples of environmental degradation.”

These statistics manage to take hugely important issues and encapsulate them into a single number with a clear message. They demonstrate that for issues that affect us all, how important it is to make statistics meaningful to everyone.

The post Capturing the Decade in a Statistic appeared first on Select Statistical Consultants.

Women in Data UK Conference

Sally Hunton — Fri, 20 Dec 2019 16:48:07 +0000

Last month, our consultant Sally was thrilled to have the opportunity to attend the Women in Data UK conference. Women in Data (WiD) UK is the UK’s largest female data professional network and, with only 25% of current data professionals being women, it has the hugely important role of providing positive examples and success stories to promote the interest in STEM to a diverse range of people.

This conference is a yearly event that aims to educate and inspire the WiD UK community through high-profile speakers, development sessions, training, networking and all-day exhibits. There was a large range of speakers and topics including improving workplace communication, managing stakeholders expectations and on how Artificial Intelligence can be applied and used for Social Good.

Speaking about the event Sally said, “My highlight of the day was hearing from Dame Stephanie Shirley who talked about her experiences as a pioneer in IT, starting her own extremely successful company despite facing all the difficulties as a woman in business in the 1960s. It was a very inspiring day and a great opportunity to hear about the careers of a range of successful women in data.”

The post Women in Data UK Conference appeared first on Select Statistical Consultants.

2019 Statistics of the Year: 58% and 72.6 years

Jo Morrison — Fri, 20 Dec 2019 10:04:51 +0000

The Royal Statistical Society (RSS) has just announced the winners of its annual Statistics of the Year competition.

“Statistics have a remarkable power in their ability to help us understand the key issues of the day”

RSS Vice President Jennifer Rogers, chair of this year’s judging panel.

The winning UK statistic is 58%, which is the percentage of those in the UK in relative poverty who live in a working household. One of the judges, Kelly Beaver of Ipsos MORI said, “This stark statistic really highlights one of the biggest issues facing the UK – in-work poverty. While it could be seen as positive that more people are in work, this figure shows that employment doesn’t necessarily mean an escape from poverty.”

The winning international statistic is 72.6 years. This is the estimated global life expectancy for those born in 2019 and is a new record high.

As with last year’s competition winners, we expect that these winning statistics will attract a lot of media coverage.

One of the key goals of the RSS is for statistics to be used effectively in the public interest, providing evidence for the good of society. And these winning statistics are certainly contributing to public debate. For example, last year the RSS won a ‘Best Campaign on a Shoestring’ award for its competition. It was a very successful campaign, not only raising the profile of the winning statistics themselves, showing how data can increase public awareness of important issues but also highlighting the importance that statistics can play in our lives.

Last year’s winning international statistic was 90.5%. This was “the proportion of plastic waste that has never been recycled“. This is a startling statistic and speaks volumes about the plastic waste we generate.

Earlier this year, the UK government published its own plastic statistic. The sale of single-use plastic bags has decreased by 90% since the introduction of the 5p charge. Reduction in the use of plastic bags is good news about our waste production and for the environment. However, as with many statistics, there is more to the story than initially meets the eye. Channel 4’s FactCheck team point out that the figures in the government’s press release relate only to single use plastic bags and don’t include any data on ‘bags for life’, which are typically studier, and use more plastic.

This illustrates that we shouldn’t always take statistics on face value. That we should look deeper and ask questions. We should think carefully and critically about what they tell us and what they don’t tell us. It is another goal of the RSS for society to be more statistically literate, enabling people to make informed decisions through understanding data, probability and risk. So, when we are faced with statistics, we ask questions such as what are the data they are based on? Were other factors taken into account? Has the way things are counted or calculated changed? Who was included in the sample? Are any claims of causality justified? Could the causal link be the other way around? Or, could there be another external cause or influencing factor?

Of course we can be confident that the winning RSS statistics are accurate and not misleading, and that they have a verifiable source. It is great to see bona fide statistics generating interest. The RSS competition has again been successful in illustrating how data and statistics can draw attention to important issues and contribute to informed debate.

The post 2019 Statistics of the Year: 58% and 72.6 years appeared first on Select Statistical Consultants.

Select Sleuths Save Framed Detective!

Sarah Littler — Thu, 28 Nov 2019 15:35:32 +0000

Last week the Select team ventured out of the comfort and safety of our offices at Oxygen House, leaving behind our statistical codes and entering an office full of cryptic codes and puzzles, and a sinister tale of betrayal and danger…

Our challenge was to tackle the mystery of “The Shadow Darkens” and help clear the good name of Detective Jack Armstrong. Arriving in Jack’s office and the world of 1940s film noir, we followed the clues, took on a series of increasingly fiendish puzzles, and assembled the evidence in order to earn our freedom and make our escape. Despite some jumpy moments (where the spring-loaded clues quite literally hit us between the eyes!), we successfully made it out, largely unscathed, and with a few minutes to spare!

We had a fantastic time testing our sleuthing skills and would like to thank Red House Mysteries for a fun filled afternoon, which was followed by some delicious, hearty grub and a debrief at The Fat Pig.

Images courtesy of Red House Mysteries Ltd.

The post Select Sleuths Save Framed Detective! appeared first on Select Statistical Consultants.