Student Voice

Using Machine Learning for Automated Language Analysis

By David Griffin

Humans are naturally slow to recognise patterns in text. However, this ability is vitally important for many applications, from monitoring political discourse, to assessing the functionality of machine learning programs.

To address this challenge, Zhong et al. (2022) recently published a paper outlining the methods they used to recognise and describe differences between two distributions of text. In short, they sought to create criteria which is more often applicable to one text distribution than the other. To illustrate this, the authors used the simple example of social media comments related to the current SARS-CoV-2 pandemic. If publicly posted comments from two consecutive years were considered, each year would form a distinct individual distribution. The criteria used to differentiate between the two distributions might be that one contains more optimistic language. In this way, machine learning could be employed to efficiently gauge public opinion on the pandemic, or even more specifically, a government’s response to it.

The method developed by Zhong et al. (2022) uses Generative Pre-trained Transformer 3 (GPT-3) (Brown et al., 2020), a machine learning tool capable of producing human-like text in response to inputs. In this case, the inputs were the two text distributions to be differentiated between.

Due to the way GPT-3 operates, if the distributions are large, they cannot be used as inputs in their entirety. Consequently, a sample from each distribution must be used instead. From these samples, natural language hypotheses are proposed about each sample. These ‘candidate hypotheses’, as the authors described them, should describe an aspect of the sample which enables differentiation between its source distribution and the other. In the social media comment example previously outlined, the candidate hypothesis could be “is optimistic about the pandemic” (Zhong et al., 2022).

Though GPT-3 produces human-like text, it was not specifically designed to generate hypotheses in this manner. As a result, there was no existing corpus (a library of written and spoken material which can be used for the training of machine learning technologies) for the authors to utilise.

To overcome this challenge, they developed a system to fine tune the hypothesis proposals generated. This involved three stages:

  1. A list of candidate hypotheses was first made. This involved a combination of manually writing hypotheses by hand and using those generated by GPT-3. Using the social media example, a hypothesis might be “is optimistic about the pandemic”.

  2. GPT-3 was then used to generate both positive and negative samples based on each candidate hypothesis. Positive samples are those which fulfill the criteria of a given hypothesis, negative samples are those which do not. A positive sample comment in the given example might be the optimistic statement, “hospitalisations are reducing”. A negative sample comment could be “hospitalisations are increasing”.

  3. The generated samples were then manually screened to ensure they truly fulfilled the criteria of the respective hypothesis. Or, in the case of negative samples, truly did not fulfill the criteria.

At this stage in the process, for each given hypothesis, the authors had a large selection of samples which definitively fulfilled or did not fulfill it. These samples were then used to train GPT-3 to develop its own hypothesis. In this way, GPT-3 was fine-tuned to produce a hypothesis in natural language in response to inputted samples from two distinct distributions.

In testing using 54 binary classification datasets (Zhong et al., 2021), when compared with manual human annotations the classifier developed by Zhong et al. (2022) produced similar outputs in a promising 76% of cases after fine-tuning.

There are several challenges to classifying text in this manner, however, as stressed by the authors. Natural language is inherently ambiguous and open to interpretation. Both the language used within distributions and hypotheses have the potential to be imprecise or hold biases. This can be further exasperated by cultural and social differences. The results of the 54 classification datasets used to test this system were also validated by hand using manual annotation.

This is time and resource intensive, however, at present there is no alternative. Yet, the significance of these challenges pales, when the range of potential applications for this form of automated language analysis is considered. The authors stress that while general text distribution was the focus of this work, applications could include analysis of anything which involves a language output. This could include analysing and comparing a vast range of human experiences, from tastes (Nozaki & Nakamoto, 2018) to traumatic events (Demszky et al., 2019).

Furthermore, it could even be applied in forms of psychological profiling, through the identification of writing patterns associated with specific psychological signatures (Boyd & Pennebaker, 2015). The authors suggest that, in effect, the list of potential applications is endless.

FAQ

Q: How does the approach of using GPT-3 for text analysis compare to traditional methods of text analysis in terms of efficiency and accuracy, especially in educational settings where Student Voice is paramount?

A: The approach of using GPT-3 for text analysis, as described by Zhong et al. (2022), represents a significant advancement over traditional methods in both efficiency and accuracy. Traditional text analysis often relies on manual coding and analysis, which can be time-consuming and subject to human error. GPT-3, on the other hand, can process and analyse large volumes of text rapidly, identifying patterns and nuances that might be missed by human analysts. This is particularly beneficial in educational settings where capturing and understanding Student Voice is crucial. By efficiently processing student feedback, comments, and discussions, GPT-3 can provide insights into student experiences and needs more quickly and accurately, supporting timely and informed decision-making in education.

Q: What implications does the method developed by Zhong et al. (2022) have for enhancing Student Voice in educational research and policy-making?

A: The method developed by Zhong et al. (2022) has significant implications for enhancing Student Voice in educational research and policy-making. By employing GPT-3 to analyse text distributions, such as student feedback and social media comments, educators and policymakers can gain deeper insights into the opinions, experiences, and needs of students. This method allows for the efficient and accurate analysis of vast amounts of text, making it easier to identify trends, concerns, and suggestions from students. Consequently, this can lead to more student-centred educational policies and practices, as decisions can be based on a comprehensive understanding of Student Voice, ensuring that educational strategies are more responsive and tailored to student needs.

Q: How does the method address potential biases in text analysis, and what are the implications for analysing texts that represent diverse Student Voices?

A: The method described by Zhong et al. (2022) addresses potential biases in text analysis through a combination of manual screening and the fine-tuning of GPT-3 using both positive and negative samples. This approach helps mitigate biases that may arise from the inherent ambiguity of natural language and the cultural or social context of the texts. By ensuring that the samples used for training GPT-3 are diverse and representative of different viewpoints, the method can more accurately capture and reflect diverse Student Voices. This is crucial for educational research and policy-making, as it ensures that the insights gained from text analysis are inclusive and reflective of the entire student body. By addressing biases and enhancing the representation of diverse voices, this method supports the development of more equitable and inclusive educational policies.

References

[Source Paper] Zhong, R., Snell, C., Klein, D., Steinhardt, J. 2022. Summarizing differences between text distributions with natural language (Preprint).
DOI: 10.48550/arXiv.2201.12323

[1] Boyd, R. L. and Pennebaker, J. W. 2015. Did Shakespeare write double falsehood? Identifying individuals by creating psychological signatures with text analysis. Psychological science, 26(5):570–582.
DOI: 10.1177%2F0956797614566658

[2] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., et al. 2020. Language models are few-shot learners. NeurIPS. Vancouver, CA.
DOI: 10.48550/arXiv.2005.14165

[3] Demszky, D., Garg, N., Voigt, R., Zou, J. Y., Gentzkow, M., Shapiro, J. M., Jurafsky, D. 2019. Analyzing polarization in social media: Method and application to tweets on 21 mass shootings. NAACL, arXiv 2019
DOI: 10.48550/arXiv.1904.01596

[4] Nozaki, Y., Nakamoto, T. 2018. Predictive modeling for odor character of a chemical using machine learning combined with natural language processing. PloS one, 13 (6): e0198475.
DOI: 10.1371/journal.pone.0198475

[5] Zhong, R., Lee, K., Zhang, Z., Klein, D. 2021. Adapting language models for zero-shot learning by meta – tuning on dataset and prompt collections. Findings of the Association for Computational Linguistics 2856–2878. Dominican Republic.
DOI: 10.48550/arXiv.2104.04670

Related Entries