Perceived Persuasiveness Scale


We developed and validated a scale to measure the perceived persuasiveness of messages to be used in digital behaviour interventions. A literature review is conducted to inspire the initial scale items. The scale is developed using Exploratory and Confirmatory Factor Analysis on the data from a study with 249 ratings of healthy eating messages. The construct validity of the scale is established using ratings of 573 email security messages. ArguMessage, the persuasive message generation system was used to create a corpus of messages for the studies in this work.


The research questions addressed are:

1. what is a reliable scale to measure perceived persuasiveness?

2. how valid is the developed perceived persuasiveness scale?


Study 1: Development of a Perceived Persuasiveness Scale

We conducted a study to develop a rating scale to measure the ’perceived persuasiveness’ of messages. The aim was to obtain a scale with good internal consistency, and with three items per factor.

Participants. The participants for this study were recruited by sharing the link of the study via social media and mailing lists. The study had four validation questions to check if participants were randomly rating the messages. After removing such participants, a total of 92 participants rated 249 messages.

Table 1

Procedure. Each participant was shown a set of five messages (see Table 1) each promoting healthy eating. These messages were based on different argumentation schemes and were produced in the evaluation study using ArguMessage. We picked these argumentation schemes as the corresponding Cialdini’s principles performed more effectively than the others (see Study 2 in Personalising Messages using User Characteristics). Each message was rated using 34 scale items (the scale items marked with * act as validation checks in Table 2) on a 7-point Likert scale that ranges from ’strongly disagree’ to ’strongly agree’ (see Table 1 and Figure 1). Finally, participants were given the option to provide feedback.

Figure 1

Results. First, we checked the Kaiser-Meyer-Olkin Measure of Sampling Adequacy, which was greater than 0.90. According to this measure, values in the 0.90's indicate that the sampling adequacy is “marvelous”. Next, we investigated the inter-item correlations. For the factor analysis, all the 7-point scale items were considered as ordinal measures. To further filter the items and identify the factors, we conducted an Exploratory Factor Analysis (EFA) using Principal Component Analysis extraction and Varimax rotation with Kaiser Normalization. Varimax rotation was used as the matrix was confirmed orthogonal (the Component Correlation Matrix shows that the majority of the correlations was less than 0.5). We obtained three factors (see Table 2). The first factor we named Effectiveness as its items relate to user behaviour and attitude changes and attainment of user goals. The second we named Quality as its items relate to characteristics of a message strength such as trustworthiness and appropriateness. The third we named Capability as its items relate to the potential for motivating users to change behaviour. We removed the 13 items that cross-loaded on different factors (see Table 2 with scale items marked ®). This resulted in the reduced scale items for the three factors (see Table 3). We checked the Cronbach's Alpha of all the items belonging to the three factors separately. It was greater than 0.9 for each of the three factors which indicates “excellent” scale reliability.

Table 2

Next, we conducted Confirmatory Factor Analyses (CFA) to determine the validity of the scale, and to confirm the factors and items by checking the model fit. Based on these analyses, 8 items were removed due to high Standardized Residual Covariances with several other items which were greater than 0.4. The items removed are the items in Table 3 marked ®.

Table 3

Table 4 shows the resulting scale of 9 items. The final Confirmatory Factor Analysis resulted in the following values for the Tucker-Lewis Index (TLI) = 0.988, Comparative Fit Index (CFI) = 0.993, and Root Mean Square Error of Approximation (RMSEA) = 0.054, when extracting the three factors and their items. A cut off value nearing 0.95 for TLI and CFI (the higher the better) and a cut off value nearing 0.60 for RMSEA (the lower the better) are required to establish that there is an acceptable model fit between the hypothesized model and the observed data. In the resulting scale, the TLI and CFI are above 0.95 and RMSEA is below 0.60, which shows an acceptable model fit.

Table 4

Study 2: Validation of the Perceived Persuasiveness Scale

Next, we conducted a study to determine the construct validity of the developed scale. We replicated the scale-testing in the domain of email security using another data set.

Participants. The participants for this study were recruited by sharing the link of the study via social media and mailing lists. After removing the invalid participants (as before), a total of 134 participants rated 573 messages.

Table 5

Procedure. Each participant was shown a set of five messages (see Table 5) that promote email security, again based on argumentation-schemes. Each message was rated using the scale (see Table 4 and Figure 2) that resulted from Study 1. Finally, participants were given the option to provide feedback.

Figure 2

Results. To determine the construct validity of the developed scale in Study 1 and replicate the scale-testing, we:

1. Used an 80-20 split validation on the original dataset of Study 1. With this specific combination, the developed scale resulted in an acceptable model fit for 80% (TLI = 0.975, CFI = 0.985, RMSEA = 0.081) and 20% of the data (TLI = 0.975, CFI = 0.985, RMSEA = 0.080).

2. Used the dataset obtained from the validation in Study 2. With this dataset, the developed model resulted in an acceptable fit (TLI = 0.984, CFI = 0.990, RMSEA = 0.071).


We developed and validated a perceived persuasiveness scale to be used when conducting studies on digital behaviour interventions which didn’t exist. We conducted two studies in different domains to develop and validate this scale, namely in the healthy eating domain and the email security domain. The validated scale has 3 factors (Effectiveness, Quality, and Capability) and 9 scale items. The validated scale developed in this work can be used to improve such studies and will make it easier to compare the results of different studies and in different domains. We plan to use the scale to study the impact of message personalization across domains.

The work presented in this paper has several limitations. Firstly, we validated the scale in two domains (healthy eating and email security), and this validation needs to be extended to more domains. Secondly, the scale reliability needs to be verified. To investigate this, we need to perform a test-retest experiment in which participants complete the same scale on the same items twice, with an interval of several days between the two measurements. This also would need to be done in multiple domains. Thirdly, we need to repeat our studies into the impact message types with more messages and in more domains.