Discussion:
counting the number of commas in a string variable
(too old to reply)
Farnood Saghi
2020-09-04 19:55:50 UTC
Permalink
So I have a variable in my datasheet named 'Symptoms', and in my questionnaire, the question associated to this variable gave the participants a list of symptoms and asked them to choose the ones they had experienced (multiple selections were allowed) and as far as scoring is concerned, the number of symptoms selected by each user would give me their score for that question. Anyway after importing my data to SPSS, this variable has been turned into a string consisting of all the symptoms selected by the user, and each of these selected symptoms is separated by a comma. I want to convert this to a numeric value, and I've figured the easiest way to do this is to write a custom function that counts the number of commas in each field and then adds 1 to it to give me the number of selected symptoms. (No of commas + 1). I know I'm supposed to use the syntax editor and manually write the syntax for this function but I don't know how to write it. Please help. Thanks in advance.
Bruce Weaver
2020-09-04 21:50:19 UTC
Permalink
Post by Farnood Saghi
So I have a variable in my datasheet named 'Symptoms', and in my questionnaire, the question associated to this variable gave the participants a list of symptoms and asked them to choose the ones they had experienced (multiple selections were allowed) and as far as scoring is concerned, the number of symptoms selected by each user would give me their score for that question. Anyway after importing my data to SPSS, this variable has been turned into a string consisting of all the symptoms selected by the user, and each of these selected symptoms is separated by a comma. I want to convert this to a numeric value, and I've figured the easiest way to do this is to write a custom function that counts the number of commas in each field and then adds 1 to it to give me the number of selected symptoms. (No of commas + 1). I know I'm supposed to use the syntax editor and manually write the syntax for this function but I don't know how to write it. Please help. Thanks in advance.
There might be a Python function to count the number of commas in a string. But here's an old fashioned NPR (no Python required) approach that works on a toy dataset I made up.

NEW FILE.
DATASET CLOSE ALL.

DATA LIST LIST / Symptoms (A150).
BEGIN DATA
"fever, dry cough, fatigue"
"aches and pains, sore throat, diarrhoea, conjunctivitis"
"headache"
"loss of taste or smell, skin rash, discolouration of fingers or toes, difficulty breathing or shortness of breath"
""
"chest pain or pressure, loss of speech or movement"
END DATA.

* Let scratch variable #L = length of string variable Symptoms.
COMPUTE #L = CHAR.LENGTH(Symptoms).
* Set N to 0 or 1 initially.
COMPUTE N = #L GT 0.
LOOP # = 1 to 500.
COMPUTE N = SUM(N, CHAR.SUBSTR(Symptoms,#,1) EQ ",").
END LOOP IF # GT #L.
FORMATS N (F2.0).
VARIABLE LABELS N "Number of symptoms".
LIST.


OUTPUT from LIST:

The variables are listed in the following order:

LINE 1: Symptoms

LINE 2: N


Symptoms: fever, dry cough, fatigue
N: 3

Symptoms: aches and pains, sore throat, diarrhoea, conjunctivitis
N: 4

Symptoms: headache
N: 1

Symptoms: loss of taste or smell, skin rash, discolouration of fingers or toes, difficulty breathing or shortness of breath
N: 4

Symptoms:
N: 0

Symptoms: chest pain or pressure, loss of speech or movement
N: 2


Number of cases read: 6 Number of cases listed: 6
Farnood Saghi
2020-09-05 00:57:37 UTC
Permalink
Post by Farnood Saghi
So I have a variable in my datasheet named 'Symptoms', and in my questionnaire, the question associated to this variable gave the participants a list of symptoms and asked them to choose the ones they had experienced (multiple selections were allowed) and as far as scoring is concerned, the number of symptoms selected by each user would give me their score for that question. Anyway after importing my data to SPSS, this variable has been turned into a string consisting of all the symptoms selected by the user, and each of these selected symptoms is separated by a comma. I want to convert this to a numeric value, and I've figured the easiest way to do this is to write a custom function that counts the number of commas in each field and then adds 1 to it to give me the number of selected symptoms. (No of commas + 1). I know I'm supposed to use the syntax editor and manually write the syntax for this function but I don't know how to write it. Please help. Thanks in advance.
There might be a Python function to count the number of commas in a string. But here's an old fashioned NPR (no Python required) approach that works on a toy dataset I made up.
NEW FILE.
DATASET CLOSE ALL.
DATA LIST LIST / Symptoms (A150).
BEGIN DATA
"fever, dry cough, fatigue"
"aches and pains, sore throat, diarrhoea, conjunctivitis"
"headache"
"loss of taste or smell, skin rash, discolouration of fingers or toes, difficulty breathing or shortness of breath"
""
"chest pain or pressure, loss of speech or movement"
END DATA.
* Let scratch variable #L = length of string variable Symptoms.
COMPUTE #L = CHAR.LENGTH(Symptoms).
* Set N to 0 or 1 initially.
COMPUTE N = #L GT 0.
LOOP # = 1 to 500.
COMPUTE N = SUM(N, CHAR.SUBSTR(Symptoms,#,1) EQ ",").
END LOOP IF # GT #L.
FORMATS N (F2.0).
VARIABLE LABELS N "Number of symptoms".
LIST.
LINE 1: Symptoms
LINE 2: N
Symptoms: fever, dry cough, fatigue
N: 3
Symptoms: aches and pains, sore throat, diarrhoea, conjunctivitis
N: 4
Symptoms: headache
N: 1
Symptoms: loss of taste or smell, skin rash, discolouration of fingers or toes, difficulty breathing or shortness of breath
N: 4
N: 0
Symptoms: chest pain or pressure, loss of speech or movement
N: 2
Number of cases read: 6 Number of cases listed: 6
Thank you so much! It worked like a charm. You've no idea how much time you saved me :)
Loading...