Sample output 3

The table below represents a ranked list of terms extracted automatically from a corpus blog posts about chronic obstructive pulmonary disease. Term variants grouped together in the second column represent lowercased versions of all surface forms found in the corpus. For each term, the variants are ordered by their frequency of occurrence in the corpus. The score in the rightmost column is called termhood and is calculated as a function of frequency and length (in tokens minus stopwords).

Rank Term variants Score
1 copd
chronic obstructive pulmonary disease
copd disease
153.5899
2 chronic disease
76.6994
3 pulmonary rehab
pulmanory rehab
13.7763
4 breathe easy
9.9351
5 vitamin d
9.7041
6 lung transplantation
lung transplant
lung transplants
lung transplantations
8.3178
7 copd blog
6.4378
8 breathe easy groups
breath easy groups
breathe easy group
6.0424
9 chest infection
chest infections
5.5452
9 quality of life
5.5452
10 blood pressure
4.8520
11 copd exacerbation
exacerbation of copd
4.8283
12 lung function
4.5055
13 ube
upper body ergometer
4.3944
13 british lung foundation
4.3944
14 rehab room
4.1589
15 lung diseases
4.0203
16 easy group
3.9278
17 shortness of breath
3.4657
18 support group
support groups
3.2347
19 blue badge
2.7726
19 strength endurance
2.7726
19 flu jab
flu jabs
2.7726
19 wealth of information
2.7726
20 pulmonary rehab course
pulmonary rehab courses
2.1972
20 team relay event
2.1972
21 hope air
2.0794
21 right arm
2.0794
21 cold weather
2.0794
21 front door
2.0794
21 flu shots
2.0794
21 immune system
2.0794
22 rehab course
1.7329
23 hospital admission
1.3863
23 antitrypsin deficiency
antitrysin deficiency
1.3863
23 blood test
1.3863
23 lung capacity
1.3863
23 car park
parking car
1.3863