빈도 구하기

특정 품사 추출하기 작업의 결과인 adjectives.txt, nouns.txt, verbs.txt 파일로 실습을 진행하겠습니다.

빈도수를 계산하기 위해 Counter 클래스를 임포트합니다.
```
from collections import Counter
```

파일 경로를 설정합니다.

nouns_output_file = 'nouns.txt'
verbs_output_file = 'verbs.txt'
adjectives_output_file = 'adjectives.txt'

명사, 동사, 형용사의 빈도수를 계산합니다.

with open(nouns_output_file, 'r', encoding='utf-8') as nouns_file:
  noun_counts = Counter(nouns_file.read().splitlines())

with open(verbs_output_file, 'r', encoding='utf-8') as verbs_file:
  verb_counts = Counter(verbs_file.read().splitlines())

with open(adjectives_output_file, 'r', encoding='utf-8') as adjectives_file:
  adjective_counts = Counter(adjectives_file.read().splitlines())

빈도수 상위 10개를 출력하겠습니다.

print("명사 상위 10개:")
for word, count in noun_counts.most_common(10):
  print(f"{word}: {count}")
print("\n동사 상위 10개:")
for word, count in verb_counts.most_common(10):
  print(f"{word}: {count}")
print("\n형용사 상위 10개:")
for word, count in adjective_counts.most_common(10):
  print(f"{word}: {count}")

지금까지의 과정을 모두 포함하면 다음과 같이 코드가 완성됩니다.

from collections import Counter

nouns_output_file = 'nouns.txt'
verbs_output_file = 'verbs.txt'
adjectives_output_file = 'adjectives.txt'

with open(nouns_output_file, 'r', encoding='utf-8') as nouns_file:
  noun_counts = Counter(nouns_file.read().splitlines())

with open(verbs_output_file, 'r', encoding='utf-8') as verbs_file:
  verb_counts = Counter(verbs_file.read().splitlines())

with open(adjectives_output_file, 'r', encoding='utf-8') as adjectives_file:
  adjective_counts = Counter(adjectives_file.read().splitlines())

print("명사 상위 10개:")
for word, count in noun_counts.most_common(10):
  print(f"{word}: {count}")
print("\n동사 상위 10개:")
for word, count in verb_counts.most_common(10):
  print(f"{word}: {count}")
print("\n형용사 상위 10개:")
for word, count in adjective_counts.most_common(10):
  print(f"{word}: {count}")

주석 보기

화살표 버튼을 클릭하여 셀을 실행합니다.

다음과 같이 출력됩니다.

빈도 구하기 작업이 끝났습니다. 출력된 결과를 보아, 불용어 목록에 который, свой 등을 포함하여 추가적인 불용어 처리가 필요함을 알 수 있습니다.