특정 품사 추출하기

형태소 분석 결과가 포함된 tagged_words.txt 파일에서 명사, 동사, 형용사를 추출하고, 각 품사에 해당하는 단어들을 별도의 텍스트 파일에 저장해 봅시다.

명사, 동사, 형용사를 저장할 출력 파일의 경로를 각각 설정합니다.

tagged_input_file = 'tagged_words.txt'
nouns_output_file = 'nouns.txt'
verbs_output_file = 'verbs.txt'
adjectives_output_file = 'adjectives.txt'

결과 저장을 위한 빈 리스트 설정합니다.
```
nouns = []
verbs = []
adjectives = []
```
태깅된 단어 파일을 읽고, 명사, 동사, 형용사를 판별합니다.
```
with open(tagged_input_file, 'r', encoding='utf-8') as infile:
  for line in infile:
    word_list = line.strip().split()
    
    for word_tag in word_list:
      if '/' in word_tag:
        word, tag = word_tag.rsplit('/', 1)
        
        if 'NOUN' in tag:
          nouns.append(word)
        elif 'VERB' in tag:
          verbs.append(word)
        elif 'ADJF' in tag or 'ADJS' in tag:
          adjectives.append(word)
```
단어와 태그를 분리하기 위해 word_tag를 /를 기준으로 나누어, 단어(word)와 품사 태그(tag)를 추출합니다. 여기서 1은 마지막 /를 기준으로 나눈다는 의미입니다. 품사 판별에서, 형용사 태그는 ADJF와 ADJS 두 가지가 있으므로, 품사 태그가 ADJF 또는 ADJS인 경우, 형용사로 판별하여 adjectives 리스트에 추가합니다.

결과를 텍스트 파일로 저장하겠습니다.

with open(nouns_output_file, 'w', encoding='utf-8') as nouns_file:
  for noun in nouns:
    nouns_file.write(noun + '\\n')

with open(verbs_output_file, 'w', encoding='utf-8') as verbs_file:
  for verb in verbs:
    verbs_file.write(verb + '\\n')

with open(adjectives_output_file, 'w', encoding='utf-8') as adjectives_file:
  for adjective in adjectives:
    adjectives_file.write(adjective + '\\n')

명사, 동사, 형용사의 개수를 출력하여, 해당 품사의 단어 수를 출력합니다.

print('명사 개수:', len(nouns))
print('동사 개수:', len(verbs))
print('형용사 개수:', len(adjectives))

지금까지의 과정을 모두 포함하면 다음과 같이 코드가 완성됩니다.

tagged_input_file = 'tagged_words.txt'
nouns_output_file = 'nouns.txt'
verbs_output_file = 'verbs.txt'
adjectives_output_file = 'adjectives.txt'

nouns = []
verbs = []
adjectives = []

with open(tagged_input_file, 'r', encoding='utf-8') as infile:
  for line in infile:
    word_list = line.strip().split()
    
    for word_tag in word_list:
      if '/' in word_tag:
        word, tag = word_tag.rsplit('/', 1)
        
        if 'NOUN' in tag:
          nouns.append(word)
        elif 'VERB' in tag:
          verbs.append(word)
        elif 'ADJF' in tag or 'ADJS' in tag:
          adjectives.append(word)

with open(nouns_output_file, 'w', encoding='utf-8') as nouns_file:
  for noun in nouns:
    nouns_file.write(noun + '\\n')

with open(verbs_output_file, 'w', encoding='utf-8') as verbs_file:
  for verb in verbs:
    verbs_file.write(verb + '\\n')

with open(adjectives_output_file, 'w', encoding='utf-8') as adjectives_file:
  for adjective in adjectives:
    adjectives_file.write(adjective + '\\n')

print('명사 개수:', len(nouns))
print('동사 개수:', len(verbs))
print('형용사 개수:', len(adjectives))

주석 보기

화살표 버튼을 클릭하여 셀을 실행합니다.

다음과 같이 출력됩니다.

품사 추출 작업이 끝났습니다. 각 품사에 해당하는 adjectives.txt, nouns.txt, verbs.txt 파일이 자동 저장됩니다.