xyzg198891的个人博客分享 http://blog.sciencenet.cn/u/xyzg198891

博文

Python统计单词频数

已有 2357 次阅读 2016-11-4 11:51 |个人分类:Python|系统分类:科研笔记| 单词频数

import re

from collections import Counter


#define a function to print the result by line

def printByLine(tuples):

   return( 'n'.join(' '.join(map(str,t)) for t in tuples))


#define a function to print the result alphabetically

def countsSortedAlphabetically(counter, **kw):

   return sorted(counter.items(), key = lambda counter:counter[0], **kw)


#open the file

myfile = open("test.txt")

#convert to lower case

myfile = myfile.read().lower()

#match words and save them in a list

words = re.findall(r"w+", myfile)

#calculate the counter of words and save the result in a list

counter = Counter(words).most_common(10)

myfile.close()


print counter

print

print printByLine(counter)

print

print printByLine(countsSortedAlphabetically(dict(counter)))


f = open("test_result.txt",'wb')

#The argument a of this function must be string or buffer

#I can't write printByLine results into test_result.txt for the moment

f.write(str(counter))

f.close()




https://wap.sciencenet.cn/blog-645111-1012675.html

上一篇:Python统计字母频数和频率
下一篇:Python提取句子
收藏 IP: 110.201.37.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-30 10:32

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部