当前访客身份:游客 [ 登录 | 加入 OSCHINA ]

代码分享

当前位置:
代码分享 » Python  » 编程基础
intergret

初识聚类算法: DBSACN

intergret 发布于 2012年10月22日 21时, 4评/5331阅
分享到: 
收藏 +0
2

       DBSCAN是一种简单的,基于密度的聚类算法。本次实现中,DBSCAN使用了基于中心的方法。在基于中心的方法中,每个数据点的密度通过对以该点为中心以边长为2*EPs的网格(邻域)内的其他数据点的个数来度量。根据数据点的密度分为三类点:
   
核心点:该点在邻域内的密度超过给定的阀值MinPs
   
边界点:该点不是核心点,但是其邻域内包含至少一个核心点。
   
噪音点:不是核心点,也不是边界点。
    
有了以上对数据点的划分,聚合可以这样进行:各个核心点与其邻域内的所有核心点放在同一个簇中,把边界点跟其邻域内的某个核心点放在同一个簇中。

      详见博客:http://blog.sina.com.cn/s/blog_62186b460101ard2.html

标签: <无>

代码片段(2) [全屏查看所有代码]

1. [代码][Python]代码     跳至 [1] [全屏预览]

# scoding=utf-8
import pylab as pl
from collections import defaultdict,Counter

points = [[int(eachpoint.split("#")[0]), int(eachpoint.split("#")[1])] for eachpoint in open("points","r")]

# 计算每个数据点相邻的数据点,邻域定义为以该点为中心以边长为2*EPs的网格
Eps = 10
surroundPoints = defaultdict(list)
for idx1,point1 in enumerate(points):
	for idx2,point2 in enumerate(points):
		if (idx1 < idx2):
			if(abs(point1[0]-point2[0])<=Eps and abs(point1[1]-point2[1])<=Eps):
				surroundPoints[idx1].append(idx2)
				surroundPoints[idx2].append(idx1)

# 定义邻域内相邻的数据点的个数大于4的为核心点
MinPts = 5
corePointIdx = [pointIdx for pointIdx,surPointIdxs in surroundPoints.iteritems() if len(surPointIdxs)>=MinPts]

# 邻域内包含某个核心点的非核心点,定义为边界点
borderPointIdx = []
for pointIdx,surPointIdxs in surroundPoints.iteritems():
	if (pointIdx not in corePointIdx):
		for onesurPointIdx in surPointIdxs:
			if onesurPointIdx in corePointIdx:
				borderPointIdx.append(pointIdx)
				break

# 噪音点既不是边界点也不是核心点
noisePointIdx = [pointIdx for pointIdx in range(len(points)) if pointIdx not in corePointIdx and pointIdx not in borderPointIdx]

corePoint = [points[pointIdx] for pointIdx in corePointIdx]	
borderPoint = [points[pointIdx] for pointIdx in borderPointIdx]
noisePoint = [points[pointIdx] for pointIdx in noisePointIdx]

# pl.plot([eachpoint[0] for eachpoint in corePoint], [eachpoint[1] for eachpoint in corePoint], 'or')
# pl.plot([eachpoint[0] for eachpoint in borderPoint], [eachpoint[1] for eachpoint in borderPoint], 'oy')
# pl.plot([eachpoint[0] for eachpoint in noisePoint], [eachpoint[1] for eachpoint in noisePoint], 'ok')

groups = [idx for idx in range(len(points))]

# 各个核心点与其邻域内的所有核心点放在同一个簇中
for pointidx,surroundIdxs in surroundPoints.iteritems():
	for oneSurroundIdx in surroundIdxs:
		if (pointidx in corePointIdx and oneSurroundIdx in corePointIdx and pointidx < oneSurroundIdx):
			for idx in range(len(groups)):
				if groups[idx] == groups[oneSurroundIdx]:
					groups[idx] = groups[pointidx]

# 边界点跟其邻域内的某个核心点放在同一个簇中
for pointidx,surroundIdxs in surroundPoints.iteritems():
	for oneSurroundIdx in surroundIdxs:
		if (pointidx in borderPointIdx and oneSurroundIdx in corePointIdx):
			groups[pointidx] = groups[oneSurroundIdx]
			break

# 取簇规模最大的5个簇
wantGroupNum = 3
finalGroup = Counter(groups).most_common(3)
finalGroup = [onecount[0] for onecount in finalGroup]

group1 = [points[idx] for idx in xrange(len(points)) if groups[idx]==finalGroup[0]]
group2 = [points[idx] for idx in xrange(len(points)) if groups[idx]==finalGroup[1]]
group3 = [points[idx] for idx in xrange(len(points)) if groups[idx]==finalGroup[2]]

pl.plot([eachpoint[0] for eachpoint in group1], [eachpoint[1] for eachpoint in group1], 'or')
pl.plot([eachpoint[0] for eachpoint in group2], [eachpoint[1] for eachpoint in group2], 'oy')
pl.plot([eachpoint[0] for eachpoint in group3], [eachpoint[1] for eachpoint in group3], 'og')

# 打印噪音点,黑色
pl.plot([eachpoint[0] for eachpoint in noisePoint], [eachpoint[1] for eachpoint in noisePoint], 'ok')	

pl.show()

2. [图片] DBscan.png    



开源中国-程序员在线工具:Git代码托管 API文档大全(120+) JS在线编辑演示 二维码 更多»

发表评论 回到顶部 网友评论(4)

  • 1楼:小小小新 发表于 2013-12-29 14:46 回复此评论
    哪句话是导入数据的?
  • 2楼:HaiziGe 发表于 2016-03-10 21:52 回复此评论
    楼主,源数据能分享一下吗?这是我的邮箱707895571@qq.com,非常感谢~
  • 3楼:马富天 发表于 2016-05-29 13:07 回复此评论
    同求数据源~~邮箱335134463@qq.com
  • 4楼:马富天 发表于 2016-05-29 13:14 回复此评论
    楼主,请问能否分享一下数据源呢~
开源从代码分享开始 分享代码
intergret的其它代码 全部(35)...