django外键批量数据查询减少数据库访问

2019-03-23

在Django使用外键时，可以直接通过含有外键的的model对象访问对应的外键对象的属性。例如：学生选课中：

StudentCourseRelation对象有两个外键：一个是学生，一个是课程班级。
student_course是学生选课关系的对象。
那么可以通过 student_course.student.name ，直接获得student的名称。但是这并不意味着，这部操作不需要额外的数据库查询。

事实上当你获得一个 StudentCourseRelation对象时
student_course ，仅有学生的id和课程的id是已经载入内存的，如果需要通过
student_course.student或 student_course.course访问student和course的属性的时候都是需要额外的数据库操作的。而且即使是只访问他们的id，如
student_course.student.id，也同样无法避免额外一次数据库查询。而且每使用一次这个变量都会产生一次数据库查询（没有缓存）。所以这个方法是非常低效的，不适用于大批数据的操作，比如一些统计分析型的任务。

如果现在需要读取全部的数据，而且这些数据包含外键。我想到的一个更快速的方法是：

分别读出各个model的全部数据，并以id 为key建立dict。然后依上例，使用
student_course.student_id 而不是 student_course.student.id 获取外键的id再，通过字典获取相应的数据

这里有一个实际的测试：

大概2万8千学生。27万人次选课记录。学生中有1000人左右没有选任何一门课。我们希望找出没选任何课的学生。

def get_abnormal_student():
    scs = StudentCourse.objects.all()
    students = Student.objects.all()
    stu = {}
    for s in students:
        stu[s.id] = s
    for sc in scs:
        if sc.student.id in stu:
            stu.pop(sc.student.id)
    for s in stu:
        print(s, stu[s].name)

这段代码由于使用了两次sc.student.id，所以运行速度很慢。经过测试，耗时159,769 ms，即159秒（第二次测试163秒）。
把第一个sc.student.id改为sc.student_id后，再次测试耗时19,844 ms，即19秒（第二次测试耗时20秒）。
把第二个 sc.student.id也改为sc.student_id后，仅耗时3371 ms，也就是3秒（第二次测试3秒）。

def get_abnormal_student2():
    students = Student.objects.all()
    for s in students:
        if StudentCourse.objects.filter(student=s).count() == 0:
            print(s.name)

另一种方法，耗时19秒。
StudentCourse.objects.filter(student=s).count() 换成s.studentcourse_set.count()耗时17秒

我的博客

django外键批量数据查询减少数据库访问

About

Categories

Tags

Tag Cloud

Archives

Recents