【笔记】博客朋友圈

发表于 2023-10-21 更新于 2024-04-28 阅读次数：

前言

在任意博客上实现博客朋友圈

下载项目

最好是在Linux环境下，MacOS环境实测无法下载部分依赖导致无法运行该程序

1	git clone https://github.com/Rock-Candy-Tea/hexo-circle-of-friends.git

修改代码

自定义爬取规则

由于该爬虫程序没有NexT主题的爬取规则，于是修改了部分代码增加了NexT主题的爬取规则，实际爬取规则需要以自己博客友链页的结构为准

通过get_next_url()方法，定义爬取的变量：头像avatar、友链链接link、名称name
方法名可以带有自定义主题名get_xxx_url()方法，在方法结尾添加全局处理器时要将第五个参数设置为主题名self.handle(avatar, link, name, queue, "xxx")
在__init__()方法中需要添加自定义主题的配置self.strategies = ("xxx")

hexo-circle-of-friends/hexo_circle_of_friends/utils/get_url.py

class GetUrl:

    def __init__(self):
        self.strategies = ("common1", "common2", "butterfly", "fluid", "matery", "nexmoe", "stun", "sakura", "volantis", "Yun", "stellar", "next")

    def get_next_url(self, response, queue):

        avatar = response.css(".link-grid .link-grid-container .link-grid-image::attr(data)").extract()

        link = response.css(".link-grid .link-grid-container a::attr(href)").extract()
        # 判断是否存在302跳转前缀
        prefix = "/302.html?target="
        for i in range(len(link)):
            if link[i].startswith(prefix):
                link[i] = link[i][len(prefix):]

        name = response.css(".link-grid .link-grid-container p::text").extract()
        # 只保留奇数的元素
        new_name_list = []
        for i in range(len(name)):
            if i % 2 == 0:
                new_name_list.append(name[i])
        name = new_name_list

        self.handle(avatar, link, name, queue, "next")

配置自己的友链页作为爬取的起点

自定义爬取规则的话需要设置主题为自定义主题名theme: "xxx"

hexo-circle-of-friends/hexo_circle_of_friends/fc_settings.yaml

1
2
3

LINK: [
     { link: "https://loli.fj.cn/links/", theme: "next" },
]

通过Docker部署运行环境

实际上是部署yyyzyyyz大佬已经配置完成的项目运行环境，然后覆盖爬虫项目为本地修改后的项目

/root/hexo-circle-of-friends：项目在本地的路径

1	docker run -di --name circle -p 8000:8000 -v /tmp/:/tmp/ -v /root/hexo-circle-of-friends:/home/fcircle_src yyyzyyyz/fcircle:latest

在Docker中运行爬虫

1	docker exec circle nohup python3 -u ./hexo_circle_of_friends/run.py > /dev/null 2>&1 &

在Docker中运行Web服务

在Docker中运行Web服务，用于通过HTTP协议访问爬取后的数据

1	docker exec circle nohup python3 -u ./api/main.py > /dev/null 2>&1 &

设置计划任务

设置计划任务，每30分钟爬取一次

1	crontab -e

1	/30 * * * docker exec circle nohup python3 -u ./hexo_circle_of_friends/run.py > ./log1.txt &

前端渲染

通过异步发送请求渲染数据

实际上该项目是包含前端部署方案的，只不过是我觉得和我目前的主题不是很契合，所以自己写了一个极简的

<ip>：服务器的IP和端口

<!doctype html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport"
          content="width=device-width, user-scalable=no, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">
    <title>朋友圈</title>
</head>
<body>

<h1>朋友圈</h1>
<div id="app">
</div>

<script data-pjax="" type="module">
    async function init() {
        let response = await fetch("<ip>/all");
        let result = await response.json();
        
        for (const item of result["article_data"]) {
            let a = document.createElement("a");
            a.href = item["link"];
            a.classList.add("article_item");
            a.innerHTML = `
                    <div class="article_title">${item["title"]}</div>
                    <div class="article_author">${item["author"]}  发表于${item["created"]}  更新于${item["updated"]}</div>
                    <br>
                `;
            document.getElementById("app").append(a);
        }

    }

    await init();

</script>

</body>
</html>

完成

参考文献

Rock-Candy-Tea/hexo-circle-of-friends
友链朋友圈官方文档