
MediaCrawler: Self-media crawler
Free

MediaCrawler is a self-media crawler designed to extract data from various social media platforms. It offers functionalities like searching for posts and comments based on keywords or specific IDs. The tool supports multiple platforms and database options (SQLite and MySQL). It requires Python 3.11 and Node.js (for some platforms). Users can configure settings like comment extraction and database choices. The documentation includes setup instructions, project architecture details, and troubleshooting guides. It also provides options for donation and developer support. The project uses uv for dependency management and Playwright for browser automation.
Crawls data from various social media platforms.
Allows searching for posts and comments using keywords or specific IDs.
Supports SQLite and MySQL for data storage.
Uses `uv` for consistent Python dependency management.
Employs Playwright for browser interaction.
Offers customizable options in `config/base_config.py`.
Install Python 3.11 and Node.js (version >= 16.0.0).,Use uv sync to manage Python dependencies.,Install Playwright browser drivers: playwright install.,Configure settings in config/base_config.py (e.g., enable comment extraction).,Run the crawler using commands like python main.py --platform --lt qrcode --type search.
Track and analyze content related to specific keywords or topics.
Collect posts and comments from multiple platforms for research or analysis.
Gather data for sentiment analysis, trend identification, and other analytical purposes.
For collecting and analyzing social media data for research projects.
To gather and process social media data for business intelligence.
To use and customize the crawler for specific data extraction needs.
MediaCrawler is an open-source project and is available for free.