Compare commits

..

5 Commits

Author SHA1 Message Date
emilis 4fcb8a9771 Added visibility configuration 2021-09-30 18:53:33 +02:00
emilis 52cb0c5ea2 Added GPT-2 training resource to readme 2021-09-30 16:57:29 +02:00
emilis 9564aeddaa Fixed panic from messages too long for telegram (prob not good) 2021-09-23 21:47:37 +02:00
emilis af00fc5c4b Fixed panic for block_on by using smol, this feels wrong 2021-09-01 23:06:46 +02:00
emilis 0dbbde5835 Improved readme 2021-09-01 19:49:57 +02:00
7 changed files with 144 additions and 25 deletions

41
Cargo.lock generated
View File

@ -58,6 +58,17 @@ dependencies = [
"slab", "slab",
] ]
[[package]]
name = "async-fs"
version = "1.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8b3ca4f8ff117c37c278a2f7415ce9be55560b846b5bc4412aaa5d29c1c3dae2"
dependencies = [
"async-lock",
"blocking",
"futures-lite",
]
[[package]] [[package]]
name = "async-global-executor" name = "async-global-executor"
version = "2.0.2" version = "2.0.2"
@ -111,6 +122,17 @@ dependencies = [
"event-listener", "event-listener",
] ]
[[package]]
name = "async-net"
version = "1.6.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5373304df79b9b4395068fb080369ec7178608827306ce4d081cba51cac551df"
dependencies = [
"async-io",
"blocking",
"futures-lite",
]
[[package]] [[package]]
name = "async-process" name = "async-process"
version = "1.1.0" version = "1.1.0"
@ -1206,6 +1228,7 @@ dependencies = [
"rand 0.8.4", "rand 0.8.4",
"serde", "serde",
"serde_json", "serde_json",
"smol",
"telegram-bot", "telegram-bot",
"thiserror", "thiserror",
"tokio 0.2.25", "tokio 0.2.25",
@ -2457,6 +2480,24 @@ dependencies = [
"maybe-uninit", "maybe-uninit",
] ]
[[package]]
name = "smol"
version = "1.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "85cf3b5351f3e783c1d79ab5fc604eeed8b8ae9abd36b166e8b87a089efd85e4"
dependencies = [
"async-channel",
"async-executor",
"async-fs",
"async-io",
"async-lock",
"async-net",
"async-process",
"blocking",
"futures-lite",
"once_cell",
]
[[package]] [[package]]
name = "socket2" name = "socket2"
version = "0.3.19" version = "0.3.19"

View File

@ -19,3 +19,4 @@ mammut = "0.13.0"
thiserror = "1.0.26" thiserror = "1.0.26"
misskey = "0.2.0" misskey = "0.2.0"
url = "2.2.2" url = "2.2.2"
smol = "1.2.5"

View File

@ -1,18 +1,82 @@
# izzilis gpt-2 bot # izzilis gpt-2 bot
Meant to be used with a finetuned GPT-2 model Meant to be used with a [finetuned GPT-2 model](https://medium.com/ai-innovation/beginners-guide-to-retrain-gpt-2-117m-to-generate-custom-text-content-8bb5363d8b7f)
## Usage ## Usage
just run it, it'll make a `bot_config.json` file in the running path To run the bot, you need a valid `bot_config.json` file at the path where you're running the bot.
If you do not have one, izzilis will generate a default one for you to fill out.
fill it out This bot currently *requires* a Telegram bot token to work, as there is no option to disable curation.
To create a bot, please use the [Botfather](https://t.me/botfather). Once created, and running, set a channel for the bot to post curation options in via the `/setmain` bot command. Usually this requires sending `/setmain@bot_username` to the chat with the bot (can be group chats).
run again This default uses the `Misskey` publisher. If you want to publish to Mastodon, Pleroma, or any other mastodon-compatible API, please replace `Misskey` in the `publisher` object with `Mastodon`.
## Docker Usage ## Config values
The dockerfile makes a few assumptions which you must fulfill: | Name | Value |
|--------------------|------------------------------------------------------------------------------------------------|
| `python_path` | The path to the system's python3 interpreter |
| `model_name` | The name of the GPT-2 model to use (see gpt-2 docs) |
| `temperature` | The `temperature` value to call gpt-2 with (see gpt-2 docs) |
| `top_k` | The `top_k` value to call gpt-2 with (see gpt-2 docs) |
| `gpt_code_path` | The path to where the gpt-2 source & models are located |
| `interval_seconds` | See [interval_seconds](#interval_seconds) |
| `bot_token` | Telegram Bot API token |
| `chat_ref` | The chat reference ID for the telegram bot, leave at 0, will be filled once `/setmain` is sent |
| `post_buffer` | How many curated samples the bot will hold at maximum at a time |
| `publisher` | See [publisher](#publisher) |
* the gpt python files & model are in the gpt/ directory (this directory doesn't exist, please make it (it must contain both `generate_unconditional_samples.py` and the `models` directory containing your finetuned model))
* you have ran the bot already. Once to generate a `bot_config.json` and a second time, once the aforementioned file is generated and filled out by the user, `fediverse.toml` from authenticating with the instance. ### interval_seconds
| Name | Value |
|-------|-------------------------------------------------|
| `min` | Minimum amount of seconds to wait between posts |
| `max` | Maximum amount of seconds to wait between posts |
### publisher
The publisher can currently hold one of two JSON objects, named either `Mastodon` or `Misskey`, which determines which posting API it will use. Whether the object is `Misskey` or `Mastodon`, it has the following members:
| Name | Value |
|--------------|------------------------------------------------------------------------------------------------------------|
| `base_url` | The base URL of the instance |
| `token` | The auth token for the account, leave empty for `Mastodon` as you will be prompted to log in and authorize |
| `visibility` | The visibility scope of the statuses to be posted. These differ between publishers, see the list below |
**Misskey visibility values**
* `Public` (Global)
* `Home`
* `Followers`
* `Specified` (DMs)
**Mastodon-comptaibles visibility values**
* `public` (Global)
* `unlisted` (Home)
* `private` (Followers only)
* `direct` (DMs)
An example `Misskey` publisher entry looks like this:
```json
"publisher": {
"Misskey": {
"base_url": "",
"token": "",
"visibility": "Public"
}
}
```
And for mastodon/pleroma:
```json
"publisher": {
"Mastodon": {
// Token will be set by izzilis on authentication (requires interactive)
"base_url": "",
"visibility": "public"
}
}
```

View File

@ -25,14 +25,15 @@ pub struct MinMax {
#[derive(Serialize, Deserialize, Debug, Clone)] #[derive(Serialize, Deserialize, Debug, Clone)]
pub enum Publisher { pub enum Publisher {
Misskey(FediverseConfig<String>), Misskey(FediverseConfig<String, misskey::model::note::Visibility>),
Mastodon(FediverseConfig<Option<mammut::Data>>), Mastodon(FediverseConfig<Option<mammut::Data>, mammut::status_builder::Visibility>),
} }
#[derive(Serialize, Deserialize, Debug, Clone)] #[derive(Serialize, Deserialize, Debug, Clone)]
pub struct FediverseConfig<T> { pub struct FediverseConfig<T, V> {
pub base_url: String, pub base_url: String,
pub token: T, pub token: T,
pub visibility: V,
} }
impl Default for Config { impl Default for Config {
@ -53,6 +54,7 @@ impl Default for Config {
publisher: Publisher::Misskey(FediverseConfig { publisher: Publisher::Misskey(FediverseConfig {
base_url: "".to_string(), base_url: "".to_string(),
token: "".to_string(), token: "".to_string(),
visibility: misskey::model::note::Visibility::Public,
}), }),
} }
} }

View File

@ -98,8 +98,8 @@ async fn main() -> Result<(), Box<dyn Error>> {
) )
.into_stream() .into_stream()
.try_filter(|message| { .try_filter(|message| {
let not_empty = !message.is_empty(); let criteria = !message.is_empty() && message.chars().count() <= 4096;
async move { not_empty } async move { criteria }
}), }),
) )
.map_err(|e| Box::new(e) as Box<dyn Error>); .map_err(|e| Box::new(e) as Box<dyn Error>);
@ -146,7 +146,7 @@ async fn resolve_publisher(
) -> Result<Either<MisskeyPublisher, MastodonPublisher>, Box<dyn Error>> { ) -> Result<Either<MisskeyPublisher, MastodonPublisher>, Box<dyn Error>> {
let publisher = match &config.publisher { let publisher = match &config.publisher {
config::Publisher::Misskey(cfg) => { config::Publisher::Misskey(cfg) => {
Either::Left(MisskeyPublisher::new(&cfg.base_url, cfg.token.clone())?) Either::Left(MisskeyPublisher::new(&cfg.base_url, cfg.token.clone(), cfg.visibility)?)
} }
config::Publisher::Mastodon(cfg) => { config::Publisher::Mastodon(cfg) => {
let app = AppBuilder { let app = AppBuilder {
@ -158,6 +158,7 @@ async fn resolve_publisher(
let mut registration = Registration::new(cfg.base_url.clone()); let mut registration = Registration::new(cfg.base_url.clone());
registration.register(app)?; registration.register(app)?;
let vis = cfg.visibility;
let mastodon = if let Some(data) = cfg.token.clone() { let mastodon = if let Some(data) = cfg.token.clone() {
Mastodon::from_data(data.clone()) Mastodon::from_data(data.clone())
@ -173,13 +174,14 @@ async fn resolve_publisher(
config.publisher = Publisher::Mastodon(FediverseConfig { config.publisher = Publisher::Mastodon(FediverseConfig {
base_url: cfg.base_url.clone(), base_url: cfg.base_url.clone(),
token: Some(fedi.data.clone()), token: Some(fedi.data.clone()),
visibility: vis.clone(),
}); });
config.save(CONFIG_PATH)?; config.save(CONFIG_PATH)?;
fedi fedi
}; };
Either::Right(MastodonPublisher::new(mastodon)) Either::Right(MastodonPublisher::new(mastodon, vis))
} }
}; };
Ok(publisher) Ok(publisher)

View File

@ -1,15 +1,23 @@
use std::{error::Error, pin::Pin, task::{Context, Poll}}; use std::{
error::Error,
pin::Pin,
task::{Context, Poll},
};
use futures::Sink; use futures::Sink;
use mammut::{status_builder::Visibility, Mastodon, StatusBuilder}; use mammut::{status_builder::Visibility, Mastodon, StatusBuilder};
pub struct MastodonPublisher { pub struct MastodonPublisher {
mastodon: Mastodon, mastodon: Mastodon,
post_visibility: Visibility,
} }
impl MastodonPublisher { impl MastodonPublisher {
pub fn new(mastodon: Mastodon) -> Self { pub fn new(mastodon: Mastodon, vis: Visibility) -> Self {
Self { mastodon } Self {
mastodon: mastodon,
post_visibility: vis,
}
} }
} }
@ -22,7 +30,7 @@ impl Sink<String> for MastodonPublisher {
fn start_send(self: Pin<&mut Self>, item: String) -> Result<(), Self::Error> { fn start_send(self: Pin<&mut Self>, item: String) -> Result<(), Self::Error> {
let mut post = StatusBuilder::new(item); let mut post = StatusBuilder::new(item);
post.visibility = Some(Visibility::Public); post.visibility = Some(self.post_visibility);
self.mastodon.new_status(post)?; self.mastodon.new_status(post)?;
Ok(()) Ok(())
} }

View File

@ -1,17 +1,18 @@
use futures::Sink; use futures::Sink;
use misskey::{ClientExt, HttpClient}; use misskey::{Client, ClientExt, HttpClient, model::note::Visibility};
use std::{error::Error, task::Poll}; use std::{error::Error, task::Poll};
use tokio::runtime::Runtime;
use url::Url; use url::Url;
pub struct MisskeyPublisher { pub struct MisskeyPublisher {
client: HttpClient, client: HttpClient,
post_visibility: Visibility,
} }
impl MisskeyPublisher { impl MisskeyPublisher {
pub fn new(url: &String, token: String) -> Result<Self, Box<dyn Error>> { pub fn new(url: &String, token: String, vis: Visibility) -> Result<Self, Box<dyn Error>> {
Ok(Self { Ok(Self {
client: HttpClient::with_token(Url::parse(url)?, token)?, client: HttpClient::with_token(Url::parse(url)?, token)?,
post_visibility: vis,
}) })
} }
} }
@ -27,9 +28,9 @@ impl Sink<String> for MisskeyPublisher {
} }
fn start_send(self: std::pin::Pin<&mut Self>, item: String) -> Result<(), Self::Error> { fn start_send(self: std::pin::Pin<&mut Self>, item: String) -> Result<(), Self::Error> {
let mut runtime = Runtime::new()?; let mut req = self.client.build_note();
let fut = self.client.create_note(item); let req = req.text(item).visibility(self.post_visibility).as_request();
runtime.block_on(fut)?; smol::block_on(self.client.request(req))?;
Ok(()) Ok(())
} }