Compare commits

...

5 Commits

Author SHA1 Message Date
emilis 4fcb8a9771 Added visibility configuration 2021-09-30 18:53:33 +02:00
emilis 52cb0c5ea2 Added GPT-2 training resource to readme 2021-09-30 16:57:29 +02:00
emilis 9564aeddaa Fixed panic from messages too long for telegram (prob not good) 2021-09-23 21:47:37 +02:00
emilis af00fc5c4b Fixed panic for block_on by using smol, this feels wrong 2021-09-01 23:06:46 +02:00
emilis 0dbbde5835 Improved readme 2021-09-01 19:49:57 +02:00
7 changed files with 144 additions and 25 deletions

41
Cargo.lock generated
View File

@ -58,6 +58,17 @@ dependencies = [
"slab",
]
[[package]]
name = "async-fs"
version = "1.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8b3ca4f8ff117c37c278a2f7415ce9be55560b846b5bc4412aaa5d29c1c3dae2"
dependencies = [
"async-lock",
"blocking",
"futures-lite",
]
[[package]]
name = "async-global-executor"
version = "2.0.2"
@ -111,6 +122,17 @@ dependencies = [
"event-listener",
]
[[package]]
name = "async-net"
version = "1.6.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5373304df79b9b4395068fb080369ec7178608827306ce4d081cba51cac551df"
dependencies = [
"async-io",
"blocking",
"futures-lite",
]
[[package]]
name = "async-process"
version = "1.1.0"
@ -1206,6 +1228,7 @@ dependencies = [
"rand 0.8.4",
"serde",
"serde_json",
"smol",
"telegram-bot",
"thiserror",
"tokio 0.2.25",
@ -2457,6 +2480,24 @@ dependencies = [
"maybe-uninit",
]
[[package]]
name = "smol"
version = "1.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "85cf3b5351f3e783c1d79ab5fc604eeed8b8ae9abd36b166e8b87a089efd85e4"
dependencies = [
"async-channel",
"async-executor",
"async-fs",
"async-io",
"async-lock",
"async-net",
"async-process",
"blocking",
"futures-lite",
"once_cell",
]
[[package]]
name = "socket2"
version = "0.3.19"

View File

@ -19,3 +19,4 @@ mammut = "0.13.0"
thiserror = "1.0.26"
misskey = "0.2.0"
url = "2.2.2"
smol = "1.2.5"

View File

@ -1,18 +1,82 @@
# izzilis gpt-2 bot
Meant to be used with a finetuned GPT-2 model
Meant to be used with a [finetuned GPT-2 model](https://medium.com/ai-innovation/beginners-guide-to-retrain-gpt-2-117m-to-generate-custom-text-content-8bb5363d8b7f)
## Usage
just run it, it'll make a `bot_config.json` file in the running path
To run the bot, you need a valid `bot_config.json` file at the path where you're running the bot.
If you do not have one, izzilis will generate a default one for you to fill out.
fill it out
This bot currently *requires* a Telegram bot token to work, as there is no option to disable curation.
To create a bot, please use the [Botfather](https://t.me/botfather). Once created, and running, set a channel for the bot to post curation options in via the `/setmain` bot command. Usually this requires sending `/setmain@bot_username` to the chat with the bot (can be group chats).
run again
This default uses the `Misskey` publisher. If you want to publish to Mastodon, Pleroma, or any other mastodon-compatible API, please replace `Misskey` in the `publisher` object with `Mastodon`.
## Docker Usage
## Config values
The dockerfile makes a few assumptions which you must fulfill:
| Name | Value |
|--------------------|------------------------------------------------------------------------------------------------|
| `python_path` | The path to the system's python3 interpreter |
| `model_name` | The name of the GPT-2 model to use (see gpt-2 docs) |
| `temperature` | The `temperature` value to call gpt-2 with (see gpt-2 docs) |
| `top_k` | The `top_k` value to call gpt-2 with (see gpt-2 docs) |
| `gpt_code_path` | The path to where the gpt-2 source & models are located |
| `interval_seconds` | See [interval_seconds](#interval_seconds) |
| `bot_token` | Telegram Bot API token |
| `chat_ref` | The chat reference ID for the telegram bot, leave at 0, will be filled once `/setmain` is sent |
| `post_buffer` | How many curated samples the bot will hold at maximum at a time |
| `publisher` | See [publisher](#publisher) |
* the gpt python files & model are in the gpt/ directory (this directory doesn't exist, please make it (it must contain both `generate_unconditional_samples.py` and the `models` directory containing your finetuned model))
* you have ran the bot already. Once to generate a `bot_config.json` and a second time, once the aforementioned file is generated and filled out by the user, `fediverse.toml` from authenticating with the instance.
### interval_seconds
| Name | Value |
|-------|-------------------------------------------------|
| `min` | Minimum amount of seconds to wait between posts |
| `max` | Maximum amount of seconds to wait between posts |
### publisher
The publisher can currently hold one of two JSON objects, named either `Mastodon` or `Misskey`, which determines which posting API it will use. Whether the object is `Misskey` or `Mastodon`, it has the following members:
| Name | Value |
|--------------|------------------------------------------------------------------------------------------------------------|
| `base_url` | The base URL of the instance |
| `token` | The auth token for the account, leave empty for `Mastodon` as you will be prompted to log in and authorize |
| `visibility` | The visibility scope of the statuses to be posted. These differ between publishers, see the list below |
**Misskey visibility values**
* `Public` (Global)
* `Home`
* `Followers`
* `Specified` (DMs)
**Mastodon-comptaibles visibility values**
* `public` (Global)
* `unlisted` (Home)
* `private` (Followers only)
* `direct` (DMs)
An example `Misskey` publisher entry looks like this:
```json
"publisher": {
"Misskey": {
"base_url": "",
"token": "",
"visibility": "Public"
}
}
```
And for mastodon/pleroma:
```json
"publisher": {
"Mastodon": {
// Token will be set by izzilis on authentication (requires interactive)
"base_url": "",
"visibility": "public"
}
}
```

View File

@ -25,14 +25,15 @@ pub struct MinMax {
#[derive(Serialize, Deserialize, Debug, Clone)]
pub enum Publisher {
Misskey(FediverseConfig<String>),
Mastodon(FediverseConfig<Option<mammut::Data>>),
Misskey(FediverseConfig<String, misskey::model::note::Visibility>),
Mastodon(FediverseConfig<Option<mammut::Data>, mammut::status_builder::Visibility>),
}
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct FediverseConfig<T> {
pub struct FediverseConfig<T, V> {
pub base_url: String,
pub token: T,
pub visibility: V,
}
impl Default for Config {
@ -53,6 +54,7 @@ impl Default for Config {
publisher: Publisher::Misskey(FediverseConfig {
base_url: "".to_string(),
token: "".to_string(),
visibility: misskey::model::note::Visibility::Public,
}),
}
}

View File

@ -98,8 +98,8 @@ async fn main() -> Result<(), Box<dyn Error>> {
)
.into_stream()
.try_filter(|message| {
let not_empty = !message.is_empty();
async move { not_empty }
let criteria = !message.is_empty() && message.chars().count() <= 4096;
async move { criteria }
}),
)
.map_err(|e| Box::new(e) as Box<dyn Error>);
@ -146,7 +146,7 @@ async fn resolve_publisher(
) -> Result<Either<MisskeyPublisher, MastodonPublisher>, Box<dyn Error>> {
let publisher = match &config.publisher {
config::Publisher::Misskey(cfg) => {
Either::Left(MisskeyPublisher::new(&cfg.base_url, cfg.token.clone())?)
Either::Left(MisskeyPublisher::new(&cfg.base_url, cfg.token.clone(), cfg.visibility)?)
}
config::Publisher::Mastodon(cfg) => {
let app = AppBuilder {
@ -158,6 +158,7 @@ async fn resolve_publisher(
let mut registration = Registration::new(cfg.base_url.clone());
registration.register(app)?;
let vis = cfg.visibility;
let mastodon = if let Some(data) = cfg.token.clone() {
Mastodon::from_data(data.clone())
@ -173,13 +174,14 @@ async fn resolve_publisher(
config.publisher = Publisher::Mastodon(FediverseConfig {
base_url: cfg.base_url.clone(),
token: Some(fedi.data.clone()),
visibility: vis.clone(),
});
config.save(CONFIG_PATH)?;
fedi
};
Either::Right(MastodonPublisher::new(mastodon))
Either::Right(MastodonPublisher::new(mastodon, vis))
}
};
Ok(publisher)

View File

@ -1,15 +1,23 @@
use std::{error::Error, pin::Pin, task::{Context, Poll}};
use std::{
error::Error,
pin::Pin,
task::{Context, Poll},
};
use futures::Sink;
use mammut::{status_builder::Visibility, Mastodon, StatusBuilder};
pub struct MastodonPublisher {
mastodon: Mastodon,
post_visibility: Visibility,
}
impl MastodonPublisher {
pub fn new(mastodon: Mastodon) -> Self {
Self { mastodon }
pub fn new(mastodon: Mastodon, vis: Visibility) -> Self {
Self {
mastodon: mastodon,
post_visibility: vis,
}
}
}
@ -22,7 +30,7 @@ impl Sink<String> for MastodonPublisher {
fn start_send(self: Pin<&mut Self>, item: String) -> Result<(), Self::Error> {
let mut post = StatusBuilder::new(item);
post.visibility = Some(Visibility::Public);
post.visibility = Some(self.post_visibility);
self.mastodon.new_status(post)?;
Ok(())
}

View File

@ -1,17 +1,18 @@
use futures::Sink;
use misskey::{ClientExt, HttpClient};
use misskey::{Client, ClientExt, HttpClient, model::note::Visibility};
use std::{error::Error, task::Poll};
use tokio::runtime::Runtime;
use url::Url;
pub struct MisskeyPublisher {
client: HttpClient,
post_visibility: Visibility,
}
impl MisskeyPublisher {
pub fn new(url: &String, token: String) -> Result<Self, Box<dyn Error>> {
pub fn new(url: &String, token: String, vis: Visibility) -> Result<Self, Box<dyn Error>> {
Ok(Self {
client: HttpClient::with_token(Url::parse(url)?, token)?,
post_visibility: vis,
})
}
}
@ -27,9 +28,9 @@ impl Sink<String> for MisskeyPublisher {
}
fn start_send(self: std::pin::Pin<&mut Self>, item: String) -> Result<(), Self::Error> {
let mut runtime = Runtime::new()?;
let fut = self.client.create_note(item);
runtime.block_on(fut)?;
let mut req = self.client.build_note();
let req = req.text(item).visibility(self.post_visibility).as_request();
smol::block_on(self.client.request(req))?;
Ok(())
}