Here is a dataset adjacent and similar to [BOORU CHARS](https://nyaa.si/view/1740396) project but **devoted to erotic posing and scenes**.
It covers **rating:questionable** images kicked off from BOORU CHARS and some "least pornografic" subset
of **rating:explicit** with filtered active genitalia usage, extensive bodily fluids presence and penis exposure.
Posting dates range from very raise of imageboards era until 10.2024.
There is a substantial share of furry for objects diversity.
515850 images processed and arranged mostly like in BOORU CHARS :
- images initially filtered Mpixels>=0.48, shorter_side>=600 px, volume>=60000 bytes, no animations
stripes dropped or cropped to aspect ratio 0.5..2
- PNG converted to JPG with downsampling to longer side mostly in range 1024..1920 px
- verbose file naming used **"%website% - %id% - %up_to_3_copyrights% ~ %up_to_5_characters% (%up_to_2_artists%).jpg"**
files uniquely identified by "%website%+%id%"
- some general image statistics got with EXIFTOOL and [IMAGE MAGICK](https://imagemagick.org)
- deep content analisys include
- [CRAFT text detector](https://github.com/fcakyon/craft-text-detector) to estimate total size and number of text pieces
- [Ultralitics YOLO](https://github.com/ultralytics/ultralytics) based torso components detector with custom PyTorch model
- all real-life photos and no-character scenes, most of comic and N-koma, overtexted images and line-arts filtered out
- a lot of semi-automated analysis completed to eliminate evident hentai, when tags and detections thoroughly used
- images deduplicatied using AntiDupl up to 0.5% similarity along with BOORU CHARS, series limited to most representative entries
Sophisticated clustering implemented :
- by aspect ratio { 7x10 +/-4% ; 3x4 +/-10% ; 1x1 +/-20% ; 3x2 +/-40% ; 2x3 +/-40% } it's crusial for scene composition
- by source imageboard rating { X = Explicit , Q = questionable + others + unknown }
- by detected head-count { 0 heads = letter A , 2 = B , 3-5 heads = C , 6+ heads = D , 1 head = letter E }
- for single-head scenes (E) by character pose { F - frontal, P - profile / from behind, S - legs splitted, U - others / undefined }
Sorting and volume splitting inside cluster used "attractiveness score function" == "colorful and textless".
A volume typically contains 1000-2000 files. Bottom score images meticulously reviewed.
Folder/archive name identify cluster and volume number inside cluster starting from 00.
Supporting metadata include :
- original and calculated attributes for each image
- 15349213 imageboard tags
- 1843396 results of torso components detection
- actual versions of PyTorch models for object detection
- additional README
Keep in mind this release is first of all
**a dataset of character-centric art in effective local format suited for batch processing**
and then
**a representative catalog of anime/gameCG/cartoon/furry copyrights, characters and artists for visual estimation**
but **not
offer high image resolution and pretending on completeness**.
Some statistics about release :
- main sources are : **gelbooru** 146831, **e621**(furry) 118196, **yande-re** 89442 and **danbooru** 81542
another ~80k pics got from allthefallen(loli) + tbib(e621) + sankaku + anime-pictures + konachan + (a little bit) zerochan and safebooru
- 654573 heads detected (from which 214362 are non-humans) and also 451558 topless and 230703 bottomless bodies
- image count with 'loli' tag 70287 exceed sum of all [NYAA SFW releases](https://nyaa.si/view/1917404)
```
@REM -- file mask can be used like query for extract e.g. for specific artist
xcopy /s "A:\BOORU_ECCHI\*(*nanashi*nlo*)*" C:\TEMP\
-- sophisticated processing possible using database (copy-paste result to just_do_it.BAT)
select 'xcopy "'||be.fpath||'\'||be.fname||'" C:\TEMP\' xcpy
from be
join be_tags d on d.booru=be.booru and d.fid=be.fid
where d.tag='tribadism' -- some sort of yuri play, mostly excluded (only 62 images left)
-- fox, rabbit, bear and bird heads together - it may be Star Fox or FNAF
select 'xcopy "'||fpath||'\'||fname||'" C:\TEMP\' xcpy
from (
select fpath, fname, listagg(distinct obj,',') objs
from be
join be_yolo d on d.booru=be.booru and d.fid=be.fid and obj in (18,19,21,25)
group by fpath, fname
) where objs='18,19,21,25' -- internal class IDs
```
Sample contact list for volume 1x1.Q\1x1-EP00 == Questionable , Single-head , profile / from_behind POV , "best quality" volume

Torso components detection changes while undressing

Comments - 0