i just got sent a long and heartfelt email from someone about how OCRbot helps them and how they couldn't use fedi without it

they have vision issues and can't really caption the images themselves, and rely on OCRbot to do it for them

this is the first time i've ever seen someone with a physical disability talk about OCRbot and um, it feels really special and also quite strange that so much responsibility has been placed on it and myself

i think i might go and clean up OCRbot's code a little now

are there any features you'd like to see implemented in OCRbot? not things like "be able to tell if it's a twitter post" or better text reading, because i can't really do that

more like, usability improvements

OCRbot's code is really messy and i can't think of any way to fix it besides the nuclear option of a total rewrite

the problem with writing code for fedi is that because you have to interact with so many instances running so many different types of servers, you often can't just take the easy route and ask the API for everything

for example, if doesn't have the post that OCRbot is being called on, it has to manually scrape the instance's JSON to find it, because the alternative would be getting API credentials for every instance in existence which is, a bit silly

it looks like the way i did it with OCRbot relies on public API accessibility being enabled on the server, which it is by default, but with all the fedi archiving drama going on, lots of people are switching that off, which is good for privacy but bad for bots like this one

there's no way around this besides manually requesting API keys from every instance in existence

and yes i know the code is gross but if there was a better way to do this than regex hacks i would be using it

also i don't wanna make unfounded claims or anything but

OCRbot uses tesseract to caption images, and a few months after it started getting popular, mastodon added a built-in feature that uses tesseract to caption images :bun:

@lynnesbian Not sure there's anything, it's as a
simple as tagging it, not really much of an issue usability wise

@luna the alternate language support is kinda confusing

you have to do e.g. @​OCRbot jp

@lynnesbian I didn't even know about that, that's cool!

You could try to add simple heuristics to guess what language the text is in

Might be tricky though :s

@luna if OCRbot seems something written in hiragana and you don't specify japanese it'll try to read them as english characters and output gibberish

i guess i could have a thing where if the output has no english words it re-scans it as other languages? but that seems messy

@lynnesbian Yeah that's what I was thinking of but it'd be pretty tough

I'm surprised OCR is language dependant at all, at least for most alphabets. Or does it perform translation too

@luna if you translate something that uses the same alphabet as english, like uhh latin, it will turn out fine

but if it's cyrillic or kanji or whatever it will try to understand the characters as latin ones and that doesn't work so well

@lynnesbian hmmm

Can you get like a "confidence" measure from the OCR

And if it's only like, 25% sure or whatever, try a different language?

Probably still not a very reliable measure, but it would probably work for things like japanese

@lynnesbian I can't really think of anything better than that, sorry :<

@lynnesbian I actually hadn't heard of this before you mentioned it. This seems like an interesting project.

@lynnesbian I feel like, as a programmer, I've been trying so hard to get to a place where I can work on and substantively improve "messy code" while resisting the urge to do intrusive refactors or expensive rewrites. it's hard :(

@lynnesbian oh good, i'm not the only one to sometimes use 2 column indentation

@monorail i always use it because it's easy to tell when something's indented and takes up the least space

@lynnesbian i used to use it more when i was golfing a lot but it's still nice sometimes

@lynnesbian I think disabling it also breaks ebooks

@u it does, i'm "working on a solution" for it with fedibooks but i've been really burned out with that project so it won't be arriving anytime soon

@lynnesbian it's cool, I just wanted you to know in case you weren't aware

@u the solution involves becoming an activitypub actor and using a bunch of stuff to authenticate yourself, it's messy but doable

@lynnesbian Is the URL always well-formed? You could try splitting on slashes then, maybe

@luna i could yes, but you never know when someone's gonna give you /1234/ instead of /1234 and it already works this way

@lynnesbian Would a trailing slash matter? That'll just give you an empty element at the end of your list, shouldn't affect the position of everything else

@lynnesbian It makes sense to do it that way, even if it's not the "optimal" solution

@lynnesbian Honestly I'm more shocked the mastodon library you're using doesn't provide a better way to get status IDs and such

seems strange

@lynnesbian why put the burden on yourself? if a server wants to be able to use your service, they should be coming to you.

@Sapphicgiraffic it's kinda difficult with OCRbot

if you're user A on instance A and you want an image on instance B captioned, it's beyond your control to ask instance B to provide an API key to OCRbot

@lynnesbian i mean, average user A can’t force typical instance B to immediately provide an API key under ordinary circumstances, but user A can put social pressure on the admin of instance B over time, through public discussion, informal request, etc., to provide the key so that future images are easier for OCRbot to access.

i feel like adoption of OCRbot is a social problem, not a technical one, which necessitates a social solution.

@lynnesbian hail lynne, for she has done what nobody else could: get gorgamel to add a feature to myasstodon

@lynnesbian I've thought about this every time I talk about or use that feature

@lynnesbian ah, I think I just showed them your post about OCRBot before I learned about their condition and I felt bad since that

Sign in to participate in the conversation
Lynnestodon's anti-chud pro-skub instance for funtimes