masto.ai is one of the many independent Mastodon servers you can use to participate in the fediverse.
A general Mastodon server for all languages.

Administered by:

Server stats:

2K
active users

Recently I have been playing with various GUI's for the Whisper transcription software. Buzz has definitely won the showdown. Almost completely keyboard accessible, give or take the toolbar which needs exploring through object navigation of NVDA or an equivallent in your screen reader of choice; handles the downloading of models, FFMPEG conversion and everything that otherwise would have required operation in the command line, works with Whisper.CPP as far as I can tell and can be localized to other languages.
Now I can finally listen to podcasts in all the languages I can't speak. I love it when technology enhances my access to knowledge and helps me do my work even better for those who benefit from it.
github.com/chidiwilliams/buzz
#Accessibility #Audio #Languages #OpenSource

GitHubGitHub - chidiwilliams/buzz: Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper. - chidiwilliams/buzz

@Piciok Hi Pawel: I tried Buzz under Windows, but I find it not really accessible with NVDA. I can import a File, but I cannot export it and don't find the transcribed text. Could you give me a hint? Best greetings from Marburg and thanks in advance.

@Radiojens @Piciok Apparently this thing is made with Python and Qt6, therefore technical chances are good to push it's accessibility way forward from the current state.

@radiorobbe @Radiojens Yes, I wanted to suggest object nav of NVDA as well. I usually navigate to the toolbar which is one object above and to the left of the table with the loaded file, and find the "Open Transcript" button there. I also hope that either the software will receive the needed improvements or that somebody writes an NVDA addon around it. Apart from the toolbar, the edit box with the transcript is the inaccessible part but then I just export the result to a txt file and work with a regular text editor from there.

@Piciok @radiorobbe Ah, and do you use Whisper or something else? In the logfile it says "error loading Whisper.dll" although it is there in the program folder. And which Model do you use with imported files? I trief large for best results.

@Radiojens @radiorobbe I use the regular Whisper, I think it's the Whisper.CPP implementation, actually, with the large model. Here are the steps:
1. I import the file using ctrl+o
2. I setup the options for the transcription job as I like them: the mechanism is Whisper, the model is large, the language is set to automatic detection, all the rest left at defaults;
3. I click Run and wait. I will eventually be moved to the table where the progresss on the task is reported.
4. I wait for it to finish i.e. to say "Completed" in the second column.
5. I navigate to the toolbar. I use the laptop layout of NVDA so I'll try to explain it using that keymap:
A. I call the navigator focus to my system focus by pressing NVDA+Backspace;
B. I navigate out of the table object - NVDA+Shift+Up arrow;
C. I navigate then two objects to the left - NVDA+shift+left arrow twice, so that I find the toolbar;
D. I expand that object with NVDA+shift+down;
E. I navigate to the right using NVDA+Shift+right arrow until I find the "Open Transcript" button;
F. I call the focus to my navigator object - that+'s NVDA+Shift+M
G. I activate the button by pressing NVDA+Enter;
6. A new window opens where the text of the transcript is presented in this inaccessible edit field that you can't handle with a keyboard. The "Export" button is found by pressing Tab. You can pick the format you need from the context menu that pops up and save it anywhere you choose.

I hope this helped. If not, and you find it a good idea, we could try to communicate somewhere else and coordinate a remote session so that I could try and see what the problem might be on your end.

Jens Bertrams

@Piciok @radiorobbe Okay, thank you, it worked. Not always, but mostly. - Thanks.