358 Underwater Human-Robot Interaction #ICRA2022 with Michael Fulton

University of Minnesota Twin Cities

by Abate De Mey

22 July 2022

Michael Fulton

Michael Fulton is a Ph.D. Candidate at the University of Minnesota Twin Cities. His research focuses primarily on underwater robotics with a focus on applications where robots work with humans. Specifically, human-robot interaction and robot perception using computer vision and deep learning, with the intent of creating systems that can work collaboratively with humans in challenging environments.

transcript

Abate: [00:00:00] So tell me a little bit about your presentation earlier today.

Michael Fulton: Yeah, so I was presenting today, my collaborative work with Jungseok Hong and my advisor Junaed Sattar on diver approach. So basically the problem of when you have an AUV and a diver working together underwater it’s important that they be close together when they wanna communicate, whether it’s for, you know, doing gestures to the AUV, to tell it, you know, Go do this task, go look at this area.

Or if it’s the AUV talking to the diver, maybe they’re telling it, Hey, I found this cool thing over here. You should come check it out in either of those situations, you need to be close together, right? Mm-hmm however, for AUVs to be useful underwater, they need to leave the diver. They need to go do searching and, you know, carrying item or, or tools and materials and stuff like that.

Uh, so this is the problem that we have, right. We need to be close to talk, but we need to be far away to, to do stuff. So to fix this, we need a capability for diver approach. We need to be able to search for the diver, find them and approach them to an appropriate distance and orientation for communication.

So our algorithm is called ADROC autonomous diver relative operator configuration. And it’s this monocular vision based method of doing this where we we do this diver approach based on only monocular vision. Yeah. Because we wanted to keep it as cheap as possible, you know? No, no sonar, no stereovision and, and as minimal sensing as we could, we could manage this with and basically the way the algorithm works is instead of trying to do monocular depth estimation, which is, you know, you can get decent accuracy on it, but you sometimes need high computational power.

Mm-hmm . Instead of doing that, we realized, okay, what we actually need to know is, is the distance that the divers is currently at "Good enough"? Is it close enough for, for us to work with the communication part of things.

Abate: So you need a rough estimate?

Michael Fulton: Yeah. You need, you need a very rough general estimate. I don’t care if the, if the robot’s, you know, one meter away or 1.1, you know, 0.9 0.7.

It doesn’t really matter to me as long as it’s close enough. Yeah. Rough enough. So the way that we did this is by using shoulder width as a prior piece of information, because we know from biomedical literature that there’s a range that human shoulder widths come in. We know the average of that range.

We know, you know, where most people’s shoulder widths are pretty close to. From that we can calculate the expected pixel width between shoulders for a close enough rough estimate, distance for communication. Yeah. And then we just compare: is the diver shoulder width smaller than that? Okay. We need to come closer.

Is it, is it larger than that? Okay. We need to back up. And the way we do the the actual calculation of the shoulder width is a two-step process. We either use a diver detector, which takes an image of a, of the scene and finds. Diver draws a bounding box around them. We can use the width of that as kind of a proxy for shoulder width.

Mm-hmm but it’s not super accurate, right? The diver could be kind of on their side. Yeah. Uh, there’s lots of things that can change the bounding box width without changing shoulder width. So that gets us a very, very rough estimate. And if we just approached based on that, the, the AUV would be way off on distance because the bounding box changes a lot.

What doesn’t change a lot is the actual shoulder width that remains. So we also use the diver pose estimation algorithm to get key points on the shoulders and calculate the distance between them. Yeah. And so it’s this cascaded approach where basically what ends up happening is from far away, the detector works.

We’ve actually run this as far as 15 meters away. Um, and that lets you center the diver in the image and start getting closer to them. And then as you get closer within the range of, I would say probably about six to seven meters is the effective range. Uh, you can actually start detecting the key points for the shoulders and then you get accurate distance.

Not distance estimation, but distance ratio calculation, we call this the pseudo distance. Yeah. Cause it’s not really distance, but it functions at it. Yeah.

Abate: So I mean, one of the nice things that you said in your presentation is that even in different poses and orientations, the space between your shoulders stays relatively the same.

But on the flip side, say my shoulders and your shoulders are different lengths.

Michael Fulton: They are different. But when you look at the magnitude of the difference compared to the magnitude of the scene, it’s actually very small. Right. Like, I would say just on a rough guess, I’d say the difference between our shoulder width is a few centimeters mm-hmm right.

And when you were using this, I can’t remember my exact shoulder width. It was something like 40 something centimeters. I, I don’t remember when we’re using that as our, as our, basically our signal for the distance a difference of a couple of centimeters does make a difference, but it doesn’t wreck things.

Yeah. We can still work with it. And, and like I said, in the, in the presentation earlier, we can run it off of the average diver shoulder width. But if you are going down with an AUV and you know, you’re gonna work [00:05:00] with it, you could also calibrate it to your exact shoulder width. We did this a few times and it works.

The algorithm works regardless if you calibrate it to your exact shoulder width, you can get really nice distance like final distance for approach. It works really nicely if you calibrate it to the specific shoulder width, but it works generally on the average as well. Is there any difference

Abate: between say taking these these measurements and images above ground versus underwater. Does water distort that measurement?

Michael Fulton: Yeah, so absolutely underwater vision in general. there’s distortion of color. There’s distortion of turbidity particulate matter and bubbles, lots of things. So, so this side of underwater vision is kind of it is the way it is.

Mm-hmm all underwater vision stuff suffers from this. There is a, a really lively thread of work on underwater image. Enhance. Which mostly attempts to deal with like light or color changing color. Yeah. Yeah. Um, so that actually, it helps a bit, but doesn’t help a ton with this. Um, the other big thing. So, so that’s from the visual side of things.

When we’re talking more about the I don’t know quite how to say this. The, the, the learning side of things, our diver detector is trained on images of divers, so it knows what they look like. It approaches them easy. The body pose that we use is TRT pose from nvidia IOT. it’s trained on terrestrial imagery. So the thing about that is that in those traditional images, people are standing or sitting, nobody is sideways, right? Cuz we, we can’t go sideways, but in the water we can, people are sideways all the time.

They’re swimming, they’re floating. And so this actually causes problems with ADROC. Um, If, if somebody is in a, a vastly different orientation it, it, it’s a lot harder, which is why, you know, if you read the paper, you’ll see, we, we made a couple of simplifying assumptions. One of them was that there’s only one diver in the scene because while we’re looking into discriminating between divers right now, the algorithm doesn’t do that.

So, and it’ll approach whichever one, it sees first . Um, the other simplifying assumption that we made was that the diver is generally upright. You know, we didn’t tell people, you have to stay a hundred percent straight up and down, but we said, you know, stay mostly upright. Yeah. And when we tried it on people, you know, sideways, it still does work, but not as well.

Abate: Yeah. So this is an area that’s like, you can definitely see a path to improvement.

Michael Fulton: Absolutely

Abate: not really a challenge. It’s just a matter of getting the data and fitting it to yeah.

Michael Fulton: With underwater robotics, brown truth is always a huge, huge trouble. And for labeling something like pose. That is some really it’s, it’s not so much that it’s like difficult work, but the labeling is gonna take months for that.

But I actually, I mean, it, this is why ICRA is great. Like I was talking with somebody on Monday night or no sat Sunday night. Um, and they were telling me about some pose network I should try. So I’m gonna go home and try, try it for our data and see if it works any better.

Abate: Yeah.

Michael Fulton: Um, I think the two main areas of improvement, three, three areas of improvement, pose estimation, we already talked about.

Yeah. Second big one is search behavior. Our search behavior for this was really simple. If you don’t see the diver turn mm-hmm right, but there’s, there’s some obvious improvements that can be made there. Things like if we lose track of the diver, we should turn in the direction that we last saw them.

Right. Or if we’re trying to cover a large space, maybe turning isn’t gonna be enough. You know, I, I said earlier, we, we ran this from 15 meters away. I would guess… I don’t have data. I would guess that past 30 meters it’s not gonna work because we just can’t see anything. So for a space that’s like 30 meters or larger, which open water underwater environments are you’re gonna need to be able to do more than just turning.

It’s gonna need to like search the space somehow. Yeah. That I think is the whole big thing on its own. Um, and then the other big thing on its own is what I said earlier about diver discrimination. Yeah. Being able to tell the difference between diver a and diver B, you know, I don’t, I don’t really care if it’s, you know, this guy versus that guy versus that girl.

It doesn’t matter who specifically, but I do want the algorithm to be able to manage multiple divers in the scene, knowing which one it’s … approached before. And, and when we actually first came up with this idea, the idea was we’re gonna turn on the robot and it’s gonna like go up to everybody and ask, Hey, are you my operator?

I really want to do that still. So if we get the diver discriminator working well enough,

Abate: And that will be through gestures, they’ll say like, …

Michael Fulton: yeah. So, so it’ll come up to the diver and it’ll do like a, so I I’ve done this work with motion based communication, robot communication via. um, and it, so the di the robot’s gonna come up and it’s gonna kind of do like a, you ever seen like a dog ask to play fetch with you?

Yeah. It’s gonna kind of go like, Hey, Hey, Hey, Hey, Hey, are you, are you? Yeah. And then the diver will say yes or, or no, I’m not your, I’m not your operator. And then it’ll go, okay, I’ll cross you off the list search for the next [00:10:00] person. Yeah. That’s where this work hopefully goes in the future. Um, you know, my, my work in general, my thesis work is about robot communication and interaction underwater.

Uh, I think I mentioned this briefly in the talk, you know, underwater human robot collaboration is a brand new field. Yeah. Like this didn’t exist before the early two thousands. Um, partially because the AUVs that are reasonable to, to work with underwater are like, since 2000’s,

Abate: they were, they were created in the 2000’s.

Michael Fulton: Yes.

Abate: And that was the impetus for why now working with a robot, right. Underwater is even a concept that we’re talking about.

Michael Fulton: Yes. Cause the first AUV’s are in like the sixties, and these are these big ocean going submarine, things that are for oceanography, great work, you know, really important stuff, but they’re bigger than you and I are.

Yeah. And you can, you can interact with that, but it’s not really what they’re for therefore doing these long deployments that humans can’t do. We’re now in, in underwater robotics, seeing the, the advent, the coming of collaborative AUV’s. It is, it is a new thing that is coming up and you can see it in the work, you know, underwater HRI papers weren’t written 20 years ago.

Um, maybe somebody wrote one 20 years ago that I don’t know about and they’re gonna get mad at me, but I’ve only seen ones dating back to early two thousands. Um, and now there’s, there’s a few here and there. I’ve presented a couple of ICRA now, and while we’re not yet at the point where the AUVs and the people are actually working together you know, I, I, I don’t know of anybody who’s actually doing collaborative work with AUVs for like a company.

Um, but it’s coming. Yeah, it’s coming soon. And, and in particular, for me, I’m really interested in like environmental conservation and biological remediation. So like trash cleanup, oil spills uh, observing invasive or so it’s either eradicating invasive species or preserving endangered species.

Yeah. This kind of thing where what’s happening right now is around the world. Some scientist is diving, you know, they are diving with all these undergrads for hours long a day. I want to be able to give them robots that are cheap and, and openly available. And you know, my big part of it is robots that they can communicate with in a way that’s not onerous for them to learn.

Yeah. I don’t want these scientists to have to learn Python or have to learn C++ or ROS and learn how to program these robots. I want them to be able to use my communication frameworks, and my task management frameworks so that they can task these AUVs with different pieces. Work go find me this, this type of Marine life.

Go find me this trash. Tell me where to go pick up this trash. Uh, bring me tools, carry samples for me. Yeah. This kind of stuff I think is very much within the realm of possibility and the work that I, and the other great Ph.D. students and master students and undergrad students and our advisor of the interactive robotics and vision lab do is actively moving us towards that.

Yeah. We’re getting, you know, perception, capabilities, and navigation mapping. Capabilities you saw in the Marine, robotics talks, all these different things. You know, the acoustic localization, the GoPro-based vision for mapping all this stuff. It’s all pieces of the puzzle. And the piece that I’m most interested in is the human-robot interaction part because it’s, it’s such an interesting, challenging environment.

There’s so many assumptions that you make terrestrially that just aren’t there. Like the big, the. Know, if you’re communicating with a robot, you kind of expect to talk to it and have it talk back. You can’t do that underwater. You gotta,

Abate: yeah. There’s no voice.

Michael Fulton: There’s no voice. There’s a breathing apparatus in your mouth.

Yeah. And you can hear, but not really well. Yeah. So I’ve developed, you know, motion, light-based communication. I’m trying sound, but nonverbal sounds so like tones instead of words.

Abate: Yeah. And what’s interesting too, is like as in there are a lot of industry examples like offshore wind and like offshore structures that are being built where The divers are not gonna get replaced.

Michael Fulton: No, no. In no time soon.

Abate: Yeah. They have such an incredibly difficult job to automate. Yes. That, and because of that, they’re also there, some of hard to find yep. Must be expensive. Yep. Um,

Michael Fulton: it’s dangerous too

Abate: and dangerous.

Michael Fulton: Yeah. People die every year.

Abate: So you don’t, you, we want to do everything you can to make that dive the most efficient version of themselves possible.

Michael Fulton: And safe and, and easier. Yeah. You know, it’s, it’s, it is hard, work. It, like you said, it’s hard to find people who do this because there’s lots of scuba dive certified people, right?

It’s a, it’s a common pastime, but technical diving and diving for, for commercial purposes. There’s not too many of them out there. There’s. I mean, [00:15:00] in, in, in the grand scheme of things, you know, it’s, it’s, it’s a rarer field and so much important work is, is in there. Uh, there’s this quote, I really. um, it’s a, I, I, I don’t know if it’s actually, it’s attributed to Leonardo DaVinci water is the driving force of all life on our planet.

Mm-hmm I really believe that. Like, obviously there’s the, the scientific reasons, you know, photosynthesis, climate climate stuff, but also just like so much commerce depends on ocean environments, the internet. I mean, we have cables under sea, all of this stuff. You need AUVs. There are some places where we wanna replace divers with AUV’s.

But we really wanna augment the divers who are currently doing work underwater with AUVs, with these collaborative AUVs, partially because you’re right. It’s gonna be a long time before they’re replaced if ever it’s such a challenging field, but also personally, I’m, I, I really like the idea of robots making people’s lives better.

Mm-hmm and sometimes replacing them in jobs is the way towards that. There are some jobs. So dangerous, so dull, so, so dirty that you don’t want anybody to do them, but there’s a lot of jobs where like, people depend on this for their livelihood. I don’t wanna replace these people. I wanna make their lives easier.

I wanna make their lives easier and I wanna make it possible for them to do more interesting work. You know, there’s we think about, we think of ourselves as such an advanced society, right? Like we go to space, we go to Mars, a ridiculous amount of our ocean is unexplored. We don’t know how so much of the life that exists in our ocean is. We don’t, we there’s so much basic science there that’s undone because the environment is so inhospitable.

You need air tanks, there’s pressure considerations. There’s a maximum limit you can dive to. So anything that you’re doing underwater is automatically a hundred times harder, a hundred times more costly, more effortful.

And this is where AUVs, my advisor said this really, really well in the session. So we want to enhance underwater divers by having underwater divers do the things, AUVs can’t and having AUVs do the things underwater divers can’t. Yeah, I think that’s a perfect summation of where this field is headed.

Awesome. Thank you. Yeah, no problem. Thank you for asking me.

transcript

tags: bio-inspired, c-Research-Innovation, cx-Research-Innovation, human-robot interaction, icra2022, podcast, Service Professional Underwater, software, video

Abate De Mey Podcast Leader and Robotics Founder

358

Underwater Human-Robot Interaction #ICRA2022 with Michael Fulton

Michael Fulton

About Robohub Podcast

→ all podcast episodes

Related posts :

Robot Talk Episode 145 – Robotics and automation in manufacturing, with Agata Suwala

Reversible, detachable robotic hand redefines dexterity

“Robot, make me a chair”

Robot Talk Episode 144 – Robot trust in humans, with Samuele Vinanzi

How can robots acquire skills through interactions with the physical world? An interview with Jiaheng Hu

Sven Koenig wins the 2026 ACM/SIGAI Autonomous Agents Research Award

Robot Talk Episode 143 – Robots for children, with Elmira Yadollahi

New frontiers in robotics at CES 2026

↑

Would you like to learn how to tell impactful stories about your robot or AI system?