首页 文章

如何识别linux块设备的请求队列

提问于
浏览
80

我正在研究通过网络连接硬盘的驱动程序 . 有一个错误,如果我在计算机上启用两个或更多硬盘,只有第一个硬盘可以查看和识别分区 . 结果是,如果我在hda上有1个分区,在hdb上有1个分区,只要我连接hda,就会有一个可以挂载的分区 . 所以hda1一旦安装就会得到一个blkid xyz123 . 但是当我继续安装hdb1时,它也会出现相同的blkid,实际上,驱动程序是从hda而不是hdb读取它 .

所以我想我找到了司机弄乱的地方 . 下面是一个调试输出,包括dump_stack,我把它放在第一个看似访问错误设备的地方 .

这是代码部分:

/*basically, this is just the request_queue processor. In the log output that
  follows, the second device, (hdb) has just been connected, right after hda
  was connected and hda1 was mounted to the system. */

void nblk_request_proc(struct request_queue *q)
{
struct request *req;
ndas_error_t err = NDAS_OK;

dump_stack();

while((req = NBLK_NEXT_REQUEST(q)) != NULL)
{
    dbgl_blk(8,"processing queue request from slot %d",SLOT_R(req));

    if (test_bit(NDAS_FLAG_QUEUE_SUSPENDED, &(NDAS_GET_SLOT_DEV(SLOT_R(req))->queue_flags)))  {
        printk ("ndas: Queue is suspended\n");
        /* Queue is suspended */
#if ( LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,31) )
        blk_start_request(req);
#else
        blkdev_dequeue_request(req);
#endif

这是一个日志输出 . 我添加了一些评论,以帮助了解正在发生的事情以及糟糕的电话似乎出现的地方 .

/* Just below here you can see "slot" mentioned many times. This is the 
     identification for the network case in which the hd is connected to the 
     network. So you will see slot 2 in this log because the first device has 
     already been connected and mounted. */

  kernel: [231644.155503] BL|4|slot_enable|/driver/block/ctrldev.c:281|adding disk: slot=2, first_minor=16, capacity=976769072|nd/dpcd1,64:15:44.38,3828:10
  kernel: [231644.155588] BL|3|ndop_open|/driver/block/ops.c:233|ing bdev=f6823400|nd/dpcd1,64:15:44.38,3720:10
  kernel: [231644.155598] BL|2|ndop_open|/driver/block/ops.c:247|slot =0x2|nd/dpcd1,64:15:44.38,3720:10
  kernel: [231644.155606] BL|2|ndop_open|/driver/block/ops.c:248|dev_t=0x3c00010|nd/dpcd1,64:15:44.38,3720:10
  kernel: [231644.155615] ND|3|ndas_query_slot|netdisk/nddev.c:791|slot=2 sdev=d33e2080|nd/dpcd1,64:15:44.38,3696:10
  kernel: [231644.155624] ND|3|ndas_query_slot|netdisk/nddev.c:817|ed|nd/dpcd1,64:15:44.38,3696:10
  kernel: [231644.155631] BL|3|ndop_open|/driver/block/ops.c:326|mode=1|nd/dpcd1,64:15:44.38,3720:10
  kernel: [231644.155640] BL|3|ndop_open|/driver/block/ops.c:365|ed open|nd/dpcd1,64:15:44.38,3724:10
  kernel: [231644.155653] BL|8|ndop_revalidate_disk|/driver/block/ops.c:2334|gendisk=c6afd800={major=60,first_minor=16,minors=0x10,disk_name=ndas-44700486-0,private_data=00000002,capacity=%lld}|nd/dpcd1,64:15:44.38,3660:10
  kernel: [231644.155668] BL|8|ndop_revalidate_disk|/driver/block/ops.c:2346|ed|nd/dpcd1,64:15:44.38,3652:10

  /* So at this point the hard disk is added (gendisk=c6...) and the identifications
     all match the network device. The driver is now about to begin scanning the 
     hard drive for existing partitions. the little 'ed', at the end of the previous
     line indicates that revalidate_disk has finished it's job. 

     Also, I think the request queue is indicated by the output dpcd1 near the very
     end of the line. 

     Now below we have entered the function that is pasted above. In the function
     you can see that the slot can be determined by the queue. And the log output
     after the stack dump shows it is from slot 1. (The first network drive that was
     already mounted.) */

        kernel: [231644.155677]  ndas-44700486-0:Pid: 467, comm: nd/dpcd1 Tainted: P           2.6.32-5-686 #1
  kernel: [231644.155711] Call Trace:
  kernel: [231644.155723]  [<fc5a7685>] ? nblk_request_proc+0x9/0x10c [ndas_block]
  kernel: [231644.155732]  [<c11298db>] ? __generic_unplug_device+0x23/0x25
  kernel: [231644.155737]  [<c1129afb>] ? generic_unplug_device+0x1e/0x2e
  kernel: [231644.155743]  [<c1123090>] ? blk_unplug+0x2e/0x31
  kernel: [231644.155750]  [<c10cceec>] ? block_sync_page+0x33/0x34
  kernel: [231644.155756]  [<c108770c>] ? sync_page+0x35/0x3d
  kernel: [231644.155763]  [<c126d568>] ? __wait_on_bit_lock+0x31/0x6a
  kernel: [231644.155768]  [<c10876d7>] ? sync_page+0x0/0x3d
  kernel: [231644.155773]  [<c10876aa>] ? __lock_page+0x76/0x7e
  kernel: [231644.155780]  [<c1043f1f>] ? wake_bit_function+0x0/0x3c
  kernel: [231644.155785]  [<c1087b76>] ? do_read_cache_page+0xdf/0xf8
  kernel: [231644.155791]  [<c10d21b9>] ? blkdev_readpage+0x0/0xc
  kernel: [231644.155796]  [<c1087bbc>] ? read_cache_page_async+0x14/0x18
  kernel: [231644.155801]  [<c1087bc9>] ? read_cache_page+0x9/0xf
  kernel: [231644.155808]  [<c10ed6fc>] ? read_dev_sector+0x26/0x60
  kernel: [231644.155813]  [<c10ee368>] ? adfspart_check_ICS+0x20/0x14c
  kernel: [231644.155819]  [<c10ee138>] ? rescan_partitions+0x17e/0x378
  kernel: [231644.155825]  [<c10ee348>] ? adfspart_check_ICS+0x0/0x14c
  kernel: [231644.155830]  [<c10d26a3>] ? __blkdev_get+0x225/0x2c7
  kernel: [231644.155836]  [<c10ed7e6>] ? register_disk+0xb0/0xfd
  kernel: [231644.155843]  [<c112e33b>] ? add_disk+0x9a/0xe8
  kernel: [231644.155848]  [<c112dafd>] ? exact_match+0x0/0x4
  kernel: [231644.155853]  [<c112deae>] ? exact_lock+0x0/0xd
  kernel: [231644.155861]  [<fc5a8b80>] ? slot_enable+0x405/0x4a5 [ndas_block]
  kernel: [231644.155868]  [<fc5a8c63>] ? ndcmd_enabled_handler+0x43/0x9e [ndas_block]
  kernel: [231644.155874]  [<fc5a8c20>] ? ndcmd_enabled_handler+0x0/0x9e [ndas_block]
  kernel: [231644.155891]  [<fc54b22b>] ? notify_func+0x38/0x4b [ndas_core]
  kernel: [231644.155906]  [<fc561cba>] ? _dpc_cancel+0x17c/0x626 [ndas_core]
  kernel: [231644.155919]  [<fc562005>] ? _dpc_cancel+0x4c7/0x626 [ndas_core]
  kernel: [231644.155933]  [<fc561cba>] ? _dpc_cancel+0x17c/0x626 [ndas_core]
  kernel: [231644.155941]  [<c1003d47>] ? kernel_thread_helper+0x7/0x10

  /* here are the output of the driver debugs. They show that this operation is
     being performed on the first devices request queue. */

  kernel: [231644.155948] BL|8|nblk_request_proc|/driver/block/block26.c:494|processing queue request from slot 1|nd/dpcd1,64:15:44.38,3408:10
  kernel: [231644.155959] BL|8|nblk_handle_io|/driver/block/block26.c:374|struct ndas_slot sd = NDAS GET SLOT DEV(slot 1)
  kernel: [231644.155966] |nd/dpcd1,64:15:44.38,3328:10
  kernel: [231644.155970] BL|8|nblk_handle_io|/driver/block/block26.c:458|case READA call ndas_read(slot=1, ndas_req)|nd/dpcd1,64:15:44.38,3328:10
  kernel: [231644.155979] ND|8|ndas_read|netdisk/nddev.c:824|read io: slot=1, cmd=0, req=x00|nd/dpcd1,64:15:44.38,3320:10

我希望这是足够的背景信息 . 也许在这个时刻一个显而易见的问题是“何时何地分配了request_queues?”

好吧,这在add_disk函数之前处理了一点 . 添加磁盘,是日志输出的第一行 .

slot->disk = NULL;
spin_lock_init(&slot->lock);
slot->queue = blk_init_queue(
    nblk_request_proc, 
    &slot->lock
);

据我所知,这是标准操作 . 回到我原来的问题 . 我可以在某处找到请求队列,并确保每个新设备的请求队列增加或唯一,或者Linux内核是否只为每个主要号码使用一个队列?我想了解为什么这个驱动程序在两个不同的块存储上加载相同的队列,并确定是否在初始注册过程中导致重复的blkid .

谢谢你为我看这种情况 .

2 回答

  • 0
    Queue = blk_init_queue(sbd_request, &Device.lock);
    
  • 1

    我分享了导致我发布这个问题的bug的解决方案 . 虽然它实际上没有回答如何识别设备请求队列的问题 .

    在上面的代码中如下:

    if (test_bit(NDAS_FLAG_QUEUE_SUSPENDED, 
           &(NDAS_GET_SLOT_DEV(SLOT_R(req))->queue_flags)))
    

    好吧,“SLOT_R(req)”造成了麻烦 . 这是定义返回gendisk设备的其他位置 .

    #define SLOT_R(_request_) SLOT((_request_)->rq_disk)
    

    这会返回磁盘,但不会在以后的各种操作中返回正确的值 . 因此,当加载额外的块设备时,此函数基本上保持返回1.(我认为它是作为布尔值处理的 . )因此,所有请求都堆积在磁盘1的请求队列中 .

    修复是访问磁盘的private_data中已存储的正确磁盘标识值,当它被添加到系统时 .

    Correct identifier definition:
       #define SLOT_R(_request_) ( (int) _request_->rq_disk->private_data )
    
    How the correct disk number was stored.
       slot->disk->queue = slot->queue;
       slot->disk->private_data = (void*) (long) s;  <-- 's' is the disk id
       slot->queue_flags = 0;
    

    现在从私有数据返回正确的磁盘ID,因此所有请求都到正确的队列 .

    如上所述,这并没有显示如何识别队列 . 一个没有受过教育的猜测可能是:

    x = (int) _request_->rq_disk->queue->id;
    

    参考 . linux http://lxr.free-electrons.com/source/include/linux/blkdev.h#L270&321中的request_queue函数

    谢谢大家的帮助!

相关问题